Google BigQuery Connectors > Part III: Data Integration with Google BigQuery Connector > Google BigQuery pushdown optimization > Pushdown optimization using a Google BigQuery V2 connection
  

Pushdown optimization using a Google BigQuery V2 connection

You can configure pushdown optimization for a mapping that contains a Google BigQuery V2 connection. Pushdown optimization enhances the mapping performance. You can configure full pushdown when you read data from an Google Cloud Storage source and write to an Google BigQuery target.

Read from and write to Google BigQuery

You can configure pushdown optimization in a mapping to read from and write to Google BigQuery using a Google BigQuery V2 connection.

Example

You work in a motorbike retail company with more than 30,000 dealerships and 2000 inspection centers globally. The company stores millions of records in Google BigQuery hosted on GCP. You want to use Data Integration to perform some transformations on the data before you write back to Google BigQuery.
Use a Google BigQuery V2 connection in the mapping to read from the Google BigQuery source and write the processed data to the Google BigQuery target. Configure full pushdown optimization in the mapping to enhance the performance.

Read from Google Cloud Storage and write to Google BigQuery

You can configure pushdown optimization for a mapping that uses a Google Cloud Storage connection in the Source transformation to read from Google Cloud Storage and a Google BigQuery V2 connection in the Target transformation to write to Google BigQuery.

Example

You work for a rapidly growing data science organization. Your organization develops software products to analyze financials, building financial graphs connecting people profiles, companies, jobs, advertisers, and publishers. The organization uses infrastructure based on Google Cloud Platform and stores its data in Google Cloud Storage files. The organization plans to implement a business intelligence service to build visualization and perform real-time analysis. You can load data from Google Cloud Storage to Google BigQuery by configuring the transformations to support the adequate data warehouse model and the consuming requirements.
Create an Google Cloud Storage V2 connection to read data form the Google Cloud Storage source. Create an Google BigQuery V2 connection and use pushdown optimization to write data to the Google BigQuery target to enhance the performance and reduce the cost involved.

Configuring pushdown optimization for a Google BigQuery mapping task

Perform the following steps to configure pushdown optimization for an Google BigQuery V2 mapping task:
    1. Create an Google Cloud Storage V2 connection and an Google BigQuery V2 connection.
    2. Create a mapping to read data from an Google Cloud Storage source and write data to an Google BigQuery target.
    3. Create a mapping task.
    1. a. Select the configured mapping.
    2. b. In the Pushdown Optimization section on the Schedule tab, set the pushdown optimization value to Full.
    3. c. Save the task and click Finish.
When you run the mapping task, the transformation logic is pushed to the Google BigQuery database.

Pushdown compatibility

You can configure the task to push transformations, variables, functions, and operators to the database.
When you use pushdown optimization, the Secure Agent converts the expression in the transformation by determining equivalent operators, variables, and functions in the database. If there is no equivalent operator, variable, and function, the Secure Agent processes the transformation logic.

Supported functions for Google BigQuery V2 mappings

The following table summarizes the availability of pushdown functions in an Google BigQuery database. Columns marked with an X indicate that the function can be pushed to the Google BigQuery database by using full pushdown optimization. Columns marked with a dash (-) symbol indicate that the function cannot be pushed to the database.
Function
Pushdown
Function
Pushdown
Function
Pushdown
ABORT()
-
INITCAP()
-
REG_MATCH()
-
ABS()
X
INSTR()
X
REG_REPLACE
-
ADD_TO_DATE()
-
IS_DATE()
X
REPLACECHR()
-
AES_DECRYPT()
-
IS_NUMBER()
-
REPLACESTR()
-
AES_ENCRYPT()
-
IS_SPACES()
-
REVERSE()
-
ASCII()
-
ISNULL()
X
ROUND(DATE)
-
AVG()
x
LAST()
-
ROUND(NUMBER)
-
CEIL()
X
LAST_DAY()
-
RPAD()
-
CHOOSE()
-
LEAST()
-
RTRIM()
X
CHR()
X
LENGTH()
X
SET_DATE_PART()
-
CHRCODE()
-
LN()
X
SIGN()
X
COMPRESS()
-
LOG()
X
SIN()
X
CONCAT()
X
LOOKUP
-
SINH()
X
COS()
X
LOWER()
X
SOUNDEX()
-
COSH()
X
LPAD()
-
SQRT()
X
COUNT()
X
LTRIM()
X
STDDEV()
X
CRC32()
-
MAKE_DATE_TIME()
-
SUBSTR()
-
CUME()
-
MAX()
X
SUM()
X
DATE_COMPARE()
-
MD5()
-
SYSTIMESTAMP()
X
DATE_DIFF()
-
MEDIAN()
-
TAN()
X
DECODE()
-
METAPHONE()
-
TANH()
-
DECODE_BASE64()
-
MIN()
X
TO_BIGINT
-
DECOMPRESS()
-
MOD()
X
TO_CHAR(DATE)
X
ENCODE_BASE64()
-
MOVINGAVG()
-
TO_CHAR(NUMBER)
X
EXP()
-
MOVINGSUM()
-
TO_DATE()
X
FIRST()
-
NPER()
-
TO_DECIMAL()
X
FLOOR()
X
PERCENTILE()
-
TO_FLOAT()
X
FV()
-
PMT()
-
TO_INTEGER()
X
GET_DATE_PART()
-
POWER()
X
TRUNC(DATE)
-
GREATEST()
-
PV()
-
TRUNC(NUMBER)
X
IIF()
-
RAND()
-
UPPER()
X
IN()
-
RATE()
-
VARIANCE()
X
INDEXOF()
-
REG_EXTRACT()
-

Supported operators for Google BigQuery V2 mappings

When you use pushdown optimization, the Secure Agent converts the expression in the transformation by determining equivalent operators in the database. If there is no equivalent operator, the Secure Agent processes the transformation logic.
The following table lists the pushdown operators that can be used in an Google BigQuery database. Columns marked with an X indicate that the operator can be pushed to the Google BigQuery database by using full pushdown optimization.
Operator
Pushdown
Operator
Pushdown
+
X
=
X
-
X
>=
X
*
X
<=
X
/
X
!=
X
%
X
AND
X
||
X
OR
X
>
X
NOT
X
<
X

Supported transformations for Google BigQuery V2 mappings

When you configure pushdown optimization, the Secure Agent tries to push the configured transformation to Google BigQuery.
The following table summarizes the availability of transformations that you can push down to Google BigQuery. Columns marked with an X indicate that the transformation can be pushed to Google BigQuery by using full pushdown optimization. Columns marked with a dash (-) symbol indicate that the transformation cannot be pushed to the database.
Transformations
Pushdown
Guidelines
Aggregator¹
X
You can perform the following aggregate calculations:
  • - AVG
  • - COUNT
  • - MAX
  • - MIN
  • - SUM
  • - STDDEV
  • - VARIANCE
When you configure an Aggregator transformation, you must use each of the incoming ports either in an aggregate function or in a group by field to define how to group data for aggregate expressions.
Expression
X
Filter
X
Joiner¹
X
Lookup¹
X
Both connected and unconnected lookups are applicable.
You can add the following lookups:
  • - Cached
  • - Uncached
  • - Unconnected with cached
When you configure a cached lookup, select the Multiple Matches property value as Report all rows in the lookup properties for pushdown optimization to work.
When you configure an unconnected Lookup transformation, consider the following rules:
  • - You must select the Multiple Matches property value as Report error in the unconnected lookup properties for pushdown optimization to work.
  • - You can only configure an Expression transformation for an output received from an unconnected lookup.
Sorter¹
X
Union¹
X
Router¹
X
Update Strategy¹
-
Sequence Generator
-
¹Applicable only for Google BigQuery sources.

Supported variables for Google BigQuery V2 mappings

When you use pushdown optimization, the Secure Agent converts the expression in the transformation by determining equivalent variables in the database. If there is no equivalent variable, the Secure Agent processes the transformation logic.
The following table lists the pushdown operators that can be used in an Google BigQuery database. Columns marked with an X indicate that the operator can be pushed to the Google BigQuery database by using full pushdown optimization.
Variable
Pushdown
SESSSTARTTIME
X
SYSDATE
-
WORKFLOWSTARTTIME
-

Supported data types for Google BigQuery V2 mappings

The following table lists the Google Cloud Storage data types based on the file format type that can be pushed to the Google BigQuery database:
File Format Type
Google Cloud Storage Data Type
Delimited
  • - BigInt
  • - Number
  • - String
Avro
  • - Binary
  • - Byte
  • - Double
  • - Float
  • - Int
  • - Long
  • - String
Parquet
  • - Binary
  • - Date
  • - Decimal
  • - Double
  • - Float
  • - Int32
  • - Int64
  • - Int96
  • - String
JSON
  • - Double
  • - Int
  • - Long
  • - String
The following table lists the Google BigQuery data types that can be used for pushdown optimization:
Google BigQuery Data Type
Transformation Data Type
Boolean
String
Date
Date/Time
DateTime
Date/Time
Float
Double
Integer
BigInt
Numeric
Decimal
Default precision 28, scale 10
Note: Though the Fields tab shows the scale as 10, Google BigQuery supports scale upto 9.
String
String
Byte
Byte
Time
Date/Time
Timestamp
Date/Time

Supported features for Google BigQuery V2 mappings

You must configure a Google BigQuery V2 connection with simple or hybrid mode when you enable pushdown optimization in a mapping task.
Note: If you configure a Google BigQuery V2 connection with complex mode, the Secure Agent logs an pushdown optimization validation error in the session logs file and the mappings run in the Informatica runtime environment without full pushdown.
When you configure pushdown optimization, the mappings support the following advance properties for a Google BigQuery V2 source:
When you configure pushdown optimization, the mappings support the following advance properties for a Google BigQuery V2 lookup:
When you configure pushdown optimization, the mappings support the following properties for an Google BigQuery V2 target:
Note: If you configure target advanced properties that are not supported, the Secure Agent either ignores the properties or logs an pushdown optimization validation error in the session logs file. If the Secure Agent logs an error in the session log, the mappings run in the Informatica runtime environment without full pushdown.

Supported features for Google Cloud Storage V2 source

When you configure pushdown optimization, the Google Cloud Storage V2 connection supports the following properties:
When you configure pushdown optimization, the mappings support the following properties for a Google Cloud Storage V2 source:

Configuring a custom query or an SQL override for the Google BigQuery source object

You can push down a custom query or an SQL override to Google BigQuery.
Before you run a task that contains a custom query as the source object or you configure an SQL override, you must set the Create Temporary View session property in the mapping task properties.
Note: If you do not set the Create Temporary View property, the mapping runs without pushdown optimization.
Perform the following task to set the property:
  1. 1. In the mapping task, navigate to the Pushdown Optimization section on the Schedule tab.
  2. 2. Select Create Temporary View.
  3. 3. Click Finish.

Optimizing full pushdown for multiple targets

When you configure a mapping to write to multiple Google BigQuery targets, you can further optimize the write operation when you configure full pushdown optimization.
To optimize, you can choose to configure an insert, update, upsert, delete, or data driven operation for multiple targets individually.
You can select the same Google BigQuery target table in multiple Target transformations and perform different operations for each of the Target transformations to run independent of each other.

Rules and guidelines for pushdown optimization

Use the following rules and guidelines when pushing functions to an Google BigQuery database:
Sources and targets
When you configure a Google BigQuery source or target, adhere to the following guidelines:
Lookup transformation
When you configure a Lookup transformation based on a Google BigQuery source, adhere to the following guidelines:
Functions
When you push functions to Google BigQuery, adhere to the following guidelines: