Google BigQuery Connectors > Part II: Data Integration with Google BigQuery V2 Connector > Mappings and mapping tasks with Google BigQuery V2 Connector > Google BigQuery V2 targets in mappings and elastic mappings
  

Google BigQuery V2 targets in mappings and elastic mappings

To write data to a Google BigQuery target, configure a Google BigQuery object as the Target transformation in a mapping or elastic mapping.
Specify the name and description of Google BigQuery target. Configure the target and advanced properties for the target object in mappings and elastic mappings.
The following table describes the target properties that you can configure for a Google BigQuery target:
Property
Description
Connection
Name of the Google BigQuery V2 connection. Select a target connection, or click New Parameter to define a new parameter for the target connection.
Target Type
Type of the Google BigQuery target objects available.
You can write data to a single Google BigQuery target object or parameterize the object. You cannot write data to multiple objects.
Parameter²
Select an existing parameter for the target object or click New Parameter to define a new parameter for the target object. The Parameter property appears only if you select Parameter as the target type.
Object
Name of the Google BigQuery target object based on the target type selected.
Create New at Runtime
Creates a target.
Enter a name for the target object and path for the target object and select the source fields that you want to use. By default, all source fields are used.
You must specify a valid dataset id for the Path attribute.
The target name can contain alphanumeric characters. You cannot use special characters in the file name except the underscore character (_).
You cannot parameterize the target at runtime.
Operation
You can select one the following operations:
  • - Insert
  • - Update
  • - Upsert (Update or Insert)
  • - Delete
  • - Data Driven
Note: If you use complex connection mode, you cannot configure update, upsert, and delete operations.
Data Driven Condition
Flags rows for an insert, update, delete, or reject operation based on the data driven expression you specify.
You must specify the data driven condition for non-CDC sources. For CDC sources, you must leave the field empty as the rows in the CDC source tables are already marked with the operation types.
Note: Appears only when you select Data Driven as the operation type.
Update Columns
Specifies the temporary primary key columns to update, upsert or delete target data. If the Google BigQuery target does not include a primary key column, and the mapping performs an update, upsert, or delete task operation, click Add to add a temporary key.
You can select multiple columns. By default, no columns are specified.
¹Applies only to mappings. The remaining properties are applicable for both mappings and elastic mappings.
²Does not apply when you perform a data driven operation.
The following table describes the advanced properties that you can configure for a Google BigQuery target:
Property
Description
Target Dataset ID
Optional. Overrides the Google BigQuery dataset name that you specified in the connection.
UpdateMode
Determines the mode that the Secure Agent uses to update rows in the Google BigQuery target.
You can select one of the following modes:
  • - Update As Update. The Secure Agent updates all rows flagged for update if the entries exist.
  • - Update Else Insert. The Secure Agent first updates all rows flagged for update if the entries exist in the target. If the entries do not exist, the Secure Agent inserts the entries.
Default is Update as Update.
Not applicable when you perform a data driven operation.
Enable Data Driven¹
Implements data driven operation to honor flagged rows for an insert, update, delete, or reject operation based on the data driven condition.
Select this option when you select Data Driven as the target operation.
Enable Merge
Implements the Merge query to perform an update, upsert, delete or data driven operation on a Google BigQuery target table.
If you select the Enable Data Driven property, you must select this option.
Default is not selected.
Update Override¹
Optional. Overrides the update SQL statement that the Secure Agent generates to update the Google BigQuery target.
Use the following format to define an update override query:
UPDATE `<project_name>.<dataset_name>.<table_name>` as <alias_name>
SET <alias_name>.<col_name1>=:<temp_table>.<col_name1>, <alias_name>.<col_name2>=:<temp_table>.<col_name2> FROM <dataset_name>.:<temp_table> WHERE <conditional expression>
For example,
UPDATE `project1.custdataset.cust_table1` as ab SET ab.fld_str=:custtemp.fld_str, ab.fld_int=:custtemp.fld_int FROM custdataset.:custtemp WHERE ab.fld_string_req = :custtemp.fld_string_req
Not applicable when you perform a data driven operation.
Target Table Name
Optional. Overrides the Google BigQuery target table name that you specified in the Target transformation.
Note: If you specify an update override query, Google BigQuery V2 Connector ignores this property.
Create Disposition
Specifies whether Google BigQuery V2 Connector must create the target table if it does not exist.
You can select one of the following values:
  • - Create if needed. If the table does not exist, Google BigQuery V2 Connector creates the table.
  • - Create never. If the table does not exist, Google BigQuery V2 Connector does not create the table and displays an error message.
Create disposition is applicable only when you perform an insert operation on a Google BigQuery target.
Write Disposition
Specifies how Google BigQuery V2 Connector must write data in bulk mode if the target table already exists.
You can select one of the following values:
  • - Write append. If the target table exists, Google BigQuery V2 Connector appends the data to the existing data in the table.
  • - Write truncate. If the target table exists, Google BigQuery V2 Connector overwrites the existing data in the table.
  • - Write empty. If the target table exists and contains data, Google BigQuery V2 Connector displays an error and does not write the data to the target. Google BigQuery V2 Connector writes the data to the target only if the target table does not contain any data.
Write disposition is applicable for bulk mode.
Write disposition is applicable only when you perform an insert operation on a Google BigQuery target.
Write Mode
Specifies the mode to write data to the Google BigQuery target.
You can select one of the following modes:
  • - Bulk. Google BigQuery V2 Connector first writes the data to a staging file in Google Cloud Storage. When the staging file contains all the data, Google BigQuery V2 Connector loads the data from the staging file to the BigQuery target. Google BigQuery V2 Connector then deletes the staging file unless you configure the task to persist the staging file.
  • - Streaming¹. Google BigQuery V2 Connector directly writes data to the BigQuery target. Google BigQuery V2 Connector writes the data into the target row by row.
  • - CDC¹. Applies only when you capture changed data from a CDC source. In CDC mode, Google BigQuery V2 Connector captures changed data from any CDC source and writes the changed data to a Google BigQuery target table.
Default is Bulk mode.
Streaming mode is not applicable when you perform a data driven operation.
Streaming Template Table Suffix¹
Specify the suffix to add to the individual target tables that Google BigQuery V2 Connector creates based on the template target table.
This property applies to streaming mode.
If you select the Enable Merge option, Google BigQuery V2 Connector ignores this property.
Streaming mode is not applicable when you perform a data driven operation.
Rows per Streaming Request¹
Specifies the number of rows that Google BigQuery V2 Connector streams to the BigQuery target for each request.
Default is 500 rows.
The maximum row size that Google BigQuery V2 Connector can stream to the Google BigQuery target for each request is 10 MB.
This property applies to streaming mode.
Streaming mode is not applicable when you perform a data driven operation.
Staging file name
Name of the staging file that Google BigQuery V2 Connector creates in the Google Cloud Storage before it loads the data to the Google BigQuery target.
This property applies to bulk mode.
Data Format of the staging file
Specifies the data format of the staging file. You can select one of the following data formats:
  • - Avro¹
  • - JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • - Parquet¹
  • - CSV¹. Supports flat data.
  • Note: In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
Only JSON format is applicable for elastic mappings.
This property applies to bulk and CDC mode.
Avro and parquet format is not applicable when you perform a data driven operation.
Persist Staging File After Loading
Indicates whether Google BigQuery V2 Connector must persist the staging file in the Google Cloud Storage after it writes the data to the Google BigQuery target. You can persist the staging file if you want to archive the data for future reference.
By default, Google BigQuery V2 Connector deletes the staging file in Google Cloud Storage.
This property applies to bulk mode.
Enable Staging File Compression¹
Select this option to compress the size of the staging file before Google BigQuery writes the data to the Google Cloud Storage and decompress the staging file before it loads the data to the Google BigQuery target.
You can enable staging file compression to reduce cost and transfer time.
Job Poll Interval in Seconds¹
The number of seconds after which Google BigQuery V2 Connector polls the status of the write job operation.
Default is 10.
Number of Threads for Uploading Staging file¹
The number of files that Google BigQuery V2 Connector must create to upload the staging file in bulk mode.
Local Stage File Directory¹
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the files temporarily before writing the data to the staging file in Google Cloud Storage.
This property applies to bulk mode.
Note: This property is not applicable when you use a serverless runtime environment.
Allow Quoted Newlines¹
Indicates whether Google BigQuery V2 Connector must allow the quoted data sections with newline character in a .csv file.
Field Delimiter¹
Indicates whether Google BigQuery V2 Connector must allow field separators for the fields in a .csv file.
Quote Character¹
Specifies the quote character to skip when you write data to Google BigQuery. When you write data to Google BigQuery and the source table contains the specified quote character, the task fails. Change the quote character value to a value that does not exist in the source table.
Default is double quotes.
Allow Jagged Rows¹
Indicates whether Google BigQuery V2 Connector must accept the rows without trailing columns in a .csv file.
Pre SQL¹
SQL statement that you want to run before writing data to the target.
For example, if you want to select records from the database before you write the records into the table, specify the following pre-SQL statement:
SELECT * FROM `api-project-80697026669.EMPLOYEE.RegionNation` LIMIT 1000
Pre SQL Configuration¹
Specify a pre-SQL configuration.
For example,
DestinationTable:PRESQL_TGT2,DestinationDataset:EMPLOYEE,
FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
Post SQL¹
SQL statement that you want to run after writing the data into the target.
For example, if you want to update records in a table after you write the records into the target table, specify the following post-SQL statement:
UPDATE [api-project-80697026669.EMPLOYEE.PERSONS_TGT_DEL]
SET phoneNumber.number =1000011, phoneNumber.areaCode=100 where fullname='John Doe'
Suppress post SQL on Error¹
Indicates whether the Secure Agent must abort the post-SQL query execution in case the task fails to write data to the Google BigQuery target table due to errors.
Default is not selected.
Post SQL Configuration¹
Specify a post-SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:True,UseLegacySQL:False
Truncate target table¹
Truncates the Google BigQuery target table before loading data to the target.
Default is not selected.
Allow Duplicate Inserts
Indicates that the Secure Agent can insert duplicate rows into the Google BigQuery target.
Applicable only when you perform a data driven operation and DD_INSERT is specified in the data driven condition.
Default is not selected.
Forward Rejected Rows
Applicable only when you configure DD_REJECT constant in the data driven condition to reject all the rows.
Otherwise, this property is not applicable for Google BigQuery V2 Connector.
¹Applies only to mappings. The remaining properties are applicable for both mappings and elastic mappings.

Mapping tasks with CDC sources

You can use Google BigQuery V2 Connector to capture changed data from any CDC source and write the changed data to a Google BigQuery target. Add the CDC sources in mappings, and then run the associated mapping tasks to write the changed data to the target. When you capture changed data from a CDC source, you can only configure a single Google BigQuery V2 target transformation in a mapping. You can configure multiple Google BigQuery V2 targets to write changed data from a CDC source. You can configure multiple pipelines in a mapping to write changed data from multiple CDC sources to multiple Google BigQuery V2 targets.
When the mapping task processes the changed data from a CDC source such as Oracle Express CDC V2, Google BigQuery V2 Connector creates a state table and a staging table in Google BigQuery. When the changed data is received from the CDC source, Google BigQuery V2 Connector uploads the changed data to the staging table. Then, it generates a Job_Id and writes the Job_Id to the state table along with the restart information. Google BigQuery V2 Connector then merges the stage table with the actual target table in Google BigQuery.
Each time you run the mapping task, Google BigQuery V2 Connector creates the state table, if it does not exist, to store the state information. Google BigQuery V2 Connector uses the following naming convention for the state table name:
state_table_cdc_<MappingTaskID>_<UniqueIdentifierForTargetInstance(s)>
Similarly, Google BigQuery V2 Connector uses the following naming convention for the staging table name:
staging_table_cdc_<MappingTaskID>_<TargetInstanceName>

Mapping tasks with CDC sources example

Your organization needs to replicate real-time changed data from a mission-critical production system to minimize intrusive, non-critical work, such as offline reporting or analytical operations system. You can use Google BigQuery V2 Connector to capture changed data from any CDC source and write the changed data to a Google BigQuery target. Add the CDC sources in mappings, and then run the associated mapping tasks to write the changed data to the target.
    1. In Data Integration, click New > Mapping > Create.
    The New Mapping dialog box appears.
    2. Enter a name and description for the mapping.
    3. On the Source transformation, specify a name and description in the general properties.
    4. On the Source tab, select any configured CDC connection and specify the required source properties.
    5. On the Target transformation, specify a name and description in the general properties.
    6. On the Target tab, perform the following steps to configure the target properties:
    1. a. In the Connection field, select the Google BigQuery V2 connection.
    2. b. In the Target Type field, select the type of the target object.
    3. c. In the Object field, select the required target object.
    4. d. In the Operation field, select Data Driven to properly handle insert, update, and delete records from the source.
    5. e. In the Data Driven Condition field, leave the field empty.
    6. f. In the Update Column field, select the key columns to upsert or update data to or delete data from Google BigQuery.
    7. g. In the Advanced Properties section, you must select CDC in the Write Mode property.
    8. h. You can only configure the following advanced target properties for CDC mode:
    7. On the Field Mapping tab, map the incoming fields to the target fields. You can manually map an incoming field to a target field or automatically map fields based on the field names.
    8. In the Actions menu, click New Mapping Task.
    The New Mapping Task page appears.
    9. In the Definition tab, enter the task name and select the configured mapping.
    10. In the CDC Runtime tab, specify the required properties for the selected CDC source.
    For more information about the CDC Runtime properties, see the source properties for the selected CDC source.
    11. In the Schedule tab, add the following properties in the Advanced Session Properties section:
    1. a. In the Commit on End of File field, select the value of the property as No.
    2. b. In the Recovery Strategy field, select the value of the property as Resume from last checkpoint.
    12. Click Save > Run the mapping task.
    Alternatively, you can create a schedule that runs the mapping task on a recurring basis without manual intervention. You can define the schedule to minimize the time between mapping task runs.
In Monitor, you can monitor the status of the logs after you run the task.

Rules and guidelines for Google BigQuery V2 CDC target

Consider the following guidelines when working with a Google BigQuery V2 change data capture (CDC) target:

Upsert task operation

When you perform an upsert operation on a Google BigQuery target, you must configure the upsert fields for the target table. You can use an ID field for standard objects. Ensure that you include the upsert field in the field mappings for the task.

Rules and Guidelines

Consider the following rules and guidelines when you perform an upsert operation on a Google BigQuery target without using Merge query:

Using Merge query for update, upsert, and delete operations

You can implement the Merge query to perform the following operations on a Google BigQuery target:
To implement Merge query, select the Enable Merge option in the advanced target properties.

Rules and Guidelines

Consider the following rules and guidelines when you use Merge query:

Data driven operation for mappings

When you flag rows for an insert, update, delete, or reject operation based on the data driven condition for a Google BigQuery target in a mapping, you must select the Enable Data Driven and Enable Merge properties.

Rules and Guidelines

Consider the following rules and guidelines when you perform a data driven operation on a Google BigQuery target: