Google BigQuery Connectors > Part II: Data Integration with Google BigQuery V2 Connector > Mappings and mapping tasks with Google BigQuery V2 Connector > Google BigQuery V2 sources in mappings and elastic mappings
  

Google BigQuery V2 sources in mappings and elastic mappings

To read data from Google BigQuery, configure a Google BigQuery object as the Source transformation in a mapping or elastic mapping.
Specify the name and description of Google BigQuery source. Configure the source and advanced properties for the source object in mappings and elastic mappings.
The following table describes the source properties that you can configure for a Google BigQuery source:
Property
Description
Connection
Name of the Google BigQuery V2 source connection. Select a source connection, or click New Parameter to define a new parameter for the source connection.
Source Type
Type of the Google BigQuery source objects available.
Select Single Object, Multiple¹, Query¹ or Parameter.
When you run an elastic mapping, you cannot read data from multiple objects.
Object
Name of the Google BigQuery source object based on the source type selected.
Parameter
Select a parameter for the source object, or click New Parameter to define a new parameter for the source object. The Parameter property appears only if you select Parameter as the source type.
Query¹
Click on Define Query and enter a valid custom query.
The Query property appears only if you select Query as the source type.
You can parameterize a custom query object at runtime in a mapping.
Filter
Configure a simple filter or an advanced filter to remove rows at the source. You can improve efficiency by filtering early in the data flow.
A simple filter includes a field name, operator, and value. Use an advanced filter to define a more complex filter condition, which can include multiple conditions using the AND or OR logical operators.
Only simple filter is applicable for elastic mappings.
¹ Applies only to mappings. The remaining properties are applicable for both mappings and elastic mappings.
The following table describes the advanced properties that you can configure for a Google BigQuery source:
Property
Description
Source Dataset ID
Optional. Overrides the Google BigQuery dataset name that you specified in the connection.
Source Table Name
Optional. Overrides the Google BigQuery table name that you specified in the Source transformation.
Number of Rows to Read
Specifies the number of rows to read from the Google BigQuery source table.
Allow Large Results¹
Determines whether Google BigQuery V2 Connector must produce arbitrarily large result tables to query large source tables.
If you select this option, you must specify a destination table to store the query results.
Query Results Table Name¹
Required if you select the Allow Large Results option.
Specifies the destination table name to store the query results. If the table is not present in the dataset, Google BigQuery V2 Connector creates the destination table with the name that you specify.
Job Poll Interval in Seconds¹
The number of seconds after which Google BigQuery V2 Connector polls the status of the read job operation.
Default is 10.
Read Mode
Specifies the read mode to read data from the Google BigQuery source.
You can select one the following read modes:
  • - Direct. In direct mode, Google BigQuery V2 Connector reads data directly from the Google BigQuery source table.
  • Note: When you use hybrid and complex connection mode, you cannot use direct mode to read data from the Google BigQuery source.
  • - Staging¹. In staging mode, Google BigQuery V2 Connector exports data from the Google BigQuery source into Google Cloud Storage. After the export is complete, Google BigQuery V2 Connector downloads the data from Google Cloud Storage into the local stage file and then reads data from the local stage file.
Default is Direct mode.
Number of Threads for Downloading Staging Files¹
Specifies the number of files that Google BigQuery V2 Connector downloads at a time to enable parallel download.
This property applies to staging mode.
Local Stage File Directory¹
Specifies the directory on your local machine where Google BigQuery V2 Connector stores the Google BigQuery source data temporarily before it reads the data.
This property applies to staging mode.
Note: This property is not applicable when you use a serverless runtime environment.
Staging File Name¹
Name of the staging file where data from the Google BigQuery source table is exported to Google Cloud Storage.
This property applies to staging mode.
Data Format of the staging file¹
Specifies the data format of the staging file. You can select one of the following data formats:
  • - Avro
  • - JSON (Newline Delimited). Supports flat and record data with nested and repeated fields.
  • - CSV. Supports flat data.
  • Note: In a .csv file, columns of the Timestamp data type are represented as floating point numbers that cause the milliseconds value to differ.
This property applies to staging mode.
Enable Staging File Compression¹
Indicates whether to compress the size of the staging file in Google Cloud Storage before Google BigQuery V2 Connector reads data from the staging file.
You can enable staging file compression to reduce cost and transfer time.
This property applies to staging mode.
Persist Destination Table¹
Indicates whether Google BigQuery V2 Connector must persist the query results table after it reads data from the query results table.
By default, Google BigQuery V2 Connector deletes the query results table.
pre SQL¹
SQL statement that you want to run before reading data from the source.
For example, if you want to select records in the database before you read the records from the table, specify the following pre SQL statement:
SELECT * FROM [api-project-80697026669:EMPLOYEE.DEPARTMENT] LIMIT 1000;
pre SQL Configuration¹
Specify a pre SQL configuration.
For example,
DestinationTable:PRESQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:False,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
post SQL¹
SQL statement that you want to run after reading data from the source.
For example, if you want to update records in a table after you read the records from a source table, specify the following post SQL statement:
UPDATE [api-project-80697026669.EMPLOYEE.PERSONS_TGT_DEL]
SET phoneNumber.number=1000011, phoneNumber.areaCode=100 where fullname='John Doe'
post SQL Configuration¹
Specify a post SQL configuration.
For example,
DestinationTable:POSTSQL_SRC,DestinationDataset:EMPLOYEE,
FlattenResults:True,WriteDisposition:WRITE_TRUNCATE,UseLegacySql:False
SQL Override Query¹
Overrides the default SQL query used to read data from the Google BigQuery source.
Note: When you specify SQL override query, you must specify a dataset name in the Source Dataset ID advanced source property.
Ensure that the list of selected columns, data types, and the order of the columns that appear in the query matches the columns, data types, and order in which they appear in the source object.
Note: Ensure that you only map all the columns in the SQL override query to the target.
Note: Does not apply when you enable partitioning.
Use Legacy SQL for SQL Override¹
Indicates that the SQL Override query is specified in legacy SQL.
Use the following format to specify a legacy SQL query for the SQL Override Query property:
SELECT <Col1, Col2, Col3> FROM [projectID:datasetID.tableName]
Clear this option to define a standard SQL override query.
Use the following format to specify a standard SQL query for the SQL Override Query property:
SELECT * FROM `projectID.datasetID.tableName`
Retry Options¹
Comma-separated list to specify the following retry options:
  • - Retry Count. The number of retry attempts to read data from Google BigQuery.
  • - Retry Interval. The time in seconds to wait between each retry attempt.
  • - Retry Exceptions. The list of exceptions separated by pipe (|) character for which the retries are made.
Use the following format to specify the retry options:
For example,
RetryCount:5,RetryInterval:1,RetryExceptions:java.net.ConnectException|java.io.IOException
Note: The retry options are available for preview. Preview functionality is supported for evaluation purposes but is unwarranted and is not production-ready. Informatica recommends that you use in non-production environments only. Informatica intends to include the preview functionality in an upcoming release for production use, but might choose not to in accordance with changing market or technical circumstances. For more information, contact Informatica Global Customer Support. To use the functionality, your organization must have the appropriate licenses.
Number of Spark Partitions²
Specifies the maximum number of partitions that the Spark engine splits the data into.
Default is 1.
¹ Applies only to mappings.
² Applies only to elastic mappings.
The remaining properties are applicable for both mappings and elastic mappings.
You can set the tracing level in the advanced properties session to determine the amount of details that logs contain.
The following table describes the tracing levels that you can configure:
Property
Description
Terse
The Secure Agent logs initialization information, error messages, and notification of rejected data.
Normal
The Secure Agent logs initialization and status information, errors encountered, and skipped rows due to transformation row errors. Summarizes session results, but not at the level of individual rows.
Verbose Initialization
In addition to normal tracing, the Secure Agent logs additional initialization details, names of index and data files used, and detailed transformation statistics.
Verbose Data
In addition to verbose initialization tracing, the Secure Agent logs each row that passes into the mapping. Also notes where the Secure Agent truncates string data to fit the precision of a column and provides detailed transformation statistics.
When you configure the tracing level to verbose data, the Secure Agent writes row data for all rows in a block when it processes a transformation.

Partitioning

When you read data from a Google BigQuery source and use simple or hybrid connection mode, you can configure key range partitioning to optimize the mapping performance at run time.

Key range partitioning

You can configure key range partitioning when you use a mapping task to read data from Google BigQuery sources and use simple or hybrid connection mode. With key range partitioning, the Secure Agent distributes rows of source data based on the fields that you define as partition keys. The Secure Agent compares the field value to the range values for each partition and sends rows to the appropriate partitions.
Use key range partitioning for columns that have an even distribution of data values. Otherwise, the partitions might have unequal size. For example, a column might have 10 rows between key values 1 and 1000 and the column might have 999 rows between key values 1001 and 2000. If the mapping includes multiple sources, use the same number of key ranges for each source.
When you define key range partitioning for a column, the Secure Agent reads the rows that are within the specified partition range. For example, if you configure two partitions for a column with the ranges as 10 through 20 and 30 through 40, the Secure Agent does not read the rows 20 through 30 because these rows are not within the specified partition range.
You can configure a partition key for fields of the following data types:
Note: You cannot configure a partition key for Record data type columns and repeated columns.
You cannot use key range partitions when a mapping includes any of the following transformations:

Configuring Key Range Partitioning

Perform the following steps to configure key range partitioning for Google BigQuery sources:
    1. In the Source Properties, click the Partitions tab.
    2. Select the required partition key from the list.
    3. Click Add New Key Range to define the number of partitions and the key ranges based on which the Secure Agent must partition data.
    Use a blank value for the start range to indicate the minimum value. Use a blank value for the end range to indicate the maximum value.
    The following image displays the Partitions tab:
    The image shows the partitions and the corresponding key ranges.