Microsoft Azure Data Lake Storage Gen2 Connector > Mappings and mapping tasks with Microsoft Azure Data Lake Storage Gen2 > Microsoft Azure Data Lake Storage Gen2 targets in mappings
  

Microsoft Azure Data Lake Storage Gen2 targets in mappings

In a mapping, you can use a Microsoft Azure Data Lake Storage Gen2 object as a target.
When you use Microsoft Azure Data Lake Storage Gen2 target objects, you can select a Microsoft Azure Data Lake Storage Gen2 Gen2 collection as target. You can configure Microsoft Azure Data Lake Storage Gen2 target properties on the Target page of the Mapping wizard. When you write data to Microsoft Azure Data Lake Storage Gen2, you can use the create target field to create a target at run time. When you create a new target based on the source, you must remove all the binary fields from the field mapping.
The following table describes the Microsoft Azure Data Lake Storage Gen2 target properties that you can configure in a Target transformation:
Property
Description
Connection
Name of the target connection. Select a target connection or click New Parameter to define a new parameter for the target connection.
When you switch between a non-parameterized and a parameterized Microsoft Azure Data Lake Storage Gen2 connection, the advanced property values are retained.
Target Type
Select Single Object or Parameter.
Object
Name of the target object. You can select an existing object or create a new target at runtime.
When you select Create New at Runtime, enter a name for the target object and select the source fields that you want to use. By default, all source fields are used.
The target name can contain alphanumeric characters. You can use only a period (.), an underscore (_), an at the rate sign (@), a dollar sign ($), and a percentage sign (%) special characters in the file name.
Ensure that the headers or file data does not contain special characters.
You can use parameters defined in a parameter file in the target name. When you select the Create Target option, you cannot parameterize the target at runtime.
Note: When you write data to a flat file created at runtime, the target flat file contains a blank line at the end of the file.
Parameter
Select an existing parameter for the target object or click New Parameter to define a new parameter for the target object. The Parameter property appears only if you select Parameter as the target type.
Format
Specifies the file format that the Microsoft Azure Data Lake Storage Gen2 Connector uses to write data to Microsoft Azure Data Lake Storage Gen2.
You can select the following file format types:
  • - Delimited
  • - Avro
  • - Parquet
  • - JSON
  • - ORC
Default is None.
If you select None as the format type, Microsoft Azure Data Lake Storage Gen2 Connector writes data to Microsoft Azure Data Lake Storage Gen2 files in binary format.
Formatting Options
Mandatory. Microsoft Azure Data Lake Storage Gen2 format options. Opens the Formatting Options dialog box to define the format of the file.
Configure the following format options:
  • - Schema Source: Specify the source of the schema. You can select Read from data file or Import from schema file option.
  • If you select an Avro, JSON, ORC, or Parquet format type and select the Read from data file option, you cannot configure the delimiter, escapeChar, and qualifier options.
    For any format type, if you select the Import from schema file option, you can only upload a schema file in the Schema File property field. You cannot configure the delimiter, escapeChar, and qualifier options.
  • - Data elements to sample2: Applicable only when you write JSON files in elastic mappings. Specify the number of rows to write to find the best match to populate the metadata.
  • - Memory available to process data2: Applicable only when you write JSON files in elastic mappings. The memory that the parser uses to write the JSON sample schema and process it.
  • The default value is 2 MB.
    If the file size is more than 2 MB, you might encounter an error. Set the value to the file size that you want to write.
  • - Schema File: You can upload a schema file. You cannot upload a schema file when you select the Create Target option.
  • - Delimiter: Character used to separate columns of data. You can configure parameters such as comma, tab, colon, semicolon, or others.
  • Note: You cannot set a tab as a delimiter directly in the Delimiter field. To set a tab as a delimiter, you must type the tab character in any text editor. Then, copy and paste the tab character in the Delimiter field.
  • - EscapeChar: Character immediately preceding a column delimiter character embedded in an unquoted string, or immediately preceding the quote character in a quoted string.
  • - Qualifier: Quote character that defines the boundaries of data. You can set qualifier as single quote or double quote.
  • - Qualifier Mode1: Applicable to mappings. Specify the qualifier behavior for the target object. You can select one of the following options:
    • - Minimal. Default mode. Applies qualifier to data that have a delimiter value or a special character present in the data. Otherwise, the Secure Agent does not apply the qualifier.
    • - All. Applies qualifier to all data.
  • - Code Page: Select the code page that the Secure Agent must use to read or write data.
  • Microsoft Azure Data Lake Storage Gen2 Connector supports only UTF-8. Ignore rest of the code pages.
  • - Header Line Number1: Applicable when you perform data preview. Specify the line number that you want to use as the header when you read data from Microsoft Azure Data Lake Storage Gen2. You can also read a data from a file that does not have a header. To read data from a file with no header, specify the value of the Header Line Number field as 0.
  • - First Data Row1: Applicable when you perform data preview. Specify the line number from where you want the Secure Agent to read data. You must enter a value that is greater or equal to one.
  • To read data from the header, the value of the Header Line Number and the First Data Row fields should be the same. Default is 1.
  • - Target Header1: This property is not applicable when you read data from a Microsoft Azure Data Lake Storage Gen2 target.
  • - Distribution Column1: This property is not applicable when you write data to a Microsoft Azure Data Lake Storage Gen2 target.
  • - maxRowsToPreview: This property is not applicable when you write data to a Microsoft Azure Data Lake Storage Gen2 target.
  • - rowDelimiter1: Character used to separate rows of data. You can set values as \r, \n, and \r\n.
Operation
Target operation. Select Insert. You can only insert data to a Microsoft Azure Data Lake Storage Gen2 target.
1Applies only to mappings.
2 Applies only to elastic mappings.
The remaining properties are applicable for both mappings and elastic mappings.
Note: When you use the Create Target option and specify an object name with extension that does not match the Format Type under Formatting Options, the Secure Agent ignores the format type you specified under Formatting Options.
For example, if you select Parquet format type and specify customer.avro in the object name in the Target Object dialog box, the Secure Agent ignores Parquet and creates an Avro target file.
The following table describes the advanced target properties for Microsoft Azure Data Lake Storage Gen2:
Advanced Target Property
Description
Concurrent Threads1
Number of concurrent connections to load data from the Microsoft Azure Data Lake Storage Gen2. When writing a large file, you can spawn multiple threads to process data. Configure Block Size to divide a large file into smaller parts.
Default is 4. Maximum is 10.
Filesystem Name Override
Overrides the default file name.
Directory Override
Microsoft Azure Data Lake Storage Gen2 directory that you use to write data. Default is root directory. The Secure Agent creates the directory if it does not exist. The directory path specified at run time overrides the path specified while creating a connection.
You can specify an absolute or a relative directory path:
  • - Absolute path - The Secure Agent searches this directory path in the specified file system.
  • Example of absolute path: Dir1/Dir2
  • - Relative path - The Secure Agent searches this directory path in the native directory path of the object.
  • Example of relative path: /Dir1/Dir2
    When you use the relative path, the imported object path is added to the file path used during the metadata fetch at runtime.
File Name Override
Target object. Select the file from which you want to write data. The file specified at run time overrides the file specified in Object.
Write Strategy
Applicable to flat files in a mapping and to all file formats in an elastic mapping.
If the file exists in Microsoft Azure Data Lake Storage Gen2, you can select to overwrite or append the existing file.
When you append data in a elastic mapping, the data is appended as a new part file in the existing target directory.
Block Size1
Applicable to flat, Avro, and Parquet file formats. Divides a large file into smaller specified block size. When you write a large file, divide the file into smaller parts and configure concurrent connections to spawn the required number of threads to process data in parallel.
Specify an integer value for the block size.
Default value in bytes for a flat file is 8388608 and maximum value is 104857600.
Default value in bytes for Avro and Parquet files is 134217728 and maximum value is 2147483647.
Compression Format
Compresses and writes data to the target. Select Gzip to write flat files. Select Snappy to write Avro, ORC, and Parquet complex files.
After you run a task, the target object name does not contain the extension .GZ or .Snappy.
You cannot write compressed JSON files.
Timeout Interval
Not applicable.
Interim Directory1
Optional. Applicable to flat files and JSON files.
Path to the staging directory in the Secure Agent machine.
Specify the staging directory where you want to stage the files when you write data to Microsoft Azure Data Lake Storage Gen2. Ensure that the directory has sufficient space and you have write permissions to the directory.
Default staging directory is /tmp.
You cannot specify an interim directory for elastic mappings.
You cannot specify an interim directory when you use the Hosted Agent.
Forward Rejected Rows1
Configure the transformation to either pass rejected rows to the next transformation or drop them.
1Applies only to mappings.
The remaining properties are applicable for both mappings and elastic mappings.

Specifying a target

You can use an existing target or create a target to hold the results of a mapping. If you choose to create the target, the agent creates the target when you run the task.
To specify the target properties, follow these steps:
    1. Select the Target transformation in the mapping.
    2. On the Incoming Fields tab, configure field rules to specify the fields to include in the target.
    3. To specify the target, click the Target tab.
    4. Select the target connection.
    5. For the target type, choose Single Object or Parameter.
    6. Specify the target object or parameter.
    Note: The Handle Special Characters option is not applicable to elastic mappings.
    7. Click Formatting Options if you want to configure the formatting options for the file, and click OK.
    8. Click Select and choose a target object. You can select an existing target object or create a new target object at run time and specify the object name.
    9. Specify Advanced properties for the target, if needed.

Target time stamps

When you create a target at run time in a mapping, you can append time stamp information to the file name to show when the file is created.
When you specify the file name for the target file, include special characters based on Linux STRFTIME function formats that the mapping task uses to include time stamp information in the file name. The time stamp is based on the organization's time zone.
You cannot append time stamp information to the file name in an elastic mapping.
The following table describes some common STRFTIME function formats that you might use in a mapping or mapping task:
Special Character
Description
%d
Day as a two-decimal number, with a range of 01-31.
%m
Month as a two-decimal number, with a range of 01-12.
%y
Year as a two-decimal number without the century, with range of 00-99.
%Y
Year including the century, for example 2015.
%T
Applicable only to flat files. Time in 24-hour notation, equivalent to %H:%:M:%S.
%H
Hour in 24-hour clock notation, with a range of 00-24.
%l
Hour in 12-hour clock notation, with a range of 01-12.
%M
Minute as a decimal, with a range of 00-59.
%S
Second as a decimal, with a range of 00-60.
%p
Either AM or PM.
Note: For complex files, instead of %T you can use the equivalent %H_%M_%S.

Microsoft Azure Data Lake Storage Gen2 target file parameterization

You can parameterize the file name and target folder location for Microsoft Azure Data Lake Storage Gen2 target objects and pass the file name and folder location at run time. If the folder does not exist, the Secure Agent creates the folder structure dynamically.

Microsoft Azure Data Lake Storage Gen2 target file parameterization through a parameter file

You parameterize the Directory Override and File Name Override advance target properties for a Microsoft Azure Data Lake Storage Gen2 target file using a parameter file.
To parameterize a Microsoft Azure Data Lake Storage Gen2 target file using a parameter file, create a Microsoft Azure Data Lake Storage Gen2 target object and add parameters in the target object name and target object path. Define the parameter that you added for the target object in the parameter file. Then, place the parameter file in the following location and run the mapping task:
<Informatica Cloud Secure Agent\apps\Data_Integration_Server\data\userparameters>
You can also save the parameter file in a cloud-hosted directory in Microsoft Azure Data Lake Storage Gen2.
You cannot use a parameter file if the mapping task is based on an elastic mapping.