Amazon S3 V2 Connector > Amazon S3 V2 sources and targets > Amazon S3 V2 sources
  

Amazon S3 V2 sources

You can use an Amazon S3 V2 object as a source in a mapping, elastic mapping, or mapping task.
When you configure the advanced source properties, configure properties specific to Amazon S3 V2. You can download Amazon S3 V2 files in multiple parts, specify the location of the staging directory, and decompress the data when you read data from Amazon S3.
The following table lists the encryption type supported for various file types:
Encryption Type
Avro File
Binary File1
Delimited
JSON File2
ORC File
Parquet File
Client-side encryption
No
Yes
Yes
No
No
No
Server-side encryption
Yes
Yes
Yes
Yes
Yes
Yes
Server-side encryption with KMS
Yes
Yes
Yes
Yes
Yes
Yes
Informatica encryption
No
Yes
Yes
No
No
No
1Applies only to mappings.
2Applies only to elastic mappings.
The remaining properties are applicable for both mappings and elastic mappings.

Data encryption in Amazon S3 V2 sources

You can decrypt data when you read binary and flat file sources from Amazon S3.

Client-side encryption for Amazon S3 V2 sources

Client-side encryption is a technique to encrypt data before transmitting the data to the Amazon S3 server.
You can read a client-side encrypted file in an Amazon S3 bucket. To read client-side encrypted files, you must provide a master symmetric key or customer master key in the connection properties. The Secure Agent decrypts the data by using the master symmetric key or customer master key. When you use a serverless runtime environment, you cannot configure client-side encryption for Amazon S3 V2 sources.
When you generate a client-side encrypted file using a third-party tool, metadata for the encrypted file is generated. To read an encrypted file from Amazon S3, you must upload the encrypted file and the metadata for the encrypted file to the Amazon S3 bucket.
You require the following keys in the metadata when you upload the encrypted file:

Reading a client-side encrypted file

Perform the following tasks to read a client-side encrypted file:
  1. 1. Provide the master symmetric key when you create an Amazon S3 V2 connection.
  2. Ensure that you provide a 256-bit AES encryption key in Base64 format.
  3. 2. Copy the local_policy.jar and US_export_policy.jar files from the following directory:
  4. <Secure Agent installation directory>/jdk/jre/lib/security/policy/unlimited/
  5. 3. Paste the files in the following directory:
  6. <Secure Agent installation directory>/jdk/jre/lib/security/
  7. 4. Enable the client-side encryption in the Source transformation advanced properties.
  8. 5. Restart the Secure Agent.

Server-side encryption for Amazon S3 V2 sources

Server-side encryption is a technique to encrypt data using Amazon S3-managed encryption keys. Server-side encryption with KMS is a technique to encrypt data using the AWS KMS-managed customer master key.
Server-side encryption
To read a server-side encrypted file, select the encrypted file in the Amazon S3 V2 source.
Server-side encryption with KMS
To read a server-side encrypted file with KMS, specify the AWS KMS-managed customer master key in the Customer Master Key ID connection property and select the encrypted file in the Amazon S3 V2 source.
Note: You do not need to specify the encryption type in the advanced source properties.

Informatica encryption for Amazon S3 V2 sources

You can download a binary or flat source file that is encrypted using the Informatica crypto libraries in the local machine or staging location and decrypt the source files.
Informatica encryption is applicable only when you run mappings on the Secure Agent machine. To read a source file that is encrypted using the Informatica crypto libraries, perform the following tasks:
  1. 1. Ensure that the organization administrator has permission to Informatica crypto libraries license when you create an Amazon S3 V2 connection.
  2. 2. Select Informatica Encryption as the encryption type in the advanced source properties.
When you read an Informatica encrypted source file and select the Informatica Encryption as the encryption type, the data preview fails.
To preview the data successfully, select a dummy source file that contains same metadata present in the Informatica encrypted source file that you want to read. Enter the file name of the Informatica encrypted source file in the File Name advanced source property to override the file name of the dummy source file. Then, select Informatica Encryption as the encryption type in the advanced source property.
Note: When you use Informatica encryption in a mapping, you cannot decrypt more than 1000 files.

Source types in Amazon S3 V2 sources

You can select the type of source from which you want to read data.
You can select the following type of sources from the Source Type option under the Amazon S3 V2 advanced source properties:
File
You must enter the bucket name that contains the Amazon S3 file. If applicable, include the folder name that contains the target file in the <bucket_name>/<folder_name> format.
Amazon S3 V2 Connector provides the option to override the value of the Folder Path and File Name properties during run time.
If you do not provide the bucket name and specify the folder path starting with a slash (/) in the /<folder_name> format, the folder path appends with the folder path that you specified in the connection properties.
For example, if you specify the /<dir2> folder path in this property and <my_bucket1>/<dir1> folder path in the connection property, the folder path appends with the folder path that you specified in the connection properties in <my_bucket1>/<dir1>/<dir2> format.
If you specify the <my_bucket1>/<dir1> folder path in the connection property and <my_bucket2>/<dir2> folder path in this property, the Secure Agent writes the file in the <my_bucket2>/<dir2> folder path that you specify in this property.
Directory
You must select the source file when you create a mapping and select the source type as Directory at run time. When you select the Source Type option as Directory, the value of File Name is honored only when you use wildcard characters to specify the folder path or file name, or recursively read files from directories.
For the read operation, if you provide the Folder Path value during run time, the Secure Agent considers the value of the Folder Path from the advanced source properties. If you do not provide the Folder Path value during run time, the Secure Agent considers the value of the Folder Path that you specify during the connection creation.
Use the following rules and guidelines to select Directory as the source type:

Reading from multiple files

You can read multiple files, which are of flat format type, from Amazon S3 and write data to a target in a mapping.
You can use the following types of manifest files:

Custom manifest file

You can read multiple files, which are of flat format type, from Amazon S3 and write data to a target. To read multiple flat files, all files must be available in the same Amazon S3 bucket.
When you want to read from multiple sources in the Amazon S3 bucket, you must create a .manifest file that contains all the source files with the respective absolute path or directory path. You must specify the .manifest file name in the following format: <file_name>.manifest.
For example, the .manifest file contains source files in the following format:

{
"fileLocations":
[
{
"URIs":
[
"dir1/dir2/dir3/file_1.csv",
"dir1/dir2/dir3/file_2.csv",
"dir1/file_3.csv"
]
},
{
"URIPrefixes":
[
"dir1/dir2/dir3/",
"dir1/dir2/dir4/"
]
},
{
"WildcardURIs":
[
"dir1/dir2/dir3/*.csv"
]
}
],
"settings":
{
"stopOnFail": "true"
}
}
The custom manifest file contains the following tags:
You can specify URIs, URIPrefixes, WildcardURIs, or all sections within fileLocations in the .manifest file.
You cannot use the wildcard characters to specify folder names. For example, { "WildcardURIs": [ "multiread_wildcard/dir1*/", "multiread_wildcard/*/" ] }.
The Data Preview tab displays the data of the first file available in the URI specified in the .manifest file. If the URI section is empty, the first file in the folder specified in URIPrefixes is displayed.

Amazon Redshift manifest file

You can use an Amazon Redshift manifest file created by the UNLOAD command to read multiple flat files from Amazon S3. All flat files must have the same metadata and must be available in the same Amazon S3 bucket.
Create a .manifest file and list all the source files with the URL that includes the bucket name and full object path for the file. You must specify the .manifest file name in the following format: <file_name>.manifest.
For example, the Amazon Redshift manifest file contains source files in the following format:
{
"entries": [
{"url": "s3://mybucket-alpha/2013-10-04-custdata", "mandatory":true},
{"url": "s3://mybucket-alpha/2013-10-05-custdata", "mandatory":true},
{"url": "s3://mybucket-beta/2013-10-04-custdata", "mandatory":true},
{"url": "s3://mybucket-beta/2013-10-05-custdata", "mandatory":true},
]
}
The Redshift manifest file format contains the following tags:
url
The url tag consists of the source file in the following format:
"url": "<endpoint name>://<folder path>/<filename>", "mandatory":<value>
mandatory
Amazon S3 V2 Connector uses the mandatory tag to determine whether to continue reading the files in the .manifest file or not, based on the following scenarios:
By default, the value of mandatory tag is false.

Wildcard characters

When you run an elastic mapping to read data from a Delimited, Avro, JSON, ORC, or Parquet file, you can use the ? and * wildcard characters to specify the folder path or file name.
To use wildcard characters for the folder path or file name, select the Allow Wildcard Characters option in the advanced read properties of the Amazon S3 V2 data object.
? (Question mark)
The question mark character (?) allows one occurrence of a character. For example, if you enter the source file name as a?b.txt, the Secure Agent reads data from files with the following names:
* (Asterisk)
The asterisk mark character (*) allows zero or more than one occurrence of a character. For example, if you enter the source file name as a*b.txt, the Secure Agent reads data from files with the following names:
You can use the asterisk (*) wildcard to fetch all the files or only the files that match the name pattern. Specify the wildcard character in the following format:
If you specify abc*.txt, the Secure Agent reads all the file names starting with the term abc and ending with the .txt file extension. If you specify abc.*, the Secure Agent reads all the file names starting with the term abc regardless of the extension.
Rules and guidelines for wildcard characters
Consider the following rules and guidelines when you use wildcard characters to specify the folder path or file name:

Recursively read files from directories

You can read objects stored in subdirectories in Amazon S3 V2 elastic mappings. You can use recursive read for Delimited, Avro, JSON, ORC, and Parquet files. The files that you read using recursive read must have the same metadata.
To enable recursive read, select the source type as Directory in the advanced source properties. Enable the Recursive Directory Read advanced source property to read objects stored in subdirectories.
You can also use recursive read when you specify wildcard characters in a folder path or file name. For example, you can use a wildcard character to recursively read files in the following ways:

Source partitioning

You can configure fixed partitioning to optimize the mapping performance at run time when you read data from Delimited, Avro, ORC, or Parquet files. You can configure fixed partitioning only on mappings.
The partition type controls how the agent distributes data among partitions at partition points. With partitioning, the Secure Agent distributes rows of source data based on the number of threads that you define as partition.
Enable partitioning when you configure the Source transformation in the Mapping Designer.
On the Partitions tab for the Source transformation, you select fixed partitioning and enter the number of partitions based on the amount of data that you want to read. By default, the value of the Number of partitions field is one.
The following image shows the configured partitioning:
On the Partitions tab of the Source transformation, the partitioning type is Fixed and the number of partitions is set to 2.
The Secure Agent enables the partition according to the size of the Amazon S3 V2 source file. The file name is appended with a number starting from 0 in the following format: <file name>_<number>
If you enable partitioning and the precision for the source column is less than the maximum data length in that column, you might receive unexpected results. To avoid unexpected results, the precision for the source column must be equal to or greater than the maximum data length in that column for partitioning to work as expected.
Note: If you configure partitioning for an Amazon S3 V2 source in a mapping to read from a manifest file, compressed .gz file, or a read directory file, the Secure Agent ignores the partition. However, the task runs successfully.

Reading source objects path

When you import source objects, the Secure Agent appends a FileName field to the imported source object. The FileName field stores the absolute path of the source file from which the Secure Agent reads the data at run time.
For example, a directory contains a number of files and each file contains multiple records that you want to read. You select the directory as source type in the Amazon S3 V2 source advanced properties. When you run the mapping, the Secure Agent reads each record and stores the absolute path of the respective source file in the FileName field.
The FileName field is applicable to the following file formats:
When you use the FileName field in a source object, the Secure Agent reads file names and directory names differently for mappings and elastic mappings.
Feature
Mapping
Elastic Mapping
File name
xyz.amazonaws.com/aa.bb.bucket/1024/characterscheckfor1024
s3a://<bucket_name>/customer.avro
Directory name
<absolute path of the file including the file name>
s3a://<bucket_name>/avro/<directory_name>/<file_name>
Note: The FileName field in a source object uses the format with -, by default. For example, s3-us-west-2.amazonaws.com/<bucket_name>/automation/customer.avro.
To change the format for the FileName field to use ., set the JVM option changeS3EndpointForFileNamePort = true. For example, s3.us-west-2.amazonaws.com/<bucket_name>/automation/customer.avro.

Pushdown optimization

You can enable full pushdown optimization when you want to load data from Amazon S3 sources to your data warehouse in Amazon Redshift. While loading the data to Amazon Redshift, you can transform the data as per your data warehouse model and requirements. When you enable full pushdown on a mapping task, the mapping logic is pushed to the AWS environment to leverage AWS commands. For more information, see the help for Amazon Redshift V2 Connector.
If your use case involves loading data to any other supported cloud data warehouse, see the connector help for the applicable cloud data warehouse.