Amazon S3 V2 Connector > Amazon S3 V2 sources and targets > Data compression in Amazon S3 V2 sources and targets
  

Data compression in Amazon S3 V2 sources and targets

You can decompress data when you read data from Amazon S3 and compress the data when you write data to Amazon S3.
The following table lists the supported source compression formats:
Compression format
Avro File
Binary File1
Delimited
JSON File2
ORC File
Parquet File
None
Yes
No
Yes
Yes
Yes
Yes
Bzip2
No
No
No
Yes
No
No
Deflate
Yes
No
No
No
No
No
Gzip
No
No
Yes
No
No
Yes
Lzo
No
No
No
No
No
No
Snappy
Yes
No
No
No
Yes
Yes
Zlib
No
No
No
No
Yes
No
1Applies only to mappings.
2Applies only to elastic mappings.
The remaining properties are applicable for both mappings and elastic mappings.
The following table lists the supported target compression formats:
Compression format
Avro File
Binary File1
Delimited
JSON File2
ORC File
Parquet File
None
Yes
No
Yes
Yes
Yes
Yes
Bzip2
No
No
No
Yes
No
No
Deflate
Yes
No
No
Yes
No
No
Gzip
No
No
Yes
Yes
No
Yes
Lzo
No
No
No
No
No
No
Snappy
Yes
No
No
Yes
Yes
Yes
Zlib
No
No
No
No
Yes
No
1Applies only to mappings.
2Applies only to elastic mappings.
The remaining properties are applicable for both mappings and elastic mappings.
Configure the compression format in the Compression Format option under the advanced source and target properties.
For the Avro, ORC and Parquet file formats, the support for the following compression formats are implicit even though these compression formats do not appear in the Compression Format option under the advanced source property:
Compression format
Avro File
ORC File
Parquet File
Deflate
Yes
No
No
Snappy
Yes
Yes
Yes
Zlib
No
Yes
No

Reading a compressed flat file

When you run a mapping to read a compressed flat file, you must upload a schema file and select Gzip as the compression format. Use the .GZ file name extension when you use the Gzip compression format to read a flat file.
    1. Select the required compressed flat file.
    2. Navigate to Formatting Options property field.
    3. Select Import from schema file option and upload the schema.
    The following figure shows a sample schema file for a flat file:
    {"Columns":[{"Name":"f_varchar","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_char","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_smallint","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_integer","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_bigint","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_decimal_default","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_real","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_double_precision","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_boolean","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_date","Type":"string","Precision":"256","Scale":"0"},{"Name":"f_timestamp","Type":"string","Precision":"256","Scale":"0"}]}
    4. Select Compression Format as GZIP from the advanced source properties.