2 Replies Latest reply on Dec 16, 2020 10:54 PM by Saurabh Shrivastava

    Read the latest .csv file from AWS S3 bucket

    Saurabh Shrivastava Guru

      Hello All,

       

      There is a bucket/folder in AWS S3 where .csv files comes on a daily basis. The naming convention of a .csv file is 2001DBBYLAST-YYYYMMDDHHMISSSSS.csv

       

      This .csv file contains a dot.(Eg: Serial No.) in few column names and for few columns, column name starts with a number(Eg: 2106 Ttl GP). While importing the .csv file into Informatica, I came to know that the column names having dot and column names that begins with a numeric digit will be skipped from source definition. So I asked the business to update the column names accordingly so that I can import the source into Informatica PC.

       

      Now the business person updated the file (names it as advent_test.csv, highlighted in below screenshot) and I imported this file as a source. When I executed the workflow, it is happening that only the content of this advent_test.csv file is being read every time. My requirement is that the latest file(in this case, it should be 2001DBBYLAST-20201211120422257.csv) should be read when we execute the session.

       

      While importing, I have specified the delimiter as comma, text qualifier as double quote.

      Can someone please help me in below

      1) How should I read the latest .csv file from the folder path ?

       

      In this case, since S3 is being used as a source, so we do not get much properties in session level, refer below screenshot

        • 1. Re: Read the latest .csv file from AWS S3 bucket
          Saurabh Shrivastava Guru

          Can someone please help me on my first question

          1) How should I read the latest .csv file from the folder path ?

          • 2. Re: Read the latest .csv file from AWS S3 bucket
            Saurabh Shrivastava Guru

            I have figured this out and posting my approach which worked absolutely fine

             

            Steps:

            1. Create a script Get_AWSS3_Files.sh at your local Infa_shared on Unix Server

            2. This script would be using 2 parameters as mentioned below

            3. Create a CONFIG_FILE_NAME.ini . This configfile.ini file should contain only the Access key & Secret Key details as mentioned below

            4. Create a command task in your work and specify the command as below

            sh <SCRIPTLOCATION>/Get_AWSS3_Files.sh $$AWS3_BUCKET_FOLDER_PATH $$INFA_DIR_PATH

            5. Kindly note that the BUCKETNAME (highlighted in bold in script) should contain only the bucket name without any folder name.

             

            --------------------------------Script starts here----------------------------------------

            #!/bin/bash

            #######################################################################################

            ### Parameters to Pass

            # $1 - AWS S3 Bucket Path Including folders, If any

            # $2 - INFA Directory Path

            #######################################################################################

             

            export AWS_CONFIG_FILE=CONFIG_FILE_DIRECTORY/CONFIG_FILE_NAME.ini

            aws s3 cp s3://BUCKETNAME/$(aws s3 ls $1 --recursive | sort | tail -n 1 | awk '{print $4}') $2

             

            --------------------------------Script ends here----------------------------------------

             

             

             

            --------------------------------Config file starts here----------------------------------------

            #######################################################################################

            #   Name: s3_config.ini

            #   Date:

            #   Author:

            #   Comments: AWS S3 Config File

            #   Last Updated:

            #######################################################################################

             

            [default]

            aws_access_key_id=********************************

            aws_secret_access_key=************************************

            # Optional, to define default region for this profile.

            region=us-east-1

            --------------------------------Config file ends here----------------------------------------