4 Replies Latest reply on Jan 29, 2021 5:45 AM by Akilan Chandrasekaran

    Extract the date from Input file and generate output target file as per the extracted date

    Saurabh Shrivastava Guru

      Hello All,

       

      Please help.

      I am extracting the files from 3 different AWS S3 folders/sources inside a same bucket to the local infa_shared unix directory by using below shell script

      #####################################

      ### Parameters to Pass 

      # $1 - AWS3 Bucket Path

      # $2 - INFA Directory Path

      #######################################################################################

       

      export AWS_CONFIG_FILE=/data/Scripts/MDM_TO_DP/Scripts/s3_config.ini

       

      aws s3 cp s3://BUCKET/$(aws s3 ls $1 --recursive | sort | tail -n 1 | awk '{print $4}') $2

       

      #####################################

      Above script will fetch the latest files as per the timestamp into INFA directory path.

      I am calling this script via command task in workflow as below

      Command 1: sh /data/Scripts/MDM_TO_DP/Scripts/Get_CarSales_AWS3_File_Test.sh $$AWS3_BUCKET_CarSales_SRC1 $$CarSales_SRC_DIR_SRC1

       

      Command 2: ls -t1 $$CarSales_SRC_DIR_SRC1/ | head -n 1 >> $$CarSales_TGT_DIR/Advent/SRC1_SourceFile_List.txt

       

      Command 2 will create a SRC1 filelist that has been fetched from respective bucket. Similarly I have done for SRC2 and SRC3.

       

      After processing the file, I am generating the target file which should have the date appended to it. But the requirement is that the output file name should have the date part based on the one which is there is the filename.

      SRC1 filename is in format: 2001DBBYLAST-20210106120250759.csv

      SRC2 filename is in format: ABC_US_SALES-20210120-080548.csv

      SRC3 filename is in format: xyzpq_ims012021071423.csv

       

      Target file should have date part like fileName_YYYYMMDD.csv

       

      The approach that was suggested to me is as below

      Get the date part from the source File (01202021) and save this in the Dynamic Parameter file.

      While writing to target, Target file name should be derived from from dynamic parameter file.

       

      Also, one more requirement is that if a file has been processed than that file should not be processed again. For SRC1, last file came on Jan 06. After that no file came. So every time my script brings out the file of Jan06 and processes it again. I am archiving the processed file in Archived folder.

      Is there a way we can check if the file that is been fetched by the script can be searched upon in archive folder. If that file exists in archive folder, then our script should not fetch that file into infa_shared directory.