2 Replies Latest reply on Feb 2, 2021 6:48 AM by hayes5736

    Parse fixed width file containing multiple record layouts

    hayes5736 Active Member

      I am new to DEI and not sure if what i want to do is possible.


      I have a scenario where I am getting a fixed width text file. Inside the file I have a header record, trailer record and 12 different fixed width formats.


      All 12 formats have a record type value in the same positions so I can consistently identify each of the 12 types. Then from record type to record type the number of fields and field names will differ.


      I am wondering if there is a way to parse this file with DEI as opposed to creating a pre-processing step to split the file out into the 12 various record types.


      I'm coming from a PowerCenter environment and what I've learned so far in DEI is incredible. The functionality in DEI is amazing compared to PowerCenter Designer, so I am really hoping this new tool can solve my problem.


      Thank you.

        • 1. Re: Parse fixed width file containing multiple record layouts
          Krishnan Sreekandath Seasoned Veteran

          Hello Hayes,


          Can you please tell me if you might be getting the multiple record file from mainframe ? If yes, then I think it would be easy to use the PWX NRDB reader using a datamap that has all the 12 record type layouts plus header and trailer and write it to HDFS/Hive.


          A successive job in Hadoop pushdown mode (or Native) can be used to process that data in HDFS/Hive, if needed. Please note that we cannot use NRDB sources in Hadoop pushdown mode.


          If not, I am afraid we might not have an easy way to parse the different types of records Natively or using just the PDO and the logic to separate/identify the 12 record types will have to be built into the mapping itself.


          • 2. Re: Parse fixed width file containing multiple record layouts
            hayes5736 Active Member

            I am getting the file from an external source not an internal mainframe source.


            I can consistently identify the record type in the same positions for every non header/footer row.


            Is there a way to dynamically route each record type through a different pipeline and somehow apply a control file to each pipeline?


            I know I can manually code a router transformation with 12 groups. Then from here, can I somehow apply the record format using some sort of control file? I would prefer not having to substring all of the fields for the 12 different formats.


            I'm also wondering if I could use a parser or data parser transformation? These transformations are new to me and I still don't understand them so I'm not sure if they could work for my use case.