13 Replies Latest reply on Nov 6, 2019 3:30 AM by Anand Rajakrishnan

    Parsing JSON file in Power Center

    Mani Ar Seasoned Veteran

      Hi,

       

      I have a requirement to pull data from client's API which spits the data in JSON format and we have to parse this data to convert it into RDBMS format to load into tables for further processing.

      I don't see informatica has a built in source qualifier for JSON file and not sure how to read JSON and process the file here.

      We use 10.2.6v

       

      Any help on this would really be appreciated.

       

      Thank you,

      Mani A

        • 1. Re: Parsing JSON file in Power Center
          Nico Heinze Guru

          I've never worked with JSON myself yet, but I do recall that most people who have written about such work here on the forum used Expression transformations to split up the JSON text into individual fields.

           

          Regards,

          Nico

          • 2. Re: Parsing JSON file in Power Center
            Veeru B Guru

            Hi Mani,

             

            You can try with java transformation to convert json to individual fields.

             

            Regards,

            Veeru

            • 3. Re: Parsing JSON file in Power Center
              Gokul Kumar New Member

              Hi Mani Kindly refer the below example, think would be helpful to you to parse json file using java transformation

              Sample JSON data set :

              [{"Name":"abc",

              "Age":"25",

              "Address":{"Address1":"abc","Address2":"xyz"}}

              ]

               

              step 1 : Put the file in a path and pass the path name with file name as a value in a port and provide it as input port to java transformation. (expression value : C:\\<filename>.json)

              Step 2: Java code for the above data set

              try {

              Object obj = new JSONParser().parse(new FileReader(File_name));

              JSONArray jo = (JSONArray) obj;

              for(int i = 0; i < jo.size(); i++)

              {

              JSONObject objects = (JSONObject)jo.get(i);

              String name = (String) objects.get("Name");

              String age = (String) objects.get("Age");

               

               

              JSONObject address = (JSONObject)objects.get("Address");

                          String address1 = (String)address.get("Address1");

                          String address2 = (String)address.get("Address2");

              Name=name ;//java output port should have port name as "Name"

              Age=age ;

              Address1=address1 ;

              Address2=address2;

              generateRow();

              }

                  }

               

               

              catch (Exception e) {

              logError("Unable to parse the file " + File_name);

              }

               

              Note : Download the JSON parser jar from internet and place it in informatica javalib path. "json-simple-1.1.jar" and refer the jar path in java transformation(in settings option).

              • 4. Re: Parsing JSON file in Power Center
                Shivam I Vats New Member

                Hi Gokul,

                 

                Thanks for this approach I was trying to execute what you had said but I am not able to understand the step 1 above. How should I put the file in a path and pass the path name with file name as a value in a port .

                 

                 

                Could you please elaborate this ?

                 

                Would be very thankful to you.

                 

                Regards

                Shivam Vats

                • 5. Re: Parsing JSON file in Power Center
                  Anand Rajakrishnan Active Member

                  If you have Oracle, you can do something like this to pass the file path:

                   

                  In the SQ, give the override SQL as: SELECT "<file path>" as DUAL

                   

                  Then route the output to the Java transformation and follow what Gokul has suggested.

                  • 6. Re: Parsing JSON file in Power Center
                    Nico Heinze Guru

                    A more generic way to feed a file name into a mapping is this:

                    Create a flat file source definition with one long field (e.g. a String(1000) and set the field delimiter character to \037 (yes, I mean it, this is the ASCII control character "Unit Separator" which I haven't seen in use at least since the early 1980's). This is the actual file name, you can join it into other data streams e.g. via a Joiner transformation.

                    For this flat file source, set the Input File Type in the session to Command (instead of File) and feed the file name into it using an "echo" command, like this:

                        echo $PMSourceFileDir\customers.csv

                     

                    That's enough.

                     

                    Regards,

                    Nico

                    1 of 1 people found this helpful
                    • 7. Re: Parsing JSON file in Power Center
                      Shivam I Vats New Member

                      Hi Nico,

                       

                      Thanks for your reply. Unfortunately I still can't access a file using the method you told. I think I did not understand how exactly I need to pass JSON as a source. I made a flat file and applied COLUMN delimiter as \037 and passed the Input File type command as you had suggested echo $ followed by the path but I am not getting the desired results.

                       

                      I have 2 questions for you.

                      A) The field name that we make in FF in the beginning should that be empty string?

                      B)Delimiter should be Row delimiter or Column or something else ?

                       

                      Please do let me know if you any other method in PowerCenter through which we can read the JSON file as a source.

                       

                      Thanks again.

                       

                       

                      Regards,

                      Shivam Vats

                      1 of 1 people found this helpful
                      • 8. Re: Parsing JSON file in Power Center
                        Nico Heinze Guru

                        Basically you have two choices, but in the end they both perform the same action, only in different ways.

                         

                        First way is to feed the name and path of the JSON file to PowerCenter, that means you will probably build some Java code to read and interpret that file. That's what Gokul has suggested.

                         

                        Second way is to read the JSON file as a flat file source file, line by line. Then you use some Expression transformation to parse each line, you save the contents of each field in some variable port, and finally you forward all data you've collected so far as output ports.

                        After the EXP you could (for example) use an Aggregator to make sure that only the final record from the EXP is forwarded (i.e. the complete set of all data fields in the JSON file).

                         

                        There's one variant to the second way. If you want to read and process multiple JSON files in one session run, then you should change the mapping at two places and the session in one place.

                        First you need to change the flat file Source Definition such that it contains the CurrentlyProcessedFileName port; there's a check box on the Properties tab (if memory serves me right) which you have to set. This makes sure that the flat file source definition contains a port which tells the name of the JSON file from which the current contents originate. You will need this file name later on.

                        Second you have to set up a few more variable ports and one additional output port in the EXP; this output port ultimately tells the AGG whether a new JSON file is now being processed (i.e. the currently processed source file name differs from the file name of the previous source record) or whether the current output record of the EXP still belongs to the same JSON file as the previous output record of the EXP.

                        Finally you have to change the session set-up; normally you enter a source file name, and that's it. But if you have to process multiple JSON files at once, then you should create a file list before the session actually starts (for example, using a pre-session command) and change the Input File Type from Direct (telling PowerCenter that the Source File Name indicates the file you want to process) to Indirect (telling PowerCenter that the Source File Name is a list file naming all the files to be processed).

                         

                        It really depends on what exactly you need to do AND whether the JSON files contain any arrays or "only" singular data elements. As long as you're dealing with singular fields, this combination of an EXP and an AGG works fine. But as soon as you have to work with arrays in JSON files, you will have to take different approaches.

                         

                        Final note: should I have succeeded in deleting ANY clarity in your mind about this task, please accept my apologies. I know that I often tend to abbreviate things heavily, which means that less experienced people often cannot understand what I mean (and that's not their fault but mine).

                        So please be frank enough to tell me if - after this post - you just feel like "Huh? What does this guy want from me? Help!!!!"

                         

                        Regards,

                        Nico

                        1 of 1 people found this helpful
                        • 9. Re: Parsing JSON file in Power Center
                          Shivam I Vats New Member

                          lol yeah I do feel like you told.( as I am a beginner) . But you are awesome ,I mean the depth of knowledge that you have is commendable. Thanks a lot Nico, for taking out time and writing this down. It really is not easy thanks again.

                           

                          However  the approach you suggested down here I was not able to figure this out:

                          Create a flat file source definition with one long field (e.g. a String(1000) and set the field delimiter character to \037 (yes, I mean it, this is the ASCII control character "Unit Separator" which I haven't seen in use at least since the early 1980's). This is the actual file name, you can join it into other data streams e.g. via a Joiner transformation.

                          For this flat file source, set the Input File Type in the session to Command (instead of File) and feed the file name into it using an "echo" command, like this:

                              echo $PMSourceFileDir\customers.csv

                           

                          I have the JAVA Code which works independently fine and can do what is required but when you use this code in Powercenter using Gokuls approach I can't seem to pass the FILE as a source . I really did not understand how to do that. If you  have any idea how it can be done (the first approach on your latest answer) please let me know.

                           

                           

                          It was so nice reading your answers though.

                           

                          Thanks and Regards,

                          Shivam Vats

                          1 of 1 people found this helpful
                          • 10. Re: Parsing JSON file in Power Center
                            Nico Heinze Guru

                            Just to make sure we're talking about the same things here:

                            You have a file name (including a file path) which you want to forward to the Java Transformation. Right?

                             

                            If so, then you might use these details below:

                             

                            Create a flat file source definition FILE_NAMES with one single long field (String( 4000) should be more than sufficient).

                            Now go to the Source Analyzer tool (the first of the five tools in the Designer), drag the source definition FILE_NAMES into the workspace, and edit it.

                            In the lower right corner, you will find a small box entitled "Flat file information". Make sure that Delimited is chosen (and not Fixed width), then click Advanced.

                            The first entry in this screen form is named Column Delimiters. Here you can define the character(s) which separate consecutive fields in a line from one another.

                             

                            And here lies the trick: you can not only enter any ASCII character here (such as a comma or a semicolon), you can even define your own delimiter character by indicating the ASCII code of this delimiter character.

                            As we all know, file names and paths can contain many silly characters. So we have to make sure that the character used here is never used in real life.

                            For this purpose I'm working with a special control character here, namely \037 (yes, you have to enter these four characters in the Column Delimiters field as one single character string). This notation (backslash followed by zero) indicates to PowerCenter that not these four characters are to be taken as field delimiters; instead this character string \037 is treated as the octal(!) ASCII code of the character to be used.

                            Now the octal number 037 equals the decimal number 31. This represents a certain control character in the ASCII character set, namely the control character "Unit Separator". I haven't seen this in use at least since the early 1980s, so I dare to suppose that no one ever uses this character in text files.

                             

                            What does that mean?

                            If you now enter this character string in the Column Delimiters field and click OK, then this source definition will use the Unit Separator character as a field delimiter.

                            Now this character simply doesn't occur in file names, meaning that each complete line will be read by the respective source qualifier as one single field.

                             

                            Now the next problem arises: usually you indicate the source file name in the session properties. However, here you want to get the actual file name.

                            That can be done like this:

                            Edit the session, go to the Mapping tab, and select the file source in the tree to the left.

                            In the right panel, there are three different sections, namely Readers, Connections, and Properties.

                            In the Properties section, you will (probably) see that the Input Type is set to File. This means that the named source file is read and forwarded into the mapping line by line.

                            However, here we don't want the file contents, here we want only the name of the file.

                            You can achieve this by doing this:

                            - Change the Input Type to Command.

                            - Press the Tab key.

                            - Now the fourth line below the Input Type has changed to Command.

                            - Enter the correct command to get the file name and path, like in this example:

                                echo  $PMSourceFileDir\customers.csv

                            (Of course you have to enter the correct file path here)

                             

                            The "echo" command simply outputs its arguments, in this case the complete file path. And this file path is then read by the Source Qualifier and forwarded to the Java Transformation.

                             

                            Does that clarify this point?

                             

                            Thanks for your kind words and all the best,

                            Nico

                            1 of 1 people found this helpful
                            • 11. Re: Parsing JSON file in Power Center
                              Nico Heinze Guru

                              Oh, one final word: PowerCenter is in many respects a pretty good and easy-going tool. However, there are really many things to learn about it, and some things are simply difficult to build.

                              I've been working with PowerCenter for >18 years by now, and I'm still learning new things each and every day.

                               

                              So, don't be frustrated, that's quite normal. Not only with PowerCenter but in the end with each and every tool which is complex. Just take a deep breath and ask again until you get the gist of it.

                               

                              Cheers,

                              Nico

                              1 of 1 people found this helpful
                              • 12. Re: Parsing JSON file in Power Center
                                Shivam I Vats New Member

                                Thanks Nico. Really appreciate what you have been doing. Thanks for the advice and yea every tool is learning experience.

                                 

                                Thanks for everything.

                                 

                                Cheers

                                Shivam Vats

                                • 13. Re: Parsing JSON file in Power Center
                                  Anand Rajakrishnan Active Member

                                  You can use a simple python script to convert json to CSV. Then use the csv as input to powercenter.