3 Replies Latest reply on Mar 31, 2021 4:20 PM by Vasudevan TS

    How to handle comma and double quotes in data for a .csv output file

    Saurabh Shrivastava Guru

      Hello All,

       

      For my requirement, Source is a Hive database and Target is flat file(in .csv format). Now I want to handle both comma , and double qoutes and newline as well

       

      Below is the sample data that will be there in source Hive table

       

      and I want the output .csv file to look like this

       

      Can somebody please help me in achieving this ?

        • 1. Re: How to handle comma and double quotes in data for a .csv output file
          Syed Aziz Guru

          Hello Saurabh,

           

          When a flat file target (comma separated value - CSV) is opened in Excel, fields in the target file are not enclosed with double quotes.

          Example

          The source string is:

          "example"

          For this to be "example" when viewed in Excel it should be inserted into a CSV file with the following format:

          """example"""

          By default The PowerCenter session writes this as:

          ""example""

          In Excel, which does not use double quotes to enclose contents, it is interpreted as:

          example""

          Is there a way to do this in the mapping?


           

          Solution

           

           

          To do this use an expression to modify the value

          1. Add an expression transformation, before the target and create an output port for each string input port.
          2. The expression for each string output port should be of the format:
            CONCAT(CONCAT('""',input_port_name),'""')

            Example

          3. In the Workflow Manager, click Edit on the corresponding session task.
          4. Click Mapping Tab and select the target file.
          5. Under Properties menu, click Set File Properties option.
          6. Click Advanced tab.
          7. Select the Double under Optional Quotes.
          8. Click OK three times and save the workflow.

          Double quotes would be visible around the string fields, in the target csv file after performing the above steps.


           

          Additional Information

           

           

          According to the CSV specification, "If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote".


           

          Best regards,

          Syed

          • 2. Re: How to handle comma and double quotes in data for a .csv output file
            Nico Heinze Guru

            Double quotes can be used within text fields, but they have to be inserted differently than in your example (which cannot be processed by PowerCenter, it's that simple).

            In the flat file source definition, you can define a "quoting character" (often a backslash is used in similar cases); then each double quote within a text field (which must be quoted by single or double quotes, otherwise newline characters cannot be embedded in such fields) must be prepended by this "quoting character".

            In your example, assuming the quoting character is a backslash, the input file needs to look like this: (third line only)

             

            1999,Chevy,"Venture \"Extended Edition, Very Large\"",,5000.0

             

            If the creator of the file cannot change the file format, you will need some other method to read and parse the input file. For example, a shell script to perform the change described above. Or a Java transformation which parses the input file on its own without using features of the PowerCenter file reader.

             

            Regards,

            Nico

            • 3. Re: How to handle comma and double quotes in data for a .csv output file
              Vasudevan TS Guru

              Hi Saurabh,

               

              You can achieve this requirement using Inbuilt functions in the mapping.

               

              For 3rd field, check "" in the field string using INSTR. When it return 0, then no quotes otherwise concat "" to this field.

               

              For 4th field, Perform concat the double quotes for the string.

               

              Directly write into Target Flat file as comma delimited file.

               

               

              Thanks

              Vasu

              1 of 1 people found this helpful