12 Replies Latest reply on Jun 16, 2021 10:20 AM by Nico Heinze

    How to handle double quotes (") and comma (,) in a field of CSV

    MOHD ADNANUDDIN New Member

      Hi All,

       

      My source is csv and it is comma (,) delimited with optional double quotes ("). Some times in one field data might have double quotes (") or comma(,). By checking optional double quoutes I could handle comma in a string but it field has double quotes its creating an issue. Can any one help in handling this situation.

        • 1. Re: How to handle double quotes (") and comma (,) in a field of CSV
          EC140656 Seasoned Veteran

          Hi,

          Based on my understanding of your issue, I would suggest the following options :

           

          1) You could specify both comma(') & double quotes(") as delimiters when you are creating the source in the Source Analyzer in the wizard that Informatica will throw up. This way Informatica will parse the .CSV file with both comma & double quotes as delimiters.

          2) If you are having double quotes(") as part of the data, then it has to be handled differently. Is this your exact issue?

           

          Please provide more clarity so that we can discuss further for possible solutions.

           

          Regards

          Krishna

          • 2. Re: How to handle double quotes (") and comma (,) in a field of CSV
            Nico Heinze Guru

            An example might help.

            Be aware that "help" can mean two things here:

            1. we can tell you a workaround.

            2. it can't work at all because input data are too "dirty".

             

            Quite often during the past 17 years I have experienced similar cases where business users created CSV files which simply could not be imported cleanly. If text fields are enclosed by single or double quotes, then it is almost completely impossible to have these quoting characters within the text. No way. In such a way the only remedy is to make sure that the source of the CSV file changes their processes such that they deliver "clean" CSV files (which can be imported). Usually that's not a technical issue but an issue of business people not understanding technical necessities.

            At least that's my experience.

             

            Regards,

            Nico

            • 3. Re: How to handle double quotes (") and comma (,) in a field of CSV
              user140848 Guru

              I don't have much experience dealing with this on the source side, but I know for CSV targets there's a flaw in Informatica's handling of double-quoted fields which include double quotes themselves. According to the CSV spec, in such cases the internal double-quotes should themselves be doubled: thus the field text ~hello"~ should be written as "~hello""~" to the file. But Informatica doesn't do this, instead writing it as "~hello"~" which causes other tools to read the output incorrectly.

               

              If Informatica has a similar problem on the input side, the difficulty depends on what exactly it does when it encounters a double-quoted string with internal double-quotes. It might only have "" in the string when it should be ", which is not too hard to correct (though a string with the sequence ", will give you problems). But it might also treat the first internal double-quote it sees as the end of the string and then corrupt the rest of the line, or possibly several lines if it winds up thinking the end of line is inside a field.

               

              In this case you are not in a good situation, and your only reasonable recourse is to request for the data to be supplied with a different delimiter -- something outré that will never appear in your data; nonprintable control characters are good for this, but the pipe character | is also a popular choice. The tab character can work well too depending on your source data, but I have seen it crop up unexpectedly and cause hassles in my own projects.

               

              We could probably advise you in more detail if you could explain exactly what symptoms you are seeing, for a particular piece of source data that is causing trouble.

               

              Regards,

              Sacha Roscoe

              • 4. Re: How to handle double quotes (") and comma (,) in a field of CSV
                Nico Heinze Guru

                Sacha's response made me recall an addition: maybe the people producing the CSV file can create it with a "complex delimiter" (such as the character sequence ~|& ) instead of a comma as the delimiter character. If they can do so, then you can set up the file properties in the PowerCenter session such that you use this "complex delimiter" instead of a single delimiter character; this way they can make sure that their data don't get corrupted (they just have to choose a character string which will never occur in the data to be transported).

                This way the problem with the double quotes within text fields will be remedied immediately as well; they should no longer use any delimiter characters for text fields, the "complex delimiter" is good enough for that.

                 

                Regards,

                Nico

                • 5. Re: How to handle double quotes (") and comma (,) in a field of CSV
                  MOHD ADNANUDDIN New Member

                  Hi Krishna,

                   

                  Thanks for Replying.

                   

                  Yes source may have double quotes as part of data.

                   

                  Example:

                     

                  Column_1Column_2Column_3
                  A020USD_Payment3338.2
                  A021USD_Payment2035
                  A022INR_Payme"nt6675
                  A023KRW , JPY887643

                   

                  Data in Column 2 is eneterd by the user. SO sometime he may eneter double quotes(") or comma(,) as part of the data. But this double quotes should be handled(removed) by ETL in Informatica before passing data to target file.  I set session setting  as comma delimiter with optional double quotes and I don't knwo how to handle this situation.

                   

                  Reagrds,

                  Adnan

                  • 6. Re: How to handle double quotes (") and comma (,) in a field of CSV
                    MOHD ADNANUDDIN New Member

                    Hi Nico,

                     

                    Thanks for your response.

                     

                    I checked on regards of changing delimiter and its not easy as this is one of the many files being downloaded from cloud by one script. they dont want to change the format of files.

                     

                    Regards,

                    Adnan

                    • 7. Re: How to handle double quotes (") and comma (,) in a field of CSV
                      Nico Heinze Guru

                      Frankly spoken, I don't care whether they are willing to change the file extract script or not. They have to deliver files which can be read by a computer, not by a human being. And if they are not able to do so, you cannot read them. It's that simple: garbage in, garbage out. They deliver garbage, they get garbage.

                       

                      I know, this sounds pretty harsh and rude. But it's the naked truth.

                       

                      Regards,

                      Nico

                      • 8. Re: How to handle double quotes (") and comma (,) in a field of CSV
                        user140848 Guru

                        Hi Adnan,

                         

                        When the data has these extra characters, how exactly is it coming in the source file? Can you open the source file with a text editor (NOT Excel or any other spreadsheet type program) and show us some samples of what is produced when there are commas or double-quotes in the data, as well as a normal row for comparison?

                         

                        It might seem fussy but this in fact makes a big difference to whether Informatica can (with a bit of help) interpret the file properly, or at least well enough.

                         

                        Regards, Sacha

                        • 9. Re: How to handle double quotes (") and comma (,) in a field of CSV
                          Thomas Kennedy Seasoned Veteran

                          Mohd,

                           

                          I have run into a very similar situation and wonder how did you finally resolve your issue of double quotes and commas?

                           

                          Thank you,

                          Tom

                          • 10. Re: How to handle double quotes (") and comma (,) in a field of CSV
                            Nico Heinze Guru

                            Hi Thomas,

                             

                            could you please post a short example? That makes more sense than to just guess how things may look like for you.

                             

                            Regards,

                            Nico

                            • 11. Re: How to handle double quotes (") and comma (,) in a field of CSV
                              Thomas Kennedy Seasoned Veteran

                              Hello Nico,

                               

                              Thank you very much for your response. Below is and example of how I receive the source file:

                              The header is row 1 column A and the data starts in row 2 column A.

                               

                               

                              Best,

                              Tom

                              • 12. Re: How to handle double quotes (") and comma (,) in a field of CSV
                                Nico Heinze Guru

                                Hi Thomas,

                                 

                                are we talking about PowerCenter or IICS (this might make a difference)?

                                In PowerCenter you have three radio buttons under the header "Optional Quotes"; here you can choose None, Single, or Double.

                                In your sample it seems that the text fields can be enclosed in double quotes, so you simply have to choose Double for this source definition. That does not mean that each field must be enclosed in double quotes; it means that each field MAY start with a double quote, and if so, PowerCenter will continue to read up to the next double quote character.

                                And the field delimiter character is the pipe character.

                                That should do fine.

                                 

                                What did I misunderstand?

                                 

                                Regards,

                                Nico