9 Replies Latest reply on May 18, 2011 2:05 AM by EC55882

    Code Page Unicode issue

    Seasoned Veteran

      Hi ,

       

      I have repository code page as UTF8 encoding for unicode.

      My source file  flat fle and in defination i have UTF8 encoding for unicode.

      source is oracle and the character set is AL32UTF8.

      We are using intregration service as IS_UNICODE.

       

      Code page validation is disabled for power center.

       

      Now the problem is there are some multibyte data in my source like â/Â for some mapping informatica is reading the charcter and converting it to junk value

      ? , ¿  and some times  it skips the charcter and don't read it. AFAIK unicode should be able to handle multibyte data.

       

      We can not change the encoding for DB. what are the possible ways tohandle this in informatica itself ?

      I also can not remove special charcater from the file as we need them.

       

      TIA

       

       

      Regards ,

       

      Digvijay Singh

       

        • 1. Code Page Unicode issue
          Active Member

          Can you try to load this data into a flat file first and convert the special characters/junk characters into readable format. In unix you should be able to use AWK to do this. If you need those special characters just as they are in the source, then sorry! this solution might not be of help to you.

           

          Charitha.

          • 2. Code Page Unicode issue
            Seasoned Veteran

            Hi Charita ,

             

            Thanks for the answer

            but the problem is if i manualy insert those character in using insert statement than they are visible in oracle. Oracle is supporting those charcter and displaying them . So they are readable in UTF8 encoding.

             

            Can any1 explain why INFA is not supporting these characetr 

            • 3. Re: Code Page Unicode issue
              Active Member

              Hi,

               

              So, oracle, as other database management systems, do conversion between oracle client code page and oracle server code page.

               

              PowerCenter communicates with oracle server through oracle client by target,source definition. In the PowerCenter source/target definition you can or even must specify code page of oracle client. The conversion UTF-8 - AL32UTF is done by oracle client and server interface. So set informatica target and oracle client to UTF-8.

               

              As far oracle National Language Support is concerned see link below:

              http://www.oracle.com/technetwork/database/features/globalization/nls-lang-099431.html#_Toc110410543

               

              I presume the code page for Integration Service and Source definition in Workflow Manager is set properly.

              I you are not sure try to read data from your flat file and input them to flat file with source code page set in Workflow Manager to UTF-8.

              Check if your target flat file is correct in terms of you dissepearing characters. If your characters does not exists in the target, you have problem before ORACLE.

               

              Regards

               

              Cezary

              1 of 1 people found this helpful
              • 4. Code Page Unicode issue
                Seasoned Veteran

                Hi Cezary ,

                 

                Thanks i will try this out and let you knw.

                 

                But i have 1 ques AL32UTF8  is a super set of UTF8. So we should have encoding for  DB  and informatica should be  same or the encoding for informatica can be super set for the DB ? I am not getting why informatica is behaiving in this way.

                 

                Regards ,

                 

                Digvijay Singh

                • 5. Re: Code Page Unicode issue
                  Active Member

                  Hi,

                   

                  Thanks for the questions.

                  The ORACLE UTF8 and UTF-8 is different code page. See please following URL:

                  http://oracleappstechnology.blogspot.com/2007/10/difference-between-utf8-and-al32utf8.html

                   

                  AL32UTF8 is equivalent of UTF-8 but not oracle UTF8. Oracle uses confusing names for code pages.

                   

                  Regards

                   

                  Cezary Opacki

                  • 6. Code Page Unicode issue
                    New Member

                    The special characters which you have mentioned (â/Â) can be read by using the following steps.

                    1)Open the flat file source definition in Source Analyzer in informatica.

                    2)In 'Table' tab  Click 'Advanced' (Present in right corner bottom)

                    3)In File Format-> Code Page-> MS Windows Latin 1(ANSI), superset of Latin 1. (Set the Code Page as mentioned)

                     

                    Please let me know if it is working fine or still there is some concern.

                     

                    Thanks,

                    Ranjan Mahapatra.

                    • 7. Code Page Unicode issue
                      Seasoned Veteran

                      Hi Rajan ,

                       

                      Thanks for the reply i am using MS Windows Latin 1(ANSI), superset of Latin 1 now and its working fine.

                       

                      Regards ,

                      Digvijay Singh

                      • 8. Code Page Unicode issue
                        New Member

                        I have the same issue. Please help me!!!!

                         

                        I set the input file to MS windows latin1. I can see the french character in preview data. I turned on verbose data and see the french characters just fine. But as soon as i loaded to the target table (oracle), everything became junk like what you said '?' or upside down '?'. May i ask what is your nls language, territory and characterset of your database? How about your informatica server (unix box) locale?

                         

                        Thanks.

                        • 9. Code Page Unicode issue
                          Seasoned Veteran

                          My Informatica Server have every thingh SET to utf-8 encoding of unicode

                           

                          I was facing this problem becuase of the ODBC connection used to load the DB have Code page set to MS Latin 1 i changed it to UTF -8 encoding of Unicode and the data was loading fine.

                           

                          you can check this property

                           

                          Workflow Manager --> Connections --> Relations --> Edit -->Code Page