Whenever you import source definitions from Oracle into Informatica , the datatype will remain same as that in source .
If a column has datatype Varchar2(100) in Oracle , Source definition in Informatica for that source will also represent the same dattype i.e varchar2(100) .
However , As Informatica has its own set of datatypes , so Informatica has to convert oracle datatypes into Informatica compatible datatypes for further processing. In order to do the same , whenever you drag any source definition in a mapping it automatically creates a source qualifier which converts Oracle datatypes into suitable Informatica datatyupes.
In your case Varchar2(100) will be converted into string(100) in Source Qualifier and there are no chances of data truncation .
We have the Source and target with datatype varchar2(100 CHAR) , whereas when we import these sources and targets. It comes as varchar2 100. We do have unicode characters coming through these columns.
My question was, whether Informatica treats these columns as varchar2(100) of BYTE based or CHAR based as the sources itself. If it treats as varchar2(100) byte based. Truncation possible during ETL.
Just for referrence, here is the difference between varchar2( 100 CHAR) and default varchar2(100) byte based.
For example in
each character is coded on a single byte, so a varchar2(10) field would accept any 10
long word, whereas in UTF-16 where each character is coded on 2 bytes, the same field would only accept 5 characters long words.
In comparison, a varchar2(10 CHAR) would always accept 10 characters long words, be the character set ASCII or UTF-16 or
Hope you have got an idea on this issue.
Kinldy share it if you have.
Am also facing the same issue, But my case is little bit different,
Source having CHAR(10 BYTE)
Data is in source like A...Boxess(Continuous small boxes).....
Target having VARCHAR2(10 BYTE)
Data is in source like A (Boxes got truncated here)
Am using Code Page : UTF-8 encoding of unicode.
Any answers for this one....
This has been answered: a VARCHAR(200) will be imported into PowerCenter as a String(200), no matter whether it's a VARCHAR(200 byte) or VARCHAR(200 char).
This is an issue:
Source:varchar2(15) --> SQ: string(15) --> Target: varchar2(15)
Source and target are oracle codepage EE8MSWIN1250
Informatica is on UTF8
an string going from source to target is trimed do 15 bytes (not 15 characters)
So you are fine whet it is 'jshdeksialeksie' but if it is 'W_ŁALAYENDSJĄAP' you get ony half of string eg 'W_ŁALAYENDSJĄ' without 'AP'
Any special character gets 2 bytes
So what is wrong ?
Looks like answer here string is byte not char:
"This is because Informatica will treat or process the data in BYTES and not as CHAR. Even the NLS_LENGTH_SEMANTICS is set to CHAR on both Source and Target databases. While reading the data, Informatica reads as BYTES and hence the extended ASCII characters in the source data will occupy multibyte and since the Source Qualifier contains the port length as 1000, it will read only 1000 BYTES and the rest will be truncated. Therefore, running the length on the target table column will show 998 and dump will show 1000."
Not understand what is idea under preserving column types like varchar2 or char if it realy doesn't matter. It can be called cow or horse it is always 1 byte.
Anyway for my example only option that not mess with precision field is:
- Change Code Page of source and target in Connections to UTF-8
- switch from varchar2 to nvarchar2 and from char to nchar.
But this still stinks - oracle naming convention simply mislead that everything is just ok.
Not 100% correct.
PowerCenter always imports VARCHAR2 as VARCHAR2(CHAR), even though the DBMS has a VARCHAR2(BYTE) definition.
However, client settings may cause significant trouble here. There have been cases where for a VARCHAR2(20) the DB client only allocated a buffer of 20 bytes, not 20 characters, and so cut off strings when reading from / writing to such a table.
And this issue occurred regardless of whether the Integration Service was running in Unicode or ASCII mode.
In particular when the DB code page is UTF-8, such problems can easily arise. Take, for example, the German diacritic 'Ä'. In ISO 8859-*, this is one single byte. In UTF-8, it's two bytes. So if, for example, your VARCHAR2(5 CHAR) contains a value of 'Ärger' (anger), PowerCenter allocates 10 bytes for this string (2 bytes per character when running in Unicode mode), but the DB client allocates only 5 bytes, meaning the bytes for Ä, r, g, and e are stored and transferred from the DBMS to PowerCenter, but the trailing "r" will not be transferred (because the buffer allocated by the DB client is too small).
I do recall that several months ago an EBF has been requested for one German customer working with DB2. Honestly I don't have any idea (and can't check) whether this EBF has been produced; whether it only worked for DB2 or Oracle as well; and whether this EBF has - as is usual at Informatica - been moved into "production" code for later PowerCenter versions.
In such cases, please open a service request with Informatica Global Customer Support to have your particular case investigated. Maybe you have fallen victim to this "miscommunication" between PowerCenter and the underlying DB client.
I had this issue not long ago. The fix is not too difficult.
The main issue is with the Oracle client on your Informatica server. Whatever NLS settings are in use for that client will determine how the data is passed to Informatica. So your relational codepage settings need to be set according to your Oracle client settings, not according to the actual database code pages.
Second, if your Oracle client is using a multi-byte code page (e.g. UTF8), you will need to add the following to the Connection Environment SQL on all your Oracle relational connections:
ALTER SESSION SET NLS_LENGTH_SEMANTICS = 'CHAR'
The reason is that Oracle recommends databases be set up to use byte semantics by default (for compatibility with old scripts that assume it), so almost always the connection defaults to byte semantics, so when Informatica tells it a size 20 buffer it uses a 20 byte buffer instead of a 20 char buffer. The above tells the session to use char semantics and you will then be able to load data from a VARCHAR2(20 CHAR) column into a 20-character string port with no difficulty.
in my case that was no issue on oracle side only, needed to switch source as well as target.
Hi J G,
I'm not really sure what you are getting at. Everything I said in my post applies to the relational connection (to all your Oracle relational connections, in fact), which is used for both sources and targets.
Sorry to resurrect an old post like this. I am having a slightly different, but related problem.
Could you clarify one point? Here you state that PowerCenter will always treat source/target varchar2(x) as x char and not x byte. Although, in the KB that JG posted it states, "This is because Informatica will treat or process the data in BYTES and not as CHAR"
Given these two statements, should I assume that varchar2(x) in the PowerCenter source & target objects is always considered x char, while the Informatica datatype in the source qualifier string(x) is always treated as x bytes?
Edit: One more question -- If the PowerCenter datatype string(x) is always treated as x bytes, is there a way to change it to x char?
No, the statement is no longer 100% correct. In fact that depends on the Integration Service: if it's set up in MBCS mode (often also called "ASCII" mode which is not entirely correct), it will allocate one byte per character internally; in Unicode mode, each character will occupy 2 bytes internally.
"Internally" means that is how the Integration Service and the pmdtm processes handle single characters.
The DB client part is a different story, as explained earlier (and clarified by Sacha).
What does that mean for you?
That your question does not apply to the transformation part of a session but only (as pointed out by Sacha) to the DB client connectivity.
Does that answer your question? What is left unclear?
Yes, thank you for clarifying that.
In my particular case, I am trying to purposefully truncate clobs down to varchar2(4000), like so:
My problem is that I am getting rejected records with an Oracle error:
ORA-12899: value too large for column MEMO_PAD (actual: 4001, maximum: 4000)
I have confirmed the following:
- Integration service data movement mode is set to Unicode
- Both source and target NLS_CHARACTERSET is set to AL32UTF8
- Both source and target NLS_LENGTH_SEMANTICS is set to BYTE
- The target varchar2 is 4000 BYTE
I have found two workarounds:
- Set the size of the text field in the SQ to something < 4000. However, it appears the "correct" number is dependent on how many multi-byte characters are in the data. Therefore, I don't consider this acceptable.
- Set the size of the target varchar2 to 3999 in the PowerCenter object only--the physical table is still varchar2(4000 byte). This seems to work regardless of the data, i.e. I haven't yet found a case where I needed to set it lower than 3999. This is more acceptable, but still has a long-term maintenance cost that I'd prefer to avoid.
My overall question is: why is PowerCenter sending 4001 bytes of data to the target, when, to my understanding, it should be sending 4000?
Thanks for your time.