Hi Kiran, We have a similar situation .Did you find out any solution. Can u please share the information .It will help us .Thank you in Advance.
Even I am struggling to find a solution for the same complex xml structure @scuser1441
1 of 1 people found this helpful
Here are a few suggestions you can try for parsing XML and loading to a CSV file on HDFS -
1) Using Data Processor -
Mapping Flow: Source Flat File object for XML -> Data Processor -> Target Flat File with the Hadoop Connection to load the file on HDFS
a) Use Flat File Data Object for XML source.
b) Create the Data Processor using Wizard option.
i) Specify the Input format as XML and feed a sample or XSD for the type of XML data you would be expecting. This steps is important. You should feed the XML sample or XSD which exactly aligns with the data that is expected. In case of any differences in the structure, the Data Processor will not be able to parse the data and would fail.
ii) Select the Output format as Relational for CSV.
c) CSV is used as a comma separated Flat File Object for the target.
For loading these Flat File targets to HDFS, do the following -
i) Open this target Flat File Object and go to Advanced Properties.
ii) Scroll down to the "Runtime: Write" Properties. Make sure the Column Format is Delimited and Delimiter is Comma. (This is the default setting)
iii) Select the Connection as Hadoop File System.
iv) Another field will be displayed to select the Hadoop connection. Click on Browse and select the connection.
Refer to the following document -
2) Without using Data Processor: Using Complex File Data Object -
Complex File Data object has "Column Projection" capabilities for the hierarchical objects which can directly be loaded to the targets, especially relational based targets.
Mapping Flow: Column Projected Complex File Data Object for XML -> Flat File target for CSV
a) Create a Complex File Data Object with Resource Format as XML.
b) Open the Complex File Object and click on "Data Object Operations"
c) Select the Read Operation.
d) Go to Schema tab and check the "Enable Column Projection" checkbox.
e) Specify the Schema Format as XML
f) Select the schema under "Schema" field (Click on Browse and select the schema).
NOTE: Column Projected Complex File is supported only in Spark pushdown.
Refer to the following document for the more information -
g) Use this Compllex File Object for Read Operation in the mapping.
h) Right click on this Complex File Object in the Mapping workspace and click on "Create Target"
i) Select the Flat File object for the target
The ports will be auto connected.
j) Specify the Hadoop Connection for the 'Runtime:Write' in the Flat File Object. Make sure the Column Format is Delimited and Delimiter is Comma. (This is the default setting)
The Hive pushdown mode is depricated in the latest Informatica version. Therefore, I would not recommend using this execution mode.