1 Reply Latest reply on Jun 9, 2019 3:10 AM by user165569

    Facing issues when writing date values into Hive table

    Svetlozar Chervenkov New Member

      Hello Infa BDM Support Team,

       

      We are using Informatica BDM 10.2.2, Hortonworks - Hadoop 2.7.3., Hive 1.2. and we are facing the issue described below.

       

      Could you please let us know what could be the reason for the following behavior of Informatica dynamic mapping (running with Full PDO to Blaze) and how this can be prevented:

       

      Setup:

      • Data comes in the source CSV file, columns with date type come in format: YYYY-MM-DD(example:’”2009-02-26”’)
      • in expression we use TO_DATE() function for group of ports(identified in control - cntl) to convert the incoming character values to date values
      • in Hive table the column is described with ‘date’ type(example: ‘fixing_date,date,’)
      • source and target definitions are dynamic and resolved during the run, depending on control file(for the source) and Hive table definition(for the target)

       

      Result:

      • Informatica dynamic mapping runs and finishes successfully loading the data into the Hive External table (creates a parquet file)

       

      Issue:

      • When attempting to read data from the date columns in the Hive table and there is data for the returned records, we receive error:

           "Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.IntWritable cannot be cast to org.apache.hdoop.hive.serde2.io.DateWritable."

       

           Note: for the rest of the columns we don’t get this error

       

      Additionally during our testing we noticed that:

      • The same behavior is when we have date data in the CSV, in format: YYYYMMDD and Hive columns with type: ‘Date’
      • We don’t face this issue if we have date data in the CSV, in format: YYYY-MM-DD HH24:MI:SS.MS and target Hive columns with type: ‘timestamp’

       

      Additional technical details regarding the issue described above:

       

        1 – screenshot shows that column(“fixing_date”) was described as “date” in Hive table definition:

      HiveTableDef

       

      2 – screenshot shows that when parquet file(created by the run of the Informatica dynamic mapping) is open using Spark, the definition for the column(“fixing_date”) is “intiger”:

      ParquetFileColumnDef

      3 - screenshot shows that when parquet file is open using Spark, the data in the column (“fixing_date”) is displayed like integer value for the days since 1970-01-01. When those days are added to the 1970-01-01 we get the same value as the value in the source file:

      ParquetFileData

       

      4 – screenshot shows the data in the source file:

      SourceFileData

       

      5 - screenshot shows that when values(visualized using Spark) in parquet file are added to date ‘1970-01-01’, we get the same value as the value in the source file:

      DateConversion

       

      Thank You in advance for your help!

       

      Kind Regards,

      Svetlozar