4 Replies Latest reply on Jun 25, 2020 3:53 AM by krithika patali

    Mass Ingestion include audit columns

    Nick White Active Member

      Hi, if I am using Mass Ingestion to load data into a Hive target is it possible to create and load additional columns in the target that do not exist in the source?

      For example, I would normally want to include various audit columns in a target table (date loaded, sequence number, etc.)

      I'm wondering if the DDL Query statement can be used? Something like:

      CREATE TABLE {INFA_TABLE_NAME} ({INFA_COLUMN_LIST}, AUD_CREATED_DT TIMESTAMP DEFAULT CURRENT_TIMESTAMP())

       

      Thanks,

        • 1. Re: Mass Ingestion include audit columns
          puneeth natesha Active Member

          Hi Nick,

           

          In the MIS target section, you can change the DDL query based on your businees logic and run a code.

           

           

           

          Thanks

          Puneeth

          • 2. Re: Mass Ingestion include audit columns
            krithika patali New Member

            Hi Nick ,

             

            In the MIS target section, you can change the DDL query. However, if you add an extra column to the target DDL the Mass Ingestion spec/mapping would fail with error below

             

            2020-06-24 16:30:22.290 <LdtmWorkflowTask-pool-5-thread-4> INFO: The Integration Service started running the Hive query [TARGET_vh1s038w_query_3] with the query [INSERT INTO TABLE default.mas_hive_tgt2department SELECT w2390236100495356522_target_m_mapspec1.DEPTID as a0, w2390236100495356522_target_m_mapspec1.DEPTNAME as a1, w2390236100495356522_target_m_mapspec1.DESCRIPTION as a2 FROM default.w2390236100495356522_target_m_mapspec1] for the task [TARGET_vh1s038w].

              2020-06-24 16:30:23.220 <HadoopBatchDTM-pool-4-thread-1> SEVERE: [HIVE_1070] The Integration Service failed to run Hive query [TARGET_vh1s038w_query_3] for task [TARGET_vh1s038w] due to following error: Hive error code [40,000], Hive message [Error while compiling statement: FAILED: SemanticException Line 0:-1 Cannot insert into target table because column number/types are different 'mas_hive_tgt2department': Table insclause-0 has 4 columns, but query has 3 columns.], Hive SQL state [42000].

             

            This behavior is observed because the insert query is generated as per the columns that are passed from the source . So the additional column present at the target will not be honored for the insert. Hence the above error is thrown

            • 3. Re: Mass Ingestion include audit columns
              Nick White Active Member

              Hi - thanks for this, much appreciated. From what you have done it appears that you cannot add columns with arbitrary values that are defined at runtime but I wonder if it is possible to add columns with default values?

              As I don't have access to a system, would it be possible for you to try the same test but with DDL that looks something like:

              CREATE TABLE {INFA_TABLE_NAME} ({INFA_COLUMN_LIST}, AUD_CREATED_DT TIMESTAMP DEFAULT CURRENT_TIMESTAMP())

              and see if that works?

               

              Regards,

              • 4. Re: Mass Ingestion include audit columns
                krithika patali New Member

                Hi Nick,

                 

                No, it is not possible to add columns with default values as well. The reason being, the INSERT query generated does not have the list of columns passed and it would fail complaining the number of columns are different as I mentioned above.

                For example:

                INSERT INTO TABLE default.mas_hive_tgt2department SELECT w2390236100495356522_target_m_mapspec1.DEPTID as a0, w2390236100495356522_target_m_mapspec1.DEPTNAME as a1, w2390236100495356522_target_m_mapspec1.DESCRIPTION as a2 FROM default.w2390236100495356522_target_m_mapspec1;

                 

                If the INSERT query was generated something like this along with the list of columns it would have worked:

                INSERT INTO TABLE default.mas_hive_tgt2department(DEPTID,DEPTNAME,DESCRIPTION) SELECT w2390236100495356522_target_m_mapspec1.DEPTID as a0, w2390236100495356522_target_m_mapspec1.DEPTNAME as a1, w2390236100495356522_target_m_mapspec1.DESCRIPTION as a2 FROM default.w2390236100495356522_target_m_mapspec1;

                 

                But currently, in spark mode, the column list is not passed in the insert query.

                 

                Regards,

                Krithika