1 Reply Latest reply on Jul 9, 2021 3:39 AM by Nico Heinze

    Copy all contents of a file into Oracle table

    Brinjit Velu Active Member

      Hello,

       

           I have multiple files with CLOB data (organic molecular structure) to load into an Oracle table. All contents of these files (just 1 column), and the filename need to be loaded into Oracle table. We can make use of indirect file method to load the contents of these files. I would like to know which delimiter should we use in this case for both row and column.

       

      Regards,

       

      Brinjit

        • 1. Re: Copy all contents of a file into Oracle table
          Nico Heinze Guru

          A few things to say here.

           

          If you have only one single column in the source files, then you could define a field delimiter which doesn't exist in your data. Assuming these are indeed text files and nothing else, you could use the following delimiter:

              \037

          This is a so-called "octal notation" (a notation with the number base 8 instead of 10 for decimal numbers or 16 for hexadecimal numbers). It represents a special ASCII control character named Unit Separator (US). I haven't seen this in real use since at least the early 80's, so probably you should be safe to use this delimiter character.

           

          The next point is that your source files may contain more than one line of text. The approach above cannot make sure that those files are read in one single step. However, you can make sure that each file is read as one single string with the following "trick":

          Forward the current input file name plus the current input line to an Expression transformation.

          In this EXP, identify (using two variable ports) whether you're currently processing a new file (compared to the previous record) or whether the current line originates from the same input file as the previous input line.

          If it's from a new (resp. the very first) input file, set a variable port v_CompleteFile to the current input line and a variable port v_FileNo to "v_FileNo + 1" (that means: in case of a new file increase v_FileNo by 1).

          If it's from the same input file as the previous line, append it to the variable v_CompleteFile and don't change v_FileNo.

          Finally forward v_FileNo, the file name, and v_CompleteFile to an Aggregator transformation set to process Sorted Input.

          This AGG groups by v_FileNo and forwards the last input file name and the last CompleteFile port per input group (namely per input file) to the target.

           

          Unfortunately there's one caveat to keep in mind:

          PowerCenter has a maximum length for string ports of 100 MB. That is 52,428,799 characters if the Integration Service runs in Unicode mode and 104,857,599 characters if the Integration Service runs in ASCII mode.

          What does that mean?

          If any or your input files should be larger than 100 MB in size, then you have a problem because you cannot collect the complete contents of such files in PowerCenter into one single string value. No way.

          If that may be a real problem for you, then we need to think about a completely different strategy; PowerCenter is not the perfect tool for such extreme situations. It can be done, but it's not at all easy.

           

          Regards,

          Nico