3 Replies Latest reply on Apr 16, 2019 1:53 PM by Shailesh Khuperkar

    EDC - Scan for Operational Metadata

    aparnachaparala New Member

      Hi Team,

       

      Would like to know if EDC can scan for the operational metadata for an asset like size of the file, when was it last accessed/updated, path of the file, when was the job last run etc. Is it possible to get this operational metadata in EDC through scanners?

       

      Regards,

      Aparna

        • 1. Re: EDC - Scan for Operational Metadata
          Shailesh Khuperkar New Member

          Hi Aparna,

                     From your query description, it looks like you want to scan file resides in file System.

           

          You can scan file system which will have files of below types.

           

          Extended unstructured formats : Use this option to extract metadata from file types such as audio files, video files, image files, and ebooks.

          Structured file types: Use this option to extract metadata from file types such as JSON, XML, text, and delimited files

          Unstructured file types: Use this option to extract metadata from file types such as Microsoft Excel, Microsoft PowerPoint, Microsoft Word, web pages, compressed files, emails, and PDF.

           

          EDC File system scanner usually extracts below information from these files:

          =>Name of the file

          =>Path of the File

          =>Size

          =>Encoding

          =>RowDelimiter

          =>Columns

          =>LastUpdatedOn

           

          Regarding "when was the job last run" mentioned in your query, can you please confirm if you are referring to EDC Scanner job here?

           

          Thanks,

          Shailesh

          1 of 1 people found this helpful
          • 2. Re: EDC - Scan for Operational Metadata
            aparnachaparala New Member

            Thank you Shailesh. We are looking to scan the files created/updated in HDFS through ingestion.

             

            Regarding "when was the job last run", yes I was looking for the EDC scanner job last runtime.

             

            Regards,

            Aparna

            • 3. Re: EDC - Scan for Operational Metadata
              Shailesh Khuperkar New Member

              Hi Aparna,

                        You can scan files on HDFS using EDC. But again EDC supports scanning only below types of files from HDFS.

               

              Extended unstructured formats. Use this option to extract metadata from file types such as audio files, video files, image files, and ebooks.-

              Structured file types. Use this option to extract metadata from file types such as JSON, Avro, Parquet, XML, text, and delimited files.-

              Unstructured file types. Use this option to extract metadata from file types such as Microsoft Excel, Microsoft PowerPoint, Microsoft Word, web pages, compressed files, emails, and PDF.

               

              If you want to scan files on HDFS which are created by EDC (after running scanner jobs), it may not be possible as those files won't be in readable format. Those files can only be read by EDC tools such as REST API calls or Catalog UI.

               

              Can you please elaborate your complete use case here for better understanding so that we can provide best possible information?

               

              Thanks,

              Shailesh