1 Reply Latest reply on Oct 12, 2015 4:29 AM by Thirunavukkarasu Selvaraj

    VIBE data stream - use case for scheduled data collection

    New Member


      we are trying to read files on a schedule but only pull file if there is change in content or change in file name. Can we achieve this VIBE and if we can, doe it take a performance hit if we start doing this for 100s of sources?



        • 1. Re: VIBE data stream - use case for scheduled data collection
          Thirunavukkarasu Selvaraj Active Member

          Hello zihva,


          Yes, a VDS data flow with file source would read only the unread events. The node does it by maintaining the file source's last read position in a hidden file created under the same directory where the file source is. Every time the data flow is (re-)started, it continues from where it left off. To force the dataflow to read from the beginning, please delete its corresponding position file under .VDSPos directory (which is under the source file directory).


          To match multiple files in a directory, you could specify the file names using regular expression.


          As for performance, it depends on where the files are and what nodes run it. Based on your question, I assume all the files are on the same host. Each node is a separate process. It's better not to overload a single node process to read multiple file. Instead you could spread it across multiple nodes. Similarly, each file read is a disk I/O, a very performance intensive operation.


          I would recommend thorough testing before production deployment.