So you have a 2 different domain, one is on on-prem and one is on AWS and now you need to compare the file from one of the DIS?
Could you please elaborate on the exact requirement? like what is that file? when you say compare, is it like 1 file is on the on-prem domain and 1 file is on AWS infa domain?, if yes, then it's not possible.
1 Domain that has 2 mrs, one of the mrs is 'on-prem' the other mrs is in AWS. We have a hybrid deployment of DEI/DEQ. We also have a hybrid deployment of EDC. Axon is deployed in AWS. We tried to reduce the network traffic by deploying a hybrid architecture.
We will be migrating a business area from using Data Flux to DEQ (for Data Quality checks) and they want to be able to compare a csv file to a parquet file in AWS (we know that we will have to leverage Athena or EMR).
What would be the best approach?
We are mostly profiling data and applying rules. The comparison of data between two sources is a future state capability that we are slowly working through.
The best approach would be to use the DIS running in AWS. reason I say that is so you dont incurr data egress charges of moving data out of AWS. Moving data in does not cost you anything.
Now a file setting on prem and a file setting in AWS can be done but you have to do some configuration on the server side. You need to create a mount point between the two servers which allows you to read the remote file system from either side. So the AWS DIS can read the on prem files just like they were sitting on the AWS EC2 instance.
You will have to leverage AWS Direct connect to establish the connections but can be done.
I had forgotten about AWS charging to take data out of AWS.
Thank you for the suggestion on creating a mount point between the EC2 instance and the On Prem file system. We used to have direct connect but went to something else.
Are there any other queries related to this post? If not, can you please mark it as answered?