2 Replies Latest reply on Apr 24, 2017 5:06 PM by inuser463284

    IIR same physical server and mechanics of fuzzy matching?

    New Member

      There is an FAQ on the site saying that it is "recommended, [though not mandatory,] to install Informatica Identity Resolution (IIR) on the same physical server as the database being used".  If you have an option of splitting across 2 separate instances the source data that feeds the IDTs/IDXs and the part of the system that sends queries to IIR and receives ranked matches back, which of these does IIR need to get aligned with given this recommendation? I would've thought it's the former (to prevent spiking the CPU load on the main database), but then how much of a sink hole is the traffic when querying the IIR data (esp. running in the cloud)?

       

      In a similar vein, having read the design and developer guides, I am still strugging to understand where the bulk of the processing power is spent.  Let's say you have source data for grocery barcodes.  When building IDTs, we want to be able to indicate that barcode 1 and barcode 2 are both product X, but barcode 3 is product Z.  (I know this sounds a bit awkward, but trying to simplify really).  Then through the magic of IIR, IDXs are built.

       

      Then the main system wants to do fuzzy searching by sending barcodes to IIR and getting ranked matches to products.  In some cases we want to say "here's barcode 4, give me a match" and expect to get back "product X, 100%" (if barcode 4 = barcode 2) or "product X, 80%" (if barcode 4 is very similar but not exact to barcode 2).  In other cases we want to say, "here's barcodes 5, 6, 7; we know they're all the same product, we just don't know which one" and expect a similarly ranked match.

       

      Where does the heavy lifting actually happen?  Is IDT->IDX fairly vanilla in it just flattening out the data, but then matching and ranking is one in real time when the query is sent through?

       

      Thanks!

        • 1. Re: IIR same physical server and mechanics of fuzzy matching?
          purusottam nayak Guru

          Hi,

           

          This is something related to designing. I believe this needs to be discussed through a call to understand the exact scenario you are covering here. I would suggest you to open a case with Informatica support to have a detailed discussion on this.

           

          The Matching process is the CPU intensive process. The searching,matching and sorting is taken care in the memory and the matching process is the most expensive and cpu intensive process.

           

          Hope this helps.

           

          Regards,

          Puru

          • 2. Re: IIR same physical server and mechanics of fuzzy matching?
            New Member

            I am not at all sure that opening a call with Informatica is applicable here. I am asking the question because I am trying to understand the recommendation from the FAQs here.

            Basically, if you can have 2 scenarios:  source data for IDTs in the same physical database as where the matching queries originate from (which I suspect is less common a pattern) and source data for IDTs is in a different database. When the FAQs talk about IIR being on the same server, that's ambiguous in this scenario, so I'm trying to clarify it.

             

            ALso, unless I am misunderstanding, given how CPU-intensive matching is, it would seem like having IIR on a separate server from either database so not to keep spiking the DB CPU would be more advantageous. So what am I missing?