2 Replies Latest reply on Sep 15, 2020 3:43 PM by Jeremy Jensen

    PostgreSQL as Metadata Repositories

    Jeremy Jensen Guru

      I wanted to ask the forums if they have had the opportunity to test running jobs on a PowerCenter/IDQ installation using PostgreSQL as their metadata repository database tier.

       

      For small scale implementations, running PowerCenter and PostgreSQL on the same server could conceivably translate into savings on database tier licensing as well as savings on hardware. However, running PostgreSQL on the same server will compete with resources when running ETL or IDQ jobs.

       

      Is anyone looking to migrate to PostgreSQL? What kinds of considerations need to be taken into account for this to be successful?

        • 1. Re: PostgreSQL as Metadata Repositories
          Nico Heinze Guru

          Hi Jeremy,

           

          I can't reply about PostGres as a DBMS, but what I can respond to is your implicit question about having the repository DBMS on the PowerCenter server.

          Sorry for the long mail, it's my goal to explain my thoughts.

           

          As a matter of fact many partner companies (and consultants / sales consultants at Informatica) work with virtual machines running a DBMS plus PowerCenter / IDQ. That's for sure not perfect in terms of performance, but it's quite common and works well for smaller data sets.

          What is a small data set? Not easy to say. As always it depends, so I won't go into more detail here. No use. Personally I think there are more important aspects to ponder.

           

          What is more important is where data are stored. A (virtual or physical) machine running a repository DB plus PowerCenter / IDQ doesn't run a whole DWH. This is important because data are processed on that machine but transported from/to somewhere else. As data are stored somewhere else, this means that data are transported (usually) via network connections, and that's it; the machine doesn't have to run the DBMS, it doesn't have to organise the files holding those data, there's no need to update and store indexes, and so on.

           

          Talking about "storing a record in a DBMS" sounds simple, but the actual process is pretty complex and complicated. No matter whether it's a relational or a columnar DBMS, some network DB, a hierarchical database, or "only" a complex file structure: storing data on that machine is a big deal. So, the more work is associated to store / retrieve particular records, the more CPU power and RAM will be needed to perform these processes.

           

          From this point of view it seems logical that heavy DBMS work is "bad" for performance. If a DWH is stored on the same machine as a PowerCenter / IDQ installation plus repositories, then this machine very likely needs notably more resources to store and handle data. At least when compared to a machine which doesn't house a DWH.

          Obvious, I know, but it fits my personal experience:

          From 2005 until 2013 I've been working for a German government agency. The AIX machine held a DB2 instance plus PowerCenter. The machine was neither particularly weak nor particularly strong. The DB2 installation was used for the repositories as well as for the "DWH". The PowerCenter processes were extremely variable in terms of sizing and of complexity. But all tests we executed didn't show heavy performance impact by the repositories except when using versioned repositories (this does have severe impact on performance).

          After this I've worked for a German insurance company. They switched from HP-UX to Windows at that time, and in addition they moved the DWH to a separate machine. And that made a HUGE impact. Not only that the Windows machine had notably more RAM than the HP-UX server, it didn't have to run the DWH DBMS. The new machines outperformed the old installation by several factors (in some cases the total runtime decreased to <10%).

           

          Long story short end: if you run repository databases on the PowerCenter / IDQ machine, I am confident that the performance impact will be rather low (but will be higher if you enable version control). Just make sure that data are not stored on that machine (except in "plain" text files).

           

          Regards,

          Nico

          1 of 1 people found this helpful
          • 2. Re: PostgreSQL as Metadata Repositories
            Jeremy Jensen Guru

            Nico – thanks for the great response. You always bring great perspective to the questions here on the forums.

             

             

            You note: “From this point of view it seems logical that heavy DBMS work is "bad" for performance. If a DWH is stored on the same machine as a PowerCenter / IDQ installation plus repositories, then this machine very likely needs notably more resources to store and handle data. At least when compared to a machine which doesn't house a DWH.”

             

             

            Which makes a lot of sense. Another consideration with this kind of installation involves licensing – if we find that the server needs additional cores to support the DB tier for the given use case, we can’t just add cores and necessarily stay compliant with the license. We might find ourselves having to move the DB application off of the server anyway to improve performance.