12 Replies Latest reply on Jul 8, 2021 9:56 AM by dan.smith@informatica.com

    PowerExchange Express CDC for Oracle Recovery and Restart

    Don Michie Active Member

      I've tried to research and not sure I understand the restart/recovery process fully for the PWX side of things.

      We have an Oracle database, a Linux box with Listener and Logger, and a Linux pair (PWC grid) running workflows.

      This is Express CDC for Oracle.

      It is my understanding that the logger and the listener processes run independently.

      It is also my understanding that this is the (rough) process:

      PWXPC Log Reader reads redo / archive logs

      This is passed to the logger

      The logger creates logger files which are then condensed (index, restart points, CDC data, etc.)

      The PWC workflow will contact the listener (independent process) to request change data 'from the logger files'

      With commit on checkpoint, the restart tokens are saved, and the process knows where in the 'logger files' to resume

       

      Now - for any reason, if the loggers fails (cold start, PWX node explodes, whatever) - there are restart tokens (available) in the PWXCCL.CFG file.   Cold start wipes out logger files, and with no special restart, begins reading at tip of the log.

       

      So for the failure above, if we wanted to recover --

      We've lost the logger files, and the restart points --

      The restart tokens for PWC points to locations in logger files, not REDO log

      If PWC requests new data based on current restart, it is GONE.

       

      Can we rebuild the logger files?

      Can the restart tokens in the PWC somehow translate to missing CDC files and then to missing REDO/ARCHIVE positions?

      Should we be backing up periodically the logger files and try to point restart in PWXCCL to an older/oldest point in the CDCT file to rebuild logger files/condense data to have an available restart position for PWC/workflows?

       

      I know it is a long question.   I guess I don't see the restart tokens in workflows relationship to restart positions in PWXCCL.

      And that is what is confusing me when I ask the question how do I recover from a PWX failure.

       

      Thank you.

        • 1. Re: PowerExchange Express CDC for Oracle Recovery and Restart
          dan.smith@informatica.com Guru

          PWXCCL restart tokens are not involved in PC session restart/recovery.

          PC session restart tokens are not involved in PWXCCL restart/recovery.

           

          That said, the token values are determined by PWX Capture, and put into the DTL__CAPXRESTART1 and DTL__CAPXRESTART2 hidden columns that are passed on each row/record image.

          Please note that the format of those tokens varies by source, and by capture method, and by whether or not PWXCCL is in use.

           

          You should never Cold Start PWXCCL unless you have no other choice.

          (Because, yes, it is kindof the nuclear bomb solution, with you at ground zero.)

          Warm Start PWXCCL.

           

          You can code RESTART_TOKEN and SEQUENCE_TOKEN in PWXCCL.CFG.

          They are overrides for Cold Start and Special Start, and are ignored for Warm Start.

          Yes, PWXCCL's restart tokens are saved in the CDCT each time that PWXCCL closes a CND file.

          When you Warm Start PWXCCL, it uses the latest tokens that were saved in the CDCT.

           

          PC sessions save restart tokens in two places.

          When a PC session "succeeds", it hardens the ending restart token values to its Restart Token file in its Restart Token folder.  That needs to be a unique name for each PC session object, so that they don't step on each other's tokens and cause problems.

          If a PC session uses Recovery Strategy == Resume from last checkpoint, then PC writes out tokens to PM_REC_STATE at each commit boundary it sends to the target.  Depending on the type of target, PM_REC_STATE takes different forms.

          - For truly Relational targets, it is a table named PM_REC_STATE in the target

          - For "Flat File writer" type targets, it is a file named PM_REC_STATE in the infa_shared/Storage directory (Please note that this includes things like Kafka and HDFS, that you might not think of as "flat files")

          - For micro-batch Pseudo-Relational targets (Teradata and Neteza are examples), it is a file in infa_shared/Storage

          - For MQ, it is a recovery queue

          - For JMS, it is a recovery topic

          For more details, please refer to chapters 6 and 9 of the PowerCenter Advanced Workflow Guide.

           

          If a PC session fails, warm start if possible.

          If you can't warm start, put token overrides in the relevant PC restart token file, and cold start the PC session.

          PC will send appropriate tokens to PWX Listener when requesting data.

           

          If PWXCCL fails, warm start if possible.

          If you can't warm start, you may or may not be able to Special Start.  If you can, do that.  Talk to Informatica PWX Support if you need help making that decision.

          If you can't Warm or Special Start, then you have no choice but Cold Start.

          Depending on the circumstances, you may be able to cold start with restart token overrides.  Talk with PWX Support to determine that, as each situation is different.  If you can, do that.

          If you can't cold start as of a specific token position, then you can cold start with no tokens, or with both token overrides set to 0.  Each of those does slightly different things, and what they do can vary by source.  Again, Talk with PWX Support.

           

          Please note that any cold start or special start (PC session or PWXCCL) potentially loses change data, so you may also need to rematerialize the targets from the sources, or you may need some reconciliation process to correct any missing data.  That really depends on how you are using PWX, and on what your business requirements are.  Again, Talk with PWX Support if you have questions.

           

          Depending on the source, you may be able to re-mine from the source to recreate CND files.

          For Oracle, that requires REDO/Archive log being available to re-mine.

          For DB2 LUW, that depends on whether or not you have done a DTLUCUDB SNAPSHOT or SQUISH since the point where you want to remine, and on whether or not the DB2 logs are available.

          For SQL Server, it depends on whether the change data is still in DISTDB.

          For data relogged by PWXCCL from mainframe sources, the mainframe PWX Archive Logs must still be available.

          For data relogged by PWXCCL from iSeries sources, the DB2i Journal Files must still be available.

          Etcetera.

          • 2. Re: PowerExchange Express CDC for Oracle Recovery and Restart
            Don Michie Active Member

            Thank you,

            In any production environment, we need at minimum a 'plan' for recovery from a complete fail of a node, in this case, PWX.

            So if we have a data loss of all logger/condense files, say hardware failure, and have to resume processing on an entirely new PWX standby node, I'll assume there is not a way to easily calculate a restart point for the logger (PWXCCL.CFG entries).

            We will most likely have the source change data in either redo or archive, but how to mine that out is to me a mystery.

            Being PWXPC/Express CDC for Oracle, I'll try to look for a separate 'sweep' type background process to pick up the logger file backups that occur during file switches to see if those can provide data, or at least some idea of a place to go back to the redo/archive logs, to rebuild CDC and resume processing in a 'reasonable' amount of time.  

            Thanks Dan!

            • 3. Re: PowerExchange Express CDC for Oracle Recovery and Restart
              dan.smith@informatica.com Guru

              If you are looking at a hardware failure at the disk/server level, including a loss of PWX_HOME and everything below it, then I would suggest nightly backups (either fulls or a weekly full with incrementals), and then this as a basic action plan:

               

              1) Restore PWX_HOME as of the latest backup

              2) Create a backup of CDCT using the available CND files

              PWXUCDCT using DERIVE_CDCT_BACKUP

              3) Restore the new backup to the CDCT

              PWXUCDCT using RESTORE_CDCT

              4) I would suggest running PWXUCDCT with REPORT_CDCT_FILES, as it should fail if the CDCT is bad.

              Also, it will show you the beginning and ending restart tokens for each CND file, in case you need to check them when restarting PC sessions

              5) Warm Start PWXCCL with no restart_token or sequence_token override

              6) Warm Start PC sessions

              - If all warm start, you're good to go.

              - If some fail because token values are not available, then you can look at the REPORT_CDCT_FILES to see what the oldest tokens are, and determine your risk scope, and then decide what to do about those specific tables.

               

              PWXUCDCT is doc'd in the PWX Utilities Guide.

              This KB may also help:

              Support

              • 4. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                dan.smith@informatica.com Guru

                OK, some caveats:

                 

                Because CND and CDCT are highly active, any backup is likely to occur while they are inflight.

                The CDCT is the most at-risk, as it is basically a giant index-and-status structure, so it gets updated fairly often.

                Recreating the CDCT from CND files guarantees integrity.

                 

                Also, the attempt to recreate the CDCT will fail if the latest CND file was a partial image when the backup ran.

                In that case delete the latest CND file, and redo the DERIVE_CDCT_BACKUP and you should have a clean set of CND and CDCT as a restart point for PWXCCL.

                • 5. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                  Brian Rinn New Member

                  Hi Dan,

                   

                  Firstly, I always appreciate your answers in this forum and learn a lot from them.

                   

                  I do have a follow-up question based on this statement you made above:

                     If a PC session fails, warm start if possible.

                     If you can't warm start, put token overrides in the relevant PC restart token file, and cold start the PC session.

                     PC will send appropriate tokens to PWX Listener when requesting data.

                   

                  We have a similar setup as the user in the original post (Express CDC for Oracle) and have the Recovery Strategy set to "Resume from Last Checkpoint."

                   

                  There is just one session task in our workflow. If the PC session fails for any reason, do we need to run the 'recovery' workflow? You mentioned to just warm start the PC session, if possible. Is running the workflow in recovery mode no longer needed? No longer suggested? Will warm starting the failed PC workflow re-sync the tokens between the recovery tables and PC token restart files?

                   

                  Thanks much...

                  • 6. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                    Bibek Sahoo Active Member

                    Hi,

                     

                     

                    Warm start should run fine, It is always a best practice to do the "Recover workflow"

                     

                     

                    By doing recovery, the Integration Service continues to process messages from the point of interruption and updates the restart token files to correct tokens, which must be empty files when the session failed.

                     

                    From Guide: If you enabled session recovery, you can recover a failed or aborted session. When you recover a session, the Integration Service continues to process messages from the point of interruption. The Integration Service recovers messages according to the real-time source.

                     

                    The Integration Service uses the following session recovery types:

                     

                    1. Automatic recovery. The Integration Service restarts the session if you configured the workflow to automatically recover terminated tasks. The Integration Service recovers any unprocessed data and restarts the session regardless of the real-time source.

                    2. Manual recovery. Use a Workflow Monitor or Workflow Manager menu command or pmcmd command to recover the session. For some real-time sources, you must recover the session before you restart it or the Integration Service will not process messages from the failed session.

                     

                     

                    Regards,

                    Bibek

                    • 7. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                      dan.smith@informatica.com Guru

                      I just realized that I never responded to this comment/question:

                      "We will most likely have the source change data in either redo or archive, but how to mine that out is to me a mystery."

                       

                      This is where you pick an SCN, convert it to restart token values, set the SEQUENCE_TOKEN and RESTART_TOKEN overrides in PWXCCL.CFG, and then cold start PWXCCL - but warm start the PC CDC sessions.

                       

                      Pick an SCN

                      - you probably want to restart as of some point in time

                      - you can ask your DBA what SCN corresponds to a given point in time (easy for them to find out, if the Oracle DB has "Flashback" enabled, less easy if not - but still very possible by looking at v$archived_log).

                       

                      Convert it to restart token values

                      - Restart token formats are not static across all releases, and we don't publish the format

                      - Raise a case with Support and ask them to provide tokens

                      - They will need the SCN, the DB ID, and the RESETLOGS ID (For Oracle - this varies by source)

                       

                      Code the RESTART_TOKEN and SEQUENCE_TOKEN overrides in PWXCCL.CFG

                      - just like it says

                      RESTART_TOKEN=TheRestartTokenValue

                      SEQUENCE_TOKEN=TheSequenceTokenValue

                      - Save PWXCCL.CFG

                       

                      Cold Start PWXCCL.

                      - PWXCCL Cold Start will destroy the existing CND and CDCT

                      - PWXCCL will start reading Oracle Archive Logs as of the SCN that you provided for the token creation

                      - The restart tokens embedded in each row-image will be the same as when they were first captured, so you can use the same PowerCenter restart point

                       

                      *Wait for PWXCCL to close at least one CND file, as that is when it will create a new CDCT*

                       

                      Warm Start the PC CDC sessions

                      - PC passes PWX the tokens from the last Commit to the target (based on PM_REC_STATE)

                       

                      This allows you to "re-mine" the existing Archive Logs with PWXCCL, and allows PC sessions to resume " normally" - no cold start, no chance of lost or duplicate data, etc.

                      • 8. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                        Don Michie Active Member

                        Sorry, just saw the reply, and thanks,

                        So the restart tokens for PC sessions 'will not' cause the logger to go 'back in time' in the redo log and pick up changes?  Correct?   Those tokens are pointing to condense files, and those would have to be rebuilt by setting tokens in the PWXCCL config file to produce that data, so the PC sessions would have the data their restart tokens would point to?   Sorry, a little confused on this topic of 'if the condense files are lost, where do PC sessions find the missing data - does the logger have to be configured to remine that data, or can the PC sessions tell PWXPC to go back in the redo logs to find it, or a combination of both.'.   This is of course a failover plan of a complete loss of PWX_HOME on the PWX node.

                        • 9. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                          dan.smith@informatica.com Guru

                          PowerCenter session restart tokens have nothing to do with PWXCCL.

                          PWXCCL restart tokens have nothing to with PC sessions.

                           

                          Basic process/data flow:

                           

                          PWXCCL starts as of some restart token value, based on checkpoint contained in CDCT.

                          PWXCCL instantiates capture.

                          PWX Capture captures data, collects change data for tables registered for capture, and sends committed UOWs contining such data back to PWXCCL.

                          PWXCCL writes out *CND* files, and updates the CDCT to show what sources had rows in each *CND* file.

                          PWXCCL updates checkpoint information in CDCT each time that it closes a *CND* file.

                           

                          When you restart PWXCCL, it restarts based on the restart tokens that it knows about, which are the ones from the checkpoints that it stored in the CDCT.

                           

                          Completely separately:

                           

                          PC sessions talk to PWX Listener and ask for data, passing a list of sources, with restart tokens for each source.

                          PWX Listener instantiates a PWX Listener subtask for each such session.

                          PWX Listener subtask looks in CDCT for each of those sources, and determines where to start reading, based on the tokens that PC session sent.

                          PWX Listener subtask reads *CND* files and sends relevant data to the PC session.

                          PC session updates restart tokens in the PM_REC_STATE table in the target.

                          PC session updates restart tokens in the PC restart token file when the PC session "succeeds".

                           

                          When you restart PC session, it restarts based on the restart tokens it knows about.

                          No impact to PWXCCL.

                           

                          If *CND* files are lost, then you have to tell PWXCCL to go back and re-mine Oracle Archive logs and recreate the missing *CND* files.  You do that by creating RESTART_TOKEN and SEQUENCE_TOKEN overrides, putting them in PWXCCL.CFG, and then cold starting PWXCCL.

                          Of course, that only works if the Oracle Archive logs in question still exist.

                           

                          Alternately, you can re-materialize the target (by whatever non-PWX method you choose), and then cold start PWXCCL without token overrides.

                           

                           

                          Please understand that you can implement PWX CDC multiple ways.  This covers the most common use case, but may or may not fit how PWX has been implemented in your shop.

                           

                          If you need someone to review your PowerExchange CDC environment and help construct a BCP/DR plan, please reach out to your Informatica Sales/Account Manager, or your Informatica Customer Success Manager, and ask about engaging Informatica Professional Services to do so.

                          • 10. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                            dan.smith@informatica.com Guru

                            If you have not already read these KB articles, then I would suggest doing so.

                             

                            PowerExchange Express CDC for Oracle: Components and Data Flow

                            https://kb.informatica.com/whitepapers/2/Pages/141248.aspx

                             

                            FAQ: Where should PowerExchange Express CDC for Oracle components run?

                            https://kb.informatica.com/whitepapers/2/Pages/141251.aspx

                            1 of 1 people found this helpful
                            • 11. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                              Don Michie Active Member

                              Thanks, that's what I was searching for.   I did not think that PowerCenter could send tokens that did not exist in CDCT, that would cause PWXCCL to 'dig back' into the Oracle logs to recover that data.   You've confirmed that PWXCCL has to be 'told' to go back and recover that data before the PowerCenter tokens could find it in the CDCT and condense.

                              We would need this method if there was a loss of the CND / CDCT, or the PWX node entirely.

                              Very much appreciated!

                              • 12. Re: PowerExchange Express CDC for Oracle Recovery and Restart
                                dan.smith@informatica.com Guru

                                You're welcome.

                                 

                                If PC sends restart tokens that are older than the oldest data from CND files, the default behaviour is that PWX will send PC an error response, and the PC session will fail.

                                 

                                It is possible to code a substatement on the "extract" (TYPE=CAPX) CAPI_CONNECTION in PWX Listener's DBMOVER.CFG to tell it to send the oldest data that it has, rather than failing.  However, that's normally a Really Bad Idea, as it allows for silent data loss.

                                There are specific business use cases where it might be appropriate, but we tend to discourage it in general.