I know that this is probably not an answer you like to hear, but I'll share this thought nonetheless. And be it "only" so that you can tell us why you can't go this route (probably because real-time processing is mandatory in your use case, that's the most common answer to this suggestion).
I would change the whole process. Instead of processing the messages when they come in, I would simply "stage" them in e.g. a relational table T1. The key values of each message are saved to another table T2.
Now you separate your current mapping into two separate mappings.
The first mapping reads the messages and "stages" them in T1 and T2.
The second mapping sources T1 in combination with T2 (inner join) and processes these records; whenever one record has been written to all targets (or moved to some error target in case the data i nthis particular message were inacceptable), the respective message key is "removed" from T2.
This second mapping needs to be repeated more or less constantly in order to shorten the delay between the two mappings.
One additional remark about the "removal" of message keys from T2.
In terms of transparency of processing, it makes sense to use one of the two following approaches.
First you can set up a "flag" in T2; whenever a message key arrives from the message queue and is written to T1 and T2 (by the first mapping), this "flag" is set to "fresh" (e.g. value 1). Now whenever the second mapping finishes one single message from T1 and T2, this flag is set to "no longer available" in T2. Last point is to change the source filter for the second mapping such that it sources only those records from T1 and T2 where the flag in T2 has the value "fresh" (in order to avoid duplicate processing of any messages).
The advantage of this approach is transparency; if you not only add this "flag" to T2 but also add two timestamps ("read_when" and "processed_when", for example), then you can safely tell any auditor when each particular record has been read from the message queue (by mapping 1) and when it has been processed (by mapping 2).
Second you can indeed either "physically" remove those message keys from T2; in addition you may want to save them to another "full history table", but that's a different story.
Thanks for the reply Nico. Yes the requirement was to process the data as real time or Near real time hence we pursued the design that I explained. We just dump the data from MQ after parsing it through XML transformation to STG tables . Resume from last save point works fine for other failures as intended but not for the case where we get bad data (Invalid date etc) in which case the error msg and few other unprocessed good msgs are dequeued from queue and available only in GMD file which could not be processed until I remove the error record. To mitigate this I thought of creating another file parallely with the records read from the queue and if in case of such failures then to read it from that back up file after removing the error records which avoids data loss. Since I hadn't worked on failure recovery and GMD files before wanted to know if there is any other straight forward approach to handle this scenario. Thanks again
I'm not the one with the biggest experience in XML or message queue handling, but in general I fear there is no "easy" approach to what you need.
The point is that the processing of any record could go wrong. You can never tell in advance which records are "bad", you cannot even tell for which reason it will be rejected. There might, for example, occur some hardware failures which interrupt processing and kill all "complete" designs.
I've worked in one project for eight years where de-coupling the transfer and the processing of data was mandatory thanks to the unstable network infrastructure. So I do know quite some of the pitfalls that await you in such circumstances, and you have just encountered the "easy" one so far ("easy" means: easy to encounter during development phase, not only after three years in production use).
Basically you have several options here.
First you can do your best to make sure that the data in the message quere are always correct. This way your mapping wouldn't encounter this problem any longer. That is the best solution.
Second you can do your best to "optimise" and stabilise your mapping as far as possible. This way you can eliminate all potential problems one by one: each time you encounter one new problem, you cater for it. That is what you're currently trying to do. You can safely assume that in two or three years some new error will show up which will make it impossible for your mapping to process those records. Also you can safely assume that another two to three years later a similar situation with a new error cause will arise. Just my experience.
Third your organisation can accept that "staging" the data from the message queue and "processing" them must be de-coupled in order to get maximum stability and long-term maintainability. Not nice. And of course the near-real-time processing will no longer be available. But this is the only way (in my experience) to once and for all get rid of the general problem you're facing here: that neither are systems always stable, nor are data always correct, nor do software upgrades never show up new errors (during the past three months we found three new bugs in PowerExchange 10.4.1 which we were not aware of beforehand).
That's the decision your organisation has to make: what is more important:
- Stability and maintainability?
- Or the "nice-weather" implementation of some near-real-time processing which will always be in danger of getting jeopardised thanks to inherent system errors? And which will cause enormous efforts each time something goes wrong?
There is no option to manually edit the GMD/storage files to clear the bad records.
However, the bad records should be getting into the bad files in general.
This should allow you to load the messages read from the queue (that are removed and stored in the storage files) to be processed when the session is run in recovery mode.