-
1. Re: Finding Missing Dates Between range of dates
JanLeendert Wijkhuijs Nov 24, 2020 4:13 AM (in response to Priya Seth)Hi,
Basically it is a matter of having the full list of dates from e.g. a calendar table and outer-joining it with the data you want to compare.
I know it is very basic but this is the best option.
Hope this helpsRegards
Jan Leendert -
2. Re: Finding Missing Dates Between range of dates
Nico Heinze Nov 24, 2020 9:17 AM (in response to JanLeendert Wijkhuijs)An alternative to using a DATE table might be to use a Java Transformation which is fed with the start and end date; the JTX would produce each and every single day between these two dates as an output record, then you can perform an Outer Join with the real input data and let pass only those records from the JTX which do not have a counterpart in the real input data.
So basically the same as Jan suggested, but it doesn't need an additional table.
Regards,
Nico
-
3. Re: Finding Missing Dates Between range of dates
Priya Seth Nov 24, 2020 7:42 PM (in response to Nico Heinze)Hi Nico,
Thanks for your reply , but since I have never used Java transformation I have no idea how it works and how I will define logic in it. It would be really helpful if you can elaborate how to implement the logic in Java transformation.
Thanks
-
4. Re: Finding Missing Dates Between range of dates
JanLeendert Wijkhuijs Nov 25, 2020 4:50 AM (in response to Priya Seth)Hi Priya,
There is one other option which is a dirty one and a tricky one.
First of all you need to have the highest range of dates which is missing available lets say 5 days in a row.
You read the dates from the source and in an expression you create 9 additional output fields so the total will be ten.
date1 is the date from the source, date2 = date1 + 1 day, date3 = date1 + 2 days, date4 = date1 + 3 etc.
Next you need a normalizer to create 10 rows out of 1 with dates then you need to deduplicate dates in an aggregator and then you will have a list of dates which you can use against the source data and find the missing dates.
It is tricky because if in the above case the range of missing dates is exceeding the 9 days you will not see that date in your output.
Regards,
JanLeendert -
5. Re: Finding Missing Dates Between range of dates
Nico Heinze Nov 25, 2020 5:21 AM (in response to JanLeendert Wijkhuijs)And this point (that the EXP and NRM are bound to a certain maximum of repetitions) is the reason why I suggested the JTX approach.
About the JTX, here is the explanation, I hope it's understandable; whatever I leave unclear, please ask for details.
Create an empty active(!) Java Transformation (JTX).
On the Ports tab, you will see that the JTX has two groups of ports, namely the Input group and the Output group:
Now click on the Output group bar, then click the New Port button four times.
The first new port should be renamed i_Date_From (type Date/Time, input/output).
The second one should be renamed as i_Date_To (also Date/Time, input/output).
The third port should be renamed as o_Date (also Date/Time, output-only).
The fourth port should be named as o_Running_Number (integer, output-only).
Finally the Ports tab should look like this:
Now to the the Java Code tab. The window will look like this:
As you can see, beneath the editor window there are seven additional tabs Import Packages, Helper Code, and so on until Java Expressions.
Right now we only need to enter code on one tab, namely On Input Row.
Click on the On Input Row tab, remove the existing code in the editor window, and enter the following code there:
if (isNull( "i_Date_From"))
{ if (isNull( "i_Date_To"))
{ logError( "NULL input dates encountered, all output ports will be NULL.");
setNull( "o_Date");
setNull( "o_Running_Number");
generateRow();
}
else
{ o_Date = i_Date_To;
o_Running_Number = 1;
generateRow();
}
} // if input value "i_Date_From" is NULL
else
{ if (isNull( "i_Date_To"))
{ o_Date = i_Date_From;
o_Running_Number = 1;
generateRow();
}
else
{ long dateFrom = i_Date_From;
long dateTo = i_Date_To;
// Check that they are in correct order:
if (dateFrom > dateTo)
{ long auxMillis = dateFrom;
dateFrom = dateTo;
dateTo = auxMillis;
}
// Now create one output record for every day:
int Index = 1;
while ( dateFrom <= dateTo)
{ o_Date = dateFrom;
o_Running_Number = Index;
generateRow();
// Advance the date by one day:
dateFrom += 24 * 60 * 60 * 1000;
Index++;
}
} // both input values are not NULL
} // input value "i_Date_From" is not NULL
Below the editor window to the right there are three blue links named Sttings, Compile, and Full Code. Click on the Compile button. If the message window at the lower end doesn't show "Java code compilation successful", ask someone for help. For example myself, here.
After having compiled the Java code, click on Apply and OK, then save. Now you can use the JTX in each mapping where you are going to need it.
The code is pretty safe in terms of error handling:
If both input ports are NULL, then an error message will be produced in the log file, and the JTX produces NULL for the output date and the "running number".
If one of the two input ports is NULL, then the other value will be output as the "o_Date" with a "running number" of 1.
If both input ports are not NULL, then the JTX will create date/time values beginning with the earlier of the two input dates (the JTX sorts the input dates on its own), add one day to this value, and repeat the whole process until the end date has been reached or exceeded.
Any questions? Please ask.
Regards,
Nico
-
6. Re: Finding Missing Dates Between range of dates
Nico Heinze Nov 25, 2020 10:17 AM (in response to Nico Heinze)Oh, sorry, forgot to mention what the output ports mean:
The output port o_Date is the date/time value you want to get.
The running number will be incremented from 1 to difference (in days) between i_Date_From and i_Date_To. So, for example, if you input Jan 22nd, 2019, and Mar 10th, 2019, for i_Date_From and i_Date_To, then you will get a total of 48 records, one for each day from Jan 22nd - Jan 31st, Feb 1st - Feb 28th, and Mar 1st - Mar 10th; the running number will go from 1 to 48 for these dates.
Regards,
Nico
-
7. Re: Finding Missing Dates Between range of dates
Nico Heinze Nov 26, 2020 2:25 AM (in response to Nico Heinze)Bah, stupid me, I forgot one more thing:
You can pass as many ports through the JTX (as simple I/O ports in the Output group of ports) as you want. Just make sure one thing: as soon as you have dragged those additional ports into the JTX, you have to re-compile the Java code (click on the link Compile as explained above).
Regards,
Nico
-
8. Re: Finding Missing Dates Between range of dates
Priya Seth Nov 28, 2020 6:39 PM (in response to Nico Heinze)Thanks Nico for such detailed step by step implementation of the logic, it really easy to understand now.
Thanks for your help.
-
9. Re: Finding Missing Dates Between range of dates
Nico Heinze Nov 30, 2020 1:56 AM (in response to Priya Seth)Glad to hear that I was able to help.
Does it work as you need it? Or do you need anything else at this point?
Regards,
Nico