According to Kimball's design tips on SCD type 2, the first approach is better. He says that you should create completely artificial primary keys that are simply sequentially assigned integers. You must make a new surrogate primary key whenever you process a Type 2 change in a dimension.
If we are using approach 1, then my fact table will have historical surroagte keys of dimension as well. When I pull records from fact along with dimension attrbutes, I will not be able to view latest records of dimension.. do I need to perform self join with SQL dimension again to get latest attributes of dimension?
That is the purpose of SCD2: to keep history information. If you need to see just the latest records of your dimension, then you should go with a SCD 1. This means that you can override the value of 5000 for EMPID 123 with the new value of 7000.
However, if you still need to see history, then you can use the column Status_flag which from my understanding tells you what is the current record.
I want latest record in report... Yes I can go ahead with SCD1.. but there are some adhoc dimension reporting where user wants to track history, so I need to mainting history as well...
So in Approach1:
when my fact record is loaded, I will surroaget key 1001 initially and later on updated dimension is loaded I will 1003.
When I perform fact join with dimension along with latest dim attributes, I wil not be getting correct result for surrogate key 1001 since it is deactivated.
if I use self join in reporting layer based on natural leys to get the latest dimension attributes, that might work.. is that solution available or something else? please advice.
It depends on how you set the keys in your fact table. I was in the same situation and this was the approach my team and I considered:
first we had the employee dimension (I'll call it EMP) of type 1. Since the keys were not chaning in time, our fact tables were ok. We had to change EMP to SCD2. We used the first approach per Kimball design. We added two columns StartDate and EndDate to help us determine at a given point of time what were the attributes of an employee.
- So, when we need to check the current records of employee we add to our report a fitler: EMP.STATUS_FLAG = 1 (STATUS_FLAG gives the current record of an employee)
- For history we do not add any filter and bascially we get all the records in the dimension.
- To determine the attributes of an employee for a specific date (for example 01-FEB-2013) we add a filter in our report: EMP.StartDate >= 01-FEB-2013 and EMP.EndDate < 01-FEB-2013
For joins with fact table, we changed the way we import the emp keys in fact tables. For a record, we took the key that was corresponding to that employee at that particular time. For example, I have a record in a fact table with an accounting date of 01-FEB-2013, the corresponding key of the employee we'll take it as:
EMP.StartDate >= 01-FEB-2013 and EMP.EndDate < 01-FEB-2013.
Why like this? For example, you have a fact table that calculates the taxes for an employee. For 2012 he had one salary, in 2013 he had another salary. So you might want to keep different keys in your fact table so you can determine for every transaction the corresponding attributes of the employee at that particular point in time.
In the end it's all about the way you want to keep your keys in the fact table.
I choose to implement two key columns... one key SK will be unique number.. another key will be XREF key which will equivalent to composite natural key.. and in fact I will be referring SK column always...XREF key will be used if at anypoint latest view of dimension is required, otherwise SK column will give in in-point i.e history dimension attributes..
thanks for your detailed explaination...
As per my understanding, because of thse type requirements only, the SCD types are enhanced to SCD Type 4, SCD Type 6 etc. Some of the projects are implementing these techniques.
Please find the details with examples in below link :
There are many other sites which are providing these type of details.
All the Best.