7 Replies Latest reply on Mar 7, 2013 11:32 AM by ashishprem

    Surrogate key in SCD

    Active Member

      Hi,

       

      My dimension table is SCD type2. I would like to know how do I create keys for it. Should it be unique number for each record? or should it be same number for same set of records.

       

      I have attached the examples in excel. Both ways are doable. There is no problem. But I would like to know what approach is best practise to use while creating surrogate key columns in SCD. in approach -2, the key column is not unique, do I need to maintain one new additional column which will just similar to unique autonumber to place the SK role. Please advice me...

       

      If I am going with approach 2 and having one additional column which will be unique, then in fact table, which column should I place, the logic grouping column or new column which is unique.

       

      let me know if you need more details or explain scenario in detail.

       

      Thanks,

      Ashish

        • 1. Surrogate key in SCD
          Active Member

          Hi Ashish,

           

          According to Kimball's design tips on SCD type 2, the first approach is better. He says that you should create completely artificial primary keys that are simply  sequentially assigned integers. You  must make a new surrogate primary key whenever you process a Type 2  change in a dimension.

           

          Regards,

          Ecaterina

          • 2. Surrogate key in SCD
            Active Member

            If we are using approach 1, then my fact table will have historical surroagte keys of dimension as well. When I pull records from fact along with dimension attrbutes, I will not be able to view latest records of dimension.. do I need to perform self join with SQL dimension again to get latest attributes of dimension?

            • 3. Surrogate key in SCD
              Active Member

              That is the purpose of SCD2: to keep history information. If you need to see just the latest records of your dimension, then you should go with a SCD 1. This means that you can override the value of 5000 for EMPID 123 with the new value of 7000.

               

              However, if you still need to see history, then you can use the column Status_flag which from my understanding tells you what is the current record.

              • 4. Surrogate key in SCD
                Active Member

                I want latest record in report... Yes I can go ahead with SCD1.. but there are some adhoc dimension reporting where user wants to track history, so I need to mainting history as well...

                 

                So in Approach1:

                when my fact record is loaded, I will surroaget key 1001 initially and later on updated dimension is loaded I will 1003.

                 

                When I perform fact join with dimension along with latest dim attributes, I wil not be getting correct result for surrogate key 1001 since it is deactivated.

                 

                if I use self join in reporting layer based on natural leys to get the latest dimension attributes, that might work.. is that solution available or something else? please advice.

                • 5. Surrogate key in SCD
                  Active Member

                  It depends on how you set the keys in your fact table. I was in the same situation and this was the approach my team and I considered:

                  first we had the employee dimension (I'll call it EMP) of type 1. Since the keys were not chaning in time, our fact tables were ok. We had to change EMP to SCD2. We used the first approach per Kimball design. We added two columns StartDate and EndDate to help us determine at a given point of time what were the attributes of an employee.

                  • So, when we need to check the current records of employee we add to our report a fitler: EMP.STATUS_FLAG = 1 (STATUS_FLAG gives the current record of an employee)
                  • For history we do not add any filter and bascially we get all the records in the dimension.
                  • To determine the attributes of an employee for a specific date (for example 01-FEB-2013) we add a filter in our report: EMP.StartDate >= 01-FEB-2013 and EMP.EndDate < 01-FEB-2013

                   

                  For joins with fact table, we changed the way we import the emp keys in fact tables. For a record, we took the key that was corresponding to that employee at that particular time. For example, I have a record in a fact table with an accounting date of 01-FEB-2013, the corresponding key of the employee we'll take it as:

                  EMP.StartDate >= 01-FEB-2013 and EMP.EndDate < 01-FEB-2013.

                  Why like this? For example, you have a fact table that calculates the taxes for an employee. For 2012 he had one salary, in 2013 he had another salary. So you might want to keep different keys in your fact table so you can determine for every transaction the corresponding attributes of the employee at that particular point in time.

                   

                  In the end it's all about the way you want to keep your keys in the fact table.

                  • 6. Surrogate key in SCD
                    Active Member

                    I choose to implement two key columns... one key SK will be unique number.. another key will be XREF key which will equivalent to composite natural key.. and in fact I will be referring SK column always...XREF key will be used if at anypoint latest view of dimension is required, otherwise SK column will give in in-point i.e history dimension attributes..

                     

                    thanks for your detailed explaination...

                    • 7. Surrogate key in SCD
                      Bhanu Prakash Putchakayalapalli New Member

                      Hi All,

                      As per my understanding, because of thse type requirements only, the SCD types are enhanced to SCD Type 4, SCD Type 6 etc. Some of the projects are implementing these techniques.

                      Please find the details with examples in below link :

                      http://en.wikipedia.org/wiki/Slowly_changing_dimension#Type_IV

                      There are many other sites which are providing these type of details.

                      All the Best.