SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

DI Studio - Holding Value Across Rows

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 89
Accepted Solution

DI Studio - Holding Value Across Rows

[ Edited ]

I need to do some clean-up on a few dim tables in our warehouse. It looks like our two columns that have the 'valid from date' and 'valide to date', which allow us to find the current record in a set of historical records, were not loaded correctly some time ago.

 

For example, we may have a record like follows:

 

ID      Valid from          Valid to

1        01JUN2007        01JAN5999

1        01JAN2004        01JAN5999

 

or

 

4        01JUN2012        01JAN5999

4        16AUG2014       01JAN5999

4        01JAN2010        30JUL2012

4        01DEC2013       0!JAN5999

 

Clearly in each of these cases, only one record should have a 'Valid To' value of 01JAN5999....know we've got to fix it. I hope I made it clear in the example that it varies the number of records an ID may have.

 

I was thinking if I could sort the the columns and retain the 'Valid From' record of the previous row, subtract 1 from the date, then I could have the correct 'Valid To' date.

 

Can this be done? Am I on the right path, or is there another approach?

 

Thanks!


Accepted Solutions
Solution
‎10-14-2016 10:05 AM
Super User
Posts: 5,424

Re: DI Studio - Holding Value Across Rows

Resort your data by Id descending valid_from_dt.
Use retain for bot from ant to dt.
And then assign a new to dt with "previous" from dt-1 when appropriate.
Another crazy idea is the set all high dates to missing and then process the data through the SCD Type 2 Loader.
Data never sleeps

View solution in original post


All Replies
Super User
Posts: 5,424

Re: DI Studio - Holding Value Across Rows

I think it sounds like a feasible approach.

Unless your data is huge. Then you might want o filter out the rows with high date, and possible whole id groups, fix the necessary records separately, and do a table update/modify by.

Just be aware that processing your data this way, indexes & constraints will vanish, so you need to apply them to your fixed data.

Data never sleeps
Frequent Contributor
Posts: 89

Re: DI Studio - Holding Value Across Rows

Thanks, but how do I retain the value from one row to then do a -1 day expression on the next to get the new close date?

Respected Advisor
Posts: 4,173

Re: DI Studio - Holding Value Across Rows

[ Edited ]

Assuming the valid from dates are correct and you don't have the same value more than once per ID then below code gives you the logic for re-alignment.

data have;
  infile datalines truncover;
  input id (Valid_from_dttm Valid_to_dttm) (:date9.);
  format Valid_from_dttm Valid_to_dttm date9.;
  datalines;
1 01JUN2007 01JAN5999
1 01JAN2004 01JAN5999
4 01JUN2012 01JAN5999
4 16AUG2014 01JAN5999
4 01JAN2010 30JUL2012
4 01DEC2013 01JAN5999
;
run;

proc sort data=have out=have_sorted;
  by id Valid_from_dttm;
run;

data updates;
  set have_sorted;
  by id;
  format Valid_to_dttm_REALIGNED date9.;
  if not last.id then
    do;
      _ind=_n_+1;
      set want 
        ( 
        keep=Valid_from_dttm 
        rename=(Valid_from_dttm=_Valid_from_dttm)
        )
        point=_ind
        ;

      if Valid_to_dttm>_Valid_from_dttm then 
        do;
          Valid_to_dttm_REALIGNED=_Valid_from_dttm-1;
          output;
        end;
    end;
run;

If your data is stored in a data base and you'r dealing with large data volumes then I would implement the logic directly in-database (eg. using analytics functions).

 

What you can't re-construct are cases where a record had been expired but then a month later got re-activated (if that's possible with your data). So something like below couldn't get reconstructed:

1 01JUN2004 01DEC2006
1 01JAN2007 01JAN5999

 

Solution
‎10-14-2016 10:05 AM
Super User
Posts: 5,424

Re: DI Studio - Holding Value Across Rows

Resort your data by Id descending valid_from_dt.
Use retain for bot from ant to dt.
And then assign a new to dt with "previous" from dt-1 when appropriate.
Another crazy idea is the set all high dates to missing and then process the data through the SCD Type 2 Loader.
Data never sleeps
Frequent Contributor
Posts: 89

Re: DI Studio - Holding Value Across Rows

Thanks @Patrick and @LinusH!

 

Actually blanking out all the values in that value_to_date column, then running through the SCD Type 2 loader did the trick.

 

Cheers!

Respected Advisor
Posts: 4,173

Re: DI Studio - Holding Value Across Rows

Sounds like a "hack". So using this approach you didn't create new records? And it really re-aligned all values also historical ones?

Frequent Contributor
Posts: 89

Re: DI Studio - Holding Value Across Rows

[ Edited ]

It did not create any new records, but did realign the open and close dates for a given ID.

 

We also had a few cases where there would only be a singular record (ID) but was closed. This should be open so that each ID has an open record. It opened those cases with the '01JAN5999' date.

 

I feared it may create new records as well, but was pleased to see it didn't.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 1377 views
  • 1 like
  • 3 in conversation