10-11-2016 01:14 PM - edited 10-11-2016 01:26 PM
I need to do some clean-up on a few dim tables in our warehouse. It looks like our two columns that have the 'valid from date' and 'valide to date', which allow us to find the current record in a set of historical records, were not loaded correctly some time ago.
For example, we may have a record like follows:
ID Valid from Valid to
1 01JUN2007 01JAN5999
1 01JAN2004 01JAN5999
4 01JUN2012 01JAN5999
4 16AUG2014 01JAN5999
4 01JAN2010 30JUL2012
4 01DEC2013 0!JAN5999
Clearly in each of these cases, only one record should have a 'Valid To' value of 01JAN5999....know we've got to fix it. I hope I made it clear in the example that it varies the number of records an ID may have.
I was thinking if I could sort the the columns and retain the 'Valid From' record of the previous row, subtract 1 from the date, then I could have the correct 'Valid To' date.
Can this be done? Am I on the right path, or is there another approach?
10-12-2016 04:19 AM
10-11-2016 03:55 PM
I think it sounds like a feasible approach.
Unless your data is huge. Then you might want o filter out the rows with high date, and possible whole id groups, fix the necessary records separately, and do a table update/modify by.
Just be aware that processing your data this way, indexes & constraints will vanish, so you need to apply them to your fixed data.
10-11-2016 09:51 PM - edited 10-11-2016 09:54 PM
Assuming the valid from dates are correct and you don't have the same value more than once per ID then below code gives you the logic for re-alignment.
data have; infile datalines truncover; input id (Valid_from_dttm Valid_to_dttm) (:date9.); format Valid_from_dttm Valid_to_dttm date9.; datalines; 1 01JUN2007 01JAN5999 1 01JAN2004 01JAN5999 4 01JUN2012 01JAN5999 4 16AUG2014 01JAN5999 4 01JAN2010 30JUL2012 4 01DEC2013 01JAN5999 ; run; proc sort data=have out=have_sorted; by id Valid_from_dttm; run; data updates; set have_sorted; by id; format Valid_to_dttm_REALIGNED date9.; if not last.id then do; _ind=_n_+1; set want ( keep=Valid_from_dttm rename=(Valid_from_dttm=_Valid_from_dttm) ) point=_ind ; if Valid_to_dttm>_Valid_from_dttm then do; Valid_to_dttm_REALIGNED=_Valid_from_dttm-1; output; end; end; run;
If your data is stored in a data base and you'r dealing with large data volumes then I would implement the logic directly in-database (eg. using analytics functions).
What you can't re-construct are cases where a record had been expired but then a month later got re-activated (if that's possible with your data). So something like below couldn't get reconstructed:
1 01JUN2004 01DEC2006 1 01JAN2007 01JAN5999
10-12-2016 04:19 AM
10-14-2016 10:18 AM - edited 10-14-2016 10:19 AM
It did not create any new records, but did realign the open and close dates for a given ID.
We also had a few cases where there would only be a singular record (ID) but was closed. This should be open so that each ID has an open record. It opened those cases with the '01JAN5999' date.
I feared it may create new records as well, but was pleased to see it didn't.
Need further help from the community? Please ask a new question.