Hi @jhh197
This is a common problem when working with slowly changing dimension tables. You have a table with some information that is valid in a given time period from a Start Date to an End Date. This information is split over several rows, and you want to compress the rows into one row with the total time period. But gaps between time periods should be preserved, so the compression should only be done on rows with consecutive time periods.
It is difficult, if not impossible, to handle with SQL steps and SAS procedures, so the good old Data Step is the Swiss Army Knife here. The following example data + code is taken from a lecture I gave recently, and my native tongue is Danish, but the code should be understandable anyway, Afdeling is Departmenti English. It is based on an SQL step to order data and a Data Step with look-ahead through two Set statements to perform the compression.
* Testdata;
data medarbejder_historik;
infile datalines;
length ID 8 PersonID 8 Personnavn $34 Afdeling $60 Dato_Start 8 Dato_Slut 8;
informat Dato_Start Dato_Slut ddmmyy10.;
format Dato_Start Dato_Slut ddmmyyd10.;
input ID @7 PersonID @14 Personnavn $char16. @30 Afdeling $char42. @73 Dato_Start Dato_Slut;
datalines;
227 3144 Max Goldman Data Warehouse Team 01-06-2010 14-09-2010
493 3144 Max Goldman Data Warehouse 01-01-2020 31-03-2021
495 3144 Max Goldman Data Warehouse 01-04-2021 31-12-9999
497 6731 Bruce Banner Team 2 Økonomisk Styring og Analyse 01-06-2012 30-06-2012
665 6921 Raymond Babbitt Data Warehouse 17-11-2008 31-05-2010
671 3144 Max Goldman Data Warehouse Team 01-07-2011 31-07-2012
1032 6731 Bruce Banner ØSA Baglandet 01-09-2013 31-12-2014
1245 6921 Raymond Babbitt Data og Analyse 01-04-2013 31-08-2013
1355 7693 John Gustafson Data Warehouse 01-04-2021 31-12-9999
1665 7693 John Gustafson Data og Analyse 01-09-2013 28-02-2014
1680 3144 Max Goldman Data og Analyse 01-09-2013 28-02-2014
1761 6921 Raymond Babbitt Data Warehouse 01-01-2016 30-09-2016
1849 6731 Bruce Banner Team Økonomistyring Øst 01-01-2015 31-12-2015
1995 3144 Max Goldman Data og Analyse 01-04-2013 31-07-2013
2265 7693 John Gustafson Data Warehouse 01-04-2018 31-12-2019
2593 6731 Bruce Banner OKÆ Økonomi 25-05-1972 04-05-2009
2726 6731 Bruce Banner Data Warehouse 01-04-2018 31-12-2019
2872 6921 Raymond Babbitt Data og Analyse 01-09-2014 31-12-2015
2971 3144 Max Goldman Data Warehouse Team 01-08-2012 31-03-2013
2972 6921 Raymond Babbitt Data Warehouse 05-05-2017 10-05-2017
2987 3144 Max Goldman Data Warehouse Team 15-09-2010 30-06-2011
3433 6921 Raymond Babbitt Data Warehouse Team 01-06-2010 14-09-2010
3770 7693 John Gustafson Data Warehouse 01-02-2017 04-05-2017
3910 3144 Max Goldman Data Warehouse 05-05-2017 10-05-2017
3998 6731 Bruce Banner OKÆ Økonomi 05-05-2009 19-05-2009
4048 7693 John Gustafson Data Warehouse 01-01-2016 31-01-2017
4127 3144 Max Goldman Data og Analyse 01-03-2014 31-08-2014
4332 6921 Raymond Babbitt Data Warehouse 30-11-2016 31-01-2017
4398 6921 Raymond Babbitt Data Warehouse 01-01-2020 31-03-2021
4459 7693 John Gustafson IT-udvikling og løsninger 01-01-1970 16-11-2008
4534 3144 Max Goldman Data og Analyse 01-08-2013 31-08-2013
4724 7693 John Gustafson Data Warehouse 05-05-2017 10-05-2017
4757 3144 Max Goldman Data og Analyse 01-09-2014 31-12-2015
5066 6921 Raymond Babbitt Data Warehouse 11-05-2017 31-03-2018
5113 6921 Raymond Babbitt Data Warehouse 28-11-2016 29-11-2016
5223 7693 John Gustafson Data Warehouse 01-01-2020 31-03-2021
5238 7693 John Gustafson Data Warehouse 17-11-2008 31-05-2010
5316 6731 Bruce Banner Økonomi, Team Data og Ledelsesinformation 01-01-2017 30-04-2017
5429 6731 Bruce Banner Team 2 01-01-2012 31-05-2012
5585 7693 John Gustafson Data og Analyse 01-03-2014 31-08-2014
5687 6921 Raymond Babbitt Data Warehouse 01-04-2021 31-12-9999
5748 7693 John Gustafson Data og Analyse 01-04-2013 31-08-2013
5825 3144 Max Goldman Data Warehouse 01-02-2017 04-05-2017
5896 6731 Bruce Banner Data Warehouse 05-05-2017 10-05-2017
5903 6921 Raymond Babbitt Data Warehouse 01-02-2017 04-05-2017
6345 6731 Bruce Banner Data Warehouse 01-05-2017 04-05-2017
6650 3144 Max Goldman Data Warehouse 01-01-2016 31-01-2017
6795 6921 Raymond Babbitt Data og Analyse 01-09-2013 28-02-2014
6882 6921 Raymond Babbitt Data og Analyse 01-03-2014 31-08-2014
6899 6921 Raymond Babbitt Data Warehouse 01-04-2018 31-12-2019
7128 6921 Raymond Babbitt IT-udvikling og løsninger 01-01-1970 16-11-2008
7128 7693 John Gustafson Data Warehouse Team 01-06-2010 14-09-2010
7283 6731 Bruce Banner Data Warehouse 11-05-2017 31-03-2018
7384 6731 Bruce Banner Data Warehouse 01-01-2020 31-03-2021
8193 3144 Max Goldman Data Warehouse 17-11-2008 31-05-2010
8449 7693 John Gustafson Data og Analyse 01-09-2014 31-12-2015
8533 6921 Raymond Babbitt Data Warehouse Team 01-07-2011 31-03-2013
8711 6921 Raymond Babbitt Data Warehouse Team 15-09-2010 30-06-2011
9004 6731 Bruce Banner Data Warehouse 01-04-2021 31-12-9999
9216 6731 Bruce Banner Team Effekt og Analyse 01-01-2016 31-12-2016
9291 7693 John Gustafson Data Warehouse 11-05-2017 31-03-2018
9312 3144 Max Goldman Data Warehouse 11-05-2017 31-03-2018
9346 7693 John Gustafson Data Warehouse Team 15-09-2010 30-06-2011
9370 6731 Bruce Banner Økonomiteam 01-12-2009 31-12-2011
9469 3144 Max Goldman IT-udvikling og løsninger 01-01-1970 16-11-2008
9570 7693 John Gustafson Data Warehouse Team 01-07-2011 31-03-2013
9692 6731 Bruce Banner Team 2 Økonomisk Styring og Analyse 01-07-2012 31-08-2013
9700 6731 Bruce Banner OKÆ Økonomi og Borger 20-05-2009 30-11-2009
9767 3144 Max Goldman Data Warehouse 01-04-2018 31-12-2019
9767 3144 Max Goldman Data Warehouse 01-04-2022 30-07-2022
;
run;
/******************************************************************************/
/* Komprimering af disse testdata til et velformet historikdatasæt */
/******************************************************************************/
proc sql;
create table w1 as
select PersonID, Personnavn, Afdeling, Dato_Start, Dato_Slut
from medarbejder_historik
order by PersonID, Personnavn, Afdeling, Dato_Start, Dato_Slut;
quit;
data person_afdeling_renset (drop = thisgroupstart thisgroupslut NextID NextAfd NextStart NextSlut);
set w1 end=end_of_data;
retain thisgroupstart thisgroupslut;
by PersonID Personnavn Afdeling;
if not end_of_data then
set w1 (
firstobs=2
drop= Personnavn
rename=(PersonID=NextID Afdeling=NextAfd Dato_Start=NextStart Dato_Slut=NextSlut)
);
if first.Afdeling then do;
thisgroupstart = Dato_Start;
thisgroupslut = Dato_Slut;
end;
if not last.Afdeling then do;
if Dato_Slut < NextStart - 1 then do;
Dato_Start = thisgroupstart;
output;
thisgroupstart = NextStart;
end;
thisgroupslut = max(Dato_Slut, thisgroupslut);
end;
if last.Afdeling then do;
dato_start = thisgroupstart;
dato_slut = max(Dato_Slut, thisgroupslut);
output;
end;
run;
proc sort data=person_afdeling_renset; by PersonID Dato_Start;
run;
... View more