rolling up dates with gaps

Accepted Solution Solved
Reply
SAS Employee
Posts: 7
Accepted Solution

rolling up dates with gaps

Hi SAS community

I am looking for some assistance with an interesting programming problem.  I am looking to create a data set that is unique by tin and ein and taking account the gaps with in the date variable. Below is an example of the problem.

data test;

input Tin $ year $ ein $;

datalines;

111111111 2000 222222222

111111111 2001 222222222

111111111 2002 222222222

111111111 2003 222222222

111111111 2004 222222222

111111111 2005 222222222

111111111 2006 222222222

333333333 2007 444444444

333333333 2010 444444444

555555555 1993 666666666

555555555 1994 666666666

555555555 1995 666666666

555555555 1999 666666666

555555555 2000 666666666

555555555 2001 666666666

555555555 2002 666666666

555555555 2005 666666666

555555555 2006 666666666

555555555 2007 666666666

555555555 2009 666666666

555555555 2008 666666666

555555555 2013 666666666

777777777  2008    888888888

777777777  2009    888888888

777777777  2010    888888888

777777777  2014    888888888

999999999  2000    101010101

999999999  2004    101010101

999999999  2005    101010101

999999999  2006    101010101

;

run;

The desire  output is to have vaild_from, valid_to ranges that take account the gaps in the data.

Tin                ein                valid_from                                                    valid_to

111111111  222222222    2000                                                            2006

333333333  444444444    2007-2007 2010-2010

555555555  666666666    1999-2002 2005-2007  2009-2013

777777777  888888888     2008-2010 2014-2014

999999999  101010101     2000-2000 2004-2006


Accepted Solutions
Solution
‎06-21-2014 05:32 AM
Super User
Posts: 10,023

Re: rolling up dates with gaps

Posted in reply to WoolieKyat

Your data is not look like what your purpose is .

data test;
input Tin $ year $ ein $;
datalines;
111111111 2000 222222222
111111111 2001 222222222
111111111 2002 222222222
111111111 2003 222222222
111111111 2004 222222222
111111111 2005 222222222
111111111 2006 222222222
333333333 2007 444444444
333333333 2010 444444444
555555555 1993 666666666
555555555 1994 666666666
555555555 1995 666666666
555555555 1999 666666666
555555555 2000 666666666
555555555 2001 666666666
555555555 2002 666666666
555555555 2005 666666666
555555555 2006 666666666
555555555 2007 666666666
555555555 2009 666666666
555555555 2008 666666666
555555555 2013 666666666
777777777  2008    888888888
777777777  2009    888888888
777777777  2010    888888888
777777777  2014    888888888
999999999  2000    101010101
999999999  2004    101010101
999999999  2005    101010101
999999999  2006    101010101
;
run;
proc sort data=test; by tin ein year;run;
data test;
 set test;
 by tin ein;
 if first.ein or dif(year) ne 1 then n+1;
run;
data temp(drop=n year first);
 set test;
 by n;
 length first $ 20;
 retain first;
 if first.n then first=year;
 if last.n then do; duration=catx('-',first,year);output;end;
run;
data want(drop=duration);
 set temp;
 by tin ein;
 length val $ 200;
 retain val;
 val=catx(' ',val,duration);
 if last.ein then do;output;call missing(val);end;
run;

Xia Keshan

View solution in original post


All Replies
Super Contributor
Posts: 644

Re: rolling up dates with gaps

Posted in reply to WoolieKyat

Your 'want' state is inconsistent.  Do you want separate valid_from and valid_to columns, as for the first row, or one extended value containing one or more date ranges as in the subsequent rows?

Richard

Solution
‎06-21-2014 05:32 AM
Super User
Posts: 10,023

Re: rolling up dates with gaps

Posted in reply to WoolieKyat

Your data is not look like what your purpose is .

data test;
input Tin $ year $ ein $;
datalines;
111111111 2000 222222222
111111111 2001 222222222
111111111 2002 222222222
111111111 2003 222222222
111111111 2004 222222222
111111111 2005 222222222
111111111 2006 222222222
333333333 2007 444444444
333333333 2010 444444444
555555555 1993 666666666
555555555 1994 666666666
555555555 1995 666666666
555555555 1999 666666666
555555555 2000 666666666
555555555 2001 666666666
555555555 2002 666666666
555555555 2005 666666666
555555555 2006 666666666
555555555 2007 666666666
555555555 2009 666666666
555555555 2008 666666666
555555555 2013 666666666
777777777  2008    888888888
777777777  2009    888888888
777777777  2010    888888888
777777777  2014    888888888
999999999  2000    101010101
999999999  2004    101010101
999999999  2005    101010101
999999999  2006    101010101
;
run;
proc sort data=test; by tin ein year;run;
data test;
 set test;
 by tin ein;
 if first.ein or dif(year) ne 1 then n+1;
run;
data temp(drop=n year first);
 set test;
 by n;
 length first $ 20;
 retain first;
 if first.n then first=year;
 if last.n then do; duration=catx('-',first,year);output;end;
run;
data want(drop=duration);
 set temp;
 by tin ein;
 length val $ 200;
 retain val;
 val=catx(' ',val,duration);
 if last.ein then do;output;call missing(val);end;
run;

Xia Keshan

SAS Employee
Posts: 7

Re: rolling up dates with gaps

Thank you very much for your response.   It gave me some ideas about some thing else I can try.  I was trying to create a solution that can be done in one data step because the tables I'm working with have over 250 million observations each.  I was trying to avoid the inefficiency of having to write out the tables multiple times.  The solution you provided gave me a thought I didn't consider before which is to output the ranges into one variable and not two variables valid_from and valid_to. 

Super Contributor
Posts: 644

Re: rolling up dates with gaps

Posted in reply to WoolieKyat

Here is a one step solution - apart from the sort, necessitated by one out of order record.  In a full solution you will want to ensure that data extracted from your RDMS is ordered properly.

Proc Sort Data = test ;

  by tin year ;

Run ;

Data want ;

  set test (rename = (tin=prev_tin year=prev_year ein=prev_ein)) ;

  length tin ein $ 10

            check_year valid_from valid_to $ 4

            valid_years $ 10

            ;

  by prev_tin prev_year ;

  tin = lag(prev_tin) ;

  year = lag(prev_year) ;

  ein = lag(prev_ein) ;

  retain valid_from valid_to ;

  if _N_ = 1 then

       do ;

            valid_from = prev_year ;

            return ;

       end ;

  else check_year = put(input(year, 4.) + 1, 4.) ;

  if (last.prev_tin OR prev_year NE check_year) then

       do ;

            valid_to = year ;

            valid_years = catx('-', valid_from, valid_to) ;

            output ;

            valid_from = prev_year ;

            return ;

       end ;

  keep tin ein valid_from valid_to valid_years ;

Run ;

Richard

Super User
Posts: 10,023

Re: rolling up dates with gaps

Posted in reply to WoolieKyat

one data step and multiple obs for a unique combination of them .

 
data test;
input Tin $ year  ein $;
datalines;
111111111 2000 222222222
111111111 2001 222222222
111111111 2002 222222222
111111111 2003 222222222
111111111 2004 222222222
111111111 2005 222222222
111111111 2006 222222222
333333333 2007 444444444
333333333 2010 444444444
555555555 1993 666666666
555555555 1994 666666666
555555555 1995 666666666
555555555 1999 666666666
555555555 2000 666666666
555555555 2001 666666666
555555555 2002 666666666
555555555 2005 666666666
555555555 2006 666666666
555555555 2007 666666666
555555555 2009 666666666
555555555 2008 666666666
555555555 2013 666666666
777777777  2008    888888888
777777777  2009    888888888
777777777  2010    888888888
777777777  2014    888888888
999999999  2000    101010101
999999999  2004    101010101
999999999  2005    101010101
999999999  2006    101010101
;
run;
proc sort data=test; by tin ein year;run;

data want(drop=lag_year dif_year);
 set test;
 by tin ein;
 retain from;
 lag_year=lag(year);
 dif_year=dif(year);
 if first.ein then from=year;
  else if dif_year ne 1 then do;
   to=lag_year;output; from=year;
 end;
 if last.ein then do;to=year;output;end;
run;

Xia Keshan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 274 views
  • 3 likes
  • 3 in conversation