BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
manya92
Fluorite | Level 6

Hi all, 

 

I need to create  a continous enrollment dataset from a dataset which has enrollment for all months across 2011-2017. The dataset I have looks  like this :

enrolid  dtstart                dtend

420       07/02/2011       07/31/2011

420        08/01/2011     08/31/2011

420       09/01/2011        09/30/2011

500       07/02/2011         07/31/2011

 

I only want a dataset looking like this:

 

enrolid    period_start  period_stop

420        07/01/2011    09/30/2011

 

How can i do this using retain and lag functions ?

There should be no gaps in the dates.. they should be continous

 

 

I am using Batch SAS 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Thank you @PaigeMiller.  I wasn't paying sufficient attention.

 

Now lag helps in detecting start of a new time span.

 

 

data want;
  set have;
  by enrolid;

  merge have (           keep=dtend  rename=(dtend=dtend_last))
        have (firstobs=2 keep=dtstart rename=(dtstart=dtstart_next));

  if first.enrolid or dtstart>sum(lag(dtend),1) then dtstart_beg=dtstart;
  retain dtstart_beg;

  if last.enrolid or dtstart_next>dtend+1;
  drop dtstart dtend dtstart_next;

run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

17 REPLIES 17
novinosrin
Tourmaline | Level 20
data have;
input enrolid  dtstart   :mmddyy10.             dtend   :mmddyy10.  ;
format dtstart             dtend   mmddyy10.  ;
cards;
420       07/02/2011       07/31/2011
420        08/01/2011     08/31/2011
420       09/01/2011        09/30/2011
500       07/02/2011         07/31/2011
;
 

proc sql;
create table want as
select enrolid, min(dtstart) as period_start   format= mmddyy10. ,max(dtend) as period_stop  format= mmddyy10.
from have
group by enrolid;
quit;
manya92
Fluorite | Level 6

I am new to sql could you show how to do this using SAS retain and lag functions ?

novinosrin
Tourmaline | Level 20
data want;
set have;
by enrolid;
retain _t;
if first.enrolid then _t=dtstart;
if last.enrolid then do;period_start=_t;
period_stop=dtend;output;end;
format period_start period_stop mmddyy10.;
drop dt: _:;
run;

Note: This assumes your dataset is sorted  

Astounding
PROC Star

Are you guaranteeing that there are no gaps in the dates? 

 

If there are gaps in the dates would you want a second observation for the same ID?

manya92
Fluorite | Level 6

There are gaps in the data.

if the gap is more than a day we have another row for the same enrolid..

 

manya92
Fluorite | Level 6

There are gaps in the dates but whenever they are there we have another row for the enrollment id. gap means more than 1 day of no enrollment 

PaigeMiller
Diamond | Level 26

I don't see a need for RETAIN here.  

Assumes data is sorted by ENROLID and DTSTART.

*** UNTESTED CODE ***

 

data want;
    set have;
    by enrolid;
    prev_dtend=lag(dtend);
    if dtstart ^= prev_dtend+1 then continuous_dates+1;
run;
proc summary nway data=want;
    class enrolid continuous_dates;
    var dtstart dtend;
    output out=want2 min(dtstart)=period_start max(dtend)=period_stop;
run;
    

 

--
Paige Miller
mkeintz
PROC Star

You don't need a lag for this task.  But retain is useful:

 

 

data want;
  set have  (rename=(dtend=dtend_last));
  by enrolid;
  if first.enrolid then dtstart_beg=dtstart;
  retain dtstart_beg;
  if last.enrolid;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
PaigeMiller
Diamond | Level 26

This doesn't meet the user requirement

 

There should be no gaps in the dates.. they should be continous

--
Paige Miller
mkeintz
PROC Star

Thank you @PaigeMiller.  I wasn't paying sufficient attention.

 

Now lag helps in detecting start of a new time span.

 

 

data want;
  set have;
  by enrolid;

  merge have (           keep=dtend  rename=(dtend=dtend_last))
        have (firstobs=2 keep=dtstart rename=(dtstart=dtstart_next));

  if first.enrolid or dtstart>sum(lag(dtend),1) then dtstart_beg=dtstart;
  retain dtstart_beg;

  if last.enrolid or dtstart_next>dtend+1;
  drop dtstart dtend dtstart_next;

run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
manya92
Fluorite | Level 6

what does 

dtstart>sum(lag(dtend),1) 

mean ?

 and also i am not clear what this does ?

if last.enrolid or dtstart_next>dtend+1;
mkeintz
PROC Star

@manya92 wrote:

what does 

dtstart>sum(lag(dtend),1) 

mean ?

 and also i am not clear what this does ?

if last.enrolid or dtstart_next>dtend+1;

dtstart>sum*lag(dtend),1)

  is almost identical to

dtstart>lag(dtend)+1

  except the latter will generate an error note for the first observation - because lag of anything is missing for the first obs.

 

The IF statement above is a "subsetting if" - i.e. the data step only keeps cases that satisfy the  if expression.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
manya92
Fluorite | Level 6
how can you have the set and merge statements in the same data step ?
mkeintz
PROC Star

The SET statement starts one data stream, and the MERGE statement starts another data stream.   And they are synchronized streams.

 

Of course if you have two streams of data from data set HAVE, you'll have any common variable being read once from each stream.  But you can't have two values occupy one variable, so I use the KEEP= and RENAME= parameters to avoid such collisions.

 

The SET statement is used for nothing more than to generate FIRST.ENROLID and LAST.ENROLID variables for easy detection of boundaries for each ID.  The MERGE statement has one of the merged sub-streams start at FIRSTOBS=2 so that you can look ahead one observation.  This makes it easy to tell if the next observation starts more than one day after the current obs ends.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 2458 views
  • 0 likes
  • 5 in conversation