DATA Step, Macro, Functions and more

how to code for continous enrollment using LAG and RETAIN function

Accepted Solution Solved
Reply
Contributor
Posts: 29
Accepted Solution

how to code for continous enrollment using LAG and RETAIN function

Hi all, 

 

I need to create  a continous enrollment dataset from a dataset which has enrollment for all months across 2011-2017. The dataset I have looks  like this :

enrolid  dtstart                dtend

420       07/02/2011       07/31/2011

420        08/01/2011     08/31/2011

420       09/01/2011        09/30/2011

500       07/02/2011         07/31/2011

 

I only want a dataset looking like this:

 

enrolid    period_start  period_stop

420        07/01/2011    09/30/2011

 

How can i do this using retain and lag functions ?

There should be no gaps in the dates.. they should be continous

 

 

I am using Batch SAS 


Accepted Solutions
Solution
‎05-03-2018 03:50 PM
Trusted Advisor
Posts: 1,312

Re: how to code for continous enrollment using LAG and RETAIN function

Posted in reply to PaigeMiller

Thank you @PaigeMiller.  I wasn't paying sufficient attention.

 

Now lag helps in detecting start of a new time span.

 

 

data want;
  set have;
  by enrolid;

  merge have (           keep=dtend  rename=(dtend=dtend_last))
        have (firstobs=2 keep=dtstart rename=(dtstart=dtstart_next));

  if first.enrolid or dtstart>sum(lag(dtend),1) then dtstart_beg=dtstart;
  retain dtstart_beg;

  if last.enrolid or dtstart_next>dtend+1;
  drop dtstart dtend dtstart_next;

run;

View solution in original post


All Replies
PROC Star
Posts: 1,592

Re: how to code for continous enrollment using LAG and RETAIN function

[ Edited ]
data have;
input enrolid  dtstart   :mmddyy10.             dtend   :mmddyy10.  ;
format dtstart             dtend   mmddyy10.  ;
cards;
420       07/02/2011       07/31/2011
420        08/01/2011     08/31/2011
420       09/01/2011        09/30/2011
500       07/02/2011         07/31/2011
;
 

proc sql;
create table want as
select enrolid, min(dtstart) as period_start   format= mmddyy10. ,max(dtend) as period_stop  format= mmddyy10.
from have
group by enrolid;
quit;
Contributor
Posts: 29

Re: how to code for continous enrollment using LAG and RETAIN function

Posted in reply to novinosrin

I am new to sql could you show how to do this using SAS retain and lag functions ?

PROC Star
Posts: 1,592

Re: how to code for continous enrollment using LAG and RETAIN function

[ Edited ]
data want;
set have;
by enrolid;
retain _t;
if first.enrolid then _t=dtstart;
if last.enrolid then do;period_start=_t;
period_stop=dtend;output;end;
format period_start period_stop mmddyy10.;
drop dt: _:;
run;

Note: This assumes your dataset is sorted  

Super User
Posts: 6,637

Re: how to code for continous enrollment using LAG and RETAIN function

Are you guaranteeing that there are no gaps in the dates? 

 

If there are gaps in the dates would you want a second observation for the same ID?

Contributor
Posts: 29

Re: how to code for continous enrollment using LAG and RETAIN function

Posted in reply to Astounding

There are gaps in the data.

if the gap is more than a day we have another row for the same enrolid..

 

Contributor
Posts: 29

Re: how to code for continous enrollment using LAG and RETAIN function

Posted in reply to Astounding

There are gaps in the dates but whenever they are there we have another row for the enrollment id. gap means more than 1 day of no enrollment 

Respected Advisor
Posts: 2,833

Re: how to code for continous enrollment using LAG and RETAIN function

[ Edited ]

I don't see a need for RETAIN here.  

Assumes data is sorted by ENROLID and DTSTART.

*** UNTESTED CODE ***

 

data want;
    set have;
    by enrolid;
    prev_dtend=lag(dtend);
    if dtstart ^= prev_dtend+1 then continuous_dates+1;
run;
proc summary nway data=want;
    class enrolid continuous_dates;
    var dtstart dtend;
    output out=want2 min(dtstart)=period_start max(dtend)=period_stop;
run;
    

 

--
Paige Miller
Trusted Advisor
Posts: 1,312

Re: how to code for continous enrollment using LAG and RETAIN function

You don't need a lag for this task.  But retain is useful:

 

 

data want;
  set have  (rename=(dtend=dtend_last));
  by enrolid;
  if first.enrolid then dtstart_beg=dtstart;
  retain dtstart_beg;
  if last.enrolid;
run;
Respected Advisor
Posts: 2,833

Re: how to code for continous enrollment using LAG and RETAIN function

This doesn't meet the user requirement

 

There should be no gaps in the dates.. they should be continous

--
Paige Miller
Solution
‎05-03-2018 03:50 PM
Trusted Advisor
Posts: 1,312

Re: how to code for continous enrollment using LAG and RETAIN function

Posted in reply to PaigeMiller

Thank you @PaigeMiller.  I wasn't paying sufficient attention.

 

Now lag helps in detecting start of a new time span.

 

 

data want;
  set have;
  by enrolid;

  merge have (           keep=dtend  rename=(dtend=dtend_last))
        have (firstobs=2 keep=dtstart rename=(dtstart=dtstart_next));

  if first.enrolid or dtstart>sum(lag(dtend),1) then dtstart_beg=dtstart;
  retain dtstart_beg;

  if last.enrolid or dtstart_next>dtend+1;
  drop dtstart dtend dtstart_next;

run;
Contributor
Posts: 29

Re: how to code for continous enrollment using LAG and RETAIN function

what does 

dtstart>sum(lag(dtend),1) 

mean ?

 and also i am not clear what this does ?

if last.enrolid or dtstart_next>dtend+1;
Trusted Advisor
Posts: 1,312

Re: how to code for continous enrollment using LAG and RETAIN function


@manya92 wrote:

what does 

dtstart>sum(lag(dtend),1) 

mean ?

 and also i am not clear what this does ?

if last.enrolid or dtstart_next>dtend+1;

dtstart>sum*lag(dtend),1)

  is almost identical to

dtstart>lag(dtend)+1

  except the latter will generate an error note for the first observation - because lag of anything is missing for the first obs.

 

The IF statement above is a "subsetting if" - i.e. the data step only keeps cases that satisfy the  if expression.

Contributor
Posts: 29

Re: how to code for continous enrollment using LAG and RETAIN function

how can you have the set and merge statements in the same data step ?
Trusted Advisor
Posts: 1,312

Re: how to code for continous enrollment using LAG and RETAIN function

The SET statement starts one data stream, and the MERGE statement starts another data stream.   And they are synchronized streams.

 

Of course if you have two streams of data from data set HAVE, you'll have any common variable being read once from each stream.  But you can't have two values occupy one variable, so I use the KEEP= and RENAME= parameters to avoid such collisions.

 

The SET statement is used for nothing more than to generate FIRST.ENROLID and LAST.ENROLID variables for easy detection of boundaries for each ID.  The MERGE statement has one of the merged sub-streams start at FIRSTOBS=2 so that you can look ahead one observation.  This makes it easy to tell if the next observation starts more than one day after the current obs ends.

 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 17 replies
  • 167 views
  • 0 likes
  • 5 in conversation