BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eabc0351
Quartz | Level 8

Hello SAS Friends, 

I am trying to create flags to identify if a person was continuously enrolled in a health plan. Each person has multiple beginning and end enrollment dates through the course of the 18 years of data. I would like to create 18 variables (0/1) with value=1 if the person was enrolled 11 of 12 months of that year. Here is a sample of the data:

 

 

data test;
input id$2. effdate  enddate ;
attrib start format =MMYYP. informat=MMYYP.;
attrib end format =MMYYP. informat=MMYYP.;
datalines;
1	09.1996	07.2000
1	10.2000	10.2010   
1	12.2014	12.2017      
2	11.2012	12.2014 
2	02.2015	08.2017
3	02.2000	07.2014
3	08.2014	05.2018 
;
run;

 

And here is the output that I want:

 

PID      effdate     enddate     flag2000     flag2001     flag2002 …  flag2010… flag2014... flag2017
1         09.1996    07.2000            0               0                  0                 0                0                   0
1         10.2000    10.2010            0               1                  1                 0                0                   0
1         12.2014    12.2017            0               0                  0                 0                0                   1
2         11.2012     12.2014          0                0                  0                 0                1                   0
2         02.2015     08.2017          0                 0                 0                  0               0                   0
3         02.2000     07.2014          0                 1                 1                 1               0                    0
3         08.2014     05.2018          0                 0                 0                 0               1                    1 


 

 

Thank you for your help!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data test;
input id $ effdate $  enddate $ ;
start=input(cats('01.',effdate),ddmmyy10.);
end=input(cats('01.',enddate),ddmmyy10.);
format start end ddmmyyp10.;
datalines;
1	09.1996	07.2000
1	10.2000	10.2010   
1	12.2014	12.2017      
2	11.2012	12.2014 
2	02.2015	08.2017
3	02.2000	07.2014
3	08.2014	05.2018 
;
run;
data temp;
 set test; 
 do i=0 to intck('month',start,end);
  date=intnx('month',start,i);output;
 end;
 format date ddmmyyp10.;
 drop i start end;
run;
proc freq data=temp noprint;
table id*effdate*enddate*date/out=temp1;
format date year4.;
run;
data temp2;
 set temp1;
 flag=(count>10);
 drop percent;
run;
proc transpose data=temp2 out=temp3 prefix=flag;
by  id effdate enddate;
var flag;
id date;
run;
proc stdize data=temp3 out=want missing=0 reponly;
var _numeric_;
run;

View solution in original post

11 REPLIES 11
ScottBass
Rhodochrosite | Level 12

Edit your post as a self-contained, syntactically correct data step using datalines, then perhaps I'll try harder.

 

I'm kind of over (the MANY) cut-and-paste posts, where I'm meant to convert the post into a working SAS data step.

 

And put your code in a SAS code block (the little "running man") so I can scroll the code rather than having it wrap.

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
eabc0351
Quartz | Level 8

sorry @ScottBass, I've always hand-typed it in. Working on it now...

ScottBass
Rhodochrosite | Level 12

While I stand by my message in my first post...I really do wish folks would post a self-contained data step so I don't have to do the conversion...perhaps I could have been less "terse" in my tone.  So sorry bout that 😉

 

Anyway, perhaps this will get you going.  And notice I didn't attempt to convert your data into a working data step 😉

 

It's not complete, and you'll need to fiddle with it.  But I think the heavy lifting is there?  You'll also need to test it.

 

 

data have;
   id=1;
   start="01APR2012"d;
   end="01SEP2018"d;
   format start end yymmddd10.;
run;

data want;
   set have;
   by id start;
   length flag2000-flag2020 8;
   array flags{2000:2020} flag:;
   if first.id then do;
      do i=lbound(flags) to hbound(flags);
         flags{i}=0;
      end;
      counter=0;
   end;
   date=start;
   do until (date > end);
      if month(date)=1 then counter=0;
      counter+1;
      if month(date)=12 then flags{year(date)}=(counter ge 11);
      date=intnx('month',date,1);
   end;
   * drop i counter date;
run;

 

In summary:

 

  • Use an array to process your flag variables
  • Use an explicit array index corresponding with the min and max year range of your data.  See http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000201956.htm, search on lower: and :upper
  • Use the LBOUND and HBOUND functions to iterate over the bounds of your array
  • Use the YEAR function on your date as the index into your array (and thus your flag variables)
  • Use the INTNX function to iterate by month over the range of your start and end dates
  • Increment a counter to count months within the loop
  • Reset the counter when month=1, set your flag variable when month=12
  • You may not need the check on first.id?  But reset your flags to 0 for each new record (instead of the default missing value)

 

Hope this helps...

 

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
eabc0351
Quartz | Level 8

Hi @ScottBass. Thank you for this solution. When I run the command, there is an error message that says: ERROR: Array subscript out of range at line 196 column 30. The line and column numbers keep changing when I rerun the command. For example, ERROR: Array subscript out of range at line 216 column 30. The flags and the other three variables --  i counter date -- are all in the output dataset, but there are no observations.

Could the issue be that my date is month/year (since day is not in the data)?

Thank you for your assistance!

ScottBass
Rhodochrosite | Level 12

Could the issue be that my date is month/year (since day is not in the data)?

You don't have a date, you have a character string.  Yes, you'll need to convert your character string to a SAS date.

 

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Jagadishkatam
Amethyst | Level 16

Please try

 

data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
datalines;
1 09.1996 07.2000
1 10.2000 10.2010   
1 12.2014 12.2017      
2 11.2012 12.2014 
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018 
;
run;


proc sql;
select min(start), max(end) into: min , : max from test;
quit;

data want;
set test;
array yrs(*) flag1996 flag1997 flag1998 flag1999 flag2000 flag2001 flag2002 flag2003 flag2004 flag2005 flag2006 flag2007 flag2008 flag2009 flag2011 flag2012 flag2013 flag2014 flag2015 flag2016 flag2017 flag2018;
array yrss(22) _temporary_ (1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018);
do i = 1 to dim(yrs);
if start<yrss(i)<end then yrs(i)=1;
end;
run;
Thanks,
Jag
eabc0351
Quartz | Level 8

Hi @Jagadishkatam . This worked except for observations #3 and #4:

1 12.2014 12.2017     
2 11.2012 12.2014

 

For these observations, flag2017 (for obs #3) and flag2014 (for obs #4) are missing, not 1. It's not registering enddate of month 12 as complete enrollment for that year. Any suggestions?

Thanks!

Jagadishkatam
Amethyst | Level 16

Please try

 

data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
startm=input(scan(effdate,1,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
endm=input(scan(enddate,1,'.'),best.);
datalines;
1 09.1996 07.2000
1 10.2000 10.2010   
1 12.2014 12.2017      
2 11.2012 12.2014 
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018 
;
run;


proc sql;
select min(start), max(end) into: min , : max from test;
quit;

data want;
set test;
array yrs(*) flag1996 flag1997 flag1998 flag1999 flag2000 flag2001 flag2002 flag2003 flag2004 flag2005 flag2006 flag2007 flag2008 flag2009 flag2011 flag2012 flag2013 flag2014 flag2015 flag2016 flag2017 flag2018;
array yrss(22) _temporary_ (1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018);
do i = 1 to dim(yrs);
if start<=yrss(i)<=end and (startm>=12 or endm>=12) then yrs(i)=1;
else if start<yrss(i)<end and (startm<12 or endm<12) then yrs(i)=1;
end;
run;
Thanks,
Jag
eabc0351
Quartz | Level 8

Hi @Jagadishkatam. I'm sorry, just tried your solution on the entire dataset. There are two more issues. Here another sample dataset:

 

data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
startm=input(scan(effdate,1,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
endm=input(scan(enddate,1,'.'),best.);
datalines;
1 01.1996 07.2000
1 08.2010 12.2012
1 12.2014 01.2017
2 12.2008 09.2013
2 07.2015 02.2018
;
run;

Issues:

1) The code isn't picking up full enrollment for the year if enrollment begins January.

    See observation #1 - flag1996 should equal 1.

 

2) When the start or end dates have month 12, then the program doesn't work correctly.

    See observation #2 - flag2010 should be missing

    Observation #3 - flags 2014 and 2017 should be missing

    Observation #4 - flags 2008 and 2013 should be missing

 

Everything works fine on observations (like obs #5) where start and/or end are not in month 12 and start not in month 1. Do you know how to solve this? Thank you so much!

 

Ksharp
Super User
data test;
input id $ effdate $  enddate $ ;
start=input(cats('01.',effdate),ddmmyy10.);
end=input(cats('01.',enddate),ddmmyy10.);
format start end ddmmyyp10.;
datalines;
1	09.1996	07.2000
1	10.2000	10.2010   
1	12.2014	12.2017      
2	11.2012	12.2014 
2	02.2015	08.2017
3	02.2000	07.2014
3	08.2014	05.2018 
;
run;
data temp;
 set test; 
 do i=0 to intck('month',start,end);
  date=intnx('month',start,i);output;
 end;
 format date ddmmyyp10.;
 drop i start end;
run;
proc freq data=temp noprint;
table id*effdate*enddate*date/out=temp1;
format date year4.;
run;
data temp2;
 set temp1;
 flag=(count>10);
 drop percent;
run;
proc transpose data=temp2 out=temp3 prefix=flag;
by  id effdate enddate;
var flag;
id date;
run;
proc stdize data=temp3 out=want missing=0 reponly;
var _numeric_;
run;
eabc0351
Quartz | Level 8

Thank you @Ksharp and all!

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3645 views
  • 0 likes
  • 4 in conversation