BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eabc0351
Quartz | Level 8

Hello SAS Friends, 

I am trying to create flags to identify if a person was continuously enrolled in a health plan. Each person has multiple beginning and end enrollment dates through the course of the 18 years of data. I would like to create 18 variables (0/1) with value=1 if the person was enrolled 11 of 12 months of that year. Here is a sample of the data:

 

 

data test;
input id$2. effdate  enddate ;
attrib start format =MMYYP. informat=MMYYP.;
attrib end format =MMYYP. informat=MMYYP.;
datalines;
1	09.1996	07.2000
1	10.2000	10.2010   
1	12.2014	12.2017      
2	11.2012	12.2014 
2	02.2015	08.2017
3	02.2000	07.2014
3	08.2014	05.2018 
;
run;

 

And here is the output that I want:

 

PID      effdate     enddate     flag2000     flag2001     flag2002 …  flag2010… flag2014... flag2017
1         09.1996    07.2000            0               0                  0                 0                0                   0
1         10.2000    10.2010            0               1                  1                 0                0                   0
1         12.2014    12.2017            0               0                  0                 0                0                   1
2         11.2012     12.2014          0                0                  0                 0                1                   0
2         02.2015     08.2017          0                 0                 0                  0               0                   0
3         02.2000     07.2014          0                 1                 1                 1               0                    0
3         08.2014     05.2018          0                 0                 0                 0               1                    1 


 

 

Thank you for your help!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data test;
input id $ effdate $  enddate $ ;
start=input(cats('01.',effdate),ddmmyy10.);
end=input(cats('01.',enddate),ddmmyy10.);
format start end ddmmyyp10.;
datalines;
1	09.1996	07.2000
1	10.2000	10.2010   
1	12.2014	12.2017      
2	11.2012	12.2014 
2	02.2015	08.2017
3	02.2000	07.2014
3	08.2014	05.2018 
;
run;
data temp;
 set test; 
 do i=0 to intck('month',start,end);
  date=intnx('month',start,i);output;
 end;
 format date ddmmyyp10.;
 drop i start end;
run;
proc freq data=temp noprint;
table id*effdate*enddate*date/out=temp1;
format date year4.;
run;
data temp2;
 set temp1;
 flag=(count>10);
 drop percent;
run;
proc transpose data=temp2 out=temp3 prefix=flag;
by  id effdate enddate;
var flag;
id date;
run;
proc stdize data=temp3 out=want missing=0 reponly;
var _numeric_;
run;

View solution in original post

11 REPLIES 11
ScottBass
Rhodochrosite | Level 12

Edit your post as a self-contained, syntactically correct data step using datalines, then perhaps I'll try harder.

 

I'm kind of over (the MANY) cut-and-paste posts, where I'm meant to convert the post into a working SAS data step.

 

And put your code in a SAS code block (the little "running man") so I can scroll the code rather than having it wrap.

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
eabc0351
Quartz | Level 8

sorry @ScottBass, I've always hand-typed it in. Working on it now...

ScottBass
Rhodochrosite | Level 12

While I stand by my message in my first post...I really do wish folks would post a self-contained data step so I don't have to do the conversion...perhaps I could have been less "terse" in my tone.  So sorry bout that 😉

 

Anyway, perhaps this will get you going.  And notice I didn't attempt to convert your data into a working data step 😉

 

It's not complete, and you'll need to fiddle with it.  But I think the heavy lifting is there?  You'll also need to test it.

 

 

data have;
   id=1;
   start="01APR2012"d;
   end="01SEP2018"d;
   format start end yymmddd10.;
run;

data want;
   set have;
   by id start;
   length flag2000-flag2020 8;
   array flags{2000:2020} flag:;
   if first.id then do;
      do i=lbound(flags) to hbound(flags);
         flags{i}=0;
      end;
      counter=0;
   end;
   date=start;
   do until (date > end);
      if month(date)=1 then counter=0;
      counter+1;
      if month(date)=12 then flags{year(date)}=(counter ge 11);
      date=intnx('month',date,1);
   end;
   * drop i counter date;
run;

 

In summary:

 

  • Use an array to process your flag variables
  • Use an explicit array index corresponding with the min and max year range of your data.  See http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000201956.htm, search on lower: and :upper
  • Use the LBOUND and HBOUND functions to iterate over the bounds of your array
  • Use the YEAR function on your date as the index into your array (and thus your flag variables)
  • Use the INTNX function to iterate by month over the range of your start and end dates
  • Increment a counter to count months within the loop
  • Reset the counter when month=1, set your flag variable when month=12
  • You may not need the check on first.id?  But reset your flags to 0 for each new record (instead of the default missing value)

 

Hope this helps...

 

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
eabc0351
Quartz | Level 8

Hi @ScottBass. Thank you for this solution. When I run the command, there is an error message that says: ERROR: Array subscript out of range at line 196 column 30. The line and column numbers keep changing when I rerun the command. For example, ERROR: Array subscript out of range at line 216 column 30. The flags and the other three variables --  i counter date -- are all in the output dataset, but there are no observations.

Could the issue be that my date is month/year (since day is not in the data)?

Thank you for your assistance!

ScottBass
Rhodochrosite | Level 12

Could the issue be that my date is month/year (since day is not in the data)?

You don't have a date, you have a character string.  Yes, you'll need to convert your character string to a SAS date.

 

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
Jagadishkatam
Amethyst | Level 16

Please try

 

data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
datalines;
1 09.1996 07.2000
1 10.2000 10.2010   
1 12.2014 12.2017      
2 11.2012 12.2014 
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018 
;
run;


proc sql;
select min(start), max(end) into: min , : max from test;
quit;

data want;
set test;
array yrs(*) flag1996 flag1997 flag1998 flag1999 flag2000 flag2001 flag2002 flag2003 flag2004 flag2005 flag2006 flag2007 flag2008 flag2009 flag2011 flag2012 flag2013 flag2014 flag2015 flag2016 flag2017 flag2018;
array yrss(22) _temporary_ (1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018);
do i = 1 to dim(yrs);
if start<yrss(i)<end then yrs(i)=1;
end;
run;
Thanks,
Jag
eabc0351
Quartz | Level 8

Hi @Jagadishkatam . This worked except for observations #3 and #4:

1 12.2014 12.2017     
2 11.2012 12.2014

 

For these observations, flag2017 (for obs #3) and flag2014 (for obs #4) are missing, not 1. It's not registering enddate of month 12 as complete enrollment for that year. Any suggestions?

Thanks!

Jagadishkatam
Amethyst | Level 16

Please try

 

data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
startm=input(scan(effdate,1,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
endm=input(scan(enddate,1,'.'),best.);
datalines;
1 09.1996 07.2000
1 10.2000 10.2010   
1 12.2014 12.2017      
2 11.2012 12.2014 
2 02.2015 08.2017
3 02.2000 07.2014
3 08.2014 05.2018 
;
run;


proc sql;
select min(start), max(end) into: min , : max from test;
quit;

data want;
set test;
array yrs(*) flag1996 flag1997 flag1998 flag1999 flag2000 flag2001 flag2002 flag2003 flag2004 flag2005 flag2006 flag2007 flag2008 flag2009 flag2011 flag2012 flag2013 flag2014 flag2015 flag2016 flag2017 flag2018;
array yrss(22) _temporary_ (1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2011 2012 2013 2014 2015 2016 2017 2018);
do i = 1 to dim(yrs);
if start<=yrss(i)<=end and (startm>=12 or endm>=12) then yrs(i)=1;
else if start<yrss(i)<end and (startm<12 or endm<12) then yrs(i)=1;
end;
run;
Thanks,
Jag
eabc0351
Quartz | Level 8

Hi @Jagadishkatam. I'm sorry, just tried your solution on the entire dataset. There are two more issues. Here another sample dataset:

 

data test;
input id$2. effdate :$ enddate:$ ;
start=input(scan(effdate,2,'.'),best.);
startm=input(scan(effdate,1,'.'),best.);
end=input(scan(enddate,2,'.'),best.);
endm=input(scan(enddate,1,'.'),best.);
datalines;
1 01.1996 07.2000
1 08.2010 12.2012
1 12.2014 01.2017
2 12.2008 09.2013
2 07.2015 02.2018
;
run;

Issues:

1) The code isn't picking up full enrollment for the year if enrollment begins January.

    See observation #1 - flag1996 should equal 1.

 

2) When the start or end dates have month 12, then the program doesn't work correctly.

    See observation #2 - flag2010 should be missing

    Observation #3 - flags 2014 and 2017 should be missing

    Observation #4 - flags 2008 and 2013 should be missing

 

Everything works fine on observations (like obs #5) where start and/or end are not in month 12 and start not in month 1. Do you know how to solve this? Thank you so much!

 

Ksharp
Super User
data test;
input id $ effdate $  enddate $ ;
start=input(cats('01.',effdate),ddmmyy10.);
end=input(cats('01.',enddate),ddmmyy10.);
format start end ddmmyyp10.;
datalines;
1	09.1996	07.2000
1	10.2000	10.2010   
1	12.2014	12.2017      
2	11.2012	12.2014 
2	02.2015	08.2017
3	02.2000	07.2014
3	08.2014	05.2018 
;
run;
data temp;
 set test; 
 do i=0 to intck('month',start,end);
  date=intnx('month',start,i);output;
 end;
 format date ddmmyyp10.;
 drop i start end;
run;
proc freq data=temp noprint;
table id*effdate*enddate*date/out=temp1;
format date year4.;
run;
data temp2;
 set temp1;
 flag=(count>10);
 drop percent;
run;
proc transpose data=temp2 out=temp3 prefix=flag;
by  id effdate enddate;
var flag;
id date;
run;
proc stdize data=temp3 out=want missing=0 reponly;
var _numeric_;
run;
eabc0351
Quartz | Level 8

Thank you @Ksharp and all!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 3501 views
  • 0 likes
  • 4 in conversation