Hello,
So I am working with insurance data and I am trying to figure out how to count consecutive months. So in order to be included in my sample, the person needs to have 6 consecutive months of coverage. I am unsure how to approach my coding for this.
So for example:
data fakedata;
input id month1 month2 month3 month4 month5 month6 month7 month8 month9 month10 month11 month12;
datalines;
1 1 1 1 1 1 1 0 0 0 0 0 0
2 0 1 1 1 0 0 0 1 1 1 0 0
3 0 0 0 1 1 1 1 1 1 1 0 0
4 0 1 1 1 1 1 0 0 0 0 0 0
5 0 1 1 0 0 0 0 0 0 0 0 0
;
run;
So in the above example, we have 12 variables for the 12 months. So we see ID#1 3 and 4 meets the criteria, since they have 6 consecutive months of coverage. While ID 2 only have 2 3 consecutive months coverage and 5 only have 2 months coverage, so they will not be included in the sample.
My first thought is just sum all the months, but that will only tell me if they have atleast 6 months of coverage. So I am a bit lost on seeing how can I code it for it to be consecutive. Does anyone have any experience doing this and can provide me with any input?
Thank you so much for your help
Tan
Tom's code should be OK. it is also suited for your new situation.
But mine need change a little bit .
data fakedata; input id month1 month2 month3 month4 month5 month6 month7 month8 month9 month10 month11 month12 year; datalines; 1 1 1 1 1 1 1 0 0 0 0 0 0 2001 1 1 1 1 0 1 1 0 0 0 0 0 0 2002 2 0 1 1 1 0 0 0 1 1 1 0 0 2002 3 0 0 0 1 1 1 1 1 1 1 0 0 2004 4 0 1 1 1 1 1 0 0 0 0 0 0 2001 5 0 1 1 0 0 0 0 0 0 0 0 0 2004 6 0 1 1 0 0 1 1 1 0 0 0 0 2010 7 0 1 1 0 0 0 0 0 0 0 0 1 2001 8 0 1 1 0 0 0 0 0 1 1 1 1 2004 8 0 1 1 0 0 0 1 1 1 1 1 1 2005 ; run; data temp(keep=id year name value); set fakedata ; length name $ 10; array m{*} month1-month12; do i=1 to dim(m); name=vname(m{i});value=m{i};output; end; run; data temp; set temp; if value ne lag(value) or id ne lag(id) then count+1; run; proc sql; create table x as select *,count(*) as n from temp where value=1 group by count having calculated n ge 6 order by id,year,input(compress(name, ,'kd'),best8.); quit; data xx(keep=id year start end n); set x; by id year; length start end $ 10; retain start end; if first.year then start=name; if last.year then do;end=name;output;end; run; data want; merge fakedata xx; by id year; run;
Ksharp
Try this:
data want;
set fakedata;
array m(12) month1-month12;
count=m(1);
max=0;
do i=2 to 12;
if m(i) then count+1;
else do; if count>max then max=count; count= 0; end;
end;
if count>max then max=count;
if max ge 6;
drop count i;
run;
Regards,
Its better to exit the loop as soon as you find the 6 months, if your criteria is at least 6 months. If you want the max amount of coverage then CTorres solution will work.
data fakedata;
input id month1 month2 month3 month4 month5 month6 month7 month8 month9 month10 month11 month12;
datalines;
1 1 1 1 1 1 1 0 0 0 0 0 0
2 0 1 1 1 0 0 0 1 1 1 0 0
3 0 0 0 1 1 1 1 1 1 1 0 0
4 0 1 1 1 1 1 0 0 0 0 0 0
5 0 1 1 0 0 0 0 0 0 0 0 0
;
run;
data want;
set fakedata;
array month(12) month1-month12;
flag=0;count=0;
do i=1 to 12 while (flag ne 1);
if month(i)=1 then count=count+1;
else count=0;
if count eq 6 then flag=1;
end;
run;
So I don't know anything about do loops. Would the do loop you written, work for consecutive months? Just reading the code, I think I need to read up on do loops and have a better undertanding on what the code is doing itself. I'm out of the office today and would play with that tomorrow
Maybe treat months as a string. This finds max coverage.
An interesting way to do this is to use a character string to record the coverage.
data fakedata;
input id month1-month12;
cards;
1 1 1 1 1 1 1 0 0 0 0 0 0
2 0 1 1 1 0 0 0 1 1 1 0 0
3 0 0 0 1 1 1 1 1 1 1 0 0
4 0 1 1 1 1 1 0 0 0 0 0 0
5 0 1 1 0 0 0 0 0 0 0 0 0
;
run;
data want;
set fakedata;
length coverstring $12;
coverstring = cats(of month1-month12);
has6months = 0 ^= index(coverstring,repeat('1',6-1));
put (id has6months coverstring) (=);
run;
id=1 has6months=1 coverstring=111111000000
id=2 has6months=0 coverstring=011100011100
id=3 has6months=1 coverstring=000111111100
id=4 has6months=0 coverstring=011111000000
id=5 has6months=0 coverstring=011000000000
if care about the start month and end month which has more than 6 series month.
data fakedata; input id month1 month2 month3 month4 month5 month6 month7 month8 month9 month10 month11 month12; datalines; 1 1 1 1 1 1 1 0 0 0 0 0 0 2 0 1 1 1 0 0 0 1 1 1 0 0 3 0 0 0 1 1 1 1 1 1 1 0 0 4 0 1 1 1 1 1 0 0 0 0 0 0 5 0 1 1 0 0 0 0 0 0 0 0 0 6 0 1 1 0 0 1 1 1 0 0 0 0 7 0 1 1 0 0 0 0 0 0 0 0 1 8 0 1 1 0 0 0 0 0 1 1 1 1 ; run; data temp(keep=id name value); set fakedata ; length name $ 10; array m{*} month1-month12; do i=1 to dim(m); name=vname(m{i});value=m{i};output; end; run; data temp; set temp; if value ne lag(value) or id ne lag(id) then count+1; run; proc sql; create table x as select *,sum(value=1) as n from temp group by count having calculated n ge 6 order by id,input(compress(name, ,'kd'),best8.); quit; data xx(keep=id start end n); set x; by id; length start end $ 10; retain start end; if first.id then start=name; if last.id then do;end=name;output;end; run; data want; merge fakedata xx; by id; run;
Ksharp
Hi,
How about the following one...
proc transpose data = have
out = have(rename = (_NAME_ = Month COL1 = VAL));
by id;
var month1-month12;
run;
data want;
set have;
by id;
_temp = dif(val);
if _temp = . then _temp = 0;
if first.id or _temp = 1 then con_mnth = 1;
else con_mnth + 1;
if _temp < 0 then con_mnth = .;
run;
proc sql;
create table want as
select id,max(con_mnth) as con_mnth
from want
where val = 1
group by id;
quit;
-Urvish
Wow thanks for everyone's response.. I am going to take all of your suggestions and play around with it.
So things just got a bit more complex. So some patients might have 2 records, so I am assuming the best option is combine them into 1.
data fakedata;
input id month1 month2 month3 month4 month5 month6 month7 month8 month9 month10 month11 month12 year;
datalines;
1 1 1 1 1 1 1 0 0 0 0 0 0 2001
1 0 1 1 1 1 1 0 0 0 0 0 0 2002;
run;
as an example. I assume I have to create variables like Yr1_month1..etc...
I am unsure if converting the values to strings is an option (method that Tom suggested). I only use the dataline to illustrate what the dataset looks like. But I can give it a try others string suggestion also.
Tan
Tom's code should be OK. it is also suited for your new situation.
But mine need change a little bit .
data fakedata; input id month1 month2 month3 month4 month5 month6 month7 month8 month9 month10 month11 month12 year; datalines; 1 1 1 1 1 1 1 0 0 0 0 0 0 2001 1 1 1 1 0 1 1 0 0 0 0 0 0 2002 2 0 1 1 1 0 0 0 1 1 1 0 0 2002 3 0 0 0 1 1 1 1 1 1 1 0 0 2004 4 0 1 1 1 1 1 0 0 0 0 0 0 2001 5 0 1 1 0 0 0 0 0 0 0 0 0 2004 6 0 1 1 0 0 1 1 1 0 0 0 0 2010 7 0 1 1 0 0 0 0 0 0 0 0 1 2001 8 0 1 1 0 0 0 0 0 1 1 1 1 2004 8 0 1 1 0 0 0 1 1 1 1 1 1 2005 ; run; data temp(keep=id year name value); set fakedata ; length name $ 10; array m{*} month1-month12; do i=1 to dim(m); name=vname(m{i});value=m{i};output; end; run; data temp; set temp; if value ne lag(value) or id ne lag(id) then count+1; run; proc sql; create table x as select *,count(*) as n from temp where value=1 group by count having calculated n ge 6 order by id,year,input(compress(name, ,'kd'),best8.); quit; data xx(keep=id year start end n); set x; by id year; length start end $ 10; retain start end; if first.year then start=name; if last.year then do;end=name;output;end; run; data want; merge fakedata xx; by id year; run;
Ksharp
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.