Dear all,
I am new to SAS but I have some programming experience. I would like some help finding the longest consecutive coverage for A and B. As an alternative method I used to include only rows that have yes in all months which made me miss some entries. Appreciate your help.
ID | year | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
A | 2010 | No | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
A | 2011 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No |
A | 2012 | No | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
A | 2013 | Yes | Yes | No | No | No | No | No | No | No | No | No | No |
B | 2011 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
B | 2012 | Yes | Yes | Yes | No | No | No | No | No | No | No | No | No |
As @mkeintz stated, you will find that a tall skinny data set is the best way to handle this type of problem in SAS. So we create a tall skinny data set.
Also, as @mkeintz stated, you will get faster and more accurate help if you provide your data as a SAS data step. Fortunately, @novinosrin has done that, but in the future, you should do that yourself.
My solution:
data tall_skinny;
informat ID $1. year 4.;
array col(12) $3;
input ID year col(*);
do i=1 to 12;
month=i;
result=col(i);
output;
end;
keep id year result month;
cards;
A 2010 No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
A 2011 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No
A 2012 No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
A 2013 Yes Yes No No No No No No No No No No
B 2011 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
B 2012 Yes Yes Yes No No No No No No No No No
;
data tall_skinny1;
set tall_skinny;
by id;
if first.id or upcase(result)='NO' then consecutive=0;
if upcase(result)="YES" then consecutive+1;
run;
proc summary data=tall_skinny1 nway;
class id;
var consecutive;
output out=stats(drop=_:) max= maxid(consecutive(month))=maxmonth
maxid(consecutive(year))=maxyear;
run;
/* Select months starting at maxmonth maxyear and go backwards to first month of sequence with coverage */
data final;
merge tall_skinny1 stats;
by id;
if mdy(month,1,year)>intnx('month',mdy(maxmonth,1,maxyear),-consecutive,'b')
and mdy(month,1,year)<=mdy(maxmonth,1,maxyear);
run;
What should the output be?
For entry A will only include data of 03/2010-11/2011 (21 months coverage)
For entry B will only include data of 01/2011-03/2012 (15 months coverage).
ID | year | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
A | 2010 | No | No | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
A | 2011 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | No |
A | 2012 | No | No | No | No | No | No | No | No | No | No | No | No |
A | 2013 | No | No | No | No | No | No | No | No | No | No | No | No |
B | 2011 | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes |
B | 2012 | Yes | Yes | Yes | No | No | No | No | No | No | No | No | No |
No or blank will be OK
Thanks in advance.
Is your sas data already in the layout you show? You'd be better of with a tall-skinny dataset, with one row per month.
Could you provide a DATA step generate the sas data you already have? Then we can provide more useful answers.
Editted addition: as @novinosrin asks, what should the output look like? One row per ID?
The data is an upload of csv file in the provided format. The year in one column and the coverage months (Jan-Dec) each has a separate column.
I just want to identify the longest consecutive coverage year and months. Later I will be using it to clean other tables where the date of service is not in the longest consecutive coverage
Best
Good morning Mark @mkeintz, I hope your day is going well. I am afraid I am late to college and off to Professor appointment et al priorities. I didn't expect you to pop in early hours but if you have time and wanna take a stab, here is the data have. I will catch up on the thread later. Have a good day!
data have;
informat ID $1. year 4.;
array month(12) $3;
input ID year month(*);
cards;
A 2010 No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
A 2011 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No
A 2012 No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
A 2013 Yes Yes No No No No No No No No No No
B 2011 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
B 2012 Yes Yes Yes No No No No No No No No No
;
As @mkeintz stated, you will find that a tall skinny data set is the best way to handle this type of problem in SAS. So we create a tall skinny data set.
Also, as @mkeintz stated, you will get faster and more accurate help if you provide your data as a SAS data step. Fortunately, @novinosrin has done that, but in the future, you should do that yourself.
My solution:
data tall_skinny;
informat ID $1. year 4.;
array col(12) $3;
input ID year col(*);
do i=1 to 12;
month=i;
result=col(i);
output;
end;
keep id year result month;
cards;
A 2010 No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
A 2011 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes No
A 2012 No No Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
A 2013 Yes Yes No No No No No No No No No No
B 2011 Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes
B 2012 Yes Yes Yes No No No No No No No No No
;
data tall_skinny1;
set tall_skinny;
by id;
if first.id or upcase(result)='NO' then consecutive=0;
if upcase(result)="YES" then consecutive+1;
run;
proc summary data=tall_skinny1 nway;
class id;
var consecutive;
output out=stats(drop=_:) max= maxid(consecutive(month))=maxmonth
maxid(consecutive(year))=maxyear;
run;
/* Select months starting at maxmonth maxyear and go backwards to first month of sequence with coverage */
data final;
merge tall_skinny1 stats;
by id;
if mdy(month,1,year)>intnx('month',mdy(maxmonth,1,maxyear),-consecutive,'b')
and mdy(month,1,year)<=mdy(maxmonth,1,maxyear);
run;
Thank you, Paige, for your answer. I briefly used SAS in the past (6 years ago) and now I am back to it. I am trying to learn it from books and the web. I am trying my best.
This worked:
merge
tall_skinny
stats;
Thanks again Paige
Oh yeah, good catch!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.