Hi SAS Experts,
Sorry to bother but really need a hand. I was working on the enrollment of Medicare beneficiaries.
However, I really got stuck. My quick question is how to set up start date and end date for enrollment status by id like below.
data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a 2019 0 0 0 0 0 0 1 1 1 1 1 1
a 2020 1 1 1 1 1 1 1 1 1 1 1 1
b 2018 1 1 1 1 1 1 1 1 1 1 1 1
b 2019 1 1 1 1 1 1 1 1 1 1 1 1
b 2020 1 1 1 1 1 1 1 1 1 1 0 0
c 2017 0 1 1 1 1 1 1 1 1 1 1 1
c 2018 1 1 1 1 1 1 1 1 1 1 1 1
c 2019 1 1 1 1 1 1 1 1 1 1 1 1
c 2020 1 1 1 1 1 0 0 0 0 0 0 0
run;
How can I transfer the data to below?
data want;
id start end
a July2019 Dec2020
b Jan2018 Oct2020
c Feb2017 May2020
Thank you so much for the advice!!
data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a 2019 0 0 0 0 0 0 1 1 1 1 1 1
a 2020 1 1 1 1 1 1 1 1 1 1 1 1
b 2018 1 1 1 1 1 1 1 1 1 1 1 1
b 2019 1 1 1 1 1 1 1 1 1 1 1 1
b 2020 1 1 1 1 1 1 1 1 1 1 0 0
c 2017 0 1 1 1 1 1 1 1 1 1 1 1
c 2018 1 1 1 1 1 1 1 1 1 1 1 1
c 2019 1 1 1 1 1 1 1 1 1 1 1 1
c 2020 1 1 1 1 1 0 0 0 0 0 0 0
;
run;
data temp;
set enrollment;
call scan(cats(of Jan--Dec),1,p,l,'0');
start=mdy(p,1,year);
end=mdy(p+l-1,1,year);
format start end monyy7.;
run;
proc summary data=temp nway;
class id;
var start end;
output out=want min(start)=start max(end)=end ;
run;
One thing is to avoid "wide" data sets and to use actual date values.
One way:
data enrollment; input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ; cards; a 2019 0 0 0 0 0 0 1 1 1 1 1 1 a 2020 1 1 1 1 1 1 1 1 1 1 1 1 b 2018 1 1 1 1 1 1 1 1 1 1 1 1 b 2019 1 1 1 1 1 1 1 1 1 1 1 1 b 2020 1 1 1 1 1 1 1 1 1 1 0 0 c 2017 0 1 1 1 1 1 1 1 1 1 1 1 c 2018 1 1 1 1 1 1 1 1 1 1 1 1 c 2019 1 1 1 1 1 1 1 1 1 1 1 1 c 2020 1 1 1 1 1 0 0 0 0 0 0 0 run; /* make a LONG set with an actual date*/ data need; set enrollment; array m (*) Jan Feb March April May June July August Sep Oct Nov Dec; do i=1 to dim(m); if m[i] then do; date =mdy(i,1,year); output; end; end; keep id date; format date date9.; run; /*report*/ proc tabulate data=need; class id; var date; table id, date=''*(min='Start'*f=monyy7. max='End'*f=monyy7.); run; /*data set*/ proc summary data=need nway; class id; var date; output out=want (drop=_:) min=Start max=end; run; proc print data=want; format start end monyy7.; run;
Date values, such as created by the MDY function (or others) allows use of formats to display dates in a nice fashion. Also there are functions to work with the date values to determine intervals and such.
The Long data set of one record per person per date allows summary statistics to be built easily.
If START is always considered to be the first of the month and END is always end of the month, then:
data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a 2019 0 0 0 0 0 0 1 1 1 1 1 1
a 2020 1 1 1 1 1 1 1 1 1 1 1 1
b 2018 1 1 1 1 1 1 1 1 1 1 1 1
b 2019 1 1 1 1 1 1 1 1 1 1 1 1
b 2020 1 1 1 1 1 1 1 1 1 1 0 0
c 2017 0 1 1 1 1 1 1 1 1 1 1 1
c 2018 1 1 1 1 1 1 1 1 1 1 1 1
c 2019 1 1 1 1 1 1 1 1 1 1 1 1
c 2020 1 1 1 1 1 0 0 0 0 0 0 0
run;
data want (drop=_:);
do until (last.id);
set enrollment;
by id;
array dummies {12} jan -- dec;
if start=. then start=mdy(whichn(1,of dummies{*}),1,year);
_nmonths=sum(_nmonths,of dummies{*});
end;
end=intnx('month',start,_nmonths)-1;
format start end date9. ;
run;
This program assumes there is only one sequences of 1's, with no internal 0's.
data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a 2019 0 0 0 0 0 0 1 1 1 1 1 1
a 2020 1 1 1 1 1 1 1 1 1 1 1 1
b 2018 1 1 1 1 1 1 1 1 1 1 1 1
b 2019 1 1 1 1 1 1 1 1 1 1 1 1
b 2020 1 1 1 1 1 1 1 1 1 1 0 0
c 2017 0 1 1 1 1 1 1 1 1 1 1 1
c 2018 1 1 1 1 1 1 1 1 1 1 1 1
c 2019 1 1 1 1 1 1 1 1 1 1 1 1
c 2020 1 1 1 1 1 0 0 0 0 0 0 0
;
run;
data temp;
set enrollment;
call scan(cats(of Jan--Dec),1,p,l,'0');
start=mdy(p,1,year);
end=mdy(p+l-1,1,year);
format start end monyy7.;
run;
proc summary data=temp nway;
class id;
var start end;
output out=want min(start)=start max(end)=end ;
run;
Hi Ksharp, I have a follow-up question on your code. When a person have a break in their enrollment, for example, I revised the original dataset to the following and please note that person a now was enrolled from July 2019 to June 2020, had a break in July 2020, but then was re-enrolled in August 2020 then all the way to December 2020.
data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a 2019 0 0 0 0 0 0 1 1 1 1 1 1
a 2020 1 1 1 1 1 1 0 1 1 1 1 1
b 2018 1 1 1 1 1 1 1 1 1 1 1 1
b 2019 1 1 1 1 1 1 1 1 1 1 1 1
b 2020 1 1 1 1 1 1 1 1 1 1 0 0
c 2017 0 1 1 1 1 1 1 1 1 1 1 1
c 2018 1 1 1 1 1 1 1 1 1 1 1 1
c 2019 1 1 1 1 1 1 1 1 1 1 1 1
c 2020 1 1 1 1 1 0 0 0 0 0 0 0
run;
In this case, the desired output would be as follow. My question is how do you revise the code that you provided to accommodate such situation? Thank you for your help!
data want;
id start end
a July2019 Jun2020
a Aug2020 Dec2020
b Jan2018 Oct2020
c Feb2017 May2020.
It would be better if you could start a brand-new session.
data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a 2019 0 0 0 0 0 0 1 1 1 1 1 1
a 2020 1 1 1 1 1 1 0 1 1 1 1 1
b 2018 1 1 1 1 1 1 1 1 1 1 1 1
b 2019 1 1 1 1 1 1 1 1 1 1 1 1
b 2020 1 1 1 1 1 1 1 1 1 1 0 0
c 2017 0 1 1 1 1 1 1 1 1 1 1 1
c 2018 1 1 1 1 1 1 1 1 1 1 1 1
c 2019 1 1 1 1 1 1 1 1 1 1 1 1
c 2020 1 1 1 1 1 0 0 0 0 0 0 0
;
run;
proc transpose data=enrollment out=temp;
by id year;
var jan--dec;
run;
data temp2;
set temp;
by id col1 notsorted;
if first.col1 then group+1;
monyy=input(cats('01',substr(_name_,1,3),year),date9.);
format monyy monyy7.;
run;
proc summary data=temp2(where=(col1=1)) ;
by group id ;
var monyy;
output out=want min=min max=max;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.