BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
chimei0403
Obsidian | Level 7

Hi SAS Experts,

Sorry to bother but really need a hand. I was working on the enrollment of  Medicare beneficiaries.

However, I really got stuck.  My quick question is how to set up start date and end date for enrollment status by id like below. 

 

data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a    2019   0  0  0 0 0 0 1 1 1 1 1 1  
a    2020   1  1  1 1 1 1 1 1 1 1 1 1 
b    2018   1  1  1 1 1 1 1 1 1 1 1 1
b    2019   1  1  1 1 1 1 1 1 1 1 1 1
b    2020   1  1  1 1 1 1 1 1 1 1 0 0   
c    2017   0  1  1 1 1 1 1 1 1 1 1 1 
c    2018   1  1  1 1 1 1 1 1 1 1 1 1 
c    2019   1  1  1 1 1 1 1 1 1 1 1 1  
c    2020   1  1  1 1 1 0 0 0 0 0 0 0  
run;

How can I transfer the data to below?

data want;
id    start             end
a      July2019     Dec2020
b      Jan2018     Oct2020
c      Feb2017    May2020

Thank you so much for the advice!!

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a    2019   0  0  0 0 0 0 1 1 1 1 1 1  
a    2020   1  1  1 1 1 1 1 1 1 1 1 1 
b    2018   1  1  1 1 1 1 1 1 1 1 1 1
b    2019   1  1  1 1 1 1 1 1 1 1 1 1
b    2020   1  1  1 1 1 1 1 1 1 1 0 0   
c    2017   0  1  1 1 1 1 1 1 1 1 1 1 
c    2018   1  1  1 1 1 1 1 1 1 1 1 1 
c    2019   1  1  1 1 1 1 1 1 1 1 1 1  
c    2020   1  1  1 1 1 0 0 0 0 0 0 0  
;
run;
data temp;
 set enrollment;
 call scan(cats(of Jan--Dec),1,p,l,'0');
 start=mdy(p,1,year);
 end=mdy(p+l-1,1,year);
format start end monyy7.;
run;
proc summary data=temp nway;
class id;
var start end;
output out=want min(start)=start max(end)=end ;
run;

View solution in original post

5 REPLIES 5
ballardw
Super User

One thing is to avoid "wide" data sets and to use actual date values.

One way:

data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a    2019   0  0  0 0 0 0 1 1 1 1 1 1  
a    2020   1  1  1 1 1 1 1 1 1 1 1 1 
b    2018   1  1  1 1 1 1 1 1 1 1 1 1
b    2019   1  1  1 1 1 1 1 1 1 1 1 1
b    2020   1  1  1 1 1 1 1 1 1 1 0 0   
c    2017   0  1  1 1 1 1 1 1 1 1 1 1 
c    2018   1  1  1 1 1 1 1 1 1 1 1 1 
c    2019   1  1  1 1 1 1 1 1 1 1 1 1  
c    2020   1  1  1 1 1 0 0 0 0 0 0 0  
run;

/* make a LONG set with an actual date*/

data need;
   set enrollment;
   array m (*) Jan Feb March April May June July August Sep Oct Nov Dec;
   do i=1 to dim(m);
      if m[i] then do;
         date =mdy(i,1,year);
         output;
      end;
   end;
   keep id date;
   format date date9.;
run;

/*report*/
proc tabulate data=need;
   class id;
   var date;
   table id,
         date=''*(min='Start'*f=monyy7. max='End'*f=monyy7.);
run;
/*data set*/
proc summary data=need nway;
   class id;
   var date;
   output out=want (drop=_:) min=Start max=end;
run;

proc print data=want;
  format start end monyy7.;
run;

Date values, such as created by the MDY function (or others) allows use of formats to display dates in a nice fashion. Also there are functions to work with the date values to determine intervals and such.

The Long data set of one record per person per date allows summary statistics to be built easily.

 

mkeintz
PROC Star

If START is always considered to be the first of the month and END is always end of the month, then:

 

data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a    2019   0  0  0 0 0 0 1 1 1 1 1 1  
a    2020   1  1  1 1 1 1 1 1 1 1 1 1 
b    2018   1  1  1 1 1 1 1 1 1 1 1 1
b    2019   1  1  1 1 1 1 1 1 1 1 1 1
b    2020   1  1  1 1 1 1 1 1 1 1 0 0   
c    2017   0  1  1 1 1 1 1 1 1 1 1 1 
c    2018   1  1  1 1 1 1 1 1 1 1 1 1 
c    2019   1  1  1 1 1 1 1 1 1 1 1 1  
c    2020   1  1  1 1 1 0 0 0 0 0 0 0  
run;

data want (drop=_:);
  do until (last.id);
    set enrollment;
    by id;
    array dummies {12} jan -- dec;
    if start=. then start=mdy(whichn(1,of dummies{*}),1,year);
    _nmonths=sum(_nmonths,of dummies{*});
  end;
  end=intnx('month',start,_nmonths)-1;
  format start end date9. ;
run;

This program assumes there is only one sequences of 1's, with no internal 0's.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a    2019   0  0  0 0 0 0 1 1 1 1 1 1  
a    2020   1  1  1 1 1 1 1 1 1 1 1 1 
b    2018   1  1  1 1 1 1 1 1 1 1 1 1
b    2019   1  1  1 1 1 1 1 1 1 1 1 1
b    2020   1  1  1 1 1 1 1 1 1 1 0 0   
c    2017   0  1  1 1 1 1 1 1 1 1 1 1 
c    2018   1  1  1 1 1 1 1 1 1 1 1 1 
c    2019   1  1  1 1 1 1 1 1 1 1 1 1  
c    2020   1  1  1 1 1 0 0 0 0 0 0 0  
;
run;
data temp;
 set enrollment;
 call scan(cats(of Jan--Dec),1,p,l,'0');
 start=mdy(p,1,year);
 end=mdy(p+l-1,1,year);
format start end monyy7.;
run;
proc summary data=temp nway;
class id;
var start end;
output out=want min(start)=start max(end)=end ;
run;
roroholic123
Calcite | Level 5

Hi Ksharp, I have a follow-up question on your code. When a person have a break in their enrollment, for example, I revised the original dataset to the following and please note that person a now was enrolled from July 2019 to June 2020, had a break in July 2020, but then was re-enrolled in August 2020 then all the way to December 2020.

data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a    2019   0  0  0 0 0 0 1 1 1 1 1 1  
a    2020   1  1  1 1 1 1 0 1 1 1 1 1 
b    2018   1  1  1 1 1 1 1 1 1 1 1 1
b    2019   1  1  1 1 1 1 1 1 1 1 1 1
b    2020   1  1  1 1 1 1 1 1 1 1 0 0   
c    2017   0  1  1 1 1 1 1 1 1 1 1 1 
c    2018   1  1  1 1 1 1 1 1 1 1 1 1 
c    2019   1  1  1 1 1 1 1 1 1 1 1 1  
c    2020   1  1  1 1 1 0 0 0 0 0 0 0  
run;

In this case, the desired output would be as follow. My question is how do you revise the code that you provided to accommodate such situation? Thank you for your help!

data want;
id    start             end
a      July2019     Jun2020
a Aug2020 Dec2020 b      Jan2018     Oct2020 c      Feb2017    May2020. 
 

 

Ksharp
Super User

It would be better if you could start a brand-new session.

 

data enrollment;
input ID$ Year Jan Feb March April May June July August Sep Oct Nov Dec ;
cards;
a    2019   0  0  0 0 0 0 1 1 1 1 1 1  
a    2020   1  1  1 1 1 1 0 1 1 1 1 1 
b    2018   1  1  1 1 1 1 1 1 1 1 1 1
b    2019   1  1  1 1 1 1 1 1 1 1 1 1
b    2020   1  1  1 1 1 1 1 1 1 1 0 0   
c    2017   0  1  1 1 1 1 1 1 1 1 1 1 
c    2018   1  1  1 1 1 1 1 1 1 1 1 1 
c    2019   1  1  1 1 1 1 1 1 1 1 1 1  
c    2020   1  1  1 1 1 0 0 0 0 0 0 0  
;
run;
proc transpose data=enrollment out=temp;
by id year;
var jan--dec;
run;
data temp2;
 set temp;
 by id col1 notsorted;
 if first.col1 then group+1;
 monyy=input(cats('01',substr(_name_,1,3),year),date9.);
 format monyy monyy7.;
run;
proc summary data=temp2(where=(col1=1)) ;
by group id ;
var monyy;
output out=want min=min max=max;
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 1204 views
  • 6 likes
  • 5 in conversation