Hi Everyone!
I have data showing every month a member is enrolled in a program. A person can be continuously enrolled, but most often a member is in and out of the program. The first observation for every member indicates the first date the member is enrolled in the program. Here is an example of the data:
ID | SEX | Eligible_DT |
1A | F | 10/01/2010 |
1A | F | 11/20/2010 |
1A | F | 12/05/2010 |
1B | M | 01/01/2011 |
1B | M | 03/06/2011 |
1B | M | 05/12/2011 |
1B | M | 06/14/2011 |
1B | M | 10/15/2011 |
1B | M | 11/27/2011 |
1C | F | 09/16/2011 |
1C | F | 11/30/2011 |
1D | M | 10/01/2010 |
Let's call this dataset A
What I want to do is create dummy variables from the start date of my analysis to the last date of my analysis (Oct2009 - Sep2015). However, for the dummy variable dates, I only want MonthYear dummy variables (72 dummy variables in total: FORMAT 200910 - 201509). I did an ARRAY but the code I used created dummy variables for every single day between Oct2009 - Sep2015 instead of every single month. This resulted in thousands of dummy variables for the number of days between Oct2009 - Sep2015.
Here is the code I used:
PROC SQL;
SELECT MIN(Eligible_DT) FORMAT YYMMN6., MAX(Eligible_DT) FORMAT YYMMN6. INTO :FIRST_DATE, :LAST_DATE
FROM A;
QUIT;
DATA B1;
SET A;
BY ID SEX;
ARRAY DATE(&FIRST_DATE : &LAST_DATE));
RUN;
My second try used the TRANSPOSE; here is the code:
PROC SORT DATA=A;
BY ID SEX ELIGIBLE_DT;
RUN;
PROC TRANSPOSE DATA=A OUT=B2 PREFIX=DATE;
BY ID SEX;
VAR ID;
ID ELIGIBLE_DT;
FORMAT ELIGIBLE_DT YYMMN6.;
RUN;
This gets me really close to what I want; however, the MonthYear dummy variables do not follow chronological order for the whole dataset (instead, it follows chronological order per member). I would like for the MonthYear dummy variables to go in chronological order for the whole dataset. Also, I would like SAS to input certain values for each dummy MonthYear depending on a certain condition of the member.
For every member, I want to assign "." for people not yet enrolled in the program (my analysis starts before some people enroll), "1" for people enrolled in the program at that particular month in time, and "0" if the person has been incorporated into the analysis but not enrolled in the program at that particular month.
Taking dataset A, here's how I want the final dataset to look like (note, it does not include all the dummy MonthYear variables of my study because I can't fit them all in here):
ID | SEX | DATE 2010 10 | DATE 2010 11 | DATE 2010 12 | DATE 2011 01 | DATE 2011 02 | DATE 2011 03 | DATE 2011 04 | DATE 2011 05 | DATE 2011 06 | DATE 2011 07 | DATE 2011 08 | DATE 2011 09 | DATE 2011 10 | DATE 2011 11 |
1A | F | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1B | M | . | . | . | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 1 |
1C | F | . | . | . | . | . | . | . | . | . | . | . | 1 | 0 | 1 |
1D | M | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
I've been trying to figure this out all day and just can't seem to overcome the hurdle on my own.
I am using SAS 9.4
Muchas Gracias
I don't know if my datast NEED is exactly the dummy set Peter had in mind, but I think it is transpose-friendly.
The idea is to make dataset need as 1 record for each month from OCT2009 through SEP2015 with the dummy var set to one for months in which eligible_dt occurs. The "trick" is to use standard dates - always the first of the month for managing the calendar.
The other trick is to loop date variable d from the lagged value of eligible_dt to the month prior to the current eligible_dt, outputting a record each time.
data have;
input ID :$2. SEX :$1. Eligible_DT :mmddyy10.;
format eligible_dt date9.;
datalines;
1A F 10/01/2010
1A F 11/20/2010
1A F 12/05/2010
1B M 01/01/2011
1B M 03/06/2011
1B M 05/12/2011
1B M 06/14/2011
1B M 10/15/2011
1B M 11/27/2011
1C F 09/16/2011
1C F 11/30/2011
1D M 10/01/2010
run;
data need (keep=id sex monthname dummy d);
set have;
by id;
if eod2=0 then set have (firstobs=2 keep=eligible_dt rename=(eligible_dt=nxtdt)) end=eod2;
d1=intnx('month',eligible_dt,0,'beg');
dummy=0;
do d= ifn(first.id,'01oct2009'd,intnx('month',lag(d1),1,'beg')) by 0 while (d<d1);
monthname=put(d,yymon7.);
output;
d=intnx('month',d,1,'beg');
end;
dummy=1; d=d1; monthname=put(d,yymon7.); output;
dummy=0;
if last.id then do d= intnx('month',d1,1,'beg') by 0 while (d<='01sep2015'd);
monthname=put(d,yymon7.);
output;
d=intnx('month',d,1,'beg');
end;
format d yymmddn8.;
run;
proc transpose data=need out=want (drop=_name_);
by id sex ;
var dummy;
id monthname;
run;
Note the expressions like
do d=xxx by 0 while (d<yyyy);
are identical to
d=xxx;
do while (d<yyyy);
Thanks for the response! Unfortunately I have no clue what you just said, and definitely no idea how to execute what you're suggesting.
I think I understand what you mean but can't seem to execute it correctly. What exactly goes in the prefix? And what do i put in the where statement?
I don't know if my datast NEED is exactly the dummy set Peter had in mind, but I think it is transpose-friendly.
The idea is to make dataset need as 1 record for each month from OCT2009 through SEP2015 with the dummy var set to one for months in which eligible_dt occurs. The "trick" is to use standard dates - always the first of the month for managing the calendar.
The other trick is to loop date variable d from the lagged value of eligible_dt to the month prior to the current eligible_dt, outputting a record each time.
data have;
input ID :$2. SEX :$1. Eligible_DT :mmddyy10.;
format eligible_dt date9.;
datalines;
1A F 10/01/2010
1A F 11/20/2010
1A F 12/05/2010
1B M 01/01/2011
1B M 03/06/2011
1B M 05/12/2011
1B M 06/14/2011
1B M 10/15/2011
1B M 11/27/2011
1C F 09/16/2011
1C F 11/30/2011
1D M 10/01/2010
run;
data need (keep=id sex monthname dummy d);
set have;
by id;
if eod2=0 then set have (firstobs=2 keep=eligible_dt rename=(eligible_dt=nxtdt)) end=eod2;
d1=intnx('month',eligible_dt,0,'beg');
dummy=0;
do d= ifn(first.id,'01oct2009'd,intnx('month',lag(d1),1,'beg')) by 0 while (d<d1);
monthname=put(d,yymon7.);
output;
d=intnx('month',d,1,'beg');
end;
dummy=1; d=d1; monthname=put(d,yymon7.); output;
dummy=0;
if last.id then do d= intnx('month',d1,1,'beg') by 0 while (d<='01sep2015'd);
monthname=put(d,yymon7.);
output;
d=intnx('month',d,1,'beg');
end;
format d yymmddn8.;
run;
proc transpose data=need out=want (drop=_name_);
by id sex ;
var dummy;
id monthname;
run;
Note the expressions like
do d=xxx by 0 while (d<yyyy);
are identical to
d=xxx;
do while (d<yyyy);
Wow this is incredible! For the most part I see what you did (need a bit more time digesting it to fully understand it), but it does most of what I need!!
One quick thing, is there a way to have "." (missing value) for the dummy MonthYear variables before the first enrolled date? Let me explain my rationale for this. For the analysis I want to do, members can be categorized into three groups: A) Member not in the study (which by default means they are not in the program) which I want to code as "." (missing); B) Member is in the study but not enrolled in the program, which I want to code as zero; C) Member is in the study and enrolled in the program, which I want to code as one. Your current code captures groups B and C perfectly! However, it also codes zero in places I want missing values. Like this, when SAS runs the analysis, it can correctly calculate the results.
Sorry for the complexity with all this and do appreciate your help!
Actually, nevermind! I thought my analysis some more and realized the way you set up the dataset is exactly how I need it! Ignore my message about needing missing values.
Thanks for the help!
Hey Peter_C, I really appreciate you taking time off your day to help me with this. I did try your code but it didn't run smoothly for me. I suspect there was something I needed to adapt on my end but failed to do so. Since MKeintz's solution works for me, I decided to go ahead and use that solution. You don't have to spend anymore time on this problem unless you really just want to challenge yourself.
Thanks for the assistance!
Bascially take the min/max month for each id and use that to blow up to all possible months.
First let's start with your sample data and convert the dates you have to the first of the month.
data have ;
input id :$2. sex $1. eligdt :yymmdd10. @@ ;
month = intx('month',eligdt,0,'b');
format eligdt month yymmdd10.;
cards;
1A F 2010-10-01 1A F 2010-11-20 1A F 2010-12-05
1B M 2011-01-01 1B M 2011-03-06 1B M 2011-05-12
1B M 2011-06-14 1B M 2011-10-15 1B M 2011-11-27
1C F 2011-09-16 1C F 2011-11-30
1D M 2010-10-01
;;;;
Now a quick PROC SUMMARY step will create a table with min/max month for each ID and add an extra row for the overall min/max.
proc summary data=have ;
class id sex ;
types () id*sex ;
var month ;
output out=ranges min=first_month max=last_month ;
run;
Now add a data step to blow that out into one record for each month in the intervals.
data all_dates ;
set ranges ;
do i=0 to intck('month',first_month,last_month) ;
month = intnx('month',first_month,i,'b');
format month yymmdd10. ;
output;
end;
run;
Then merge it back with the source table so that we can create the flag variable with the .,0,1 values. We can also convert the MONTH into a value that makes a nice SAS variable name, like YyyyyMmm .
data all ;
merge all_dates (in=in1) have (in=in2);
by id month ;
flag = in2 ;
length _name_ $8 ;
_name_ = cats('Y',put(month,yymm7.));
run;
And finally use PROC TRANSPOSE to create the dummy variables, use a WHERE= dataset option to eliminate the dummy row that is setting the overall list of months.
proc transpose data=all out=want(drop=_: where=(not missing(id))) ;
by id sex ;
var flag ;
run;
Y Y Y Y Y Y Y Y Y Y Y Y Y Y 2 2 2 2 2 2 2 2 2 2 2 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1 1 1 O s M M M M M M M M M M M M M M b i e 1 1 1 0 0 0 0 0 0 0 0 0 1 1 s d x 0 1 2 1 2 3 4 5 6 7 8 9 0 1 1 1A F 1 1 1 . . . . . . . . . . . 2 1B M . . . 1 0 1 0 1 1 0 0 0 1 1 3 1C F . . . . . . . . . . . 1 0 1 4 1D M 1 . . . . . . . . . . . . .
Thanks for the response! I'll try it on Monday when I get back to work to see how it goes!!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.