I'm trying to select cases for a cohort. In order to be selected, they must have a status code of 10 for each month of the year that the person was alive. If someone lived the entire year or died in December, then all 12 STATUS_CODE variables must = 10. But if they died in February, then only STATUS_CODE _01 and STATUS_CODE_02 must = 10. I'm trying to use a do loop, with a count that goes up to the month that the person died (or 12 if they didn't die), for the STATUS_CODE variables.
I also need to reformat the do loop count variable so that a padding 0 is added to single digit months but not to double digit months (so Jan should be 01 instead of 1, but October is still 10 and 010).
But I'm getting an error that my do loop count variable "i" is an invalid argument for SYSFUNC because "i" isn't a number.
Another issue I'm concerned about is whether later months will overwrite the selection flag variable. For example, case 2345 should not be selected because status code = 6 near the middle of the year. But since the last status code is 10, will that overwrite the selection flag variable?
I tried to provide sample data, but I can't get that to work either 😡 All of the BENE_DEATH_DT values are apparently invalid, even though everything is in date9 format. And all values for STATUS_CODE_10 are invalid, even though they're exactly the same as all the other STATUS_CODE variables.
data have;
infile datalines dsd dlm=',' truncover;
input BENE_ID BENE_DEATH_DT date9. VALID_DEATH_DT_SW $ STATUS_CODE_01
STATUS_CODE_02 STATUS_CODE_03 STATUS_CODE_04 STATUS_CODE_05
STATUS_CODE_06 STATUS_CODE_07 STATUS_CODE_08 STATUS_CODE_09
STATUS_CODE_10 STATUS_CODE_11 STATUS_CODE_12;
datalines;
1234,"",.,10,10,10,10,10,10,10,10,10,10,10,10 /*should_be_selected = 1*/
2345,"",.,10,10,10,10,10,10,6,10,10,10,10,10 /*should_be_selected = 0*/
3456,"V",04JUN2018,10,10,10,10,10,10,0,0,0,0,0,0 /*should_be_selected = 1*/
4567,"V",15DEC2018,10,10,10,10,10,10,10,10,10,10,10,10 /*should_be_selected = 1*/
5678,"V",08FEB2018,10,6,0,0,0,0,0,0,0,0,0,0 /*should_be_selected = 0*/
;RUN;
DATA want; SET have;
IF VALID_DEATH_DT_SW = "V" THEN final_month = month(BENE_DEATH_DT);
ELSE IF VALID_DEATH_DT_SW ^= "V" THEN final_month=12;
should_be_selected = 1;
do i=1 to final_month;
%let m=%sysfunc(putn(i,z2.)); /*This should add a padding 0 to single digit numbers*/
IF STATUS_CODE_&m. ^= 10 THEN should_be_selected = 0;
end;
RUN;
proc print data=want;run;
Remember that the macro processor is a PRE-processor. It takes the text of your program and changes it and then passes the results onto to the actual SAS language processor to evaluate and run.
So placing that %LET statement in the middle of a data step makes no sense. Move it to BEFORE the DATA statement since that is where it is actually going to end up executing.
You want to use an ARRAY instead. No code generation needed. So no macro code needed. Also you probably want to deal with the missing values.
Perhaps something like this:
I cleaned up your example data step. Note you do NOT want to read the date variable using FORMATTED mode. Use LIST MODE since you have a delimited input file. So add the colon modifier before the DATE informat. And you cannot have those strings in the last field on the data lines. Why not just add the values from the comments as another variable which will make testing easier.
data have;
infile datalines dsd dlm=',' truncover;
input BENE_ID VALID_DEATH_DT_SW $ BENE_DEATH_DT :date.
STATUS_CODE_01 - STATUS_CODE_12
expected
;
format BENE_DEATH_DT date9.;
datalines;
1234,"",,10,10,10,10,10,10,10,10,10,10,10,10,1
2345,"",,10,10,10,10,10,10,6,10,10,10,10,10,0
3456,"V",04JUN2018,10,10,10,10,10,10,0,0,0,0,0,0,1
4567,"V",15DEC2018,10,10,10,10,10,10,10,10,10,10,10,10,1
5678,"V",08FEB2018,10,6,0,0,0,0,0,0,0,0,0,0,0
;
DATA want;
SET have;
IF VALID_DEATH_DT_SW = "V" THEN final_month = month(BENE_DEATH_DT);
ELSE IF VALID_DEATH_DT_SW ^= "V" THEN final_month=12;
array status_code STATUS_CODE_01 - STATUS_CODE_12;
should_be_selected = 1;
do i=1 to final_month;
IF STATUS_CODE[i] not in (. 10) THEN should_be_selected = 0;
end;
drop i;
RUN;
proc print data=want;
var bene_id expected should_be_selected;
run;
should_be_ Obs BENE_ID expected selected 1 1234 1 1 2 2345 0 0 3 3456 1 1 4 4567 1 1 5 5678 0 0
Here's is the code again with a slightly better version of the datalines... one of the variables was in the wrong order. But most of the problems remain.
data have;
infile datalines dsd dlm=',' truncover;
input BENE_ID VALID_DEATH_DT_SW $ BENE_DEATH_DT date9. STATUS_CODE_01
STATUS_CODE_02 STATUS_CODE_03 STATUS_CODE_04 STATUS_CODE_05
STATUS_CODE_06 STATUS_CODE_07 STATUS_CODE_08 STATUS_CODE_09
STATUS_CODE_10 STATUS_CODE_11 STATUS_CODE_12;
datalines;
1234,"",.,10,10,10,10,10,10,10,10,10,10,10,10 /*should_be_selected = 1*/
2345,"",.,10,10,10,10,10,10,6,10,10,10,10,10 /*should_be_selected = 0*/
3456,"V",04JUN2018,10,10,10,10,10,10,0,0,0,0,0,0 /*should_be_selected = 1*/
4567,"V",15DEC2018,10,10,10,10,10,10,10,10,10,10,10,10 /*should_be_selected = 1*/
5678,"V",08FEB2018,10,6,0,0,0,0,0,0,0,0,0,0 /*should_be_selected = 0*/
;RUN;
DATA want; SET have;
IF VALID_DEATH_DT_SW = "V" THEN final_month = month(BENE_DEATH_DT);
ELSE IF VALID_DEATH_DT_SW ^= "V" THEN final_month=12;
should_be_selected = 1;
do i=1 to final_month;
%let m=%sysfunc(putn(i,z2.)); /*This should add a padding 0 to single digit numbers*/
IF STATUS_CODE_&m. ^= 10 THEN should_be_selected = 0;
end;
RUN;proc print data=want;run;
Remember that the macro processor is a PRE-processor. It takes the text of your program and changes it and then passes the results onto to the actual SAS language processor to evaluate and run.
So placing that %LET statement in the middle of a data step makes no sense. Move it to BEFORE the DATA statement since that is where it is actually going to end up executing.
You want to use an ARRAY instead. No code generation needed. So no macro code needed. Also you probably want to deal with the missing values.
Perhaps something like this:
I cleaned up your example data step. Note you do NOT want to read the date variable using FORMATTED mode. Use LIST MODE since you have a delimited input file. So add the colon modifier before the DATE informat. And you cannot have those strings in the last field on the data lines. Why not just add the values from the comments as another variable which will make testing easier.
data have;
infile datalines dsd dlm=',' truncover;
input BENE_ID VALID_DEATH_DT_SW $ BENE_DEATH_DT :date.
STATUS_CODE_01 - STATUS_CODE_12
expected
;
format BENE_DEATH_DT date9.;
datalines;
1234,"",,10,10,10,10,10,10,10,10,10,10,10,10,1
2345,"",,10,10,10,10,10,10,6,10,10,10,10,10,0
3456,"V",04JUN2018,10,10,10,10,10,10,0,0,0,0,0,0,1
4567,"V",15DEC2018,10,10,10,10,10,10,10,10,10,10,10,10,1
5678,"V",08FEB2018,10,6,0,0,0,0,0,0,0,0,0,0,0
;
DATA want;
SET have;
IF VALID_DEATH_DT_SW = "V" THEN final_month = month(BENE_DEATH_DT);
ELSE IF VALID_DEATH_DT_SW ^= "V" THEN final_month=12;
array status_code STATUS_CODE_01 - STATUS_CODE_12;
should_be_selected = 1;
do i=1 to final_month;
IF STATUS_CODE[i] not in (. 10) THEN should_be_selected = 0;
end;
drop i;
RUN;
proc print data=want;
var bene_id expected should_be_selected;
run;
should_be_ Obs BENE_ID expected selected 1 1234 1 1 2 2345 0 0 3 3456 1 1 4 4567 1 1 5 5678 0 0
Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.