Hi guys,
suppose to have the following dataset:
data DB;
input ID :$20. Admission :date09. Discharge :date09. Index age age_class;
format Admission date9. Discharge date9.;
cards;
0001 13JAN2017 25JAN2017 1 59 6
0001 22FEB2018 03MAR2018 0 60 6
0001 30JAN2019 04MAR2019 0 61 7
0002 01DEC2016 14DEC2016 0 67 7
0002 01DEC2016 14DEC2016 0 67 7
0002 25DEC2017 02JAN2018 1 68 7
0002 25FEB2018 27FEB2018 0 69 7
0003 09JAN2016 25JAN2016 0 34 4
0003 29JAN2018 12FEB2018 1 36 4
0004 02FEB2014 12MAR2014 1 76 8
0004 04APR2018 11APR2019 0 80 8
0004 02DEC2018 11SEP2019 0 80 8
;run;
Is there a way to get the following?
data DB1;
input ID :$20. Admission :date09. Discharge :date09. Index age age_class;
format Admission date9. Discharge date9.;
cards;
0001 13JAN2017 25JAN2017 1 59 6
0001 22FEB2018 03MAR2018 0 59 6
0001 30JAN2019 04MAR2019 0 59 6
0002 01DEC2016 14DEC2016 0 0 0
0002 01DEC2016 14DEC2016 0 0 0
0002 25DEC2017 02JAN2018 1 68 7
0002 25FEB2018 27FEB2018 0 68 7
0003 09JAN2016 25JAN2016 0 0 0
0003 29JAN2018 12FEB2018 1 36 4
0004 02FEB2014 12MAR2014 1 76 8
0004 02FEB2014 12MAR2014 0 0 0
0004 04APR2018 11APR2019 0 76 8
0004 02DEC2018 11SEP2019 0 76 8
;run;
Rules:
1) For all cases like 0001, format age and age_class variables based on what is reported at Index = 1.
2) For all cases like 0002 do the same as for 0001 with the following exception: set to 0 the two variables (age and age_class ) if dates < dates at Index = 1. This should be done in the case dates are identical like in this case (01DEC2016 14DEC2016
) or not.
3) For all cases like 0003 do the same as for 0002.
4) For all cases like 0004: do the same as for 0001 but in this case dates are replicated but in one case Index = 1 and in the other Index = 0. When Index = 0 and dates are replicated, then set the two variables, age and age_class, to 0.
Can anyone help me please?
Or is the goal to just make a subject level variables that indicates the index date and index age?
data want;
merge db db(keep=id admission index age age_class
rename=(admission=idxdate index=any_index age=idx_age age_class=idx_age_class)
where=(any_index))
;
by id;
run;
If for some reason you want to clear the variables for admissions that happened before the index date then add a second step (cannot use the same step because the nature of 1 to many merges).
data want;
set want;
if admission < idxdate then call missing(idx_age,idx_age_class);
run;
Result
I am not sure what you mean by the verb "format" in your question. In SAS FORMAT has a very specific meaning. A FORMAT is instructions for how to convert values into TEXT so they can be displayed or printed. For example in your code you are attaching the format specification DATE9. to the two variables that have date values so they will print in a style that humans can recognize as dates.
I really don't think that is the meaning of "format" that you are asking about.
Perhaps you meant you want to RECODE your variables? Or CALCULATE new variables?
I do not follow your rules.
What do you mean by "all cases like 0001"? What is it about that ID that you are talking about? How is ID=0001 different from ID=0004?
So you want to compare the admission date to the INDEX DATE?
You could just re-merge the index observation back with the dataset.
data want;
merge db db(keep=id admission index rename=(admission=idxdate index=any_index)
where=(any_index))
;
by id;
new_age = age;
if admission < idxdate then new_age=0;
else new_age = age - intck('year',idxdate,admission,'c') ;
run;
Result:
Obs ID Admission Discharge Index age class idxdate index new_age 1 0001 13JAN2017 25JAN2017 1 59 6 13JAN2017 1 59 2 0001 22FEB2018 03MAR2018 0 60 6 13JAN2017 1 59 3 0001 30JAN2019 04MAR2019 0 61 7 13JAN2017 1 59 4 0002 01DEC2016 14DEC2016 0 67 7 25DEC2017 1 0 5 0002 01DEC2016 14DEC2016 0 67 7 25DEC2017 1 0 6 0002 25DEC2017 02JAN2018 1 68 7 25DEC2017 1 68 7 0002 25FEB2018 27FEB2018 0 69 7 25DEC2017 1 69 8 0003 09JAN2016 25JAN2016 0 34 4 29JAN2018 1 0 9 0003 29JAN2018 12FEB2018 1 36 4 29JAN2018 1 36 10 0004 02FEB2014 12MAR2014 1 76 8 02FEB2014 1 76 11 0004 04APR2018 11APR2019 0 80 8 02FEB2014 1 76 12 0004 02DEC2018 11SEP2019 0 80 8 02FEB2014 1 76
Do you really want the zeros? Shouldn't the new age be MISSING for the admissions before the index date?
Or is the goal to just make a subject level variables that indicates the index date and index age?
data want;
merge db db(keep=id admission index age age_class
rename=(admission=idxdate index=any_index age=idx_age age_class=idx_age_class)
where=(any_index))
;
by id;
run;
If for some reason you want to clear the variables for admissions that happened before the index date then add a second step (cannot use the same step because the nature of 1 to many merges).
data want;
set want;
if admission < idxdate then call missing(idx_age,idx_age_class);
run;
Result
If by "format" you really mean "carry the values of the age and age_class variables forward to the next observations for the same ID" This may be one way:
data want; set have; by id; /* assumes data is sorted by ID*/ retain t_age t_ageclass; if first.id then call missing(t_age,t_ageclass); if index=1 then do; t_age=age; t_ageclass=age_class; end; age= coalesce(t_age, age); age_class=coalesce(t_ageclass,age_class); drop t_age t_ageclass; run;
Retain keeps values across the data step boundary for those variables on a Retain statement. So we create a couple variables to hold the values for later use when appropriate.
The BY id allows us to reset the retained values to missing so we don't get them from the previous id. Conditionally assign the retained variable values when Index is 1.
The Coalesce function returns the first non-missing value in the list. So if the t_age is missing it returns age, other wise the t_age variable.
If your data is not actually sorted by ID but is grouped you can add the NOTSORTED option to the BY statement so SAS will execute the code. Otherwise there would be an error created by not sorted data.
Note: Your age_class variable is likely not needed as you could use a an actual format based on the value of AGE to display those categories or group values for analysis or reporting.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.