Hi team - hoping to get some help with the below issue. Essentially, I want to create a new variable that keeps the most recent value based on another variable, based on date. See below:
This is the data I have:
PATIENT_ID LAB_RESULT DATE
1 A 1/1/22
1 B 2/1/22
2 C 1/1/22
2 D 2/1/22
2 E 3/1/22
3 F 1/1/22
This is the data I want:
PATIENT_ID LAB_RESULT DATE LAB_RECENT
1 A 1/1/22
1 B 2/1/22 B
2 C 1/1/22
2 D 2/1/22
2 E 3/1/22 E
3 F 1/1/22 F
Try this
data have;
input PATIENT_ID LAB_RESULT $ DATE:ddmmyy10.;
format date ddmmyy10.;
datalines;
1 A 1/1/22
1 B 2/1/22
2 C 1/1/22
2 D 2/1/22
2 E 3/1/22
3 F 1/1/22
;
data want;
set have;
by PATIENT_ID;
if last.PATIENT_ID then LAB_RECENT = LAB_RESULT;
run;
Is this data set always sorted by PATIENT_ID and DATE? If yes, then:
data want;
set have;
by patient_id;
if last.patient_id then lab_recent=lab_result;
run;
Thank you. I have a wrench to add, though. What if the whole thing is dependent on a flag variable being checked. For example, below, the 3/1/22 lab result for Patient_ID 2 would not be considered in the new variable.
This is the new have/want situation:
Have:
Patient_ID | Lab_Result | Lab_Flag | Date |
1 | A | Y | 1/1/2022 |
1 | B | Y | 2/1/2022 |
2 | C | Y | 1/1/2022 |
2 | D | Y | 2/1/2022 |
2 | E | 3/1/2022 | |
3 | F | Y | 1/1/2022 |
Want:
Patient_ID | Lab_Result | Lab_Flag | Date | Lab_Recent |
1 | A | Y | 1/1/2022 | |
1 | B | Y | 2/1/2022 | B |
2 | C | Y | 1/1/2022 | |
2 | D | Y | 2/1/2022 | D |
2 | E | 3/1/2022 | ||
3 | F | Y | 1/1/2022 | F |
Makes it a bit more complicated, but try this
data have;
input PATIENT_ID LAB_RESULT $ Lab_Flag $ DATE:ddmmyy10.;
format DATE ddmmyy10.;
infile datalines dlm = '|' dsd;
datalines;
1|A|Y|1/1/22
1|B|Y|2/1/22
2|C|Y|1/1/22
2|D|Y|2/1/22
2|E| |3/1/22
3|F|Y|1/1/22
;
data want(drop = i n);
do i = 1 by 1 until (last.PATIENT_ID);
set have;
by PATIENT_ID;
if Lab_Flag = 'Y' then n = i;
end;
do i = 1 to i;
set have;
LAB_RECENT = ifc(i = n, LAB_RESULT, '');
output;
end;
run;
Result
PATIENT_ID LAB_RESULT Lab_Flag DATE LAB_RECENT 1 A Y 01/01/2022 1 B Y 02/01/2022 B 2 C Y 01/01/2022 2 D Y 02/01/2022 D 2 E 03/01/2022 3 F Y 01/01/2022 F
Simple, don't include rows in the analysis if they don't have Y
data want;
set have(where=(lab_flag='Y'));
by patient_id;
if last.patient_id then lab_recent=lab_result;
run;
Thanks for the quick responses!
Wouldn't the new table in this solution include only rows where Lab_flag = "Y"? I want all the original rows in the resulting table.
@v2murthy wrote:
Thanks for the quick responses!
Wouldn't the new table in this solution include only rows where Lab_flag = "Y"? I want all the original rows in the resulting table.
Yes, that's correct, but one more step gives you what you want if you merge the original data with the result from my code.
Thank you. This has helped immensely.
What about if I have two conditions in the where clause? what would be the correct syntax?
For example, instead of
data want ;
set have (where = (Flag = "Y")) ;
run ;
I have two conditions.. I am trying to do this:
data want ;
set have (where = (Flag = "Y" and days >=30)) ;
run ;
This doesn't seem to work.
Try this
data have;
input PATIENT_ID LAB_RESULT $ DATE:ddmmyy10.;
format date ddmmyy10.;
datalines;
1 A 1/1/22
1 B 2/1/22
2 C 1/1/22
2 D 2/1/22
2 E 3/1/22
3 F 1/1/22
;
data want;
set have;
by PATIENT_ID;
if last.PATIENT_ID then LAB_RECENT = LAB_RESULT;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.