Hello all,
Novice user here. I am given the following data (patient ID, Adm_Date formatted mmddyy10, and Age) and must fill in the missing values for the variable age. I have only been able to calculate the first missing age for each patient ID (dates10/15/2006 and 10/01/2011), and I believe this is at least partially due to my use of the LAG function. Does LAG only work on the first value it references? Would using RETAIN statement or an array be more appropriate? I have not used an array previously. Any hints that would point me in the right direction would be greatly appreciated. Please avoid posting your complete solution, I'd like to figure out as much on my own as possible. Thanks!
A11 12/10/2003 65.00
A11 10/15/2006 .
A11 07/20/2010 .
C02 05/11/2006 71.00
C02 10/01/2011 .
C02 07/03/2012 .
C02 05/08/2015 .
PROC SORT DATA=AdmissionAge;
BY ID;
RUN;
DATA AGEFILL;
SET AdmissionAge;
BY ID;
prev_ADM_DATE=lag(ADM_DATE);
prev_AGE=lag(AGE);
IF NOT FIRST.ID AND AGE=. THEN
AGE = prev_AGE+((ADM_DATE-prev_ADM_DATE)/365.25);
FORMAT prev_ADM_DATE ADM_DATE MMDDYY10. AGE 5.2 ;
DROP prev_AGE prev_ADM_DATE ;
RUN;
My apologies for any formatting errors.
@ratamacue wrote:
Please avoid posting your complete solution, I'd like to figure out as much on my own as possible. Thanks!
Well, that's refreshing and different. I am impressed.
LAG won't work here (without some additional effort because when you are on the third record, the LAG of the AGE is missing (because the value of AGE in the second record is missing).
RETAIN ought to work. I'll say no more, unless you ask.
And ARRAYs don't even apply in this situation.
Thank you! I've tried using a RETAIN statement instead and now the program calculates the difference between each of the previous dates, but I can't figure out how to sum the differences and add them to the original age values for each ID. I'm having trouble figuring out how to reference the original age value properly. Where am I going wrong here? I know I'm misusing FIRST.AGE since the data isn't sorted by age.
Here is my current code:
DATA AGEFILL;
SET AdmissionAge;
BY ID ADM_DATE;
RETAIN AgeDiff;
AgeDiff= dif(ADM_DATE)/365.25;
IF FIRST.ID THEN AgeDiff=.;
ELSE IF NOT FIRST.ID AND AGE=. THEN AGE=FIRST.AGE+AgeDiff;
FORMAT ADM_DATE MMDDYY10. AGE 5.2 ;
DROP AgeDiff;
RUN;
PROC PRINT DATA=AGEFILL;
RUN;
Output (sorry for the slop)
Obs ID ADM_DATE AGE
1 A11 12/10/2003 65.0000
2 A11 10/15/2006 2.8474
3 A11 07/20/2010 3.7618
4 C02 05/11/2006 71.0000
5 C02 10/01/2011 5.3908
6 C02 07/03/2012 0.7556
7 C02 05/08/2015 2.8446
You are asking "how to sum the differences and add them to the original age values for each ID" ?
You can add a summing variable to the retain statement and use the last summed value.
Just initialize the summed variable on first.ID.
If this were my problem I would calculate a likely "date of birth" variable from the first value, retain that, and use other age calculation with the YRDIF function, the date of the visit and the retained DOB.
Or possibly determine why you don't have an actual DOB to begin with and see if it may have been available and has been "lost" in processing.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Early bird rate extended! Save $200 when you sign up by March 31.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.