Hi Everyone!
Here is my data setup
ID visit visit_date age
1 470 2247 55
1 480 2500 .
2 500 2800 57
2 510 2984 58
3 510 3050 35
3 520 3175 .
4 470 2278 42
4 480 2585 43
I have a long dataset with multiple visits per person. I am trying to fill in the missing ages. I'd like to do this by taking the time between the visit dates added to the age at the prior visit to find the age for the next visit. I would like to "assume" that each person is at the halfway mark of their age (ie age=55 is actually age=55.5, but my dataset only has whole numbers). Ie for ID #1 I'd like it to fill in the blank with 2500-2247= 253 + (365/2 to account for the mid year assumption). So then it would be 55.5+ (253/365)= 56.19 and then just round down to the closest whole number so the blank would be filled with 56.
I'd like to do it this way to account for any participants where there may be a large gap between visits so I don't just assume they are 1 year older at the subsequent visit.
Thanks in advance for any help with this - I haven't been able to figure out any decent approach to it so any guidance is appreciated!
Can we generalize from the data you show, and assume that the blanks are always the second item for each ID? If so, then this approach ought to work
data want;
set have;
by id;
prev_visit_date=lag(visit_date);
prev_age=lag(age);
if last.id and not first.id and missing(age) then
age = round(prev_age+0.5+(visit_date-prev_visit_date)/365);
run;
If we can't make that assumption, then you need to clearly state the full set of assumptions needed regarding the setup of your data.
Should we extend the logic?
For example, if two AGE values in a row are missing, should both be filled in based on the most recent nonmissing value? If the very first AGE is missing for an ID, should we work backwards from the next available AGE value?
In this case, I have the dataset cut down to two visits for isolating the age at a specific time (the second visit) and no one has both ages missing, so basing it off of carrying forward + the difference between the visit times should be sufficient for me!
Can we generalize from the data you show, and assume that the blanks are always the second item for each ID? If so, then this approach ought to work
data want;
set have;
by id;
prev_visit_date=lag(visit_date);
prev_age=lag(age);
if last.id and not first.id and missing(age) then
age = round(prev_age+0.5+(visit_date-prev_visit_date)/365);
run;
If we can't make that assumption, then you need to clearly state the full set of assumptions needed regarding the setup of your data.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.