BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bgosiker
Obsidian | Level 7

Hi Everyone! 

 

Here is my data setup 

 

ID       visit   visit_date   age 

1         470      2247        55

1         480      2500          .

2         500      2800        57

2         510      2984        58

3         510      3050        35

3         520      3175          .

4         470      2278        42

4         480      2585        43 

 

I have a long dataset with multiple visits per person. I am trying to fill in the missing ages. I'd like to do this by taking the time between the visit dates added to the age at the prior visit to find the age for the next visit. I would like to "assume" that each person is at the halfway mark of their age (ie age=55 is actually age=55.5, but my dataset only has whole numbers).  Ie for ID #1 I'd like it to fill in the blank with 2500-2247= 253  + (365/2 to account for the mid year assumption). So then it would be 55.5+ (253/365)= 56.19 and then just round down to the closest whole number so the blank would be filled with 56. 

 

I'd like to do it this way to account for any participants where there may be a large gap between visits so I don't just assume they are 1 year older at the subsequent visit. 

 

Thanks in advance for any help with this - I haven't been able to figure out any decent approach to it so any guidance is appreciated! 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

Can we generalize from the data you show, and assume that the blanks are always the second item for each ID? If so, then this approach ought to work

 

data want;
    set have;
    by id;
    prev_visit_date=lag(visit_date);
    prev_age=lag(age);
    if last.id and not first.id and missing(age) then
        age = round(prev_age+0.5+(visit_date-prev_visit_date)/365);
run;

If we can't make that assumption, then you need to clearly state the full set of assumptions needed regarding the setup of your data.

--
Paige Miller

View solution in original post

4 REPLIES 4
Astounding
PROC Star

Should we extend the logic? 

 

For example, if two AGE values in a row are missing, should both be filled in based on the most recent nonmissing value?  If the very first AGE is missing for an ID, should we work backwards from the next available AGE value?

bgosiker
Obsidian | Level 7

In this case, I have the dataset cut down to two visits for isolating the age at a specific time (the second visit) and no one has both ages missing, so basing it off of carrying forward + the difference between the visit times should be sufficient for me! 

PaigeMiller
Diamond | Level 26

Can we generalize from the data you show, and assume that the blanks are always the second item for each ID? If so, then this approach ought to work

 

data want;
    set have;
    by id;
    prev_visit_date=lag(visit_date);
    prev_age=lag(age);
    if last.id and not first.id and missing(age) then
        age = round(prev_age+0.5+(visit_date-prev_visit_date)/365);
run;

If we can't make that assumption, then you need to clearly state the full set of assumptions needed regarding the setup of your data.

--
Paige Miller
Astounding
PROC Star
Similar but different:

data want;
set have;
by id;
prior_age = lag(age);
N_days = dif(visit_date) + 365/2;
if first.id=0 and age=. then age =
prior_age + int(n_days / 365) ;
run;

Also, consider using a denominator of 365.25 instead of 365.

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1157 views
  • 0 likes
  • 3 in conversation