Hi Everyone, I am working with a longitudinal data and have age at every visit. I'd like to use this variable to generate age as a "time metric" . By that I mean if someone's first visit is at age 31 and their second visit is at age 32, then their second visit is accounting for anything that happened since the last visit (ie age 31-32). Here is the data setup I HAVE: ID visit visit_date age 1 1 200 32 1 2 300 32 1 3 400 32 1 4 600 33 1 5 950 34 1 6 1100 34 1 7 1300 35 1 8 1400 35 1 9 1475 35 1 10 1700 36 I'd like to generate variables that indicate the observation window over which a certain visit covers (ie since the previous visit) so that I can use this as a time metric for an analysis. "start_period" would be the age at the beginning of that specific follow-up period. "end_period" would be the age at the beginning of that follow-up period (ie the age at the current visit). My main struggle is that sometimes people have multiple visits in a given year - ie they could have 3 visits in a row that have the SAME age (as I have in the example above). So this makes generalizing code a bit more difficult to incorporate the possibility of multiple visits at the same age (glancing through the most I see is 3 in a row, but could potentially be people with more than that). For the first observation of a given individual, it seems easiest to assume that is their birthday to set the baseline age. So, visit 1 would be as shown below. For subsequent visits, I want to use the “visit_date” variable to calculate the exact age at the visit (seen as “new_age” below). The ”start_period” would then be the “end_period” of the previous visit and the “end_period” would be the age at the current visit. The start and end period for the first visit are essentially ignored since we have no observation of the person before the first visit. Below is an example of how I imagine the output looking: Example using visit 2: new_age = lag(new_age) + ((300-200)/365.25) Here is the data setup I WANT: ID visit visit_date age new_age start_period end_period 1 1 200 32 32 32 32 1 2 300 32 32.27 32 32.27 1 3 400 32 32.55 32.27 32.55 1 4 600 33 33.1 32.55 33.1 1 5 950 34 34.06 33.1 34.06 1 6 1100 34 34.47 34.06 34.47 1 7 1300 35 35.02 34.47 35.02 1 8 1400 35 35.29 35.02 35.29 1 9 1475 35 35.50 35.29 35.50 1 10 1700 36 36.12 35.50 36.12 Cod e that I have thus far that only calculates for the first 2 visits but then stops for some reason proc sort data=Aim2.visit_gt320; by id vdate;run;
data Aim2.agemetric; set Aim2.visit_gt320;
by id vdate;
if first.id then n=1;else n+1;
lag_vdate=lag(vdate);
lag_age=lag(age);
if first.id then do;
age_new=age;
age_end=age;
age_start=age;
end;
lag_age_new=lag(age_new);
if n^=1 then do;
age_new=lag_age_new+((vdate-lag_vdate)/365.25);
age_end=age_new;
age_start=lag_age_new;
end;
keep id vdate lag_vdate age lag_age age_new lag_age_new age_start age_end n;
run;
... View more