Hi Everyone,
I am working with a longitudinal data and have age at every visit. I'd like to use this variable to generate age as a "time metric" . By that I mean if someone's first visit is at age 31 and their second visit is at age 32, then their second visit is accounting for anything that happened since the last visit (ie age 31-32).
Here is the data setup I HAVE:
ID visit visit_date age
1 1 200 32
1 2 300 32
1 3 400 32
1 4 600 33
1 5 950 34
1 6 1100 34
1 7 1300 35
1 8 1400 35
1 9 1475 35
1 10 1700 36
I'd like to generate variables that indicate the observation window over which a certain visit covers (ie since the previous visit) so that I can use this as a time metric for an analysis. "start_period" would be the age at the beginning of that specific follow-up period. "end_period" would be the age at the beginning of that follow-up period (ie the age at the current visit).
My main struggle is that sometimes people have multiple visits in a given year - ie they could have 3 visits in a row that have the SAME age (as I have in the example above). So this makes generalizing code a bit more difficult to incorporate the possibility of multiple visits at the same age (glancing through the most I see is 3 in a row, but could potentially be people with more than that). For the first observation of a given individual, it seems easiest to assume that is their birthday to set the baseline age. So, visit 1 would be as shown below. For subsequent visits, I want to use the “visit_date” variable to calculate the exact age at the visit (seen as “new_age” below).
The ”start_period” would then be the “end_period” of the previous visit and the “end_period” would be the age at the current visit. The start and end period for the first visit are essentially ignored since we have no observation of the person before the first visit. Below is an example of how I imagine the output looking:
Example using visit 2: new_age = lag(new_age) + ((300-200)/365.25)
Here is the data setup I WANT:
ID visit visit_date age new_age start_period end_period
1 1 200 32 32 32 32
1 2 300 32 32.27 32 32.27
1 3 400 32 32.55 32.27 32.55
1 4 600 33 33.1 32.55 33.1
1 5 950 34 34.06 33.1 34.06
1 6 1100 34 34.47 34.06 34.47
1 7 1300 35 35.02 34.47 35.02
1 8 1400 35 35.29 35.02 35.29
1 9 1475 35 35.50 35.29 35.50
1 10 1700 36 36.12 35.50 36.12
Cod e that I have thus far that only calculates for the first 2 visits but then stops for some reason
proc sort data=Aim2.visit_gt320; by id vdate;run;
data Aim2.agemetric; set Aim2.visit_gt320;
by id vdate;
if first.id then n=1;else n+1;
lag_vdate=lag(vdate);
lag_age=lag(age);
if first.id then do;
age_new=age;
age_end=age;
age_start=age;
end;
lag_age_new=lag(age_new);
if n^=1 then do;
age_new=lag_age_new+((vdate-lag_vdate)/365.25);
age_end=age_new;
age_start=lag_age_new;
end;
keep id vdate lag_vdate age lag_age age_new lag_age_new age_start age_end n;
run;
I suppose I get the correct result.
See the picture below and here is my method.
Here test is the data you provided and just read them through a txt file into sas dataset.
My logic is to sum the difs between visits and then add the age from the first id.
data test;
infile test;
input id visit vdate age @@;
run;
proc sort data=test; by id vdate;run;
data want; set test;
by id vdate;
retain temp_age;
if first.id then n=1;else n+1;
lag_vdate=lag(vdate);
lag_age=lag(age);
***sum the diffs among visits;
dif=((vdate-lag_vdate)/365.25);
sum+dif;
***replace the first id;
if first.id then do;
age_new=age;
age_end=age;
age_start=age;
temp_age=age;
end;
***calculate new age;
age_new=temp_age+sum;
lag_age_new=lag(age_new);
***start and end;
if n >1 then do;
age_start=lag_age_new;
age_end=age_new;
end;
keep id vdate lag_vdate age lag_age age_new lag_age_new age_start age_end n dif sum temp_age;
run;
Why not use visit_date as your time metric?
Something like below returns the result you're after.
data have;
infile datalines dlm=' ' truncover;
input ID visit visit_date age _new_age _start_period _end_period ;
datalines;
1 1 200 32 32 32 32
1 2 300 32 32.27 32 32.27
1 3 400 32 32.55 32.27 32.55
1 4 600 33 33.1 32.55 33.1
1 5 950 34 34.06 33.1 34.06
1 6 1100 34 34.47 34.06 34.47
1 7 1300 35 35.02 34.47 35.02
1 8 1400 35 35.29 35.02 35.29
1 9 1475 35 35.50 35.29 35.50
1 10 1700 36 36.12 35.50 36.12
;
run;
data want;
set have;
by id visit_date;
if first.id then
do;
start_age=age;
start_date=visit_date;
retain start_age start_date;
new_age=age;
retain new_age;
end;
start_period=new_age;
new_age=round(start_age +(visit_date-start_date)/365.25,0.01);
end_period=new_age;
run;
I suppose I get the correct result.
See the picture below and here is my method.
Here test is the data you provided and just read them through a txt file into sas dataset.
My logic is to sum the difs between visits and then add the age from the first id.
data test;
infile test;
input id visit vdate age @@;
run;
proc sort data=test; by id vdate;run;
data want; set test;
by id vdate;
retain temp_age;
if first.id then n=1;else n+1;
lag_vdate=lag(vdate);
lag_age=lag(age);
***sum the diffs among visits;
dif=((vdate-lag_vdate)/365.25);
sum+dif;
***replace the first id;
if first.id then do;
age_new=age;
age_end=age;
age_start=age;
temp_age=age;
end;
***calculate new age;
age_new=temp_age+sum;
lag_age_new=lag(age_new);
***start and end;
if n >1 then do;
age_start=lag_age_new;
age_end=age_new;
end;
keep id vdate lag_vdate age lag_age age_new lag_age_new age_start age_end n dif sum temp_age;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.