BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bgosiker
Obsidian | Level 7

Hi Everyone, 

 

I am working with a longitudinal data and have age at every visit. I'd like to use this variable to generate age as a "time metric" . By that I mean if someone's first visit is at age 31 and their second visit is at age 32, then their second visit is accounting for anything that happened since the last visit (ie age 31-32). 

 

Here is the data setup I HAVE: 

 

ID     visit     visit_date     age 

1        1           200            32

1        2           300            32

1        3           400            32

1        4           600            33

1        5           950            34

1        6         1100            34

1        7         1300            35

1        8         1400            35

1        9         1475            35

1       10        1700            36

 

 

I'd like to generate variables that indicate the observation window over which a certain visit covers (ie since the previous visit) so that I can use this as a time metric for an analysis. "start_period" would be the age at the beginning of that specific follow-up period. "end_period" would be the age at the beginning of that follow-up period (ie the age at the current visit). 

 

My main struggle is that sometimes people have multiple visits in a given year - ie they could have 3 visits in a row that have the SAME age (as I have in the example above). So this makes generalizing code a bit more difficult to incorporate the possibility of multiple visits at the same age (glancing through the most I see is 3 in a row, but could potentially be people with more than that). For the first observation of a given individual, it seems easiest to assume that is their birthday to set the baseline age. So, visit 1 would be as shown below. For subsequent visits, I want to use the “visit_date” variable to calculate the exact age at the visit (seen as “new_age” below).

 

The ”start_period” would then be the “end_period” of the previous visit and the “end_period” would be the age at the current visit. The start and end period for the first visit are essentially ignored since we have no observation of the person before the first visit. Below is an example of how I imagine the output looking:

Example using visit 2: new_age = lag(new_age) + ((300-200)/365.25)

 

Here is the data setup I WANT:

ID     visit     visit_date     age   new_age        start_period        end_period 

 1        1           200            32          32                     32                      32               

 1        2           300            32          32.27                32                      32.27

 1        3           400            32          32.55                32.27                 32.55   

 1        4           600            33         33.1                   32.55                 33.1  

 1        5           950            34         34.06                 33.1                   34.06

 1        6         1100            34         34.47                 34.06                 34.47  

 1        7         1300            35         35.02                 34.47                 35.02   

 1        8         1400            35         35.29                 35.02                 35.29   

 1        9         1475            35         35.50                 35.29                 35.50   

 1       10        1700            36          36.12                35.50                 36.12   

 

Cod e that I have thus far that only calculates for the first 2 visits but then stops for some reason

proc sort data=Aim2.visit_gt320; by id vdate;run;
data Aim2.agemetric; set Aim2.visit_gt320;
	by id vdate;
	if first.id then n=1;else n+1; 
	lag_vdate=lag(vdate);
	lag_age=lag(age);
	if first.id then do; 
		age_new=age;
		age_end=age; 
		age_start=age;
	end; 
	lag_age_new=lag(age_new); 
	if n^=1 then do; 
		age_new=lag_age_new+((vdate-lag_vdate)/365.25);
		age_end=age_new;
 		age_start=lag_age_new; 
	end; 

	keep id vdate lag_vdate age lag_age age_new lag_age_new age_start age_end n;
run; 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
James_Jinbei
Calcite | Level 5

I suppose I get the correct result.

See the picture below and here is my method.

Here test is the data you provided and just read them through a txt file into sas dataset.

My logic is to sum the difs between visits and then add the age from the first id. 

data test;

infile test;

input id visit vdate age @@;

run;

proc sort data=test; by id vdate;run;

data want; set test;

by id vdate;

retain temp_age;

if first.id then n=1;else n+1; 

 

lag_vdate=lag(vdate);

lag_age=lag(age);

***sum the diffs among visits;

dif=((vdate-lag_vdate)/365.25);

 

sum+dif;

***replace the first id;

if first.id then do; 

 

age_new=age;

age_end=age;

age_start=age;

temp_age=age;

end; 

***calculate new age;

 

age_new=temp_age+sum;

lag_age_new=lag(age_new);

***start and end;

if n >1 then do; 

 

age_start=lag_age_new;

age_end=age_new;

 

end; 

keep id vdate lag_vdate age lag_age age_new lag_age_new age_start age_end n dif sum temp_age;

run; 

 

Capture.PNG

View solution in original post

4 REPLIES 4
PGStats
Opal | Level 21

Why not use visit_date as your time metric?

PG
bgosiker
Obsidian | Level 7
Should have clarified that in my question -- I am wanting to control for age as tightly as possible in my eventual model as my outcome is highly correlated with age. Using age as the time metric accomplishes this for me in a more direct way for my analysis.
Patrick
Opal | Level 21

@bgosiker 

Something like below returns the result you're after.

data have;
  infile datalines dlm=' ' truncover;
  input ID visit visit_date age _new_age _start_period _end_period ;
  datalines;
1 1 200 32 32 32 32 
1 2 300 32 32.27 32 32.27
1 3 400 32 32.55 32.27 32.55 
1 4 600 33 33.1 32.55 33.1 
1 5 950 34 34.06 33.1 34.06
1 6 1100 34 34.47 34.06 34.47 
1 7 1300 35 35.02 34.47 35.02 
1 8 1400 35 35.29 35.02 35.29 
1 9 1475 35 35.50 35.29 35.50 
1 10 1700 36 36.12 35.50 36.12 
;
run;

data want;
  set have;
  by id visit_date;
  if first.id then 
    do;
      start_age=age;
      start_date=visit_date;
      retain start_age start_date;
      new_age=age;
      retain new_age;
    end;

  start_period=new_age;
  new_age=round(start_age +(visit_date-start_date)/365.25,0.01);
  end_period=new_age;
run;
James_Jinbei
Calcite | Level 5

I suppose I get the correct result.

See the picture below and here is my method.

Here test is the data you provided and just read them through a txt file into sas dataset.

My logic is to sum the difs between visits and then add the age from the first id. 

data test;

infile test;

input id visit vdate age @@;

run;

proc sort data=test; by id vdate;run;

data want; set test;

by id vdate;

retain temp_age;

if first.id then n=1;else n+1; 

 

lag_vdate=lag(vdate);

lag_age=lag(age);

***sum the diffs among visits;

dif=((vdate-lag_vdate)/365.25);

 

sum+dif;

***replace the first id;

if first.id then do; 

 

age_new=age;

age_end=age;

age_start=age;

temp_age=age;

end; 

***calculate new age;

 

age_new=temp_age+sum;

lag_age_new=lag(age_new);

***start and end;

if n >1 then do; 

 

age_start=lag_age_new;

age_end=age_new;

 

end; 

keep id vdate lag_vdate age lag_age age_new lag_age_new age_start age_end n dif sum temp_age;

run; 

 

Capture.PNG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1049 views
  • 0 likes
  • 4 in conversation