How to calculate lagged exposure time

Accepted Solution Solved
Reply
Contributor
Posts: 44
Accepted Solution

How to calculate lagged exposure time

Dear SAS Community,

I recently submitted a question about how to calculate time intervals between observations for a survey data set related to smoking and cancer.  This week, I'm trying to calculate exposure estimates for these data.  I have two aims: To only include exposure time that precedes cancer diagnosis; and to lag the exposure by five years (discount exposure time within five years of diagnosis). I've thought hard about how to do this but I haven't figured it out yet. Here is fake data that demonstrates what mine looks like and what I want my new variable, LagTime to look like:

PersonID     BirthDate SurveyDate     DiagnosisDate    Age     Time     CigsPerDay                              LagTime

1                1/1/1960     7/1/2012          1/1/2008           18     27            20      *Started smoking                  25

1                1/1/1960     7/2/2012          1/1/2008           45     3             15      *Changed smoking habit         0         

1                1/1/1960     7/1/2012          1/1/2008           48     2             10      *Diagnosed at 48                    0

1                1/1/1960     7/1/2012          1/1/2008           50     2               0      *Quit smoking                        0

1                1/1/1960     7/1/2012          1/1/2008           52     0               0      *Current age at survey             0

The exposure histories are all different but every record begins with a line for when the individual started smoking and ends with a line for his/her current smoking habit at time of interview.

Can anyone please comment on how I can calculate LagTime for all observations in my dataset? Thank you very much.


Accepted Solutions
Solution
‎08-08-2016 04:51 PM
Occasional Contributor
Posts: 18

Re: How to calculate lagged exposure time

Ah so you already have DxAge in the source data, that makes things easier and the code a little smaller.

I only used DiagYN as a flag to pick-up the DxAge, with DxAge already on the file you just need this:

data bob;

  format pid age dxAge 3.;

  input pid age dxAge;

  cards;

1  18  48

1  25  48

1  48  48

1  50  48

1  52  48

;

run;

proc sort data=bob;

  by pid descending age;

run;

data newBob;

  set bob;

  by pid;

  retain found lastAge;

  if first.pid then do;

    found = 0;

    lastAge = 0;

  end;

  if lastAge gt 0 then

     time = lastAge - age;

  else

     time = 0;

  if dxAge gt 0 then do;

    if not(found) then do;

      if dxAge - age gt 5 then do;

        found = 1;

        lagTime = (dxAge - age) - 5;

      end;

      else

         lagTime = 0;

    end;

    else

       lagTime = time;

  end;

  else

     lagTime = 0;

  lastAge = age;

run;

proc sort data=newBob;

  by pid age;

run;

And given the this example:

Age     DxAge     Time

18          47          22 *Started

40          47          8   *Quit

48          47          0   *Interview date

You get:

AGE    DXAGE    TIME    LAGTIME

18      47      22        22

40      47       8         2

48      47       0         0

View solution in original post


All Replies
Trusted Advisor
Posts: 1,510

Re: How to calculate lagged exposure time

Would you mind adding how each LagTime is computed?

I think I understand 25 = 48(diagnosis) - 18 (age current record) - 5 (lag). Is that right? No negatives?

What about the other 4 lines?

Contributor
Posts: 44

Re: How to calculate lagged exposure time

Hi Chris,

Here is how I computed lagged exposure time for this observation: 34 years total response time - 4 years for time from diagnosis to interview - 5 years for the lagged exposure before diagnosis = 25 years.  Thanks.

Occasional Contributor
Posts: 18

Re: How to calculate lagged exposure time

Well you have 2 main options.

1, run a data step or proc sql to extract the patients age at time of diagnosis.

this can then be merged back onto the main table and the lag calulated then.

if diagAge - age > 5 then

lagTime = (diagAge - age) - 5;

else

lagTime = 0;

2, reverse sort the main file, step through the data and pick up the DiagAge as you go.

The same calculation applies, you would just have to initialise DiagAge with.

retain diagAge;

if first.personID then

  diagAge = 0;

Contributor
Posts: 44

Re: How to calculate lagged exposure time

Hi Michael,

I tried your first suggestion and it's definitely helpful.  That is, the code subtracts exposure time from the years before diagnosis. However, there seems to be an issue. The function [(DxAge - age) - 5] distorts exposure time for the earlier records.  Here is an example:

Age     DxAge     Time     LagTime        

18          48          4          25

22          48          3          21

25          48         23         18

48          48          2          0

50          48          2          0

52          48          0          0

For ages 25 and later, the exposure time is correctly calculated.  However, LagTime for ages 18 and 22 differ from our desired time of 4 and 3 years, respectively.  Is there a way to fix this?  Thank you.

Occasional Contributor
Posts: 18

Re: How to calculate lagged exposure time

Ah yes, this will need the reverse sorting option then, so you can pick up the Diagnosis Year, then apply it to the first one that needs it, taking time as is for earlier periods.

So this will do it, this actually calculates time and lagTime, it allows for easier testing, I also tried records at 45 & 40, both seemed good.

data bob;

  format pid age 3. diagYn 1.;

  input pid age diagYn;

  cards;

1  18  0

1  25  0

1  48  1

1  50  0

1  52  0

;

run;

proc sort data=bob;

  by pid descending age;

run;

data newBob;

  set bob;

  by pid;

  retain dxAge found lastAge;

  if first.pid then do;

    dxAge = 0;

    found = 0;

    lastAge = 0;

  end;

  if lastAge gt 0 then

     time = lastAge - age;

  else

     time = 0;

  if dxAge gt 0 then do;

    if not(found) then do;

      if dxAge - age gt 5 then do;

        found = 1;

        lagTime = (dxAge - age) - 5;

      end;

      else

         lagTime = 0;

    end;

    else

       lagTime = time;

  end;

  else

     lagTime = 0;

  if dxAge eq 0 and diagYn eq 1 then

     dxAge = age;

  lastAge = age;

run;

proc sort data=newBob;

  by pid age;

run;

Contributor
Posts: 44

Re: How to calculate lagged exposure time

Wow, thank you Micheal! Can I ask a follow-up question? This is how I calculated diagYn to follow your program:

if age = DxAge then diagYn = 1;

else diagYn = 0;

However, not everyone in the dataset had a new line for smoking habit at the age when they were diagnosed. For example, some people quit years before they were diagnosed, so the lines might look like this:

Age     DxAge     Time                     diagYn

18          47          22 *Started               0

40          47          8   *Quit                   0

48          47          0   *Interview date      0

Do you suppose that's an issue for calculating LagTime? I've reviewed the first few individuals and the values look correct. Thank you very much for your help!      

Solution
‎08-08-2016 04:51 PM
Occasional Contributor
Posts: 18

Re: How to calculate lagged exposure time

Ah so you already have DxAge in the source data, that makes things easier and the code a little smaller.

I only used DiagYN as a flag to pick-up the DxAge, with DxAge already on the file you just need this:

data bob;

  format pid age dxAge 3.;

  input pid age dxAge;

  cards;

1  18  48

1  25  48

1  48  48

1  50  48

1  52  48

;

run;

proc sort data=bob;

  by pid descending age;

run;

data newBob;

  set bob;

  by pid;

  retain found lastAge;

  if first.pid then do;

    found = 0;

    lastAge = 0;

  end;

  if lastAge gt 0 then

     time = lastAge - age;

  else

     time = 0;

  if dxAge gt 0 then do;

    if not(found) then do;

      if dxAge - age gt 5 then do;

        found = 1;

        lagTime = (dxAge - age) - 5;

      end;

      else

         lagTime = 0;

    end;

    else

       lagTime = time;

  end;

  else

     lagTime = 0;

  lastAge = age;

run;

proc sort data=newBob;

  by pid age;

run;

And given the this example:

Age     DxAge     Time

18          47          22 *Started

40          47          8   *Quit

48          47          0   *Interview date

You get:

AGE    DXAGE    TIME    LAGTIME

18      47      22        22

40      47       8         2

48      47       0         0

Contributor
Posts: 44

Re: How to calculate lagged exposure time

Fantastic, your program works great! Thank you very much for your help this week.

Contributor
Posts: 44

Re: How to calculate lagged exposure time

[ Edited ]

Hi Michael,

 

You helped me with this issue a while ago and I appreciate it. I have a follow-up question. I have an issue where I have multiple lines for the same Age variable, like this: 

 

data have;

  format id age dxAge 3.;

  input id age dxAge;

  cards;

1  18  48

1  18  48

1  18  48

1  25  48

1  48  48

1  50  48

1  52  48

;

 

I would like to measure the cumulative time for each age, so the cumulative time for age 18 to 25 would be 7*3=21 years. However, the program you suggested only calculates the time for one record and sets the remaining time to 0 (cumulative time=7). I know it's been a while, but could you please suggest some way to address this? Thank you again.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 2479 views
  • 9 likes
  • 3 in conversation