BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
zombiegirl
Calcite | Level 5

Hi, I'm working with longitudinal data on patients that may have two types of infections (X & Y). I want to create a new variable for patients that have any "Infection Y" at any time after an "Infection X" event. The interval variable is in months. I have tried a lot of different things, but can't seem to get any to work. Please, if you have any advice, please let me know as I am very frustrated 🙂 Thank you! 

 

Here is what my dataset looks like:

 

ID    INTERVAL   INFECT_X   INFECT_Y       

1     1.1               1                  0

1     1.4               0                  0 

1     1.5               0                  1

2     2.4               0                  1

2     2.6               1                  0 

3     1.6               0                  1

3     1.9               1                  0

3     2.0               0                  1

4     2.2               0                  0 

4     2.4               0                  0

 

Here is what I want it to look like:

ID    INTERVAL   INFECT_X   INFECT_Y   NEWVAR

1     1.1               1                  0                 1  

1     1.4               0                  0                 1

1     1.5               0                  1                 1

2     2.4               0                  1                 0

2     2.6               1                  0                 0 

3     1.6               0                  1                 1

3     1.9               1                  0                 1

3     2.0               0                  1                 1

4     2.2               0                  0                 0

4     2.4               0                  0                 0

 

Also, note that an Infect_Y can occur before the Infect_X (ID #3), as long as another infect_Y occurs again after infect_X, it counts. Also, the infections are not always consecutive, there can be a long time between them. I appreciate any advice, thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

@Kurt_Bremser's code will run better if you change the line:

;

to read:

end;

Alternatively, you can produce the same result using a DOW loop:

data want (drop=inx);
  do until (last.ID);
    set have;
    by id;
    if infect_x then inx=1;
    else if inx*infect_y then newvar=1;
  end;
  do until (last.ID);
    set have;
    by id;
    if newvar ne 1 then newvar=0;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post

8 REPLIES 8
Kurt_Bremser
Super User

One method with an intermediate table:

data int (keep=id newvar);
set have;
by id;
retain newvar inf_x;
if first.id
then do;
  newvar = 0;
  inf_x = 0;
;
if infect_x then inf_x = 1;
if inf_x and infect_y then newvar = 1;
if last.id then output;
run;

data want;
merge
  have
  int
;
by id;
run;

Untested. For tested code, provide your example data in a data step with datalines for easy recreation.

zombiegirl
Calcite | Level 5
Thanks Kurt! I ended up having to use Art's iteration, somehow yours gave me zero observations for newvar (I'm 100% sure my own user error, as I'm a newbie), but thank you for your kind suggestions! 🙂
art297
Opal | Level 21

@Kurt_Bremser's code will run better if you change the line:

;

to read:

end;

Alternatively, you can produce the same result using a DOW loop:

data want (drop=inx);
  do until (last.ID);
    set have;
    by id;
    if infect_x then inx=1;
    else if inx*infect_y then newvar=1;
  end;
  do until (last.ID);
    set have;
    by id;
    if newvar ne 1 then newvar=0;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

zombiegirl
Calcite | Level 5

Thank you SOO much, Art! I tried all the suggested answers, and yours was the one that worked with how my dataset was written. (The others probably would've worked too, I probably messed them up somehow Smiley Tongue). 

ballardw
Super User

Just for curiosity sake what does 0.1 actually mean in the interval? Since "month" does not have a standard length in common usage I wonder what decimal values actually represent.

zombiegirl
Calcite | Level 5

Hi Ballard, I think its a result of timestamping done by hospital electronic record systems. Fortunately, most of my patients are followed for a few years, so I dont think the little bits of time difference have a huge impact on the results. It was a little misleading the way I wrote the sample dataset, the infections are typically much further apart, I was typing fast. Sorry I don't have a more definitive answer for you! Smiley LOL

JohnSAScom
Quartz | Level 8

A possible solution:

 

data disease;

input ID INTERVAL INFECT_X INFECT_Y;

datalines;

1 1.1 1 0

1 1.4 0 0

1 1.5 0 1

2 2.4 0 1

2 2.6 1 0

3 1.6 0 1

3 1.9 1 0

3 2.0 0 1

4 2.2 0 0

4 2.4 0 0

;

 

data disease1 (drop=INT);

      set disease;

      by ID;

      retain INT 8;

      length NEWVAR 8;

      if first.ID then do;

            INT = INTERVAL;

            NEWVAR = 0;

      end;

      if INFECT_X = 1 then INT=INTERVAL;

      if INFECT_Y = 1 and INTERVAL > INT then NEWVAR = 1; else NEWVAR = 0;

run;

 

proc sql;

      create table NEWVARS as

      select distinct(ID), NEWVAR from disease1 where NEWVAR = 1;

      create table disease2 as

      select a.ID, a.INTERVAL, a.INFECT_X, a.INFECT_Y, b.NEWVAR

      from disease1 a left join NEWVARS b on a.ID = b.ID

      order by ID, INTERVAL;

      update disease2

      set NEWVAR = 0 where NEWVAR = .;

quit;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2395 views
  • 2 likes
  • 5 in conversation