Solved: Sequential variable problem, please help!

zombiegirl · Posted 08-13-2017 05:28 PM

Hi, I'm working with longitudinal data on patients that may have two types of infections (X & Y). I want to create a new variable for patients that have any "Infection Y" at any time after an "Infection X" event. The interval variable is in months. I have tried a lot of different things, but can't seem to get any to work. Please, if you have any advice, please let me know as I am very frustrated 🙂 Thank you!

Here is what my dataset looks like:

ID INTERVAL INFECT_X INFECT_Y

1 1.1 1 0

1 1.4 0 0

1 1.5 0 1

2 2.4 0 1

2 2.6 1 0

3 1.6 0 1

3 1.9 1 0

3 2.0 0 1

4 2.2 0 0

4 2.4 0 0

Here is what I want it to look like:

ID INTERVAL INFECT_X INFECT_Y NEWVAR

1 1.1 1 0 1

1 1.4 0 0 1

1 1.5 0 1 1

2 2.4 0 1 0

2 2.6 1 0 0

3 1.6 0 1 1

3 1.9 1 0 1

3 2.0 0 1 1

4 2.2 0 0 0

4 2.4 0 0 0

Also, note that an Infect_Y can occur before the Infect_X (ID #3), as long as another infect_Y occurs again after infect_X, it counts. Also, the infections are not always consecutive, there can be a long time between them. I appreciate any advice, thanks!

art297 · Posted 08-14-2017 10:08 AM

@Kurt_Bremser's code will run better if you change the line:

to read:

end;

Alternatively, you can produce the same result using a DOW loop:

data want (drop=inx);
  do until (last.ID);
    set have;
    by id;
    if infect_x then inx=1;
    else if inx*infect_y then newvar=1;
  end;
  do until (last.ID);
    set have;
    by id;
    if newvar ne 1 then newvar=0;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

View solution in original post

Kurt_Bremser · Posted 08-14-2017 09:41 AM

One method with an intermediate table:

data int (keep=id newvar);
set have;
by id;
retain newvar inf_x;
if first.id
then do;
  newvar = 0;
  inf_x = 0;
;
if infect_x then inf_x = 1;
if inf_x and infect_y then newvar = 1;
if last.id then output;
run;

data want;
merge
  have
  int
;
by id;
run;

Untested. For tested code, provide your example data in a data step with datalines for easy recreation.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

zombiegirl · Posted 08-14-2017 10:00 PM

Thanks Kurt! I ended up having to use Art's iteration, somehow yours gave me zero observations for newvar (I'm 100% sure my own user error, as I'm a newbie), but thank you for your kind suggestions! 🙂

art297 · Posted 08-14-2017 10:08 AM

@Kurt_Bremser's code will run better if you change the line:

to read:

end;

Alternatively, you can produce the same result using a DOW loop:

data want (drop=inx);
  do until (last.ID);
    set have;
    by id;
    if infect_x then inx=1;
    else if inx*infect_y then newvar=1;
  end;
  do until (last.ID);
    set have;
    by id;
    if newvar ne 1 then newvar=0;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

Kurt_Bremser · Posted 08-14-2017 10:10 AM

As I said, untested 😉

just a quick hack from the hip.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

zombiegirl · Posted 08-14-2017 09:57 PM

Thank you SOO much, Art! I tried all the suggested answers, and yours was the one that worked with how my dataset was written. (The others probably would've worked too, I probably messed them up somehow ).

ballardw · Posted 08-14-2017 12:28 PM

Just for curiosity sake what does 0.1 actually mean in the interval? Since "month" does not have a standard length in common usage I wonder what decimal values actually represent.

zombiegirl · Posted 08-14-2017 10:11 PM

Hi Ballard, I think its a result of timestamping done by hospital electronic record systems. Fortunately, most of my patients are followed for a few years, so I dont think the little bits of time difference have a huge impact on the results. It was a little misleading the way I wrote the sample dataset, the infections are typically much further apart, I was typing fast. Sorry I don't have a more definitive answer for you!

JohnSAScom · Posted 08-14-2017 02:40 PM

A possible solution:

data disease;

input ID INTERVAL INFECT_X INFECT_Y;

datalines;

1 1.1 1 0

1 1.4 0 0

1 1.5 0 1

2 2.4 0 1

2 2.6 1 0

3 1.6 0 1

3 1.9 1 0

3 2.0 0 1

4 2.2 0 0

4 2.4 0 0

;

data disease1 (drop=INT);

set disease;

by ID;

retain INT 8;

length NEWVAR 8;

if first.ID then do;

INT = INTERVAL;

NEWVAR = 0;

end;

if INFECT_X = 1 then INT=INTERVAL;

if INFECT_Y = 1 and INTERVAL > INT then NEWVAR = 1; else NEWVAR = 0;

run;

proc sql;

create table NEWVARS as

select distinct(ID), NEWVAR from disease1 where NEWVAR = 1;

create table disease2 as

select a.ID, a.INTERVAL, a.INFECT_X, a.INFECT_Y, b.NEWVAR

from disease1 a left join NEWVARS b on a.ID = b.ID

order by ID, INTERVAL;

update disease2

set NEWVAR = 0 where NEWVAR = .;

quit;

Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Re: Sequential variable problem, please help!

Registration is open

Call for Content EXTENDED

Registration is open

Call for Content EXTENDED

SAS Training: Just a Click Away