DATA Step, Macro, Functions and more

Sequential variable problem, please help!

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 8
Accepted Solution

Sequential variable problem, please help!

Hi, I'm working with longitudinal data on patients that may have two types of infections (X & Y). I want to create a new variable for patients that have any "Infection Y" at any time after an "Infection X" event. The interval variable is in months. I have tried a lot of different things, but can't seem to get any to work. Please, if you have any advice, please let me know as I am very frustrated :-) Thank you! 

 

Here is what my dataset looks like:

 

ID    INTERVAL   INFECT_X   INFECT_Y       

1     1.1               1                  0

1     1.4               0                  0 

1     1.5               0                  1

2     2.4               0                  1

2     2.6               1                  0 

3     1.6               0                  1

3     1.9               1                  0

3     2.0               0                  1

4     2.2               0                  0 

4     2.4               0                  0

 

Here is what I want it to look like:

ID    INTERVAL   INFECT_X   INFECT_Y   NEWVAR

1     1.1               1                  0                 1  

1     1.4               0                  0                 1

1     1.5               0                  1                 1

2     2.4               0                  1                 0

2     2.6               1                  0                 0 

3     1.6               0                  1                 1

3     1.9               1                  0                 1

3     2.0               0                  1                 1

4     2.2               0                  0                 0

4     2.4               0                  0                 0

 

Also, note that an Infect_Y can occur before the Infect_X (ID #3), as long as another infect_Y occurs again after infect_X, it counts. Also, the infections are not always consecutive, there can be a long time between them. I appreciate any advice, thanks!


Accepted Solutions
Solution
‎08-14-2017 09:58 PM
Super User
Posts: 8,220

Re: Sequential variable problem, please help!

Posted in reply to zombiegirl

@KurtBremser's code will run better if you change the line:

;

to read:

end;

Alternatively, you can produce the same result using a DOW loop:

data want (drop=inx);
  do until (last.ID);
    set have;
    by id;
    if infect_x then inx=1;
    else if inx*infect_y then newvar=1;
  end;
  do until (last.ID);
    set have;
    by id;
    if newvar ne 1 then newvar=0;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Super User
Posts: 10,623

Re: Sequential variable problem, please help!

Posted in reply to zombiegirl

One method with an intermediate table:

data int (keep=id newvar);
set have;
by id;
retain newvar inf_x;
if first.id
then do;
  newvar = 0;
  inf_x = 0;
;
if infect_x then inf_x = 1;
if inf_x and infect_y then newvar = 1;
if last.id then output;
run;

data want;
merge
  have
  int
;
by id;
run;

Untested. For tested code, provide your example data in a data step with datalines for easy recreation.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 8

Re: Sequential variable problem, please help!

Posted in reply to KurtBremser
Thanks Kurt! I ended up having to use Art's iteration, somehow yours gave me zero observations for newvar (I'm 100% sure my own user error, as I'm a newbie), but thank you for your kind suggestions! :-)
Solution
‎08-14-2017 09:58 PM
Super User
Posts: 8,220

Re: Sequential variable problem, please help!

Posted in reply to zombiegirl

@KurtBremser's code will run better if you change the line:

;

to read:

end;

Alternatively, you can produce the same result using a DOW loop:

data want (drop=inx);
  do until (last.ID);
    set have;
    by id;
    if infect_x then inx=1;
    else if inx*infect_y then newvar=1;
  end;
  do until (last.ID);
    set have;
    by id;
    if newvar ne 1 then newvar=0;
    output;
  end;
run;

Art, CEO, AnalystFinder.com

 

Super User
Posts: 10,623

Re: Sequential variable problem, please help!

[ Edited ]

As I said, untested Smiley Wink

just a quick hack from the hip.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Occasional Contributor
Posts: 8

Re: Sequential variable problem, please help!

Thank you SOO much, Art! I tried all the suggested answers, and yours was the one that worked with how my dataset was written. (The others probably would've worked too, I probably messed them up somehow Smiley Tongue). 

Super User
Posts: 13,950

Re: Sequential variable problem, please help!

Posted in reply to zombiegirl

Just for curiosity sake what does 0.1 actually mean in the interval? Since "month" does not have a standard length in common usage I wonder what decimal values actually represent.

Occasional Contributor
Posts: 8

Re: Sequential variable problem, please help!

Hi Ballard, I think its a result of timestamping done by hospital electronic record systems. Fortunately, most of my patients are followed for a few years, so I dont think the little bits of time difference have a huge impact on the results. It was a little misleading the way I wrote the sample dataset, the infections are typically much further apart, I was typing fast. Sorry I don't have a more definitive answer for you! Smiley LOL

Contributor
Posts: 54

Re: Sequential variable problem, please help!

Posted in reply to zombiegirl

A possible solution:

 

data disease;

input ID INTERVAL INFECT_X INFECT_Y;

datalines;

1 1.1 1 0

1 1.4 0 0

1 1.5 0 1

2 2.4 0 1

2 2.6 1 0

3 1.6 0 1

3 1.9 1 0

3 2.0 0 1

4 2.2 0 0

4 2.4 0 0

;

 

data disease1 (drop=INT);

      set disease;

      by ID;

      retain INT 8;

      length NEWVAR 8;

      if first.ID then do;

            INT = INTERVAL;

            NEWVAR = 0;

      end;

      if INFECT_X = 1 then INT=INTERVAL;

      if INFECT_Y = 1 and INTERVAL > INT then NEWVAR = 1; else NEWVAR = 0;

run;

 

proc sql;

      create table NEWVARS as

      select distinct(ID), NEWVAR from disease1 where NEWVAR = 1;

      create table disease2 as

      select a.ID, a.INTERVAL, a.INFECT_X, a.INFECT_Y, b.NEWVAR

      from disease1 a left join NEWVARS b on a.ID = b.ID

      order by ID, INTERVAL;

      update disease2

      set NEWVAR = 0 where NEWVAR = .;

quit;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 181 views
  • 2 likes
  • 5 in conversation