Hi, I'm working with longitudinal data on patients that may have two types of infections (X & Y). I want to create a new variable for patients that have any "Infection Y" at any time after an "Infection X" event. The interval variable is in months. I have tried a lot of different things, but can't seem to get any to work. Please, if you have any advice, please let me know as I am very frustrated 🙂 Thank you!
Here is what my dataset looks like:
ID INTERVAL INFECT_X INFECT_Y
1 1.1 1 0
1 1.4 0 0
1 1.5 0 1
2 2.4 0 1
2 2.6 1 0
3 1.6 0 1
3 1.9 1 0
3 2.0 0 1
4 2.2 0 0
4 2.4 0 0
Here is what I want it to look like:
ID INTERVAL INFECT_X INFECT_Y NEWVAR
1 1.1 1 0 1
1 1.4 0 0 1
1 1.5 0 1 1
2 2.4 0 1 0
2 2.6 1 0 0
3 1.6 0 1 1
3 1.9 1 0 1
3 2.0 0 1 1
4 2.2 0 0 0
4 2.4 0 0 0
Also, note that an Infect_Y can occur before the Infect_X (ID #3), as long as another infect_Y occurs again after infect_X, it counts. Also, the infections are not always consecutive, there can be a long time between them. I appreciate any advice, thanks!
@Kurt_Bremser's code will run better if you change the line:
;
to read:
end;
Alternatively, you can produce the same result using a DOW loop:
data want (drop=inx); do until (last.ID); set have; by id; if infect_x then inx=1; else if inx*infect_y then newvar=1; end; do until (last.ID); set have; by id; if newvar ne 1 then newvar=0; output; end; run;
Art, CEO, AnalystFinder.com
One method with an intermediate table:
data int (keep=id newvar);
set have;
by id;
retain newvar inf_x;
if first.id
then do;
newvar = 0;
inf_x = 0;
;
if infect_x then inf_x = 1;
if inf_x and infect_y then newvar = 1;
if last.id then output;
run;
data want;
merge
have
int
;
by id;
run;
Untested. For tested code, provide your example data in a data step with datalines for easy recreation.
@Kurt_Bremser's code will run better if you change the line:
;
to read:
end;
Alternatively, you can produce the same result using a DOW loop:
data want (drop=inx); do until (last.ID); set have; by id; if infect_x then inx=1; else if inx*infect_y then newvar=1; end; do until (last.ID); set have; by id; if newvar ne 1 then newvar=0; output; end; run;
Art, CEO, AnalystFinder.com
As I said, untested 😉
just a quick hack from the hip.
Thank you SOO much, Art! I tried all the suggested answers, and yours was the one that worked with how my dataset was written. (The others probably would've worked too, I probably messed them up somehow ).
Just for curiosity sake what does 0.1 actually mean in the interval? Since "month" does not have a standard length in common usage I wonder what decimal values actually represent.
Hi Ballard, I think its a result of timestamping done by hospital electronic record systems. Fortunately, most of my patients are followed for a few years, so I dont think the little bits of time difference have a huge impact on the results. It was a little misleading the way I wrote the sample dataset, the infections are typically much further apart, I was typing fast. Sorry I don't have a more definitive answer for you!
A possible solution:
data disease;
input ID INTERVAL INFECT_X INFECT_Y;
datalines;
1 1.1 1 0
1 1.4 0 0
1 1.5 0 1
2 2.4 0 1
2 2.6 1 0
3 1.6 0 1
3 1.9 1 0
3 2.0 0 1
4 2.2 0 0
4 2.4 0 0
;
data disease1 (drop=INT);
set disease;
by ID;
retain INT 8;
length NEWVAR 8;
if first.ID then do;
INT = INTERVAL;
NEWVAR = 0;
end;
if INFECT_X = 1 then INT=INTERVAL;
if INFECT_Y = 1 and INTERVAL > INT then NEWVAR = 1; else NEWVAR = 0;
run;
proc sql;
create table NEWVARS as
select distinct(ID), NEWVAR from disease1 where NEWVAR = 1;
create table disease2 as
select a.ID, a.INTERVAL, a.INFECT_X, a.INFECT_Y, b.NEWVAR
from disease1 a left join NEWVARS b on a.ID = b.ID
order by ID, INTERVAL;
update disease2
set NEWVAR = 0 where NEWVAR = .;
quit;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.