Hi, I'm working with longitudinal data on patients that may have two types of infections (X & Y). I want to create a new variable for patients that have any "Infection Y" at any time after an "Infection X" event. The interval variable is in months. I have tried a lot of different things, but can't seem to get any to work. Please, if you have any advice, please let me know as I am very frustrated 🙂 Thank you!
Here is what my dataset looks like:
ID INTERVAL INFECT_X INFECT_Y
1 1.1 1 0
1 1.4 0 0
1 1.5 0 1
2 2.4 0 1
2 2.6 1 0
3 1.6 0 1
3 1.9 1 0
3 2.0 0 1
4 2.2 0 0
4 2.4 0 0
Here is what I want it to look like:
ID INTERVAL INFECT_X INFECT_Y NEWVAR
1 1.1 1 0 1
1 1.4 0 0 1
1 1.5 0 1 1
2 2.4 0 1 0
2 2.6 1 0 0
3 1.6 0 1 1
3 1.9 1 0 1
3 2.0 0 1 1
4 2.2 0 0 0
4 2.4 0 0 0
Also, note that an Infect_Y can occur before the Infect_X (ID #3), as long as another infect_Y occurs again after infect_X, it counts. Also, the infections are not always consecutive, there can be a long time between them. I appreciate any advice, thanks!
@Kurt_Bremser's code will run better if you change the line:
;
to read:
end;
Alternatively, you can produce the same result using a DOW loop:
data want (drop=inx); do until (last.ID); set have; by id; if infect_x then inx=1; else if inx*infect_y then newvar=1; end; do until (last.ID); set have; by id; if newvar ne 1 then newvar=0; output; end; run;
Art, CEO, AnalystFinder.com
One method with an intermediate table:
data int (keep=id newvar);
set have;
by id;
retain newvar inf_x;
if first.id
then do;
newvar = 0;
inf_x = 0;
;
if infect_x then inf_x = 1;
if inf_x and infect_y then newvar = 1;
if last.id then output;
run;
data want;
merge
have
int
;
by id;
run;
Untested. For tested code, provide your example data in a data step with datalines for easy recreation.
@Kurt_Bremser's code will run better if you change the line:
;
to read:
end;
Alternatively, you can produce the same result using a DOW loop:
data want (drop=inx); do until (last.ID); set have; by id; if infect_x then inx=1; else if inx*infect_y then newvar=1; end; do until (last.ID); set have; by id; if newvar ne 1 then newvar=0; output; end; run;
Art, CEO, AnalystFinder.com
As I said, untested 😉
just a quick hack from the hip.
Thank you SOO much, Art! I tried all the suggested answers, and yours was the one that worked with how my dataset was written. (The others probably would've worked too, I probably messed them up somehow ).
Just for curiosity sake what does 0.1 actually mean in the interval? Since "month" does not have a standard length in common usage I wonder what decimal values actually represent.
Hi Ballard, I think its a result of timestamping done by hospital electronic record systems. Fortunately, most of my patients are followed for a few years, so I dont think the little bits of time difference have a huge impact on the results. It was a little misleading the way I wrote the sample dataset, the infections are typically much further apart, I was typing fast. Sorry I don't have a more definitive answer for you!
A possible solution:
data disease;
input ID INTERVAL INFECT_X INFECT_Y;
datalines;
1 1.1 1 0
1 1.4 0 0
1 1.5 0 1
2 2.4 0 1
2 2.6 1 0
3 1.6 0 1
3 1.9 1 0
3 2.0 0 1
4 2.2 0 0
4 2.4 0 0
;
data disease1 (drop=INT);
set disease;
by ID;
retain INT 8;
length NEWVAR 8;
if first.ID then do;
INT = INTERVAL;
NEWVAR = 0;
end;
if INFECT_X = 1 then INT=INTERVAL;
if INFECT_Y = 1 and INTERVAL > INT then NEWVAR = 1; else NEWVAR = 0;
run;
proc sql;
create table NEWVARS as
select distinct(ID), NEWVAR from disease1 where NEWVAR = 1;
create table disease2 as
select a.ID, a.INTERVAL, a.INFECT_X, a.INFECT_Y, b.NEWVAR
from disease1 a left join NEWVARS b on a.ID = b.ID
order by ID, INTERVAL;
update disease2
set NEWVAR = 0 where NEWVAR = .;
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.