Hi, I'm working with longitudinal data on patients that may have two types of infections (X & Y). I want to create a new variable for patients that have any "Infection Y" at any time after an "Infection X" event. The interval variable is in months. I have tried a lot of different things, but can't seem to get any to work. Please, if you have any advice, please let me know as I am very frustrated 🙂 Thank you!
Here is what my dataset looks like:
ID INTERVAL INFECT_X INFECT_Y
1 1.1 1 0
1 1.4 0 0
1 1.5 0 1
2 2.4 0 1
2 2.6 1 0
3 1.6 0 1
3 1.9 1 0
3 2.0 0 1
4 2.2 0 0
4 2.4 0 0
Here is what I want it to look like:
ID INTERVAL INFECT_X INFECT_Y NEWVAR
1 1.1 1 0 1
1 1.4 0 0 1
1 1.5 0 1 1
2 2.4 0 1 0
2 2.6 1 0 0
3 1.6 0 1 1
3 1.9 1 0 1
3 2.0 0 1 1
4 2.2 0 0 0
4 2.4 0 0 0
Also, note that an Infect_Y can occur before the Infect_X (ID #3), as long as another infect_Y occurs again after infect_X, it counts. Also, the infections are not always consecutive, there can be a long time between them. I appreciate any advice, thanks!
@Kurt_Bremser's code will run better if you change the line:
;
to read:
end;
Alternatively, you can produce the same result using a DOW loop:
data want (drop=inx);
do until (last.ID);
set have;
by id;
if infect_x then inx=1;
else if inx*infect_y then newvar=1;
end;
do until (last.ID);
set have;
by id;
if newvar ne 1 then newvar=0;
output;
end;
run;
Art, CEO, AnalystFinder.com
One method with an intermediate table:
data int (keep=id newvar);
set have;
by id;
retain newvar inf_x;
if first.id
then do;
newvar = 0;
inf_x = 0;
;
if infect_x then inf_x = 1;
if inf_x and infect_y then newvar = 1;
if last.id then output;
run;
data want;
merge
have
int
;
by id;
run;
Untested. For tested code, provide your example data in a data step with datalines for easy recreation.
@Kurt_Bremser's code will run better if you change the line:
;
to read:
end;
Alternatively, you can produce the same result using a DOW loop:
data want (drop=inx);
do until (last.ID);
set have;
by id;
if infect_x then inx=1;
else if inx*infect_y then newvar=1;
end;
do until (last.ID);
set have;
by id;
if newvar ne 1 then newvar=0;
output;
end;
run;
Art, CEO, AnalystFinder.com
As I said, untested 😉
just a quick hack from the hip.
Thank you SOO much, Art! I tried all the suggested answers, and yours was the one that worked with how my dataset was written. (The others probably would've worked too, I probably messed them up somehow
).
Just for curiosity sake what does 0.1 actually mean in the interval? Since "month" does not have a standard length in common usage I wonder what decimal values actually represent.
Hi Ballard, I think its a result of timestamping done by hospital electronic record systems. Fortunately, most of my patients are followed for a few years, so I dont think the little bits of time difference have a huge impact on the results. It was a little misleading the way I wrote the sample dataset, the infections are typically much further apart, I was typing fast. Sorry I don't have a more definitive answer for you! ![]()
A possible solution:
data disease;
input ID INTERVAL INFECT_X INFECT_Y;
datalines;
1 1.1 1 0
1 1.4 0 0
1 1.5 0 1
2 2.4 0 1
2 2.6 1 0
3 1.6 0 1
3 1.9 1 0
3 2.0 0 1
4 2.2 0 0
4 2.4 0 0
;
data disease1 (drop=INT);
set disease;
by ID;
retain INT 8;
length NEWVAR 8;
if first.ID then do;
INT = INTERVAL;
NEWVAR = 0;
end;
if INFECT_X = 1 then INT=INTERVAL;
if INFECT_Y = 1 and INTERVAL > INT then NEWVAR = 1; else NEWVAR = 0;
run;
proc sql;
create table NEWVARS as
select distinct(ID), NEWVAR from disease1 where NEWVAR = 1;
create table disease2 as
select a.ID, a.INTERVAL, a.INFECT_X, a.INFECT_Y, b.NEWVAR
from disease1 a left join NEWVARS b on a.ID = b.ID
order by ID, INTERVAL;
update disease2
set NEWVAR = 0 where NEWVAR = .;
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Still thinking about your presentation idea? The submission deadline has been extended to Friday, Nov. 14, at 11:59 p.m. ET.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.