I have a data set with the variables: participant ID (UUID), day, and event (infx_status). I would like to create a variable (DFI) that signifies when the first infection occurred for each participant. I'm having trouble especially when there is no infection present.
UUID | day | infx_status | DFI (want) |
UU0333 | 0 | 0 | 2 |
UU0333 | 1 | 0 | 2 |
UU0333 | 2 | 1 | 2 |
UU0333 | 3 | 0 | 2 |
UU0333 | 4 | 1 | 2 |
UU0333 | 5 | 0 | 2 |
UU0334 | 1 | 0 | 5 |
UU0334 | 2 | 0 | 5 |
UU0334 | 3 | 0 | 5 |
UU0334 | 4 | 0 | 5 |
UU0334 | 5 | 1 | 5 |
UU0338 | 1 | 0 | . |
UU0338 | 2 | 0 | . |
UU0338 | 3 | 0 | . |
UU0338 | 4 | 0 | . |
UU0338 | 5 | 0 | . |
Don't know why I am having so much trouble with this but would appreciate some help. Thanks!
I assumed you knew SQL basics, apologies if you don't.
The full code needed:
proc sql;
create table WANT as
select UUID, DAY, INFX_STATUS, min(DAY*ifn(INFX_STATUS, 1, .)) as DIF
from HAVE
group by UUID
order by UUID, DAY;
Like this?
select UUID, DAY, INFX_STATUS, min(DAY*ifn(INFX_STATUS, 1, .))
from HAVE
group by UUID
order by UUID, DAY;
UUID | day | INFX_STATUS | |
---|---|---|---|
UU0333 | 0 | 0 | 2 |
UU0333 | 1 | 0 | 2 |
UU0333 | 2 | 1 | 2 |
UU0333 | 3 | 0 | 2 |
UU0333 | 4 | 1 | 2 |
UU0333 | 5 | 0 | 2 |
UU0334 | 1 | 0 | 5 |
UU0334 | 2 | 0 | 5 |
UU0334 | 3 | 0 | 5 |
UU0334 | 4 | 0 | 5 |
UU0334 | 5 | 1 | 5 |
UU0338 | 1 | 0 | . |
UU0338 | 2 | 0 | . |
UU0338 | 3 | 0 | . |
UU0338 | 4 | 0 | . |
UU0338 | 5 | 0 |
. |
1. Run this in proc sql.
2. The color parser in EG is unreliable. Like the rest of EG. 😉
3. A new column is created, give it a name if you want: min(..) as VARNAMEOFYOURCHOICE
I assumed you knew SQL basics, apologies if you don't.
The full code needed:
proc sql;
create table WANT as
select UUID, DAY, INFX_STATUS, min(DAY*ifn(INFX_STATUS, 1, .)) as DIF
from HAVE
group by UUID
order by UUID, DAY;
The data step approach, if you dataset is sorted by UUID/DAY would be:
data want;
set have (where=(infx_status=1) in=in1)
have (in=in2);
by id;
retain dfi;
if first.id then dfi=.;
if in1=1 and dfi=. then dfi=day;
if in2;
run;
Or like this:
data WANT;
set HAVE (where=(INFX_STATUS=1))
HAVE (in=IN2);
by UUID;
retain DFI;
if first.UUID then DFI=ifn(INFX_STATUS, DAY, .);
if IN2;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.