I have a dataset with over 700 records, similar in format to the sample dataset below.
Each participant (ID) can have up to a maximum of 5 visits (visit=00 through 04); some might have only one visit, others might have more than one.
Status is a binary variable that denotes their disease status (0=negative, 1=positive for disease) and this status can change from visit to visit. For example, a participant might be positive at baseline (Status=1 at visit=00) and then negative at the next visit(s), another participant might have the same status across all visits and so on.
I would like to know how many times participants had change in Status from visit to visit but not sure how to do this programmatically. I am thinking to present counts for all the different combinations of disease status across the different visits that the participants had. For example, with a total of 5 visits, we’d have a total of 2^5=32 possible combinations, like so:
0-0-0-0-0, N=N1 (where N1 is number of patients who had Status=0 at all 5 visits)
1-0-0-0-0, N=N2 (N2 is number of patients who had Status=1 at Visit=00 but 0 at all subsequent visits)
0-1-0-0-0, N=N3 (N3 is number of patients who had Status=1 at Visit=01 but 0 at all other visits)
ETC.
My only idea for how to approach this is to transpose the data and then manually list all the 32 different combinations with if-then statements and flag them so I can then add them up and see how many of each I have but this seems really cumbersome, inefficient and probably not the best solution since it doesn't address cases where participant had just two or three visits with a change in status somewhere along the way (For example, case of ID=002 or 008, 009 etc.). Does anybody have any suggestions or sample code for how they would approach this problem? Code of what I’ve done so far is shown below.
Thanks in advance!
data dat1;
input ID$ Visit$ Status$;
datalines;
001 00 0
001 01 0
001 02 0
001 03 0
001 04 0
002 00 1
002 01 0
002 02 0
002 03 1
003 01 1
003 02 1
004 00 1
004 02 1
004 03 0
005 00 1
006 02 0
007 02 0
007 04 0
008 00 1
008 02 1
008 03 0
009 00 0
009 01 1
009 02 0
009 03 1
010 01 1
011 00 1
011 01 0
011 02 0
011 03 1
011 04 0
;
run;
proc print data=dat1; run;
*Convert data from long to wide with transpose;
proc transpose data=dat1 out=dat_wide prefix=Status;
by ID;
id visit;
var status;
run;
proc print data=dat_wide; run;
data dat_wide2;
set dat_wide;
if Status00=0 and Status01=0 and Status02=0 and Status03=0 and Status04=0 then flag1=1;
if Status00=0 and Status01=0 and Status02=0 and Status03=0 and Status04=1 then flag2=1;
if Status00=0 and Status01=0 and Status02=0 and Status03=1 and Status04=0 then flag3=1;
/*etc for the remaining 29 combinations*/
run;
proc print data=dat_wide2; run;
Your data is in perfect shape to solve the problem, no need to transpose it.
data want;
set dat1;
by Id;
length num_changes 8;
retain num_changes;
last_status = lag(status);
if first.id then do;
num_changes = 0;
end;
else do;
num_changes = num_changes + (last_status ^= status);
end;
if last.id then do;
output;
end;
drop last_status status;
run;
Your data is in perfect shape to solve the problem, no need to transpose it.
data want;
set dat1;
by Id;
length num_changes 8;
retain num_changes;
last_status = lag(status);
if first.id then do;
num_changes = 0;
end;
else do;
num_changes = num_changes + (last_status ^= status);
end;
if last.id then do;
output;
end;
drop last_status status;
run;
Do this:
data changes;
set dat1;
by id;
change = ifn(first.id,0,status ne lag(status));
run;
You can now perform summary statistics on change as needed (overall, by id, by visit).
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.