I'm parsing down a dataset for final output. Here are the datalines:
data have;
input PatientID Stage Var1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;
What I'm trying to do is if there are records of the same PatientID (1001) where the Stages don't match, but Var1 matches, delete both records. If the there are records of the same PatientID where Stage matches and Var1 doesn't match, I want to keep both records. Here is some code that I've tried:
data want;
set have;
if first.PatientID=last.PatientID and first.Stage NE last.Stage and first.Var1=last.Var1 then delete;
run;
And here is what the log read.
NOTE: Variable first.PatientID is uninitialized. NOTE: Variable last.PatientID is uninitialized. NOTE: Variable first.Stage is uninitialized. NOTE: Variable last.Stage is uninitialized. NOTE: Variable first.Var1 is uninitialized. NOTE: Variable last.Var1 is uninitialized. NOTE: There were 8 observations read from the data set WORK.HAVE. NOTE: The data set WORK.WANT has 8 observations and 3 variables. NOTE: DATA statement used (Total process time): real time 0.01 seconds cpu time 0.00 seconds
Is this even possible to do? I tried using a proc sql as well but ran into the same issue of trying to compare values within the same column.
You want to detect and delete all; cases (within a patientid) with unchanging var1, but changing stage:
data have;
input PatientID Stage Var1;
seq+1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;
data want;
set have;
by patientid var1 stage notsorted;
if ???? then delete;
run;
Using the by pateindid var1 stage notsorted; has patientid as the major grouping key, stage as the minor key, and var1 as the middle key. SAS honors this hierarchy, such that when a given key changes (setting its first. indicator, it also sets the indicators for all keys to its right). BTW, notsorted means that, for a given patientid/var1 the stage values can go up or down - SAS need not expect either ascending or descending order in identifying groups, for any of the "by" keys.
This means that if you encounter a record that starts a new stage value (first.stage=1), but does not at the same time start a new var1 value, then it should be deleted. Similarly if it ends the current value of stage (last.stage=1) but does not end the current value of var1, then it should also be deleted. Build the expression to test for those conditions, and put it in as a replacement for the ???? above.
You cannot use FIRST. and LAST. without first listing those variables in a BY statement.
To get unique combinations of three variables use this:
data want;
set have;
by PatientID Stage Value;
if first.value;
run;
You want to detect and delete all; cases (within a patientid) with unchanging var1, but changing stage:
data have;
input PatientID Stage Var1;
seq+1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;
data want;
set have;
by patientid var1 stage notsorted;
if ???? then delete;
run;
Using the by pateindid var1 stage notsorted; has patientid as the major grouping key, stage as the minor key, and var1 as the middle key. SAS honors this hierarchy, such that when a given key changes (setting its first. indicator, it also sets the indicators for all keys to its right). BTW, notsorted means that, for a given patientid/var1 the stage values can go up or down - SAS need not expect either ascending or descending order in identifying groups, for any of the "by" keys.
This means that if you encounter a record that starts a new stage value (first.stage=1), but does not at the same time start a new var1 value, then it should be deleted. Similarly if it ends the current value of stage (last.stage=1) but does not end the current value of var1, then it should also be deleted. Build the expression to test for those conditions, and put it in as a replacement for the ???? above.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.