Solved: Conditionally compare values within one variable

analyticalepi · Posted 12-11-2020 12:32 PM

I'm parsing down a dataset for final output. Here are the datalines:

data have;
input PatientID Stage Var1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;

What I'm trying to do is if there are records of the same PatientID (1001) where the Stages don't match, but Var1 matches, delete both records. If the there are records of the same PatientID where Stage matches and Var1 doesn't match, I want to keep both records. Here is some code that I've tried:

data want;
set have;
if first.PatientID=last.PatientID and first.Stage NE last.Stage and first.Var1=last.Var1 then delete;
run;

And here is what the log read.

NOTE: Variable first.PatientID is uninitialized.
NOTE: Variable last.PatientID is uninitialized.
NOTE: Variable first.Stage is uninitialized.
NOTE: Variable last.Stage is uninitialized.
NOTE: Variable first.Var1 is uninitialized.
NOTE: Variable last.Var1 is uninitialized.
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 8 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

Is this even possible to do? I tried using a proc sql as well but ran into the same issue of trying to compare values within the same column.

mkeintz · Posted 12-11-2020 03:54 PM

You want to detect and delete all; cases (within a patientid) with unchanging var1, but changing stage:

data have;
input PatientID Stage Var1;
seq+1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;

data want;
  set have;
  by patientid var1 stage notsorted;
  if ???? then delete;
run;

Using the by pateindid var1 stage notsorted; has patientid as the major grouping key, stage as the minor key, and var1 as the middle key. SAS honors this hierarchy, such that when a given key changes (setting its first. indicator, it also sets the indicators for all keys to its right). BTW, notsorted means that, for a given patientid/var1 the stage values can go up or down - SAS need not expect either ascending or descending order in identifying groups, for any of the "by" keys.

This means that if you encounter a record that starts a new stage value (first.stage=1), but does not at the same time start a new var1 value, then it should be deleted. Similarly if it ends the current value of stage (last.stage=1) but does not end the current value of var1, then it should also be deleted. Build the expression to test for those conditions, and put it in as a replacement for the ???? above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

Tom · Posted 12-11-2020 12:57 PM

You cannot use FIRST. and LAST. without first listing those variables in a BY statement.

To get unique combinations of three variables use this:

data want;
  set have;
  by PatientID Stage Value;
  if first.value;
run;

mkeintz · Posted 12-11-2020 03:54 PM

You want to detect and delete all; cases (within a patientid) with unchanging var1, but changing stage:

data have;
input PatientID Stage Var1;
seq+1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;

data want;
  set have;
  by patientid var1 stage notsorted;
  if ???? then delete;
run;

Using the by pateindid var1 stage notsorted; has patientid as the major grouping key, stage as the minor key, and var1 as the middle key. SAS honors this hierarchy, such that when a given key changes (setting its first. indicator, it also sets the indicators for all keys to its right). BTW, notsorted means that, for a given patientid/var1 the stage values can go up or down - SAS need not expect either ascending or descending order in identifying groups, for any of the "by" keys.

This means that if you encounter a record that starts a new stage value (first.stage=1), but does not at the same time start a new var1 value, then it should be deleted. Similarly if it ends the current value of stage (last.stage=1) but does not end the current value of var1, then it should also be deleted. Build the expression to test for those conditions, and put it in as a replacement for the ???? above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Conditionally compare values within one variable

Re: Conditionally compare values within one variable

Re: Conditionally compare values within one variable

Re: Conditionally compare values within one variable

Conditionally compare values within one variable

Re: Conditionally compare values within one variable

Re: Conditionally compare values within one variable

Re: Conditionally compare values within one variable

Click image to register for webinar

Classroom Training Available!