BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
analyticalepi
Calcite | Level 5

I'm parsing down a dataset for final output. Here are the datalines:

 

data have;
input PatientID Stage Var1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;

What I'm trying to do is if there are records of the same PatientID (1001) where the Stages don't match, but Var1 matches, delete both records. If the there are records of the same PatientID where Stage matches and Var1 doesn't match, I want to keep both records. Here is some code that I've tried:

 

data want;
set have;
if first.PatientID=last.PatientID and first.Stage NE last.Stage and first.Var1=last.Var1 then delete;
run;

And here is what the log read. 

NOTE: Variable first.PatientID is uninitialized.
NOTE: Variable last.PatientID is uninitialized.
NOTE: Variable first.Stage is uninitialized.
NOTE: Variable last.Stage is uninitialized.
NOTE: Variable first.Var1 is uninitialized.
NOTE: Variable last.Var1 is uninitialized.
NOTE: There were 8 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.WANT has 8 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds

Is this even possible to do? I tried using a proc sql as well but ran into the same issue of trying to compare values within the same column.

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

You want to detect and delete all; cases (within a patientid) with unchanging var1, but changing stage:  

 

data have;
input PatientID Stage Var1;
seq+1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;

data want;
  set have;
  by patientid var1 stage notsorted;
  if ???? then delete;
run;

Using the by pateindid var1 stage notsorted; has patientid as the major grouping key, stage as the minor key, and var1 as the middle key.  SAS honors this hierarchy, such that when a given key changes (setting its first. indicator, it also sets the indicators for all keys to its right).  BTW, notsorted means that, for a given patientid/var1 the stage values can go up or down - SAS need not expect either ascending or descending order in identifying groups, for any of the "by" keys.

 

This means that if you encounter a record that starts a new stage value (first.stage=1), but does not at the same time start a new var1 value, then it should be deleted.  Similarly if it ends the current value of stage (last.stage=1) but does not end the current value of var1, then it should also be deleted.  Build the expression to test for those conditions, and put it in as a replacement for the ???? above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

2 REPLIES 2
Tom
Super User Tom
Super User

You cannot use FIRST. and LAST. without first listing those variables in a BY statement.

To get unique combinations of three variables use this:

data want;
  set have;
  by PatientID Stage Value;
  if first.value;
run;

 

mkeintz
PROC Star

You want to detect and delete all; cases (within a patientid) with unchanging var1, but changing stage:  

 

data have;
input PatientID Stage Var1;
seq+1;
datalines;
1001 1 0
1001 2 0
1001 2 1
1002 2 1
1005 3 3
1006 2 1
1010 3 0
1010 3 2
;

data want;
  set have;
  by patientid var1 stage notsorted;
  if ???? then delete;
run;

Using the by pateindid var1 stage notsorted; has patientid as the major grouping key, stage as the minor key, and var1 as the middle key.  SAS honors this hierarchy, such that when a given key changes (setting its first. indicator, it also sets the indicators for all keys to its right).  BTW, notsorted means that, for a given patientid/var1 the stage values can go up or down - SAS need not expect either ascending or descending order in identifying groups, for any of the "by" keys.

 

This means that if you encounter a record that starts a new stage value (first.stage=1), but does not at the same time start a new var1 value, then it should be deleted.  Similarly if it ends the current value of stage (last.stage=1) but does not end the current value of var1, then it should also be deleted.  Build the expression to test for those conditions, and put it in as a replacement for the ???? above.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 517 views
  • 0 likes
  • 3 in conversation