Re: Delete where group by

mancel3 · Posted 03-21-2022 12:45 PM

Simplified version of the dataset I have is:

DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16

;RUN;

I am looking to compare match1 and match2, and if anywhere in the ID column match1 does not equal match2, I would like to remove all of the rows with that ID. So for this example dataset I want to remove all of ID 2 (rows 3 and 4) since row 3 does not have a match between match1 and match2. All I can figure out how to do so far is to delete the rows where they dont match, which isnt terribly helpful for this application. I assume it would be easier to make it a new data set with some wheres but I am unsure how to begin there. Any ideas / advice?

Thanks

mancel3 · Posted 03-21-2022 01:02 PM

Apologies, I dumbed down my dataset too much and forgot about an important exception. New example is as follows:

DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
3 "KEF" . 38

;RUN;

note that I do NOT want to delete ID 3's here, since the match2 column is blank. I only want to delete the whole group when match2 is not blank, and it does not match with match1. Thanks!

Kurt_Bremser · Posted 03-21-2022 01:47 PM

Alternative:

data want;
merge
  have
  have (
    in=remove
    rename=(match1=_m1 match2=_m2)
    where=(_m1 ne _m2 and _m2 ne "")
  )
;
by id;
if not remove;
drop _m1 _m2;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Amir · Posted 03-21-2022 01:32 PM

Hi,

I redrafted the have data step, and used a blank to represent the missing value.

Does the following work for you?

data have;
   infile datalines dsd;
   
   input
      id           :  8.
      match1       : $8.
      match2       : $8.
      not_relevant :  8.
   ;
   
   datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;

data want(drop = remove);
   do until (last.id);
      set have;
      by id;
      
      if match1 ne match2 and not missing(match2) then
         remove = 1;
   end;
   
   do until (last.id);
      set have;
      by id;
      
      if not remove then
         output;
   end;
run;

Kind regards,

Amir.

Amir · Posted 03-21-2022 01:54 PM

Hi,

An alternative is:

data want;
   merge have
         have (in = remove where = (match1 ne match2 and not missing(match2)));
   by id;
   if not remove;
run;

Kind regards,

Amir.

Ksharp · Posted 03-22-2022 08:11 AM

data have;
   infile datalines dsd;
   
   input
      id           :  8.
      match1       : $8.
      match2       : $8.
      not_relevant :  8.
   ;
   
   datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;

proc sql;
create table want as
select * from have
 where id not in 
(select id from have where match1 ne match2 and
 match1 is not missing and match2 is not missing);
quit;

Delete where group by