BookmarkSubscribeRSS Feed
mancel3
Calcite | Level 5

Simplified version of the dataset I have is:

 

DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16

;RUN;

 

I am looking to compare match1 and match2, and if anywhere in the ID column match1 does not equal match2, I would like to remove all of the rows with that ID. So for this example dataset I want to remove all of ID 2 (rows 3 and 4) since row 3 does not have a match between match1 and match2. All I can figure out how to do so far is to delete the rows where they dont match, which isnt terribly helpful for this application. I assume it would be easier to make it a new data set with some wheres but I am unsure how to begin there. Any ideas / advice? 

 

Thanks

5 REPLIES 5
mancel3
Calcite | Level 5

Apologies, I dumbed down my dataset too much and forgot about an important exception. New example is as follows:

 

DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
3 "KEF" .        38

;RUN;

 

note that I do NOT want to delete ID 3's here, since the match2 column is blank. I only want to delete the whole group when match2 is not blank, and it does not match with match1. Thanks!

Kurt_Bremser
Super User

Alternative:

data want;
merge
  have
  have (
    in=remove
    rename=(match1=_m1 match2=_m2)
    where=(_m1 ne _m2 and _m2 ne "")
  )
;
by id;
if not remove;
drop _m1 _m2;
run;
Amir
PROC Star

Hi,

 

I redrafted the have data step, and used a blank to represent the missing value.

 

Does the following work for you?

 

data have;
   infile datalines dsd;
   
   input
      id           :  8.
      match1       : $8.
      match2       : $8.
      not_relevant :  8.
   ;
   
   datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;

data want(drop = remove);
   do until (last.id);
      set have;
      by id;
      
      if match1 ne match2 and not missing(match2) then
         remove = 1;
   end;
   
   do until (last.id);
      set have;
      by id;
      
      if not remove then
         output;
   end;
run;   

 

 

 

Kind regards,

Amir.

Amir
PROC Star

Hi,

 

An alternative is:

 

data want;
   merge have
         have (in = remove where = (match1 ne match2 and not missing(match2)));
   by id;
   if not remove;
run;

 

 

Kind regards,

Amir.

Ksharp
Super User
data have;
   infile datalines dsd;
   
   input
      id           :  8.
      match1       : $8.
      match2       : $8.
      not_relevant :  8.
   ;
   
   datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;

proc sql;
create table want as
select * from have
 where id not in 
(select id from have where match1 ne match2 and
 match1 is not missing and match2 is not missing);
quit;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 741 views
  • 1 like
  • 4 in conversation