BookmarkSubscribeRSS Feed
mancel3
Calcite | Level 5

Simplified version of the dataset I have is:

 

DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16

;RUN;

 

I am looking to compare match1 and match2, and if anywhere in the ID column match1 does not equal match2, I would like to remove all of the rows with that ID. So for this example dataset I want to remove all of ID 2 (rows 3 and 4) since row 3 does not have a match between match1 and match2. All I can figure out how to do so far is to delete the rows where they dont match, which isnt terribly helpful for this application. I assume it would be easier to make it a new data set with some wheres but I am unsure how to begin there. Any ideas / advice? 

 

Thanks

5 REPLIES 5
mancel3
Calcite | Level 5

Apologies, I dumbed down my dataset too much and forgot about an important exception. New example is as follows:

 

DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
3 "KEF" .        38

;RUN;

 

note that I do NOT want to delete ID 3's here, since the match2 column is blank. I only want to delete the whole group when match2 is not blank, and it does not match with match1. Thanks!

Kurt_Bremser
Super User

Alternative:

data want;
merge
  have
  have (
    in=remove
    rename=(match1=_m1 match2=_m2)
    where=(_m1 ne _m2 and _m2 ne "")
  )
;
by id;
if not remove;
drop _m1 _m2;
run;
Amir
PROC Star

Hi,

 

I redrafted the have data step, and used a blank to represent the missing value.

 

Does the following work for you?

 

data have;
   infile datalines dsd;
   
   input
      id           :  8.
      match1       : $8.
      match2       : $8.
      not_relevant :  8.
   ;
   
   datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;

data want(drop = remove);
   do until (last.id);
      set have;
      by id;
      
      if match1 ne match2 and not missing(match2) then
         remove = 1;
   end;
   
   do until (last.id);
      set have;
      by id;
      
      if not remove then
         output;
   end;
run;   

 

 

 

Kind regards,

Amir.

Amir
PROC Star

Hi,

 

An alternative is:

 

data want;
   merge have
         have (in = remove where = (match1 ne match2 and not missing(match2)));
   by id;
   if not remove;
run;

 

 

Kind regards,

Amir.

Ksharp
Super User
data have;
   infile datalines dsd;
   
   input
      id           :  8.
      match1       : $8.
      match2       : $8.
      not_relevant :  8.
   ;
   
   datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;

proc sql;
create table want as
select * from have
 where id not in 
(select id from have where match1 ne match2 and
 match1 is not missing and match2 is not missing);
quit;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1143 views
  • 1 like
  • 4 in conversation