BookmarkSubscribeRSS Feed
mancel3
Calcite | Level 5

Simplified version of the dataset I have is:

 

DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16

;RUN;

 

I am looking to compare match1 and match2, and if anywhere in the ID column match1 does not equal match2, I would like to remove all of the rows with that ID. So for this example dataset I want to remove all of ID 2 (rows 3 and 4) since row 3 does not have a match between match1 and match2. All I can figure out how to do so far is to delete the rows where they dont match, which isnt terribly helpful for this application. I assume it would be easier to make it a new data set with some wheres but I am unsure how to begin there. Any ideas / advice? 

 

Thanks

5 REPLIES 5
mancel3
Calcite | Level 5

Apologies, I dumbed down my dataset too much and forgot about an important exception. New example is as follows:

 

DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
3 "KEF" .        38

;RUN;

 

note that I do NOT want to delete ID 3's here, since the match2 column is blank. I only want to delete the whole group when match2 is not blank, and it does not match with match1. Thanks!

Kurt_Bremser
Super User

Alternative:

data want;
merge
  have
  have (
    in=remove
    rename=(match1=_m1 match2=_m2)
    where=(_m1 ne _m2 and _m2 ne "")
  )
;
by id;
if not remove;
drop _m1 _m2;
run;
Amir
PROC Star

Hi,

 

I redrafted the have data step, and used a blank to represent the missing value.

 

Does the following work for you?

 

data have;
   infile datalines dsd;
   
   input
      id           :  8.
      match1       : $8.
      match2       : $8.
      not_relevant :  8.
   ;
   
   datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;

data want(drop = remove);
   do until (last.id);
      set have;
      by id;
      
      if match1 ne match2 and not missing(match2) then
         remove = 1;
   end;
   
   do until (last.id);
      set have;
      by id;
      
      if not remove then
         output;
   end;
run;   

 

 

 

Kind regards,

Amir.

Amir
PROC Star

Hi,

 

An alternative is:

 

data want;
   merge have
         have (in = remove where = (match1 ne match2 and not missing(match2)));
   by id;
   if not remove;
run;

 

 

Kind regards,

Amir.

Ksharp
Super User
data have;
   infile datalines dsd;
   
   input
      id           :  8.
      match1       : $8.
      match2       : $8.
      not_relevant :  8.
   ;
   
   datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;

proc sql;
create table want as
select * from have
 where id not in 
(select id from have where match1 ne match2 and
 match1 is not missing and match2 is not missing);
quit;

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 713 views
  • 1 like
  • 4 in conversation