Simplified version of the dataset I have is:
DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
;RUN;
I am looking to compare match1 and match2, and if anywhere in the ID column match1 does not equal match2, I would like to remove all of the rows with that ID. So for this example dataset I want to remove all of ID 2 (rows 3 and 4) since row 3 does not have a match between match1 and match2. All I can figure out how to do so far is to delete the rows where they dont match, which isnt terribly helpful for this application. I assume it would be easier to make it a new data set with some wheres but I am unsure how to begin there. Any ideas / advice?
Thanks
Apologies, I dumbed down my dataset too much and forgot about an important exception. New example is as follows:
DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
3 "KEF" . 38
;RUN;
note that I do NOT want to delete ID 3's here, since the match2 column is blank. I only want to delete the whole group when match2 is not blank, and it does not match with match1. Thanks!
Alternative:
data want;
merge
have
have (
in=remove
rename=(match1=_m1 match2=_m2)
where=(_m1 ne _m2 and _m2 ne "")
)
;
by id;
if not remove;
drop _m1 _m2;
run;
Hi,
I redrafted the have data step, and used a blank to represent the missing value.
Does the following work for you?
data have;
infile datalines dsd;
input
id : 8.
match1 : $8.
match2 : $8.
not_relevant : 8.
;
datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;
data want(drop = remove);
do until (last.id);
set have;
by id;
if match1 ne match2 and not missing(match2) then
remove = 1;
end;
do until (last.id);
set have;
by id;
if not remove then
output;
end;
run;
Kind regards,
Amir.
Hi,
An alternative is:
data want;
merge have
have (in = remove where = (match1 ne match2 and not missing(match2)));
by id;
if not remove;
run;
Kind regards,
Amir.
data have; infile datalines dsd; input id : 8. match1 : $8. match2 : $8. not_relevant : 8. ; datalines; 1,"ABC","ABC",4 1,"XYZ","XYZ",29 2,"QQQ","AAA",5 2,"ABC","ABC",9 3,"EFG","EFG",7 3,"DEF","DEF",12 3,"LMK","LMK",16 3,"KEF",,38 ; proc sql; create table want as select * from have where id not in (select id from have where match1 ne match2 and match1 is not missing and match2 is not missing); quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.