Simplified version of the dataset I have is:
DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
;RUN;
I am looking to compare match1 and match2, and if anywhere in the ID column match1 does not equal match2, I would like to remove all of the rows with that ID. So for this example dataset I want to remove all of ID 2 (rows 3 and 4) since row 3 does not have a match between match1 and match2. All I can figure out how to do so far is to delete the rows where they dont match, which isnt terribly helpful for this application. I assume it would be easier to make it a new data set with some wheres but I am unsure how to begin there. Any ideas / advice?
Thanks
Apologies, I dumbed down my dataset too much and forgot about an important exception. New example is as follows:
DATA HAVE;
INPUT ID match1 $ match2 $ not_relevant;
DATALINES;
1 "ABC" "ABC" 4
1 "XYZ" "XYZ" 29
2 "QQQ" "AAA" 5
2 "ABC" "ABC" 9
3 "EFG" "EFG" 7
3 "DEF" "DEF" 12
3 "LMK" LMK" 16
3 "KEF" . 38
;RUN;
note that I do NOT want to delete ID 3's here, since the match2 column is blank. I only want to delete the whole group when match2 is not blank, and it does not match with match1. Thanks!
Alternative:
data want;
merge
have
have (
in=remove
rename=(match1=_m1 match2=_m2)
where=(_m1 ne _m2 and _m2 ne "")
)
;
by id;
if not remove;
drop _m1 _m2;
run;
Hi,
I redrafted the have data step, and used a blank to represent the missing value.
Does the following work for you?
data have;
infile datalines dsd;
input
id : 8.
match1 : $8.
match2 : $8.
not_relevant : 8.
;
datalines;
1,"ABC","ABC",4
1,"XYZ","XYZ",29
2,"QQQ","AAA",5
2,"ABC","ABC",9
3,"EFG","EFG",7
3,"DEF","DEF",12
3,"LMK","LMK",16
3,"KEF",,38
;
data want(drop = remove);
do until (last.id);
set have;
by id;
if match1 ne match2 and not missing(match2) then
remove = 1;
end;
do until (last.id);
set have;
by id;
if not remove then
output;
end;
run;
Kind regards,
Amir.
Hi,
An alternative is:
data want;
merge have
have (in = remove where = (match1 ne match2 and not missing(match2)));
by id;
if not remove;
run;
Kind regards,
Amir.
data have; infile datalines dsd; input id : 8. match1 : $8. match2 : $8. not_relevant : 8. ; datalines; 1,"ABC","ABC",4 1,"XYZ","XYZ",29 2,"QQQ","AAA",5 2,"ABC","ABC",9 3,"EFG","EFG",7 3,"DEF","DEF",12 3,"LMK","LMK",16 3,"KEF",,38 ; proc sql; create table want as select * from have where id not in (select id from have where match1 ne match2 and match1 is not missing and match2 is not missing); quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.