Esteemed Advisers
I need some help with selectively deleting observations by groups. Below are exemplar “have”and “want” datasets. The goal is to retain only one Member that contains “A” in both the Alpha and Beta Groups (which one is retained is irrelevant as they are redundant).
Thanks for taking a look and any advice you have to offer,.
Gene
data have;
input Group $ Member $;
Alpha A1
Alpha A2
Alpha B1
Alpha B2
Alpha A3
Alpha C1
Alpha C2
Beta A1
Beta A2
Beta D1
Beta D2
Beta E1
Beta E2
Beta A3
;
data want;
input Group $ Member $;
Alpha A1
Alpha B1
Alpha B2
Alpha C1
Alpha C2
Beta A1
Beta D1
Beta D2
Beta E1
Beta E2
;
Assuming the data is sorted by GROUP you just to count the number of names MEMBER starts with A per group.
data want;
set have;
by group ;
if first.group then n_a=0;
if member=:'A' then do;
n_a+1;
if n_a>1 then delete;
end;
run;
data have;
input Group $ Member $;
cards;
Alpha A1
Alpha A2
Alpha B1
Alpha B2
Alpha A3
Alpha C1
Alpha C2
Beta A1
Beta A2
Beta D1
Beta D2
Beta E1
Beta E2
Beta A3
;
proc sql;
create table temp as
select distinct member from have group by member having count(distinct group)>1;
create table want as
select * from have where member in (select member from temp(obs=1))
union
select * from have where member not in (select member from temp);
quit;
Here another option. Not as smart as what @Ksharp proposes but should perform better (less sorting and lookup) in case you've got a lot of data and performance is relevant to you.
data have;
input Group $ Member $;
cards;
Alpha A1
Alpha A2
Alpha B1
Alpha B2
Alpha A3
Alpha C1
Alpha C2
Beta A1
Beta A2
Beta D1
Beta D2
Beta E1
Beta E2
Beta A3
;
proc sort
data=have(keep=group member where=(group in ('Alpha','Beta') and member =:'A'))
out=inter
presorted;
by group member;
run;
%let memA_sel=;
data _null_;
merge
inter(in=ina where=(group='Alpha'))
inter(in=inb where=(group='Beta' ))
;
by member;
if ina and inb then
do;
call symputx('memA_sel',member);
stop;
end;
run;
data want;
set have;
if group in ('Alpha','Beta') and member=:'A' and member ne "&memA_sel" then delete;
run;
proc print data=want;
run;
If you can be sure that your data have is already sorted by group and member (or by member as first variable) then use below code version.
%let memA_sel=;
data _null_;
merge
have(in=ina where=(group='Alpha' and member =:'A'))
have(in=inb where=(group='Beta' and member =:'A'))
;
by member;
if ina and inb then
do;
call symputx('memA_sel',member);
stop;
end;
run;
data want;
set have;
if group in ('Alpha','Beta') and member=:'A' and member ne "&memA_sel" then delete;
run;
Assuming the data is sorted by GROUP you just to count the number of names MEMBER starts with A per group.
data want;
set have;
by group ;
if first.group then n_a=0;
if member=:'A' then do;
n_a+1;
if n_a>1 then delete;
end;
run;
Thanks to all who offered solutions. All were valid and I chose to accept Tom's only because of its parsimonious use of code.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.