Solved: Re: Selectively deleting observations within a group

genemroz · Posted 05-20-2024 07:55 PM

Esteemed Advisers

I need some help with selectively deleting observations by groups. Below are exemplar “have”and “want” datasets. The goal is to retain only one Member that contains “A” in both the Alpha and Beta Groups (which one is retained is irrelevant as they are redundant).

Thanks for taking a look and any advice you have to offer,.

Gene

data have;
input Group $ Member $;
Alpha A1
Alpha A2
Alpha B1
Alpha B2
Alpha A3
Alpha C1
Alpha C2
Beta A1
Beta A2
Beta D1
Beta D2
Beta E1
Beta E2
Beta A3
;

data want;
input Group $ Member $;
Alpha A1
Alpha B1
Alpha B2
Alpha C1
Alpha C2
Beta A1
Beta D1
Beta D2
Beta E1
Beta E2
;

Tom · Posted 05-20-2024 11:44 PM

Assuming the data is sorted by GROUP you just to count the number of names MEMBER starts with A per group.

data want;
  set have;
  by group ;
  if first.group then n_a=0;
  if member=:'A' then do;
    n_a+1;
    if n_a>1 then delete;
  end;
run;

View solution in original post

Ksharp · Posted 05-20-2024 08:34 PM

data have;
input Group $ Member $;
cards;
Alpha A1
Alpha A2
Alpha B1
Alpha B2
Alpha A3
Alpha C1
Alpha C2
Beta A1
Beta A2
Beta D1
Beta D2
Beta E1
Beta E2
Beta A3
;
proc sql;
create table temp as
select distinct member from have group by member having count(distinct group)>1;

create table want as
select * from have where member in (select member from temp(obs=1))
union
select * from have where member not in (select member from temp);
quit;

Patrick · Posted 05-20-2024 10:57 PM

Here another option. Not as smart as what @Ksharp proposes but should perform better (less sorting and lookup) in case you've got a lot of data and performance is relevant to you.

data have;
  input Group $ Member $;
cards;
Alpha A1
Alpha A2
Alpha B1
Alpha B2
Alpha A3
Alpha C1
Alpha C2
Beta A1
Beta A2
Beta D1
Beta D2
Beta E1
Beta E2
Beta A3
;

proc sort 
  data=have(keep=group member where=(group in ('Alpha','Beta') and member =:'A')) 
  out=inter 
  presorted;
  by group member;
run;

%let memA_sel=;
data _null_;
  merge 
    inter(in=ina where=(group='Alpha'))
    inter(in=inb where=(group='Beta' ))
    ;
  by member;
  if ina and inb then
    do;
      call symputx('memA_sel',member);
      stop;
    end;
run;

data want;
  set have;
  if group in ('Alpha','Beta') and member=:'A' and member ne "&memA_sel" then delete;
run;

proc print data=want;
run;

If you can be sure that your data have is already sorted by group and member (or by member as first variable) then use below code version.

%let memA_sel=;
data _null_;
  merge 
    have(in=ina where=(group='Alpha' and member =:'A'))
    have(in=inb where=(group='Beta'  and member =:'A'))
    ;
  by member;
  if ina and inb then
    do;
      call symputx('memA_sel',member);
      stop;
    end;
run;

data want;
  set have;
  if group in ('Alpha','Beta') and member=:'A' and member ne "&memA_sel" then delete;
run;

Tom · Posted 05-20-2024 11:44 PM

Assuming the data is sorted by GROUP you just to count the number of names MEMBER starts with A per group.

data want;
  set have;
  by group ;
  if first.group then n_a=0;
  if member=:'A' then do;
    n_a+1;
    if n_a>1 then delete;
  end;
run;

genemroz · Posted 05-21-2024 10:26 AM

Thanks to all who offered solutions. All were valid and I chose to accept Tom's only because of its parsimonious use of code.

Selectively deleting observations within a group

Re: Selectively deleting observations within a group

Re: Selectively deleting observations within a group

Re: Selectively deleting observations within a group

Re: Selectively deleting observations within a group

Re: Selectively deleting observations within a group

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away