Help using Base SAS procedures

Eliminate duplicates if a condition is satisfied

Reply
Contributor
Posts: 24

Eliminate duplicates if a condition is satisfied

I want to eliminate duplicates from a database, based on an identifier, an order and a condition.

More precisely, if a condition is met (I can create a variable that equals 1 if it is and 0 otherwise), I would like to select one unique observation per identifier, based by its order (the last one). If this condition is not met then I want to keep all the observations related to this identifier.

Without the condition I can do that (or use a proc sql but that is not point)

proc sort data=have;

by identifier descending order;

run;

proc sort nudopkey data=have;

by identifier;

run;

But how to incorporate my condition in this ?

For instance, with this dataset

data Test;

   input identifier $ order condition;

   datalines;

1023 1 0

1023 2 0

1098 1 0

1098 1 1

;            

I would like to keep the lines :

-1023 2 0

-1098 1 0

-1098 1 1

Super User
Super User
Posts: 7,392

Re: Eliminate duplicates if a condition is satisfied

Do you mean you have a flag 1 or 0 in your data?  If so then you could just add a where flag=1.  You will probably have to do two steps, one for those with the condition true -> find max(), and those without.

proc sql;

     create table WANT as

     select     *

     from       HAVE

     where     *condition is false*

     union all

     select     *

     from       HAVE

     where     *condition is true*

     having    XYZ=(select max(XYZ) ...);

quit;  

You could also do it in datastep, sort by condition max() value, then if first row is true the set output flag to 1 else 0.

Some test data might help.

Contributor
Posts: 24

Re: Eliminate duplicates if a condition is satisfied

added a minimal example.

If I use a where statement in my proc nodupkey, it removes my observations I want to keep

Super User
Super User
Posts: 7,392

Re: Eliminate duplicates if a condition is satisfied

Well simplest I can think of right now is:

data Test;
   input identifier $ order condition;
   datalines;
1023 1 0
1023 2 0
1098 1 0
1098 1 1
;            
run;

proc sql undo_policy=none;
  delete from TEST A
  where not exists(select distinct THIS.IDENTIFIER from TEST THIS where THIS.IDENTIFIER=A.IDENTIFIER and THIS.CONDITION=1)
    and ORDER ne (select max(THIS.ORDER) from TEST THIS where THIS.IDENTIFIER=A.IDENTIFIER);
quit;

Super User
Posts: 9,671

Re: Eliminate duplicates if a condition is satisfied

data Test; 
   input identifier $ order condition; 
   datalines;
1023 1 0
1023 2 0
1098 1 0
1098 1 1
;             
run;
data want(drop=found);
 do until(last.identifier);
  set test;
  by identifier;
  if  condition then found=1;
 end; 
 do until(last.identifier);
  set test;
  by identifier;
  if not found then do; if last.identifier then output;end;
   else output;
 end;
run;

Xia Keshan

Super Contributor
Posts: 305

Re: Eliminate duplicates if a condition is satisfied

Hello,

data have (index=(pkey=(identifier  condition order)));
   input identifier $ order condition;
   datalines;
2093 1 0
2093 2 0
1098 1 0
1098 1 1
1065 2 0
1065 1 0
1065 1 3
;   

data want;
set have;
by identifier condition;
if last.condition;
run;

Ask a Question
Discussion stats
  • 5 replies
  • 740 views
  • 0 likes
  • 4 in conversation