Help using Base SAS procedures

Keeping the Strongest Classification Observation

Reply
Occasional Contributor
Posts: 11

Keeping the Strongest Classification Observation

Hi everyone!

I'm trying to solve the problem below...

  • Every customer has a classification (High, Medium or Low) but they have duplicates on HAVE data set, unfortunately the customers duplicates in this data set can have different classification. For example: Customer A appears two times on data set with two different classification like the example below.
  • What I need to do is to remove the duplicates but keeping the strongest classification, for example Customer A has Medium and Low classifications on HAVE dataset, I need to remove the Low obs and Keep the Medium so Medium is stronger than Low. For Customer B I need to keep the HIGH observation instead of LOW.
  • I was using proc frec nodupkey, like below but unfortunately, in with this code obs are removed without any rule:

proc sort data = HAVE out = WANT nodupkey;

        by CLASSIFICATION;

run;

CUSTOMERCLASSIFICATION
CUSTOMER AMEDIUM
CUSTOMER ALOW
CUSTOMER BHIGH
CUSTOMER BLOW
CUSTOMER CMEDIUM


Thanks and regards from Brazil!

Super User
Posts: 10,035

Re: Keeping the Strongest Classification Observation

You need encode these classification firstly.

data have;
input c $ v $;
cards;
a medium
a low
b high
b low
c medium
;
run;
data have;
 set have;
  select (v);
   when ("high") r=1;
   when ("medium") r=2;
   when  ("low") r=3;
   otherwise;
  end;
run;
proc sort data=have;by c r;run;
/* one way */
proc sort data=have out=want1 nodupkey;by c;run;
/*another way*/
data want2;
 set have;
 by c;
 if first.c;
run;

Xia Keshan

Respected Advisor
Posts: 4,925

Re: Keeping the Strongest Classification Observation

Use a small auxiliary table containing the required classification order:

data have;

length customer $12 classification $8;

input customer & classification;

datalines;

CUSTOMER A  MEDIUM

CUSTOMER A  LOW

CUSTOMER B  HIGH

CUSTOMER B  LOW

CUSTOMER C  MEDIUM

;

data order;

length classification $8;

do classification = "LOW", "MEDIUM", "HIGH"; order+1; output; end;

run;

proc sql;

create table want as

select unique h.*

from have as h inner join order as o on h.classification=o.classification

group by customer

having order = max(order);

select * from want;

quit;

PG

PG
Respected Advisor
Posts: 4,925

Re: Keeping the Strongest Classification Observation

Alternatively, you could sort by a sql-expression and keep only the last record :

proc sql;

create view have2 as

select * from have

order by customer,

       case classification

       when "LOW" then 1

       when "MEDIUM" then 2

       else 3

       end;

quit;

data want;

set have2; by customer;

if last.customer;

run;

PG

PG
Super User
Posts: 10,035

Re: Keeping the Strongest Classification Observation

PG,

You can do it in a whole SQL statement.

data have;
length customer $12 classification $8;
input customer & classification;
datalines;
CUSTOMER A  MEDIUM
CUSTOMER A  LOW
CUSTOMER B  HIGH
CUSTOMER B  LOW
CUSTOMER C  MEDIUM
;

 
proc sql;
create table want as
select unique h.*, case 
       when classification="LOW" then 1
       when classification="MEDIUM" then 2
       else 3
       end as r
from have as h 
group by customer
having r = max(r);
quit;

Xia Keshan

Respected Advisor
Posts: 4,925

Re: Keeping the Strongest Classification Observation

But for the utmost efficiency, assuming dataset have is already sorted by customer:

data want;

array c{3} $ _temporary_ ("LOW", "MEDIUM", "HIGH");

do until(last.customer);

       set have; by customer;

       i = max(i, whichc(classification, of c{*}));

       end;

classification = c{max(i,1)};

drop i;

run;

PG

PG
Super User
Posts: 10,035

Re: Keeping the Strongest Classification Observation

PG,

I would not use your code, if there are some other variables out there, and it usually does.

Xia Keshan

Occasional Contributor
Posts: 11

Re: Keeping the Strongest Classification Observation

I would like to thank you guys, the solutions worked fine and I could achive the goal keeping the strongest one.

Ask a Question
Discussion stats
  • 7 replies
  • 301 views
  • 2 likes
  • 3 in conversation