anonymous_user
Posts: 0

# Keeping the Strongest Classification Observation

Hi everyone!

I'm trying to solve the problem below...

• Every customer has a classification (High, Medium or Low) but they have duplicates on HAVE data set, unfortunately the customers duplicates in this data set can have different classification. For example: Customer A appears two times on data set with two different classification like the example below.
• What I need to do is to remove the duplicates but keeping the strongest classification, for example Customer A has Medium and Low classifications on HAVE dataset, I need to remove the Low obs and Keep the Medium so Medium is stronger than Low. For Customer B I need to keep the HIGH observation instead of LOW.
• I was using proc frec nodupkey, like below but unfortunately, in with this code obs are removed without any rule:

proc sort data = HAVE out = WANT nodupkey;

by CLASSIFICATION;

run;

CUSTOMERCLASSIFICATION
CUSTOMER AMEDIUM
CUSTOMER ALOW
CUSTOMER BHIGH
CUSTOMER BLOW
CUSTOMER CMEDIUM

Thanks and regards from Brazil!

Super User
Posts: 10,787

## Re: Keeping the Strongest Classification Observation

You need encode these classification firstly.

```data have;
input c \$ v \$;
cards;
a medium
a low
b high
b low
c medium
;
run;
data have;
set have;
select (v);
when ("high") r=1;
when ("medium") r=2;
when  ("low") r=3;
otherwise;
end;
run;
proc sort data=have;by c r;run;
/* one way */
proc sort data=have out=want1 nodupkey;by c;run;
/*another way*/
data want2;
set have;
by c;
if first.c;
run;
```

Xia Keshan

Posts: 5,540

## Re: Keeping the Strongest Classification Observation

Use a small auxiliary table containing the required classification order:

data have;

length customer \$12 classification \$8;

input customer & classification;

datalines;

CUSTOMER A  MEDIUM

CUSTOMER A  LOW

CUSTOMER B  HIGH

CUSTOMER B  LOW

CUSTOMER C  MEDIUM

;

data order;

length classification \$8;

do classification = "LOW", "MEDIUM", "HIGH"; order+1; output; end;

run;

proc sql;

create table want as

select unique h.*

from have as h inner join order as o on h.classification=o.classification

group by customer

having order = max(order);

select * from want;

quit;

PG

PG
Posts: 5,540

## Re: Keeping the Strongest Classification Observation

Alternatively, you could sort by a sql-expression and keep only the last record :

proc sql;

create view have2 as

select * from have

order by customer,

case classification

when "LOW" then 1

when "MEDIUM" then 2

else 3

end;

quit;

data want;

set have2; by customer;

if last.customer;

run;

PG

PG
Super User
Posts: 10,787

## Re: Keeping the Strongest Classification Observation

PG,

You can do it in a whole SQL statement.

```data have;
length customer \$12 classification \$8;
input customer & classification;
datalines;
CUSTOMER A  MEDIUM
CUSTOMER A  LOW
CUSTOMER B  HIGH
CUSTOMER B  LOW
CUSTOMER C  MEDIUM
;

proc sql;
create table want as
select unique h.*, case
when classification="LOW" then 1
when classification="MEDIUM" then 2
else 3
end as r
from have as h
group by customer
having r = max(r);
quit;

```

Xia Keshan

Posts: 5,540

## Re: Keeping the Strongest Classification Observation

But for the utmost efficiency, assuming dataset have is already sorted by customer:

data want;

array c{3} \$ _temporary_ ("LOW", "MEDIUM", "HIGH");

do until(last.customer);

set have; by customer;

i = max(i, whichc(classification, of c{*}));

end;

classification = c{max(i,1)};

drop i;

run;

PG

PG
Super User
Posts: 10,787

## Re: Keeping the Strongest Classification Observation

PG,

I would not use your code, if there are some other variables out there, and it usually does.

Xia Keshan

anonymous_user
Posts: 0