BookmarkSubscribeRSS Feed

Hi everyone!

I'm trying to solve the problem below...

  • Every customer has a classification (High, Medium or Low) but they have duplicates on HAVE data set, unfortunately the customers duplicates in this data set can have different classification. For example: Customer A appears two times on data set with two different classification like the example below.
  • What I need to do is to remove the duplicates but keeping the strongest classification, for example Customer A has Medium and Low classifications on HAVE dataset, I need to remove the Low obs and Keep the Medium so Medium is stronger than Low. For Customer B I need to keep the HIGH observation instead of LOW.
  • I was using proc frec nodupkey, like below but unfortunately, in with this code obs are removed without any rule:

proc sort data = HAVE out = WANT nodupkey;

        by CLASSIFICATION;

run;

CUSTOMERCLASSIFICATION
CUSTOMER AMEDIUM
CUSTOMER ALOW
CUSTOMER BHIGH
CUSTOMER BLOW
CUSTOMER CMEDIUM


Thanks and regards from Brazil!

7 REPLIES 7
Ksharp
Super User

You need encode these classification firstly.

data have;
input c $ v $;
cards;
a medium
a low
b high
b low
c medium
;
run;
data have;
 set have;
  select (v);
   when ("high") r=1;
   when ("medium") r=2;
   when  ("low") r=3;
   otherwise;
  end;
run;
proc sort data=have;by c r;run;
/* one way */
proc sort data=have out=want1 nodupkey;by c;run;
/*another way*/
data want2;
 set have;
 by c;
 if first.c;
run;

Xia Keshan

PGStats
Opal | Level 21

Use a small auxiliary table containing the required classification order:

data have;

length customer $12 classification $8;

input customer & classification;

datalines;

CUSTOMER A  MEDIUM

CUSTOMER A  LOW

CUSTOMER B  HIGH

CUSTOMER B  LOW

CUSTOMER C  MEDIUM

;

data order;

length classification $8;

do classification = "LOW", "MEDIUM", "HIGH"; order+1; output; end;

run;

proc sql;

create table want as

select unique h.*

from have as h inner join order as o on h.classification=o.classification

group by customer

having order = max(order);

select * from want;

quit;

PG

PG
PGStats
Opal | Level 21

Alternatively, you could sort by a sql-expression and keep only the last record :

proc sql;

create view have2 as

select * from have

order by customer,

       case classification

       when "LOW" then 1

       when "MEDIUM" then 2

       else 3

       end;

quit;

data want;

set have2; by customer;

if last.customer;

run;

PG

PG
Ksharp
Super User

PG,

You can do it in a whole SQL statement.

data have;
length customer $12 classification $8;
input customer & classification;
datalines;
CUSTOMER A  MEDIUM
CUSTOMER A  LOW
CUSTOMER B  HIGH
CUSTOMER B  LOW
CUSTOMER C  MEDIUM
;

 
proc sql;
create table want as
select unique h.*, case 
       when classification="LOW" then 1
       when classification="MEDIUM" then 2
       else 3
       end as r
from have as h 
group by customer
having r = max(r);
quit;

Xia Keshan

PGStats
Opal | Level 21

But for the utmost efficiency, assuming dataset have is already sorted by customer:

data want;

array c{3} $ _temporary_ ("LOW", "MEDIUM", "HIGH");

do until(last.customer);

       set have; by customer;

       i = max(i, whichc(classification, of c{*}));

       end;

classification = c{max(i,1)};

drop i;

run;

PG

PG
Ksharp
Super User

PG,

I would not use your code, if there are some other variables out there, and it usually does.

Xia Keshan

I would like to thank you guys, the solutions worked fine and I could achive the goal keeping the strongest one.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 940 views
  • 2 likes
  • 3 in conversation