Hi everyone,
I have a pretty similar problem as in https://communities.sas.com/t5/SAS-Procedures/Deleting-Duplicate-Records-But-Keep-the-one-with-Speci... but I my specific variable is not binary but can have diffrent expressions:
Customer Name Customer ID Address Customer Type
Joe Doe 123 123 Way Online (retain)
Joe Doe 123 123 Way In-Store (delete)
Ken Moore 456 456 Way Online (retain)
Ken Moore 456 456 Way Offline (delete)
Lisa Mae 789 789 Way In-Store (delete)
Lisa Mae 123 123 Way Marketplace (retain)
Paul Hum 456 456 Way Online2 (retain)
Paul Hum 456 456 Way Market (delete)
So I would need the SAS procedure where I can restrict which specific expressions should be kept and which ones can be deleted.
Thank you for your help.
Hi @Jay_Aguilar,
Similar to @ed_sas_member's approach, you could create an informat to define the hierarchy, e.g.:
proc format;
invalue type
'Online' = 1
'Online2' = 2
'Marketplace' = 3
'Market' = 4
'In-Store' = 5
'Offline' = 6
other = 99;
run;
Then, using a PROC SQL view, you don't need an intermediate dataset or a new variable:
proc sql;
create view _tmp as
select * from have
order by id, /* add more key variables if needed */
input(Customer_type, type.);
quit;
data want;
set _tmp;
by id; /* add more key variables if needed */
if first.id; /* then replace ID with last key variable */
run;
Alternatively, you could define the hierarchy with the WHICHC function and use whichc(Customer_type, ...) instead of input(...) in the ORDER BY clause. The "other" case would then require special consideration (e.g. switching to "if last.id;" with reversed order in the WHICHC arguments) because WHICHC returns 0 for unexpected Customer_type values.
Hi @Jay_Aguilar
I am thinking about something like this -> you need to establish the rules to define "priority"
data have;
infile datalines dlm=",";
input Customer_Name:$50. Customer_ID Address:$50. Customer_Type:$50.;
datalines;
Joe Doe,123,123 Way,Online
Joe Doe,123,123 Way,In-Store
Ken Moore,456,456 Way,Online
Ken Moore,456,456 Way,Offline
Lisa Mae,789,789 Way,In-Store
Lisa Mae,123,123 Way,Marketplace
Paul Hum,456,456 Way,Online2
Paul Hum,456,456 Way,Market
;
run;
data have_priority; /*Check priorities !*/
set have;
if Customer_Type = "Online" then Priority = 1;
else if Customer_Type = "Online2" then Priority = 2;
else if Customer_Type = "In-Store" then Priority = 3;
else if Customer_Type = "Marketplace" then Priority = 4;
else if Customer_Type = "Market" then Priority = 5;
else if Customer_Type = "Offline" then Priority = 6;
run;
proc sort data=have_priority;
by Customer_Name Customer_ID priority;
run;
data want;
set have_priority;
by Customer_Name Customer_ID priority;
if first.Customer_Name then output;
drop priority;
run;
Best,
Try this:
data want;
set have (where=(customer_type not in ("Market","Offline","In-Store")));
by customer_name notsorted;
if first.customer_name;
run;
Hi @Jay_Aguilar,
Similar to @ed_sas_member's approach, you could create an informat to define the hierarchy, e.g.:
proc format;
invalue type
'Online' = 1
'Online2' = 2
'Marketplace' = 3
'Market' = 4
'In-Store' = 5
'Offline' = 6
other = 99;
run;
Then, using a PROC SQL view, you don't need an intermediate dataset or a new variable:
proc sql;
create view _tmp as
select * from have
order by id, /* add more key variables if needed */
input(Customer_type, type.);
quit;
data want;
set _tmp;
by id; /* add more key variables if needed */
if first.id; /* then replace ID with last key variable */
run;
Alternatively, you could define the hierarchy with the WHICHC function and use whichc(Customer_type, ...) instead of input(...) in the ORDER BY clause. The "other" case would then require special consideration (e.g. switching to "if last.id;" with reversed order in the WHICHC arguments) because WHICHC returns 0 for unexpected Customer_type values.
Thank you very much all @Kurt_Bremser @ed_sas_member @FreelanceReinh, actually all 3 solutions work for me.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.