Hi everyone,
I have a pretty similar problem as in https://communities.sas.com/t5/SAS-Procedures/Deleting-Duplicate-Records-But-Keep-the-one-with-Speci... but I my specific variable is not binary but can have diffrent expressions:
Customer Name Customer ID Address Customer Type
Joe Doe 123 123 Way Online (retain)
Joe Doe 123 123 Way In-Store (delete)
Ken Moore 456 456 Way Online (retain)
Ken Moore 456 456 Way Offline (delete)
Lisa Mae 789 789 Way In-Store (delete)
Lisa Mae 123 123 Way Marketplace (retain)
Paul Hum 456 456 Way Online2 (retain)
Paul Hum 456 456 Way Market (delete)
So I would need the SAS procedure where I can restrict which specific expressions should be kept and which ones can be deleted.
Thank you for your help.
Hi @Jay_Aguilar,
Similar to @ed_sas_member's approach, you could create an informat to define the hierarchy, e.g.:
proc format;
invalue type
'Online' = 1
'Online2' = 2
'Marketplace' = 3
'Market' = 4
'In-Store' = 5
'Offline' = 6
other = 99;
run;
Then, using a PROC SQL view, you don't need an intermediate dataset or a new variable:
proc sql;
create view _tmp as
select * from have
order by id, /* add more key variables if needed */
input(Customer_type, type.);
quit;
data want;
set _tmp;
by id; /* add more key variables if needed */
if first.id; /* then replace ID with last key variable */
run;
Alternatively, you could define the hierarchy with the WHICHC function and use whichc(Customer_type, ...) instead of input(...) in the ORDER BY clause. The "other" case would then require special consideration (e.g. switching to "if last.id;" with reversed order in the WHICHC arguments) because WHICHC returns 0 for unexpected Customer_type values.
Hi @Jay_Aguilar
I am thinking about something like this -> you need to establish the rules to define "priority"
data have;
infile datalines dlm=",";
input Customer_Name:$50. Customer_ID Address:$50. Customer_Type:$50.;
datalines;
Joe Doe,123,123 Way,Online
Joe Doe,123,123 Way,In-Store
Ken Moore,456,456 Way,Online
Ken Moore,456,456 Way,Offline
Lisa Mae,789,789 Way,In-Store
Lisa Mae,123,123 Way,Marketplace
Paul Hum,456,456 Way,Online2
Paul Hum,456,456 Way,Market
;
run;
data have_priority; /*Check priorities !*/
set have;
if Customer_Type = "Online" then Priority = 1;
else if Customer_Type = "Online2" then Priority = 2;
else if Customer_Type = "In-Store" then Priority = 3;
else if Customer_Type = "Marketplace" then Priority = 4;
else if Customer_Type = "Market" then Priority = 5;
else if Customer_Type = "Offline" then Priority = 6;
run;
proc sort data=have_priority;
by Customer_Name Customer_ID priority;
run;
data want;
set have_priority;
by Customer_Name Customer_ID priority;
if first.Customer_Name then output;
drop priority;
run;
Best,
Try this:
data want;
set have (where=(customer_type not in ("Market","Offline","In-Store")));
by customer_name notsorted;
if first.customer_name;
run;
Hi @Jay_Aguilar,
Similar to @ed_sas_member's approach, you could create an informat to define the hierarchy, e.g.:
proc format;
invalue type
'Online' = 1
'Online2' = 2
'Marketplace' = 3
'Market' = 4
'In-Store' = 5
'Offline' = 6
other = 99;
run;
Then, using a PROC SQL view, you don't need an intermediate dataset or a new variable:
proc sql;
create view _tmp as
select * from have
order by id, /* add more key variables if needed */
input(Customer_type, type.);
quit;
data want;
set _tmp;
by id; /* add more key variables if needed */
if first.id; /* then replace ID with last key variable */
run;
Alternatively, you could define the hierarchy with the WHICHC function and use whichc(Customer_type, ...) instead of input(...) in the ORDER BY clause. The "other" case would then require special consideration (e.g. switching to "if last.id;" with reversed order in the WHICHC arguments) because WHICHC returns 0 for unexpected Customer_type values.
Thank you very much all @Kurt_Bremser @ed_sas_member @FreelanceReinh, actually all 3 solutions work for me.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.