BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jay_Aguilar
Calcite | Level 5

Hi everyone,

 

I have a pretty similar problem as in https://communities.sas.com/t5/SAS-Procedures/Deleting-Duplicate-Records-But-Keep-the-one-with-Speci... but I my specific variable is not binary but can have diffrent expressions:

 

Customer Name    Customer ID  Address      Customer Type

       Joe Doe                  123              123 Way       Online            (retain)

       Joe Doe                  123              123 Way       In-Store         (delete)

       Ken Moore              456              456 Way       Online           (retain)

       Ken Moore              456              456 Way       Offline         (delete)

       Lisa Mae                 789              789 Way       In-Store         (delete)

       Lisa Mae                  123              123 Way       Marketplace         (retain)

       Paul Hum                 456              456 Way       Online2         (retain)

       Paul Hum                  456              456 Way       Market         (delete)

 

So I would need the SAS procedure where I can restrict which specific expressions should be kept and which ones can be deleted.

 

Thank you for your help.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Jay_Aguilar,

 

Similar to @ed_sas_member's approach, you could create an informat to define the hierarchy, e.g.:

proc format;
invalue type
'Online'      = 1
'Online2'     = 2
'Marketplace' = 3
'Market'      = 4
'In-Store'    = 5
'Offline'     = 6
other         = 99;
run;

 

Then, using a PROC SQL view, you don't need an intermediate dataset or a new variable:

proc sql;
create view _tmp as
select * from have
order by id, /* add more key variables if needed */
         input(Customer_type, type.);
quit;

data want;
set _tmp;
by id; /* add more key variables if needed */
if first.id; /* then replace ID with last key variable */
run;

 

Alternatively, you could define the hierarchy with the WHICHC function and use whichc(Customer_type, ...) instead of input(...) in the ORDER BY clause. The "other" case would then require special consideration (e.g. switching to "if last.id;" with reversed order in the WHICHC arguments) because WHICHC returns 0 for unexpected Customer_type values.

 

View solution in original post

4 REPLIES 4
ed_sas_member
Meteorite | Level 14

Hi @Jay_Aguilar 

 

I am thinking about something like this -> you need to establish the rules to define "priority"

data have;
	infile datalines dlm=",";
	input Customer_Name:$50. Customer_ID Address:$50. Customer_Type:$50.;
	datalines;
Joe Doe,123,123 Way,Online
Joe Doe,123,123 Way,In-Store
Ken Moore,456,456 Way,Online
Ken Moore,456,456 Way,Offline
Lisa Mae,789,789 Way,In-Store
Lisa Mae,123,123 Way,Marketplace
Paul Hum,456,456 Way,Online2
Paul Hum,456,456 Way,Market
;
run;

data have_priority; /*Check priorities !*/
	set have;
	if 	    Customer_Type = "Online" 	  then Priority = 1;
	else if Customer_Type = "Online2" 	  then Priority = 2;
	else if Customer_Type = "In-Store" 	  then Priority = 3;
	else if Customer_Type = "Marketplace" then Priority = 4;
	else if Customer_Type = "Market"	  then Priority = 5;
	else if Customer_Type = "Offline"     then Priority = 6;
run;

proc sort data=have_priority;
	by Customer_Name Customer_ID priority;
run;

data want;
	set have_priority;
	by Customer_Name Customer_ID priority;
	if first.Customer_Name then output;
	drop priority;
run;

Best,

FreelanceReinh
Jade | Level 19

Hi @Jay_Aguilar,

 

Similar to @ed_sas_member's approach, you could create an informat to define the hierarchy, e.g.:

proc format;
invalue type
'Online'      = 1
'Online2'     = 2
'Marketplace' = 3
'Market'      = 4
'In-Store'    = 5
'Offline'     = 6
other         = 99;
run;

 

Then, using a PROC SQL view, you don't need an intermediate dataset or a new variable:

proc sql;
create view _tmp as
select * from have
order by id, /* add more key variables if needed */
         input(Customer_type, type.);
quit;

data want;
set _tmp;
by id; /* add more key variables if needed */
if first.id; /* then replace ID with last key variable */
run;

 

Alternatively, you could define the hierarchy with the WHICHC function and use whichc(Customer_type, ...) instead of input(...) in the ORDER BY clause. The "other" case would then require special consideration (e.g. switching to "if last.id;" with reversed order in the WHICHC arguments) because WHICHC returns 0 for unexpected Customer_type values.

 

Jay_Aguilar
Calcite | Level 5

Thank you very much all @Kurt_Bremser @ed_sas_member @FreelanceReinh, actually all 3 solutions work for me. 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 700 views
  • 1 like
  • 4 in conversation