BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Jay_Aguilar
Calcite | Level 5

Hi everyone,

 

I have a pretty similar problem as in https://communities.sas.com/t5/SAS-Procedures/Deleting-Duplicate-Records-But-Keep-the-one-with-Speci... but I my specific variable is not binary but can have diffrent expressions:

 

Customer Name    Customer ID  Address      Customer Type

       Joe Doe                  123              123 Way       Online            (retain)

       Joe Doe                  123              123 Way       In-Store         (delete)

       Ken Moore              456              456 Way       Online           (retain)

       Ken Moore              456              456 Way       Offline         (delete)

       Lisa Mae                 789              789 Way       In-Store         (delete)

       Lisa Mae                  123              123 Way       Marketplace         (retain)

       Paul Hum                 456              456 Way       Online2         (retain)

       Paul Hum                  456              456 Way       Market         (delete)

 

So I would need the SAS procedure where I can restrict which specific expressions should be kept and which ones can be deleted.

 

Thank you for your help.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @Jay_Aguilar,

 

Similar to @ed_sas_member's approach, you could create an informat to define the hierarchy, e.g.:

proc format;
invalue type
'Online'      = 1
'Online2'     = 2
'Marketplace' = 3
'Market'      = 4
'In-Store'    = 5
'Offline'     = 6
other         = 99;
run;

 

Then, using a PROC SQL view, you don't need an intermediate dataset or a new variable:

proc sql;
create view _tmp as
select * from have
order by id, /* add more key variables if needed */
         input(Customer_type, type.);
quit;

data want;
set _tmp;
by id; /* add more key variables if needed */
if first.id; /* then replace ID with last key variable */
run;

 

Alternatively, you could define the hierarchy with the WHICHC function and use whichc(Customer_type, ...) instead of input(...) in the ORDER BY clause. The "other" case would then require special consideration (e.g. switching to "if last.id;" with reversed order in the WHICHC arguments) because WHICHC returns 0 for unexpected Customer_type values.

 

View solution in original post

4 REPLIES 4
ed_sas_member
Meteorite | Level 14

Hi @Jay_Aguilar 

 

I am thinking about something like this -> you need to establish the rules to define "priority"

data have;
	infile datalines dlm=",";
	input Customer_Name:$50. Customer_ID Address:$50. Customer_Type:$50.;
	datalines;
Joe Doe,123,123 Way,Online
Joe Doe,123,123 Way,In-Store
Ken Moore,456,456 Way,Online
Ken Moore,456,456 Way,Offline
Lisa Mae,789,789 Way,In-Store
Lisa Mae,123,123 Way,Marketplace
Paul Hum,456,456 Way,Online2
Paul Hum,456,456 Way,Market
;
run;

data have_priority; /*Check priorities !*/
	set have;
	if 	    Customer_Type = "Online" 	  then Priority = 1;
	else if Customer_Type = "Online2" 	  then Priority = 2;
	else if Customer_Type = "In-Store" 	  then Priority = 3;
	else if Customer_Type = "Marketplace" then Priority = 4;
	else if Customer_Type = "Market"	  then Priority = 5;
	else if Customer_Type = "Offline"     then Priority = 6;
run;

proc sort data=have_priority;
	by Customer_Name Customer_ID priority;
run;

data want;
	set have_priority;
	by Customer_Name Customer_ID priority;
	if first.Customer_Name then output;
	drop priority;
run;

Best,

FreelanceReinh
Jade | Level 19

Hi @Jay_Aguilar,

 

Similar to @ed_sas_member's approach, you could create an informat to define the hierarchy, e.g.:

proc format;
invalue type
'Online'      = 1
'Online2'     = 2
'Marketplace' = 3
'Market'      = 4
'In-Store'    = 5
'Offline'     = 6
other         = 99;
run;

 

Then, using a PROC SQL view, you don't need an intermediate dataset or a new variable:

proc sql;
create view _tmp as
select * from have
order by id, /* add more key variables if needed */
         input(Customer_type, type.);
quit;

data want;
set _tmp;
by id; /* add more key variables if needed */
if first.id; /* then replace ID with last key variable */
run;

 

Alternatively, you could define the hierarchy with the WHICHC function and use whichc(Customer_type, ...) instead of input(...) in the ORDER BY clause. The "other" case would then require special consideration (e.g. switching to "if last.id;" with reversed order in the WHICHC arguments) because WHICHC returns 0 for unexpected Customer_type values.

 

Jay_Aguilar
Calcite | Level 5

Thank you very much all @Kurt_Bremser @ed_sas_member @FreelanceReinh, actually all 3 solutions work for me. 

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 890 views
  • 1 like
  • 4 in conversation