Hello Everyone,
So I am stuck on a problem on how to use a hiter object to eliminate duplicates from a data set in order to create a new one.
My example data set is orion.order_fact:
Order_
Obs Customer_ID Type
1 4 3
2 4 1
3 4 3
4 4 3
5 4 3
6 4 1
7 4 1
8 4 3
9 4 1
10 5 2
11 5 1
12 5 2
13 5 2
14 5 2
15 5 1
16 5 2
17 5 2
18 5 2
19 5 1
20 5 1
What I need to do is to eliminate the duplicate values of Customer_ID and Order_Type so only one record would appear like
Customer_ID Order_Type
4 1
4 3
5 1
5 2
So far my code attempt is pretty pathetic since I really don't know what I am doing:
data different1;
length Customer_ID 8 Order_Type 8.;
if _N_=1 then do;
if 0 then set orion.order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/
declare hash test(dataset:'orion.order_fact',ordered='A',duplicate:'r');
test.definekey('Customer_ID','Order_Type');
test.definedata('Customer_ID','Order_Type');
test.definedone();
declare hiter t('test');
/*call missing('Customer_ID','Order_Type');*/
end;
t.removeDup();
if rc=0 then output different1;
stop;
run;
proc print data=different1 (obs=10);
run;
If anyone out there has any suggestions on what I should do to remedy this problem, please let me know.
I know there are other ways I could go about this without using a hiter object, but this is for a homework assignment and I need to use a hiter object.
Thanks,
Alisa
You don't need Hiter Object. Since you didn't use Hash Table parameter multidata:'Y' .
data order_fact; input Obs Customer_ID Order_Type ; cards; 1 4 3 2 4 1 3 4 3 15 5 1 16 5 2 17 5 2 ; run; data different1; if _N_=1 then do; if 0 then set order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/ declare hash test(dataset:'order_fact',ordered:'A',duplicate:'r'); test.definekey('Customer_ID','Order_Type'); test.definedata('Customer_ID','Order_Type'); test.definedone(); end; test.output(dataset:'want'); stop; run;
Ksharp
You don't need Hiter Object. Since you didn't use Hash Table parameter multidata:'Y' .
data order_fact; input Obs Customer_ID Order_Type ; cards; 1 4 3 2 4 1 3 4 3 15 5 1 16 5 2 17 5 2 ; run; data different1; if _N_=1 then do; if 0 then set order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/ declare hash test(dataset:'order_fact',ordered:'A',duplicate:'r'); test.definekey('Customer_ID','Order_Type'); test.definedata('Customer_ID','Order_Type'); test.definedone(); end; test.output(dataset:'want'); stop; run;
Ksharp
Hi Ksharp,
This worked out perfectly!
Thanks for your help!
Alisa
Hi Alisa,
As far as I know, there is NO way that you can use hash hiter to achieve this task, as eliminating duplicates implies needed keys, while hiter does not use keys ( not that type of keys anyway). As you can see from Ksharp's post, hash() has been built in this capacity to do the job. And there are also ways to do it more strenuously using removedup(), but I can't hardly see any roles that hiter can play.
Since you mentioned using hiter was part of your assignment requirement, I am more than interested to know the answers after they are released by your instructor. Please report it back. Thanks in advance!
Haikuo
Hi Hai.kuo,
Here is the answer that my professor gave:
data different;
drop rc;
if _N_=1 then do;
if 0 then set orion.order_fact(keep=Customer_ID
Order_Type);
declare hash orders(dataset: 'orion.order_fact',
ordered: 'yes');
declare hiter O_F('orders');
orders.defineKey('Customer_ID', 'Order_Type');
orders.defineData('Customer_ID', 'Order_Type');
orders.defineDone();
end;
rc=O_F.first();
do while (rc=0);
output;
rc=O_F.next();
end;
stop;
run;
proc print data=different(obs=10);
title "No Duplicates";
run;
Alisa,
So as you can see, hiter does not remove duplicates. Hiter here just simply reads in every record in the hash object that is already been processed in term of 'eliminating duplicates'. Hash will load unique records only when not specifying 'multidata' as option.
BTW, IMHO, comparing to Ksharp's approach of output(), using hiter here is kinda clumsy.
Thanks,
Haikuo
Hai:
May I ask a question?
I have many tables with hundreds of columns and thousands of rows.
I need to eliminate the dupplicated rows in each of these tables.
Do you have a generic script that I can run on each ot the tables that delete the dupplicated records and leave the records that do not have dupplicates?
Thanks a lot
Jorge sanchez.
data order_fact;
input customer_id order_type amt;
cards;
4 3 4.99
4 1 1.00
4 3 0.99
5 1 0.02
5 2 9.99
5 2 5.47
;
run;
data some_other_table;
input foo bar;
cards;
1 1
1 2
1 3
1 1
4 1
5 1
4 1
5 2
5 3
5 1
;
run;
%macro hash_noduprec(dataset);
data _null_;
if 0 then set &dataset;
dcl hash i(dataset:"&dataset",ordered:'a'); *order will be based on variable order in dataset;
i.definekey(all:'y');
i.definedata(all:'y');
i.definedone();
i.output(dataset:"&dataset._ndr");
stop;
run;
%mend;
%hash_noduprec(order_fact);
%hash_noduprec(some_other_table);
*I don't see why not to just use proc sort for this task;
proc sort data=order_fact nodupkey; by _all_; run;
Message was edited by: Matthew Kastin
Hi FriedEgg
When using noduprec you want to use all variables for sorting. What can happen else is that you get multiple obs in a by group and as Proc Sort Noduprec only compares consecutive observations you still might end up with duplicates (Noduprec is kind of deceptive - may be not one of the best things SAS invented).
proc sort data=order_fact noduprec; by _all_; run
Patrick.
Agree! option noduprec is almost useless .
Your code is exactly the same with FriedEgg .
Ksharp
Hi,
for dataset have:
data have;
input id age;
cards;
1 20
1 20
2 30
3 40
;
do you want
1 20
2 30
3 40
or
2 30
3 40
?
Alisa,
Agree with Haikuo. Your teacher should amend the code. It's most definitely clumsy.
Thanks
Patrick
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.