Help using Base SAS procedures

Using a Hiter Object to Eliminate Duplicates

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 90
Accepted Solution

Using a Hiter Object to Eliminate Duplicates

Hello Everyone,

So I am stuck on a problem on how to use a hiter object to eliminate duplicates from a data set in order to create a new one.

My example data set is orion.order_fact:

                                                                      Order_

                                 Obs     Customer_ID     Type

                                   1               4                 3

                                   2               4                 1

                                   3               4                 3

                                   4               4                 3

                                   5               4                 3

                                   6               4                 1

                                   7               4                 1

                                   8               4                 3

                                   9               4                 1

                                  10               5                 2

                                  11               5                 1

                                  12               5                 2

                                  13               5                 2

                                  14               5       2

                                  15               5       1

                                  16               5       2

                                  17               5       2

                                  18               5       2

                                  19               5       1

                                  20               5       1

What I need to do is to eliminate the duplicate values of Customer_ID and Order_Type so only one record would appear like

Customer_ID     Order_Type

4                         1

4                         3

5                         1

5                         2

So far my code attempt is pretty pathetic since I really don't know what I am doing:

data different1;

   length Customer_ID 8 Order_Type 8.;

   if _N_=1 then do;

   if 0 then set orion.order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/

   declare hash test(dataset:'orion.order_fact',ordered='A',duplicate:'r');

   test.definekey('Customer_ID','Order_Type');

   test.definedata('Customer_ID','Order_Type');

   test.definedone();

   declare hiter t('test');

   /*call missing('Customer_ID','Order_Type');*/

   end;

   t.removeDup();

   if rc=0 then output different1;

   stop;

run;

proc print data=different1 (obs=10);

run;

If anyone out there has any suggestions on what I should do to remedy this problem, please let me know.

I know there are other ways I could go about this without using a hiter object, but this is for a homework assignment and I need to use a hiter object.

Thanks,

Alisa


Accepted Solutions
Solution
‎04-24-2012 12:14 AM
Super User
Posts: 10,023

Re: Using a Hiter Object to Eliminate Duplicates

Posted in reply to InfoAlisaA

You don't need Hiter Object. Since you didn't use Hash Table parameter  multidata:'Y'  .

  
data order_fact;
input Obs     Customer_ID     Order_Type ;
cards;
1               4                 3
2               4                 1
3               4                 3
15               5       1
16               5       2
17               5       2
;
run;
data different1;
   if _N_=1 then do;
   if 0 then set order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/
   declare hash test(dataset:'order_fact',ordered:'A',duplicate:'r');
   test.definekey('Customer_ID','Order_Type');
   test.definedata('Customer_ID','Order_Type');
   test.definedone();
end;
test.output(dataset:'want');
stop;
run;


Ksharp

View solution in original post


All Replies
Solution
‎04-24-2012 12:14 AM
Super User
Posts: 10,023

Re: Using a Hiter Object to Eliminate Duplicates

Posted in reply to InfoAlisaA

You don't need Hiter Object. Since you didn't use Hash Table parameter  multidata:'Y'  .

  
data order_fact;
input Obs     Customer_ID     Order_Type ;
cards;
1               4                 3
2               4                 1
3               4                 3
15               5       1
16               5       2
17               5       2
;
run;
data different1;
   if _N_=1 then do;
   if 0 then set order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/
   declare hash test(dataset:'order_fact',ordered:'A',duplicate:'r');
   test.definekey('Customer_ID','Order_Type');
   test.definedata('Customer_ID','Order_Type');
   test.definedone();
end;
test.output(dataset:'want');
stop;
run;


Ksharp

Frequent Contributor
Posts: 90

Re: Using a Hiter Object to Eliminate Duplicates

Hi Ksharp,

This worked out perfectly!

Thanks for your help!

Alisa

Respected Advisor
Posts: 3,156

Re: Using a Hiter Object to Eliminate Duplicates

Posted in reply to InfoAlisaA

Hi Alisa,

As far as I know, there is NO way that you can use hash hiter to achieve this task, as eliminating duplicates implies needed keys, while hiter does not use keys ( not that type of keys anyway). As you can see from Ksharp's post, hash() has been built in this capacity to do the job. And there are also ways to do it more strenuously using removedup(), but I can't hardly see any roles that hiter can play. 

Since you mentioned using hiter  was part of your assignment requirement,  I am more than interested to know the answers after they are released by your instructor. Please report it back. Thanks in advance!

Haikuo

Frequent Contributor
Posts: 90

Re: Using a Hiter Object to Eliminate Duplicates

Hi Hai.kuo,

Here is the answer that my professor gave:

data different;

   drop rc;

   if _N_=1 then do;

      if 0 then set orion.order_fact(keep=Customer_ID

                                          Order_Type);

      declare hash orders(dataset: 'orion.order_fact',

                          ordered: 'yes');

      declare hiter O_F('orders');

      orders.defineKey('Customer_ID', 'Order_Type');

      orders.defineData('Customer_ID', 'Order_Type');

      orders.defineDone();

   end;

rc=O_F.first();

   do while (rc=0);

      output;

      rc=O_F.next();

   end;

   stop;

run;

proc print data=different(obs=10);

   title "No Duplicates";

run;

Respected Advisor
Posts: 3,156

Re: Using a Hiter Object to Eliminate Duplicates

Posted in reply to InfoAlisaA

Alisa,

So as you can see, hiter does not remove duplicates. Hiter here just simply reads in every record in the hash object that is already been processed in term of 'eliminating duplicates'. Hash will load  unique records only when not specifying 'multidata' as option.

BTW, IMHO, comparing to Ksharp's approach of output(), using hiter here is kinda clumsy.

Thanks,

Haikuo

N/A
Posts: 1

Re: Using a Hiter Object to Eliminate Duplicates

Hai:

May I ask a question?

I have many tables with hundreds of columns and thousands of rows.

I need to eliminate the dupplicated rows in each of these tables.

Do you have a generic script that I can run on each ot the tables that delete the dupplicated records and leave the records that do not have dupplicates?

Thanks a lot

Jorge sanchez.

Trusted Advisor
Posts: 1,301

Re: Using a Hiter Object to Eliminate Duplicates

Posted in reply to jorge_sanchez

data order_fact;

input customer_id order_type amt;

cards;

4 3 4.99

4 1 1.00

4 3 0.99

5 1 0.02

5 2 9.99

5 2 5.47

;

run;

data some_other_table;

input foo bar;

cards;

1 1

1 2

1 3

1 1

4 1

5 1

4 1

5 2

5 3

5 1

;

run;

%macro hash_noduprec(dataset);

data _null_;

  if 0 then set &dataset;

  dcl hash i(dataset:"&dataset",ordered:'a'); *order will be based on variable order in dataset;

   i.definekey(all:'y');

   i.definedata(all:'y');

   i.definedone();

  i.output(dataset:"&dataset._ndr");

  stop;

run;

%mend;

%hash_noduprec(order_fact);

%hash_noduprec(some_other_table);

*I don't see why not to just use proc sort for this task;

proc sort data=order_fact nodupkey; by _all_; run;

Message was edited by: Matthew Kastin

Respected Advisor
Posts: 4,173

Re: Using a Hiter Object to Eliminate Duplicates

Hi FriedEgg

When using noduprec you want to use all variables for sorting. What can happen else is that you get multiple obs in a by group and as Proc Sort Noduprec only compares consecutive observations you still might end up with duplicates (Noduprec is kind of deceptive - may be not one of the best things SAS invented).

proc sort data=order_fact noduprec; by _all_; run

Super User
Posts: 10,023

Re: Using a Hiter Object to Eliminate Duplicates

Patrick.

Agree!  option noduprec is almost useless .

Your code is exactly the same with FriedEgg .

Ksharp

Super Contributor
Posts: 1,636

Re: Using a Hiter Object to Eliminate Duplicates

Posted in reply to jorge_sanchez

Hi,

for dataset have:

data have;

input id age;

cards;

1 20

1 20

2 30

3 40

;

do you want

1 20

2 30

3 40

or

2 30

3 40

?

Respected Advisor
Posts: 4,173

Re: Using a Hiter Object to Eliminate Duplicates

Posted in reply to InfoAlisaA

Alisa,

Agree with Haikuo. Your teacher should amend the code. It's most definitely clumsy.

Thanks

Patrick

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 492 views
  • 0 likes
  • 7 in conversation