BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
InfoAlisaA
Calcite | Level 5

Hello Everyone,

So I am stuck on a problem on how to use a hiter object to eliminate duplicates from a data set in order to create a new one.

My example data set is orion.order_fact:

                                                                      Order_

                                 Obs     Customer_ID     Type

                                   1               4                 3

                                   2               4                 1

                                   3               4                 3

                                   4               4                 3

                                   5               4                 3

                                   6               4                 1

                                   7               4                 1

                                   8               4                 3

                                   9               4                 1

                                  10               5                 2

                                  11               5                 1

                                  12               5                 2

                                  13               5                 2

                                  14               5       2

                                  15               5       1

                                  16               5       2

                                  17               5       2

                                  18               5       2

                                  19               5       1

                                  20               5       1

What I need to do is to eliminate the duplicate values of Customer_ID and Order_Type so only one record would appear like

Customer_ID     Order_Type

4                         1

4                         3

5                         1

5                         2

So far my code attempt is pretty pathetic since I really don't know what I am doing:

data different1;

   length Customer_ID 8 Order_Type 8.;

   if _N_=1 then do;

   if 0 then set orion.order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/

   declare hash test(dataset:'orion.order_fact',ordered='A',duplicate:'r');

   test.definekey('Customer_ID','Order_Type');

   test.definedata('Customer_ID','Order_Type');

   test.definedone();

   declare hiter t('test');

   /*call missing('Customer_ID','Order_Type');*/

   end;

   t.removeDup();

   if rc=0 then output different1;

   stop;

run;

proc print data=different1 (obs=10);

run;

If anyone out there has any suggestions on what I should do to remedy this problem, please let me know.

I know there are other ways I could go about this without using a hiter object, but this is for a homework assignment and I need to use a hiter object.

Thanks,

Alisa

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

You don't need Hiter Object. Since you didn't use Hash Table parameter  multidata:'Y'  .

  
data order_fact;
input Obs     Customer_ID     Order_Type ;
cards;
1               4                 3
2               4                 1
3               4                 3
15               5       1
16               5       2
17               5       2
;
run;
data different1;
   if _N_=1 then do;
   if 0 then set order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/
   declare hash test(dataset:'order_fact',ordered:'A',duplicate:'r');
   test.definekey('Customer_ID','Order_Type');
   test.definedata('Customer_ID','Order_Type');
   test.definedone();
end;
test.output(dataset:'want');
stop;
run;


Ksharp

View solution in original post

11 REPLIES 11
Ksharp
Super User

You don't need Hiter Object. Since you didn't use Hash Table parameter  multidata:'Y'  .

  
data order_fact;
input Obs     Customer_ID     Order_Type ;
cards;
1               4                 3
2               4                 1
3               4                 3
15               5       1
16               5       2
17               5       2
;
run;
data different1;
   if _N_=1 then do;
   if 0 then set order_fact(keep=Customer_ID Order_Type);/*Using this method instead of the call missing to initialize the variables*/
   declare hash test(dataset:'order_fact',ordered:'A',duplicate:'r');
   test.definekey('Customer_ID','Order_Type');
   test.definedata('Customer_ID','Order_Type');
   test.definedone();
end;
test.output(dataset:'want');
stop;
run;


Ksharp

InfoAlisaA
Calcite | Level 5

Hi Ksharp,

This worked out perfectly!

Thanks for your help!

Alisa

Haikuo
Onyx | Level 15

Hi Alisa,

As far as I know, there is NO way that you can use hash hiter to achieve this task, as eliminating duplicates implies needed keys, while hiter does not use keys ( not that type of keys anyway). As you can see from Ksharp's post, hash() has been built in this capacity to do the job. And there are also ways to do it more strenuously using removedup(), but I can't hardly see any roles that hiter can play. 

Since you mentioned using hiter  was part of your assignment requirement,  I am more than interested to know the answers after they are released by your instructor. Please report it back. Thanks in advance!

Haikuo

InfoAlisaA
Calcite | Level 5

Hi Hai.kuo,

Here is the answer that my professor gave:

data different;

   drop rc;

   if _N_=1 then do;

      if 0 then set orion.order_fact(keep=Customer_ID

                                          Order_Type);

      declare hash orders(dataset: 'orion.order_fact',

                          ordered: 'yes');

      declare hiter O_F('orders');

      orders.defineKey('Customer_ID', 'Order_Type');

      orders.defineData('Customer_ID', 'Order_Type');

      orders.defineDone();

   end;

rc=O_F.first();

   do while (rc=0);

      output;

      rc=O_F.next();

   end;

   stop;

run;

proc print data=different(obs=10);

   title "No Duplicates";

run;

Haikuo
Onyx | Level 15

Alisa,

So as you can see, hiter does not remove duplicates. Hiter here just simply reads in every record in the hash object that is already been processed in term of 'eliminating duplicates'. Hash will load  unique records only when not specifying 'multidata' as option.

BTW, IMHO, comparing to Ksharp's approach of output(), using hiter here is kinda clumsy.

Thanks,

Haikuo

jorge_sanchez
Calcite | Level 5

Hai:

May I ask a question?

I have many tables with hundreds of columns and thousands of rows.

I need to eliminate the dupplicated rows in each of these tables.

Do you have a generic script that I can run on each ot the tables that delete the dupplicated records and leave the records that do not have dupplicates?

Thanks a lot

Jorge sanchez.

FriedEgg
SAS Employee

data order_fact;

input customer_id order_type amt;

cards;

4 3 4.99

4 1 1.00

4 3 0.99

5 1 0.02

5 2 9.99

5 2 5.47

;

run;

data some_other_table;

input foo bar;

cards;

1 1

1 2

1 3

1 1

4 1

5 1

4 1

5 2

5 3

5 1

;

run;

%macro hash_noduprec(dataset);

data _null_;

  if 0 then set &dataset;

  dcl hash i(dataset:"&dataset",ordered:'a'); *order will be based on variable order in dataset;

   i.definekey(all:'y');

   i.definedata(all:'y');

   i.definedone();

  i.output(dataset:"&dataset._ndr");

  stop;

run;

%mend;

%hash_noduprec(order_fact);

%hash_noduprec(some_other_table);

*I don't see why not to just use proc sort for this task;

proc sort data=order_fact nodupkey; by _all_; run;

Message was edited by: Matthew Kastin

Patrick
Opal | Level 21

Hi FriedEgg

When using noduprec you want to use all variables for sorting. What can happen else is that you get multiple obs in a by group and as Proc Sort Noduprec only compares consecutive observations you still might end up with duplicates (Noduprec is kind of deceptive - may be not one of the best things SAS invented).

proc sort data=order_fact noduprec; by _all_; run

Ksharp
Super User

Patrick.

Agree!  option noduprec is almost useless .

Your code is exactly the same with FriedEgg .

Ksharp

Linlin
Lapis Lazuli | Level 10

Hi,

for dataset have:

data have;

input id age;

cards;

1 20

1 20

2 30

3 40

;

do you want

1 20

2 30

3 40

or

2 30

3 40

?

Patrick
Opal | Level 21

Alisa,

Agree with Haikuo. Your teacher should amend the code. It's most definitely clumsy.

Thanks

Patrick

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1967 views
  • 2 likes
  • 7 in conversation