DATA Step, Macro, Functions and more

Hash Table

Reply
Occasional Contributor
Posts: 16

Hash Table

Hi guys,

 

I have 2 Tables with a One-To-Many Relationship.

The first Table, lets say Table a, is a big Table which contains the columns Date, customerID and Revenue.

The second Table b contains Date, customerID, wealth and adress. but only of a few (premium) rich customers.

It´s possible, that we have more than one Observations with the same Date and Customer ID in Table 1 (If he buys more than one products in one month),

in Table b, only one Observation for each Date/CustomerID-Combination is possible.

 

So, the Tables could look like this:

 

data a;
input date $ customerId $ Revenue $ ;
datalines;
01JAN01 1 10

01JAN01 1 20

01JAN01 2 30
01FEB01 1 400

01FEB01 2 50

01FEB01 3 60

01FEB01 4 30

01MAR01 1 60

01MAR01 2 30

01APR01 1 60

01APR01 2 60

01MAY01 1 800

01MAY01 2 608

01MAY01 2 67

01MAY01 4 65

01MAY01 5 644
;
run;

 

And Table b could look like this:

 

data b;
input date $ customerId $ wealth $ Adress $;
datalines;
01JAN01 1 1000 Stockholm

01JAN01 2 2000 Paris
01FEB01 1 4000 Munich

01MAR01 1 6000 New York

01MAR01 2 3000 Cologne

01MAR01 3 3000 London
;
run;

 

Now, I want to make an Inner Join to join both Tables when a.Date = b.Date and a.CustomerID = b.CustomerID.

The final table should have the following columns: date, customerID, Revenue and wealth.

In case that the criteria is met for more combinations of table a, the desired table should contain all rows.

So in this Scenario, the final table should look like this:

 

DateCustomerIDRevenueWealth
Jan 01110100000
Jan 01120100000
Jan 01230300000
Feb 01130120000
Mrz 011510130000
Mrz 01210200000
Mrz 01310500000

 

This would not be a Problem with SQL, but my Professor wants that I use a Hash-Table, because for large datasets, this would be more efficient.

Could someone kindly provide me a Hash Solution for this?

 

Note: This is just a simplified example that I´ve made. Pls don´t care about the sense of the data, let´s say rich customers can Switch their adress and wealth every month or can be banned from the premium list in every month Smiley Happy

 

With the kindest regards,

Alex

 

 

Super User
Super User
Posts: 9,227

Re: Hash Table

Posted in reply to mrzlatan91

We are not really here to answer your course questions.  Investigate and learn it and come back with any specific questions:

https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/Regina-User-Group/ShaunKaufma...

 

Occasional Contributor
Posts: 16

Re: Hash Table

Well,this is not really a course, It´s a part of my Master Thesis.

I´ve tried to solve my Problem but unfortunately I´m not very familiar with Hash Tables,

I´m only able to solve the Problem with SQL.

 

So, If anyone else could help me I´d really appreciate it.

Super Contributor
Posts: 500

Re: Hash Table

Posted in reply to mrzlatan91

Have you read the documentation? Using hash tables is well explained in numerous papers, e.g. Introduction to SAS® Hash Objects - SAS Support

Super User
Posts: 6,543

Re: Hash Table

Posted in reply to mrzlatan91

Just to add a little bit of a framework, here are the things you will need to learn.

 

How to load table B into a hash table, where the combination of CustomerID and Date are the keys, and wealth and address are the data elements.

 

In the same DATA step, read in observations from table A.  For each, see if the hash table contains a match.  If so, retrieve the matching data from the hash table and output the observation.  (If no match, do nothing.)

Occasional Contributor
Posts: 16

Re: Hash Table

Posted in reply to Astounding

Thanks for your help. Ive tried this, but I´m not really sure if this is the right solution for my Problem.

How can I define that I want a one-to-many relationship?

 

data desired;

if 0 then set work.b;   /* loading in my small table b*/

declare hash VS(dataset:”work.b”);

rc=VS.defineKey(“date”,”customerid”); /*key = date & customer id */

rc=VS.defineData(“date”,”customerid”, “wealth”); /*without address, cause I don’t want it*/

rc=VS.definedone();

do until(eof);

set work.a end=eof; /*loading my big table */

call missing(date, customerid, revenue); /*Is This right? I don’t really get what call missing does to be honest */

rc=VS.find();

if rc=0 then output; /*because I want an inner join */

end;

stop;

run;

 

 

Super Contributor
Posts: 500

Re: Hash Table

[ Edited ]
Posted in reply to mrzlatan91

Did the result match your expectation?

 

Some minor fixes, but untested:

data desired;
   if 0 then set work.b(keep=date customerid wealth);   /* loading in my small table b*/

   if _n_ = 1 then do;
      declare hash VS(dataset:"work.b");
      rc=VS.defineKey("date","customerid"); /*key = date & customer id */
      rc=VS.defineData("wealth"); /*without address, cause I don’t want it */
      rc=VS.definedone();

      call missing(wealth);
   end;
   
   set work.a;

   if vs.find() = 0;
run;

 

Edit: Added keep-option to  if 0 then set work.b

PROC Star
Posts: 1,351

Re: Hash Table

Posted in reply to mrzlatan91

@mrzlatan91wrote:

 

So in this Scenario, the final table should look like this:

 

DateCustomerIDRevenueWealth
Jan 01110100000
Jan 01120100000
Jan 01230300000
Feb 01130120000
Mrz 011510130000
Mrz 01210200000
Mrz 01310500000

 

This would not be a Problem with SQL, but my Professor wants that I use a Hash-Table, because for large datasets, this would be more efficient.

Could someone kindly provide me a Hash Solution for this?

 

Note: This is just a simplified example that I´ve made. Pls don´t care about the sense of the data, let´s say rich customers can Switch their adress and wealth every month or can be banned from the premium list in every month Smiley Happy

 

With the kindest regards,

Alex

 

 

 


Good morning, I fail to understand why your wealth values in your final table are way off than what it is in data b?

Occasional Contributor
Posts: 16

Re: Hash Table

Posted in reply to novinosrin

Dear novinosrin,

sorry, my mistake. I tried to make an other example before, but switched to the current one, but I may have copied the wrong table.

Table b Looks like this:

 

data b;
input date $ customerId $ wealth $ Adress $;
datalines;
01JAN01 1 100000 Stockholm

01JAN01 2 300000 Paris
01FEB01 1 120000 Munich

01MAR01 1 130000 New York

01MAR01 2 200000 Cologne

01MAR01 3 500000 London
;
run;

PROC Star
Posts: 1,351

Re: Hash Table

Posted in reply to mrzlatan91

@mrzlatan91 No worries mate. It appears @andreas_lds has given you the solution. Have you tested it?

Occasional Contributor
Posts: 16

Re: Hash Table

Posted in reply to novinosrin

Unfortunately, I can only test it on friday.

I´ll tell you then if it works.

 

But thanks mate Smiley Happy

Occasional Contributor _s_
Occasional Contributor
Posts: 5

Re: Hash Table

Posted in reply to mrzlatan91

Alex:

Learning to link data tables via hash indexes offers you a chance to learn more about how database systems optimize inner joins. I suspect that your professor knows that in SAS SQL an ordinary inner join of two tables, if a hash index of one will fit into memory, will optimize as a scan of a dynamic hash index. Even so, once you learn how to program the inner join using a hash index, it will be easier to learn how to program a left join using a hash index. The SAS SQL compiler does not, last time I checked, optimize a left join. So except for academic purposes, a SAS SQL SELECT ... FROM r1 INNER JOIN r2 ON r1.key = r2.key will usually execute as fast or faster than a Data step that generates a hash index. Over on the SAS-L side, that discussion has continued off and on for decades.

S

Ask a Question
Discussion stats
  • 11 replies
  • 146 views
  • 0 likes
  • 6 in conversation