Re: Hash Table

mrzlatan91 · Posted 04-10-2018 08:10 AM

Hi guys,

I have 2 Tables with a One-To-Many Relationship.

The first Table, lets say Table a, is a big Table which contains the columns Date, customerID and Revenue.

The second Table b contains Date, customerID, wealth and adress. but only of a few (premium) rich customers.

It´s possible, that we have more than one Observations with the same Date and Customer ID in Table 1 (If he buys more than one products in one month),

in Table b, only one Observation for each Date/CustomerID-Combination is possible.

So, the Tables could look like this:

data a;
input date $ customerId $ Revenue $ ;
datalines;
01JAN01 1 10

01JAN01 1 20

01JAN01 2 30
01FEB01 1 400

01FEB01 2 50

01FEB01 3 60

01FEB01 4 30

01MAR01 1 60

01MAR01 2 30

01APR01 1 60

01APR01 2 60

01MAY01 1 800

01MAY01 2 608

01MAY01 2 67

01MAY01 4 65

01MAY01 5 644
;
run;

And Table b could look like this:

data b;
input date $ customerId $ wealth $ Adress $;
datalines;
01JAN01 1 1000 Stockholm

01JAN01 2 2000 Paris
01FEB01 1 4000 Munich

01MAR01 1 6000 New York

01MAR01 2 3000 Cologne

01MAR01 3 3000 London
;
run;

Now, I want to make an Inner Join to join both Tables when a.Date = b.Date and a.CustomerID = b.CustomerID.

The final table should have the following columns: date, customerID, Revenue and wealth.

In case that the criteria is met for more combinations of table a, the desired table should contain all rows.

So in this Scenario, the final table should look like this:

Date	CustomerID	Revenue	Wealth
Jan 01	1	10	100000
Jan 01	1	20	100000
Jan 01	2	30	300000
Feb 01	1	30	120000
Mrz 01	1	510	130000
Mrz 01	2	10	200000
Mrz 01	3	10	500000

This would not be a Problem with SQL, but my Professor wants that I use a Hash-Table, because for large datasets, this would be more efficient.

Could someone kindly provide me a Hash Solution for this?

Note: This is just a simplified example that I´ve made. Pls don´t care about the sense of the data, let´s say rich customers can Switch their adress and wealth every month or can be banned from the premium list in every month 🙂

With the kindest regards,

Alex

RW9 · Posted 04-10-2018 08:32 AM

We are not really here to answer your course questions. Investigate and learn it and come back with any specific questions:

https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/Regina-User-Group/ShaunKaufma...

mrzlatan91 · Posted 04-10-2018 08:44 AM

Well,this is not really a course, It´s a part of my Master Thesis.

I´ve tried to solve my Problem but unfortunately I´m not very familiar with Hash Tables,

I´m only able to solve the Problem with SQL.

So, If anyone else could help me I´d really appreciate it.

andreas_lds · Posted 04-10-2018 08:55 AM

Have you read the documentation? Using hash tables is well explained in numerous papers, e.g. Introduction to SAS® Hash Objects - SAS Support

Astounding · Posted 04-10-2018 09:07 AM

Just to add a little bit of a framework, here are the things you will need to learn.

How to load table B into a hash table, where the combination of CustomerID and Date are the keys, and wealth and address are the data elements.

In the same DATA step, read in observations from table A. For each, see if the hash table contains a match. If so, retrieve the matching data from the hash table and output the observation. (If no match, do nothing.)

mrzlatan91 · Posted 04-10-2018 09:27 AM

Thanks for your help. Ive tried this, but I´m not really sure if this is the right solution for my Problem.

How can I define that I want a one-to-many relationship?

data desired;

if 0 then set work.b; /* loading in my small table b*/

declare hash VS(dataset:”work.b”);

rc=VS.defineKey(“date”,”customerid”); /*key = date & customer id */

rc=VS.defineData(“date”,”customerid”, “wealth”); /*without address, cause I don’t want it*/

rc=VS.definedone();

do until(eof);

set work.a end=eof; /*loading my big table */

call missing(date, customerid, revenue); /*Is This right? I don’t really get what call missing does to be honest */

rc=VS.find();

if rc=0 then output; /*because I want an inner join */

end;

stop;

run;

andreas_lds · Posted 04-10-2018 09:35 AM

Did the result match your expectation?

Some minor fixes, but untested:

data desired;
   if 0 then set work.b(keep=date customerid wealth);   /* loading in my small table b*/

   if _n_ = 1 then do;
      declare hash VS(dataset:"work.b");
      rc=VS.defineKey("date","customerid"); /*key = date & customer id */
      rc=VS.defineData("wealth"); /*without address, cause I don’t want it */
      rc=VS.definedone();

      call missing(wealth);
   end;
   
   set work.a;

   if vs.find() = 0;
run;

Edit: Added keep-option to if 0 then set work.b

novinosrin · Posted 04-10-2018 10:37 AM

@mrzlatan91wrote:

So in this Scenario, the final table should look like this:

Date CustomerID Revenue Wealth
Jan 01 1 10 100000
Jan 01 1 20 100000
Jan 01 2 30 300000
Feb 01 1 30 120000
Mrz 01 1 510 130000
Mrz 01 2 10 200000
Mrz 01 3 10 500000

This would not be a Problem with SQL, but my Professor wants that I use a Hash-Table, because for large datasets, this would be more efficient.
Could someone kindly provide me a Hash Solution for this?

Note: This is just a simplified example that I´ve made. Pls don´t care about the sense of the data, let´s say rich customers can Switch their adress and wealth every month or can be banned from the premium list in every month 🙂

With the kindest regards,
Alex

Good morning, I fail to understand why your wealth values in your final table are way off than what it is in data b?

mrzlatan91 · Posted 04-10-2018 10:52 AM

Dear novinosrin,

sorry, my mistake. I tried to make an other example before, but switched to the current one, but I may have copied the wrong table.

Table b Looks like this:

data b;
input date $ customerId $ wealth $ Adress $;
datalines;
01JAN01 1 100000 Stockholm

01JAN01 2 300000 Paris
01FEB01 1 120000 Munich

01MAR01 1 130000 New York

01MAR01 2 200000 Cologne

01MAR01 3 500000 London
;
run;

novinosrin · Posted 04-10-2018 11:13 AM

@mrzlatan91 No worries mate. It appears @andreas_lds has given you the solution. Have you tested it?

mrzlatan91 · Posted 04-10-2018 11:18 AM

Unfortunately, I can only test it on friday.

I´ll tell you then if it works.

But thanks mate 🙂

_s_ · Posted 04-10-2018 05:59 PM

Alex:

Learning to link data tables via hash indexes offers you a chance to learn more about how database systems optimize inner joins. I suspect that your professor knows that in SAS SQL an ordinary inner join of two tables, if a hash index of one will fit into memory, will optimize as a scan of a dynamic hash index. Even so, once you learn how to program the inner join using a hash index, it will be easier to learn how to program a left join using a hash index. The SAS SQL compiler does not, last time I checked, optimize a left join. So except for academic purposes, a SAS SQL SELECT ... FROM r1 INNER JOIN r2 ON r1.key = r2.key will usually execute as fast or faster than a Data step that generates a hash index. Over on the SAS-L side, that discussion has continued off and on for decades.

S

Registration is open

SAS Training: Just a Click Away