BookmarkSubscribeRSS Feed
mrzlatan91
Obsidian | Level 7

Hi guys,

 

I have 2 Tables with a One-To-Many Relationship.

The first Table, lets say Table a, is a big Table which contains the columns Date, customerID and Revenue.

The second Table b contains Date, customerID, wealth and adress. but only of a few (premium) rich customers.

It´s possible, that we have more than one Observations with the same Date and Customer ID in Table 1 (If he buys more than one products in one month),

in Table b, only one Observation for each Date/CustomerID-Combination is possible.

 

So, the Tables could look like this:

 

data a;
input date $ customerId $ Revenue $ ;
datalines;
01JAN01 1 10

01JAN01 1 20

01JAN01 2 30
01FEB01 1 400

01FEB01 2 50

01FEB01 3 60

01FEB01 4 30

01MAR01 1 60

01MAR01 2 30

01APR01 1 60

01APR01 2 60

01MAY01 1 800

01MAY01 2 608

01MAY01 2 67

01MAY01 4 65

01MAY01 5 644
;
run;

 

And Table b could look like this:

 

data b;
input date $ customerId $ wealth $ Adress $;
datalines;
01JAN01 1 1000 Stockholm

01JAN01 2 2000 Paris
01FEB01 1 4000 Munich

01MAR01 1 6000 New York

01MAR01 2 3000 Cologne

01MAR01 3 3000 London
;
run;

 

Now, I want to make an Inner Join to join both Tables when a.Date = b.Date and a.CustomerID = b.CustomerID.

The final table should have the following columns: date, customerID, Revenue and wealth.

In case that the criteria is met for more combinations of table a, the desired table should contain all rows.

So in this Scenario, the final table should look like this:

 

DateCustomerIDRevenueWealth
Jan 01110100000
Jan 01120100000
Jan 01230300000
Feb 01130120000
Mrz 011510130000
Mrz 01210200000
Mrz 01310500000

 

This would not be a Problem with SQL, but my Professor wants that I use a Hash-Table, because for large datasets, this would be more efficient.

Could someone kindly provide me a Hash Solution for this?

 

Note: This is just a simplified example that I´ve made. Pls don´t care about the sense of the data, let´s say rich customers can Switch their adress and wealth every month or can be banned from the premium list in every month 🙂

 

With the kindest regards,

Alex

 

 

11 REPLIES 11
RW9
Diamond | Level 26 RW9
Diamond | Level 26

We are not really here to answer your course questions.  Investigate and learn it and come back with any specific questions:

https://www.sas.com/content/dam/SAS/en_ca/User%20Group%20Presentations/Regina-User-Group/ShaunKaufma...

 

mrzlatan91
Obsidian | Level 7

Well,this is not really a course, It´s a part of my Master Thesis.

I´ve tried to solve my Problem but unfortunately I´m not very familiar with Hash Tables,

I´m only able to solve the Problem with SQL.

 

So, If anyone else could help me I´d really appreciate it.

andreas_lds
Jade | Level 19

Have you read the documentation? Using hash tables is well explained in numerous papers, e.g. Introduction to SAS® Hash Objects - SAS Support

Astounding
PROC Star

Just to add a little bit of a framework, here are the things you will need to learn.

 

How to load table B into a hash table, where the combination of CustomerID and Date are the keys, and wealth and address are the data elements.

 

In the same DATA step, read in observations from table A.  For each, see if the hash table contains a match.  If so, retrieve the matching data from the hash table and output the observation.  (If no match, do nothing.)

mrzlatan91
Obsidian | Level 7

Thanks for your help. Ive tried this, but I´m not really sure if this is the right solution for my Problem.

How can I define that I want a one-to-many relationship?

 

data desired;

if 0 then set work.b;   /* loading in my small table b*/

declare hash VS(dataset:”work.b”);

rc=VS.defineKey(“date”,”customerid”); /*key = date & customer id */

rc=VS.defineData(“date”,”customerid”, “wealth”); /*without address, cause I don’t want it*/

rc=VS.definedone();

do until(eof);

set work.a end=eof; /*loading my big table */

call missing(date, customerid, revenue); /*Is This right? I don’t really get what call missing does to be honest */

rc=VS.find();

if rc=0 then output; /*because I want an inner join */

end;

stop;

run;

 

 

andreas_lds
Jade | Level 19

Did the result match your expectation?

 

Some minor fixes, but untested:

data desired;
   if 0 then set work.b(keep=date customerid wealth);   /* loading in my small table b*/

   if _n_ = 1 then do;
      declare hash VS(dataset:"work.b");
      rc=VS.defineKey("date","customerid"); /*key = date & customer id */
      rc=VS.defineData("wealth"); /*without address, cause I don’t want it */
      rc=VS.definedone();

      call missing(wealth);
   end;
   
   set work.a;

   if vs.find() = 0;
run;

 

Edit: Added keep-option to  if 0 then set work.b

novinosrin
Tourmaline | Level 20

@mrzlatan91wrote:

 

So in this Scenario, the final table should look like this:

 

DateCustomerIDRevenueWealth
Jan 01110100000
Jan 01120100000
Jan 01230300000
Feb 01130120000
Mrz 011510130000
Mrz 01210200000
Mrz 01310500000

 

This would not be a Problem with SQL, but my Professor wants that I use a Hash-Table, because for large datasets, this would be more efficient.

Could someone kindly provide me a Hash Solution for this?

 

Note: This is just a simplified example that I´ve made. Pls don´t care about the sense of the data, let´s say rich customers can Switch their adress and wealth every month or can be banned from the premium list in every month 🙂

 

With the kindest regards,

Alex

 

 

 


Good morning, I fail to understand why your wealth values in your final table are way off than what it is in data b?

mrzlatan91
Obsidian | Level 7

Dear novinosrin,

sorry, my mistake. I tried to make an other example before, but switched to the current one, but I may have copied the wrong table.

Table b Looks like this:

 

data b;
input date $ customerId $ wealth $ Adress $;
datalines;
01JAN01 1 100000 Stockholm

01JAN01 2 300000 Paris
01FEB01 1 120000 Munich

01MAR01 1 130000 New York

01MAR01 2 200000 Cologne

01MAR01 3 500000 London
;
run;

novinosrin
Tourmaline | Level 20

@mrzlatan91 No worries mate. It appears @andreas_lds has given you the solution. Have you tested it?

mrzlatan91
Obsidian | Level 7

Unfortunately, I can only test it on friday.

I´ll tell you then if it works.

 

But thanks mate 🙂

_s_
Fluorite | Level 6 _s_
Fluorite | Level 6

Alex:

Learning to link data tables via hash indexes offers you a chance to learn more about how database systems optimize inner joins. I suspect that your professor knows that in SAS SQL an ordinary inner join of two tables, if a hash index of one will fit into memory, will optimize as a scan of a dynamic hash index. Even so, once you learn how to program the inner join using a hash index, it will be easier to learn how to program a left join using a hash index. The SAS SQL compiler does not, last time I checked, optimize a left join. So except for academic purposes, a SAS SQL SELECT ... FROM r1 INNER JOIN r2 ON r1.key = r2.key will usually execute as fast or faster than a Data step that generates a hash index. Over on the SAS-L side, that discussion has continued off and on for decades.

S

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1612 views
  • 0 likes
  • 6 in conversation