BookmarkSubscribeRSS Feed
CharlotteCain
Quartz | Level 8

Hey SAS Folks,

I am in need of rather efficient logic perhaps using HASH Obj rather than typical joins for my problem. I have Dataset A and Dataset B with 100 and 80 variables respectively. I need to combine A and B with a left join with the following conditions:

1. variable1 of A= variable1 of B

2.variable2 of A=variable4 of B

3.variable5 of A= variable5 of B

4. pick only the last 20 variables from Dataset B

Lets call the resultant datatset as AB

Now, I need to left join AB dataset with Datatset C on conditions similar like the above and create the joined Dataset ABC. This process repeats and follows with ABC dataSET combining with Dataset D again on a left join to get the result ABCD with new join and filter  conditions as mentioned above. So, Basically the dataset keeps growing if I have explained well enough!!

If you notice, it's pretty easy to keep writing too many join queries that is highly I/O intensive and less efficient. I would appreciate a solution that I believe can be done using Hash Obj or perhaps in other methods that will be give me the most effective solutions coz each of my datatsets are like 20 million large record.

Any help would be most appreciated,

Thanks,

Charlotte from England 

2 REPLIES 2
jakarman
Barite | Level 11

The problem is stille not well defined in:

- The  limitations you have. Hardware, machine, logical limitations the Operating system you are using.

  Some hardware with multiple processors (you have that) can be usefull exploited by SAS versions. Wich do you use?

  How much internal memory do you have. Using hash-objects implies building a copy of the dataset into internal memory.

- Why it is not possible to do the join first for all datasets ABCD and limit on variables while joining?

- are all variables of A kept and you are limiting them in B?

- As you are mentioning I/O I would guess the datasets with 20M records are about 2Gb of size?

  You are looking voor turn-around (elaps) time optimization?     

Coding Joining  with SQL is easy coding. The processes can be multithreading effectively using multiple cores.

It will possible do random IO (slow-much overhead).


Coding Joining a SAS datastep (eventuly hash) is more work but gives more control.

Having all datasets ordered a merge statement with some additional logic can effectively process the sequential data.

If you have all data on different dasd or being virtualized having a great read ahead cache, this will perform well.    

What is what you have done until now?     

---->-- ja karman --<-----
Patrick
Opal | Level 21

In addition to what Jaap already said:

Hash tables get loaded into memory. Considering the size of your tables do you have enough memory for this?

"ABCD with new join and filter conditions..."

If you can't use hash tables then it's either Merge or SQL - eventually using indexes. What's the most efficient approach will largely depend on your actual data and we would need a good understanding of it to come up with a valid proposition.

You say that the result table will be growing. Is this simply because you're adding variables or will there also be additional rows (so 1:many relationships between the tables)?

May be you give us some sample data - eg. by posting 3 data steps creating such data representative for what you're actually dealing with.

As I understand you the join condition varies (different variables and different logic) between the tables.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 397 views
  • 0 likes
  • 3 in conversation