04-25-2013 08:11 AM
Is there any way to get Cartesian product between two datasets in Datasetp???
I was doing some learning process on difference between Merge and Joins.
I can able to get left join and right joins in the datastep by using IN options.but i am not sure how to get Cartesian product in the datastep.
It it not my requirement but i just want know how to achieve this.
Could any one help me on this to get clarify..
04-25-2013 08:23 AM
You may get someone providing you with a working solution to obtaining a Cartesian product using a data step but why would you bother? Proc SQL does this without any fuss. Anything else is just creating a roundabout way to solve a simple problem. Do you have a business case for what you are asking?
04-25-2013 09:04 AM
As i already mentioned in my post,"it is not my Actual requirement",but i just want to know. how can we approach the datastep to do this.
In a learning process,i tried all the SQL joins in datastep like Left,Right,Inner,Outer Etc.But i was struck with Cartesian product.I too always prefer SQL join to get Cartesian product because its straight forward and simple and it wont required much coding as well.
I am using This Communities forum as a learning opportunity to get clarify many Hidden this in SAS ,even Those requirements not comes into the picture of real time.
Don't mind if i am posting few silly requirements.
04-26-2013 10:01 PM
Well, as I predicted a number of people responded to your challenge, proving
I would not regard this exercise as a "learning opportunity to get clarity". Your best learning opportunities are taking the SAS training; reading the SAS documentation (I would recommend bookmarking functions, formats, and commonly used procs like SQL, MEANS, FREQ), and asking your more experienced colleagues to explain something you do not understand.
04-25-2013 08:26 AM
It isnt part of the merge on a data step but here is a reference and code for catresian in a datastep
/* Set one of your data sets, usually the larger data set */
do i=1 to n;
/* For every observation in the first data set, */
/* read in each observation in the second data set */
set two point=i nobs=n;
Hope that helps!
UPDATE -- Richard is right -- learning aside -- I did cartesians in SQL only
04-25-2013 09:16 AM
FWIW, Eric's code is good when you do Cartesian Product over two tables from top to toe. However, if doing it using BY variables (Doing Cartesian Products within groups, such as many to many join), then Hash() seems to be the only data-step way to go.
On another note, I would generally agree with Richard's comments regarding SQL, which is built natively to do Cartesian product. But from some of my own experiences, sometimes Hash() does hold a performance edge over Proc SQL.
Just my 2 cents,
04-25-2013 11:18 AM
If you need the Cartesian product within BY variables, there is another way that a DATA step can accomplish this. It's not necessarily the best way, but here's the approach FWIW.
For the smaller data set, sort it. Then construct a format that translates the BY variable's value into two pieces of information: the first and last observation number in the sorted data that matches that BY value.
Then perform the "join". Read in an observation from the larger data set, use the format to retrieve which observation numbers match the value of the "BY" variable, and retrieve them using point=. Altogether, it's probably 20 lines of code.
04-25-2013 10:35 AM
If you want all the observations from A and all the observations from B, just "stack the datasets".
set fileA fileB;
If you want all the data from A and B with a common identifier (ID).
merge fileA(in=a) fileB(in=b);
if a and b;
01-17-2014 10:35 AM
Need to understand why SAS system generate merging in data step as SQL is already done all those work .. its all about the way we think ...I think we need more enhance processing we need to do new function and then they created this algorithms .. because any other system can't do what merge is doing here in terms of technical processing .
if they only need to build a Cartesian product they never find the this merge process ..
and if there is possible to generate the Cartesian product .. i am also interested to know .
But my think is always towards find the efficient and more simple function of processing.
01-19-2014 03:18 AM
As you are in learning approach.
The cartesian product is mathematical simple but can be for real execution a no go having bigger datasets.
For real big data problems there is move to NOSQL NoSQL - Wikipedia, the free encyclopedia
Do we not have something there that is nosql?.... SAS datastep
01-19-2014 10:28 AM
Hashing in SAS was added with SAS V9. SQL was already present in SAS V6.
NoSql is hyping today but with a closer look at that the same kind of technical approaches as in the time of the PC-XT.
The basic question is also the same:
"what to do when a simple logical approach is going beyond comfort-zone when trying to implement it".