BookmarkSubscribeRSS Feed
kuridisanjeev
Quartz | Level 8

Hello All.

Is there any way to get Cartesian product between two datasets in Datasetp???


I was doing some learning process on difference between Merge and Joins.

I can able to get left join and right joins in the datastep by using IN options.but i am not sure how to get Cartesian product in the datastep.


It it not my requirement but i just want know how to achieve this.


Could any one help me on this to get clarify..




Thanks& Regards.


Sanjeev.K

.

19 REPLIES 19
RichardinOz
Quartz | Level 8

Sanjeev

You may get someone providing you with a working solution to obtaining a Cartesian product using a data step but why would you bother?  Proc SQL does this without any fuss.  Anything else is just creating a roundabout way to solve a simple problem.  Do you have a business case for what you are asking?

Richard

kuridisanjeev
Quartz | Level 8

Hi Richardi.


As i already mentioned in my post,"it is not my Actual requirement",but i just want to know. how can we approach the datastep to do this.


In a learning process,i tried all the SQL joins in datastep like Left,Right,Inner,Outer Etc.But i was struck with  Cartesian product.I too always prefer SQL join to get Cartesian product because its straight forward and simple and it wont required much coding as well.


I am using This Communities forum   as a learning opportunity to get clarify many Hidden this in SAS ,even Those requirements not comes into the picture of real time.


 
Don't mind if i  am posting few  silly requirements.



Regards.

Sanjeev.K

RichardinOz
Quartz | Level 8

Well, as I predicted a number of people responded to your challenge, proving

  1. Some people will enjoy devising ingenious solutions to problems even if they have no practical implementation.  "Art for Art's sake."  Well enough.
  2. The SAS data step is a very powerful beast in the right hands.

I would not regard this exercise as a "learning opportunity to get clarity".  Your best learning opportunities are taking the SAS training; reading the SAS documentation (I would recommend bookmarking functions, formats, and commonly used procs like SQL, MEANS, FREQ), and asking your more experienced colleagues to explain something you do not understand.

Richard

esjackso
Quartz | Level 8

It isnt part of the merge on a data step but here is a reference and code for catresian in a datastep

24652 - Generate every combination of observations between data sets

data every_combination;

  /* Set one of your data sets, usually the larger data set */

  set one;

  do i=1 to n;

    /* For every observation in the first data set,    */

    /* read in each observation in the second data set */

    set two point=i nobs=n;

    output;

  end;

run;

Hope that helps!

EJ

UPDATE -- Richard is right -- learning aside -- I did cartesians in SQL only

Ricardo_Neves
Obsidian | Level 7

Hi,

I have a process where I do a Cartesian product of 2 tables  (one with data and one with parameters) using proc sql and then I use data step to apply the needed filters so that at the end only the lines where the data matches the parameter remain.

As you can imagine this is a very inefficient process and so I'm trying to use this approach to and/or merge to apply the filters during the cross join so that the process becomes more effecient. But unfortunately it doesn't seem to be working since and don't get the same results in both cases.

Here is a sample of what I'm doing:

 

    data campanha_000_testes;
    set base
      do i=1 to n;

        set parameter point=i nobs=n;


    if id = 000 and SYSTEM NE 'BBBB' then delete;

        
            if not missing('oferta Origem'n) then
                do;
                    if compress('oferta Origem'n) = compress(pack_dsc);
                end;

                    if find(RP_ID_EXCLUDE,compress(put(pack_cod,10.))) then delete;
                end;

            if not missing(Desconto_mens_max) then
                do;
                    if Desconto_mens_min <= Beneficio <=Desconto_mens_max;
                end;

            if not missing(mensalidade_maxima) then
                do;
                    *distinguir a abordagem;
                    if abordagem in ('Client','Conta') and id ne 303 then
                        do;
                            if mensalidade_minima < mens_liq_conta <= mensalidade_maxima;
                        end;
                    else if compress(abordagem) in ('Client','Conta') and id = 303 then
                        do;
                            if mensalidade_minima < mens_liq_im_conta <= mensalidade_maxima;
                        end;
                    else if abordagem = 'NIF' then
                        do;
                            if mensalidade_minima < mens_liq_NIF <= mensalidade_maxima;
                        end;
                    else
                        do;
                            if Mensalidade_Minima < Mensalidade_Liq_Siva <= Mensalidade_maxima;
                        end;
                end;

            output;

      end;

        run;


Any ideas on how to perform this?


Thanks in advance

 

UmaSingh
Calcite | Level 5

THanks. The code really helped me to get cartesian type prod

Ksharp
Super User

And of course .don't forget Hash Table could also get it.

Brian_C_Brown
Calcite | Level 5

I think a hash table would look up at most one row.  So the resulting dataset would not be a Cartesian product.

Patrick
Opal | Level 21

@Brian_C_Brown

You can iterate over the full hash table either by using an ITER object or by loading the hash with a single value key for all rows and then use the do_over() method.

Haikuo
Onyx | Level 15

FWIW, Eric's code is good when you do Cartesian Product over two tables from top to toe. However, if doing it using BY variables (Doing Cartesian Products within groups, such as many to many join), then Hash() seems to be the only data-step way to go.

On another note, I would generally agree with Richard's comments regarding SQL, which is built natively to do Cartesian product. But from some of my own experiences, sometimes Hash() does hold a performance edge over Proc SQL.

Just my 2 cents,

Haikuo

Astounding
PROC Star

If you need the Cartesian product within BY variables, there is another way that a DATA step can accomplish this.  It's not necessarily the best way, but here's the approach FWIW.

For the smaller data set, sort it.  Then construct a format that translates the BY variable's value into two pieces of information:  the first and last observation number in the sorted data that matches that BY value.

Then perform the "join".  Read in an observation from the larger data set, use the format to retrieve which observation numbers match the value of the "BY" variable, and retrieve them using point=.  Altogether, it's probably 20 lines of code.

Haikuo
Onyx | Level 15

Wow, Astounding! You are squeezing the last drop of juice from conventional data step, very impressed!

Haikuo

donevans
Calcite | Level 5

If you want all the observations from A and all the observations from B, just "stack the datasets".

data fileAB;

set fileA fileB;

run;

***********************

If you want all the data from A and B with a common identifier (ID).

data fileAB;

     merge fileA(in=a) fileB(in=b);

     by ID;

     if a and b;

run;

AjayKant
Calcite | Level 5

Need to understand why SAS system generate merging in data step as SQL  is already done all those work .. its all about the way we think ...I think we need more enhance processing  we need to do new function and then they created this algorithms .. because any other system can't do what merge is doing here in terms of technical processing  .

if they only need to build a Cartesian product they never find the this merge process ..

and if there is possible to generate the Cartesian product .. i am also interested to know .

But my think is always towards find the efficient and more simple function of processing.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 25383 views
  • 7 likes
  • 14 in conversation