Re: creating combinatorial output

deleted_user · Posted 08-27-2010 10:50 AM

I'm new to this forum. Sorry if I'm missusing it.

I have a dataset with five columns.
The first column identifies objects A, the second identifies objects B related to each of the objects in the first column, the third gives the number (n) of a certain characteristic (X) each object in the second column has, the fourth the number (m) of objects related to each object in the first column, while the fifth column contains the characterstic X.

235456 10456 3 2 PO
235456 10456 3 2 UI
235456 10456 3 2 UR
235456 14283 3 2 PO
235456 14283 3 2 AL
235456 14283 3 2 KO
.
.
.

I would like to produce a table with all combinations of the characteristic X contained in the material. For each object A there are n to the power of m possible combinations if I'm not misstaken. From the example above
PO-PO
PO-AL
PO-KO
UI-PO
UI-AL
UI-KO
UR-PO
UR-AL
UR-KO

I would appreciate any suggestion how to solv this.

Johan

ArtC · Posted 08-29-2010 07:37 PM

I believe that you are not getting an answer because we do not understand the question - I don't in any case. I don't believe that I understand all the constraints. Are there necessarily only two groups (more than two values in the second column)? Do you always match first to second? The fourth column is always 2, but UI and UR are not related to PO in the first group. Can you expand your example to show what the fourth column means?

Cynthia_sas · Posted 08-29-2010 08:41 PM

Hi:
I agree with Art. I don't understand how your posted results correspond to what you said about wanting "to produce a table with all combinations of the characteristic X contained in the material".

I'm confused about a few things:
1) Right now, in the data, you only show the same value for object A (first col) -- does your data have more than one object A?? I would normally expect that your data was more than 6 rows or observations. What is the logical connection between your object A and your object B and characteristic X. For example, all the values of object A are the same, but you have 2 unique values for object B. Considering ONLY object A, you have 6 values for characteristic X -- but only 5 unique values.

2) You say you have to produce this table with all combinations of X (last column) for every unique value of A -- will you ever have to produce this set of combinations across the whole table or will it always be across unique values of A?? Where do the other columns come into play???

You have 6 values for the last column (characteristic X) for unique values of object A=235456 :
[pre]
PO
UI
UR
PO
AL
KO
[/pre]

(of course PO is repeated twice...you'll have to decide how to deal with that)
ALL the possible combinations would be something like this:
[pre]
PO combined with all:
PO PO
UI PO
UR PO
PO PO
AL PO
KO PO

UI combined with all:
PO UI
UI UI
UR UI
PO UI
AL UI
KO UI

UR Combined with all:
PO UR
UI UR
UR UR
PO UR
AL UR
KO UR

The second value of PO combined with all:
PO PO
UI PO
UR PO
PO PO
AL PO
KO PO

AL combined with all:
PO AL
UI AL
UR AL
PO AL
AL AL
KO AL

KO combined with all:
PO KO
UI KO
UR KO
PO KO
AL KO
KO KO
[/pre]

Is there some other criteria for the combinations you envision?? Is KO-AL the same as AL-KO as a combination, for example??? What about the fact that PO occurs twice for object A -- should it only get considered 1 time???

If you only considered the unique combinations, then you might get combinations like this (PO only counted one time):
[pre]
PO combined with all:
PO PO
UI PO
UR PO
AL PO
KO PO

UI combined with everything but PO:
UI UI
UR UI
AL UI
KO UI

UR combined with everything but PO and UI:
UR UR
AL UR
KO UR

AL combined with everything but PO, UI and UR:
AL AL
KO AL

KO combined with itself:
KO KO
[/pre]

Or is there some other combination that you envision??? The first combination example sounds like a Cartesian product to me. http://support.sas.com/kb/25/270.html
And, this paper has some other examples of SQL joins: http://support.sas.com/resources/papers/proceedings09/062-2009.pdf

The second combination sounds like a specialized join or a Cartesian product with some post-processing to remove duplicates. Right now, I don't understand where your second column (object B) or your third column (X) or your 4th column (M) come into play in the combinations you want.

cynthia

deleted_user · Posted 08-30-2010 05:06 AM

Thanks so much for input!
And you´re right it wasn't clear when I now read it, sorry.

My data has 11 million, or so, rows. There are more than one input A let's say around 1 million. For each A there are from 1 up to m entries B, m is specified in the fourth column. Each entry B has a number, n:1-6 specified in the third column, of the characteristic (classification) X (the two letter combination in the fifth column).

The all combinations statement was wrong, I agree. What I want to do is to list the combinations of characteristics of the different B's belonging to each A.

In the example I gave the entry A has two different B each with 3 classifications. Each classification of B1 can be combined with the 3 classifications of B2, a total of 9 classification-combinations.

Still unclear? Let me know before spending time on proposing solutions and I'll try to rephrase.

Again, many thanks for time spent!
Johan

darrylovia · Posted 08-30-2010 07:15 AM

It sounds like you are look for a cartesian join on a list of some values. I put together a quick example using the SHOES table in SASHELP. Is this the direction that you are look for?

proc sql;
select *
from sashelp.shoes;

proc sql;
create table Region as
select distinct Region
from sashelp.shoes;

create table Product as
select distinct Product
from sashelp.shoes;

select region, product
from Region cross join Product;

deleted_user · Posted 08-30-2010 08:25 AM

Thanks for reply!

Using sashelp.shoes I think I might be able to state my question more clearly. I want to create a table with the possible combinations of two Subsidiaries for each combination of Region and Product.

For Africa and Boot I want the result

Addis Ababa Algiers
Addis Ababa Cairo
Addis Ababa Johannesburg
Addis Ababa Khartoum

etc.

And then similar for Africa and Men's Casual and so on.

Johan

Florent · Posted 08-31-2010 11:18 AM

Hello JFroberg,

Could you please execute the following code and let me know whether it fits your needs or not? Watch out: I did not check if all cases are taken into account but I think it could be a good starting point.

proc sort data=sashelp.shoes out=shoes_1;
by region product subsidiary;
run;

data shoes_2;
set shoes_1;
by region product subsidiary;

id = _N_;
run;

proc sql;
create table shoes_3 as
select distinct
t1.region, t1.product,
trim(left(t1.subsidiary)) || ' ' || trim(left(t2.subsidiary)) as Combinations
from (select id, region, product, subsidiary from shoes_2) t1
, (select id, region, product, subsidiary from shoes_2) t2
where t1.region = t2.region
and t1.product = t2.product
and t1.subsidiary >= t2.subsidiary
and t1.id > t2.id;
quit;

I hope it helps.

Regards,
Florent.

creating combinatorial output