BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I'm new to this forum. Sorry if I'm missusing it.

I have a dataset with five columns.
The first column identifies objects A, the second identifies objects B related to each of the objects in the first column, the third gives the number (n) of a certain characteristic (X) each object in the second column has, the fourth the number (m) of objects related to each object in the first column, while the fifth column contains the characterstic X.

235456 10456 3 2 PO
235456 10456 3 2 UI
235456 10456 3 2 UR
235456 14283 3 2 PO
235456 14283 3 2 AL
235456 14283 3 2 KO
.
.
.

I would like to produce a table with all combinations of the characteristic X contained in the material. For each object A there are n to the power of m possible combinations if I'm not misstaken. From the example above
PO-PO
PO-AL
PO-KO
UI-PO
UI-AL
UI-KO
UR-PO
UR-AL
UR-KO

I would appreciate any suggestion how to solv this.

Johan
6 REPLIES 6
ArtC
Rhodochrosite | Level 12
I believe that you are not getting an answer because we do not understand the question - I don't in any case. I don't believe that I understand all the constraints. Are there necessarily only two groups (more than two values in the second column)? Do you always match first to second? The fourth column is always 2, but UI and UR are not related to PO in the first group. Can you expand your example to show what the fourth column means?
Cynthia_sas
SAS Super FREQ
Hi:
I agree with Art. I don't understand how your posted results correspond to what you said about wanting "to produce a table with all combinations of the characteristic X contained in the material".

I'm confused about a few things:
1) Right now, in the data, you only show the same value for object A (first col) -- does your data have more than one object A?? I would normally expect that your data was more than 6 rows or observations. What is the logical connection between your object A and your object B and characteristic X. For example, all the values of object A are the same, but you have 2 unique values for object B. Considering ONLY object A, you have 6 values for characteristic X -- but only 5 unique values.

2) You say you have to produce this table with all combinations of X (last column) for every unique value of A -- will you ever have to produce this set of combinations across the whole table or will it always be across unique values of A?? Where do the other columns come into play???

You have 6 values for the last column (characteristic X) for unique values of object A=235456 :
[pre]
PO
UI
UR
PO
AL
KO
[/pre]

(of course PO is repeated twice...you'll have to decide how to deal with that)
ALL the possible combinations would be something like this:
[pre]
PO combined with all:
PO PO
UI PO
UR PO
PO PO
AL PO
KO PO

UI combined with all:
PO UI
UI UI
UR UI
PO UI
AL UI
KO UI

UR Combined with all:
PO UR
UI UR
UR UR
PO UR
AL UR
KO UR

The second value of PO combined with all:
PO PO
UI PO
UR PO
PO PO
AL PO
KO PO

AL combined with all:
PO AL
UI AL
UR AL
PO AL
AL AL
KO AL

KO combined with all:
PO KO
UI KO
UR KO
PO KO
AL KO
KO KO
[/pre]

Is there some other criteria for the combinations you envision?? Is KO-AL the same as AL-KO as a combination, for example??? What about the fact that PO occurs twice for object A -- should it only get considered 1 time???

If you only considered the unique combinations, then you might get combinations like this (PO only counted one time):
[pre]
PO combined with all:
PO PO
UI PO
UR PO
AL PO
KO PO

UI combined with everything but PO:
UI UI
UR UI
AL UI
KO UI

UR combined with everything but PO and UI:
UR UR
AL UR
KO UR

AL combined with everything but PO, UI and UR:
AL AL
KO AL

KO combined with itself:
KO KO
[/pre]

Or is there some other combination that you envision??? The first combination example sounds like a Cartesian product to me. http://support.sas.com/kb/25/270.html
And, this paper has some other examples of SQL joins: http://support.sas.com/resources/papers/proceedings09/062-2009.pdf

The second combination sounds like a specialized join or a Cartesian product with some post-processing to remove duplicates. Right now, I don't understand where your second column (object B) or your third column (X) or your 4th column (M) come into play in the combinations you want.

cynthia
deleted_user
Not applicable
Thanks so much for input!
And you´re right it wasn't clear when I now read it, sorry.

My data has 11 million, or so, rows. There are more than one input A let's say around 1 million. For each A there are from 1 up to m entries B, m is specified in the fourth column. Each entry B has a number, n:1-6 specified in the third column, of the characteristic (classification) X (the two letter combination in the fifth column).

The all combinations statement was wrong, I agree. What I want to do is to list the combinations of characteristics of the different B's belonging to each A.

In the example I gave the entry A has two different B each with 3 classifications. Each classification of B1 can be combined with the 3 classifications of B2, a total of 9 classification-combinations.

Still unclear? Let me know before spending time on proposing solutions and I'll try to rephrase.

Again, many thanks for time spent!
Johan
darrylovia
Quartz | Level 8
It sounds like you are look for a cartesian join on a list of some values. I put together a quick example using the SHOES table in SASHELP. Is this the direction that you are look for?

proc sql;
select *
from sashelp.shoes;


proc sql;
create table Region as
select distinct Region
from sashelp.shoes;

create table Product as
select distinct Product
from sashelp.shoes;

select region, product
from Region cross join Product;
deleted_user
Not applicable
Thanks for reply!

Using sashelp.shoes I think I might be able to state my question more clearly. I want to create a table with the possible combinations of two Subsidiaries for each combination of Region and Product.

For Africa and Boot I want the result

Addis Ababa Algiers
Addis Ababa Cairo
Addis Ababa Johannesburg
Addis Ababa Khartoum

etc.

And then similar for Africa and Men's Casual and so on.

Johan
Florent
Quartz | Level 8
Hello JFroberg,

Could you please execute the following code and let me know whether it fits your needs or not? Watch out: I did not check if all cases are taken into account but I think it could be a good starting point.

proc sort data=sashelp.shoes out=shoes_1;
by region product subsidiary;
run;

data shoes_2;
set shoes_1;
by region product subsidiary;

id = _N_;
run;

proc sql;
create table shoes_3 as
select distinct
t1.region, t1.product,
trim(left(t1.subsidiary)) || ' ' || trim(left(t2.subsidiary)) as Combinations
from (select id, region, product, subsidiary from shoes_2) t1
, (select id, region, product, subsidiary from shoes_2) t2
where t1.region = t2.region
and t1.product = t2.product
and t1.subsidiary >= t2.subsidiary
and t1.id > t2.id;
quit;

I hope it helps.

Regards,
Florent.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 975 views
  • 0 likes
  • 5 in conversation