BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bnorth66
Calcite | Level 5

is there a way to take a data set that contains 2 categorical variables and create an output data set with a single line for all possible combinations of those two variables (maybe I'd have to create data sets with duplicates first ? not sure)

if it makes it clearer in R this would be done as follows using the mtcars data (all combinatios returned) - many thanks

expand.grid(unique(mtcars$cyl),unique(mtcars$gear))
Var1 Var2
1 6 4
2 4 4
3 8 4
4 6 3
5 4 3
6 8 3
7 6 5
8 4 5
9 8 5
1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

I think you need to use a Cartesian join in PROC SQL.

 

proc sql;
    create table want as select a.name,b.amount
         from have(keep=name) a,have(keep=amount) b;
quit;
--
Paige Miller

View solution in original post

10 REPLIES 10
Kurt_Bremser
Super User

Do you mean to do a frequency count for the distinct combinations? Most easily done in SQL:

proc sql;
create table want as
select var1, var2, count(*) as count
from have
group by var1, var2;
quit;
bnorth66
Calcite | Level 5

thanks, KurtBremser,

I must admit though that doesn't seem to be working for me - I suspect my question wasn't clear.

So I adapted some code/data online and run the code - is this what was intended ? I do just get the counts rather than all combinations - there is no entry in want for name=Raj and amount=20 for example

data have;
  input NAME $ AMOUNT ;
  datalines;
RAJ     90  
RAJ 40 RAVI 20 JOHN 30 JOSEPH 40 run; proc sql; create table want as select name, amount, count(*) as count from have group by name, amount; quit;

 many thanks again for your help

bnorth66
Calcite | Level 5

thanks, Paige, good point the first column in the "R" example is jus the row number that R inserts - it is confusing apologies

for the data set I sent earlier the output I'd want would be

RAJ 20  
RAJ 30  
RAJ 40
RAJ 90
RAVI 20  
RAVI 30  
RAVI 40
RAVi 90 JOHN 20  
JOHN 30  
JOHN 40
JOHN 90 JOSEPH 20  
JOSEPH 30  
JOSEPH 40
JOSEPH 90 run;

thanks again for all your advice

PaigeMiller
Diamond | Level 26

I think you need to use a Cartesian join in PROC SQL.

 

proc sql;
    create table want as select a.name,b.amount
         from have(keep=name) a,have(keep=amount) b;
quit;
--
Paige Miller
PaigeMiller
Diamond | Level 26

Some of us don't understand R code. However, you talk about 2 categorical variables, but you show an example with three variables, which is confusing. Could you show us the input SAS data set, and the desired output?

--
Paige Miller
Tom
Super User Tom
Super User

So you only want the combinations based on the values that actually appear in the data? 

Not ALL of the possible combinations based on some external definition of the population of possible values for those two variables?

For the latter you would need some other source of the list of possible values.

bnorth66
Calcite | Level 5

many thanks, Paige and Kurt, both methods work brilliantly so I'm spoilt for choice 🙂

yes, thats right Tom - just the values in the actual data set would be OK at the moment though I can see maybe your more general problem might be useful sometimes too

 

many thanks again for all your help - sorry I'm a bit of a novice SAS user

Kurt_Bremser
Super User

If you need to treat the two RAJ's in your example data as separate persons, @PaigeMiller's code is the way to go. If it's one person, then mine with the distinct prevents the doubling of the RAJ entries.

bnorth66
Calcite | Level 5

Hi KurtBremser

ah yes ... you're right of course - I think its your code I need - many many thanks for all your help and patience !

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 11303 views
  • 7 likes
  • 4 in conversation