is there a way to take a data set that contains 2 categorical variables and create an output data set with a single line for all possible combinations of those two variables (maybe I'd have to create data sets with duplicates first ? not sure)
if it makes it clearer in R this would be done as follows using the mtcars data (all combinatios returned) - many thanks
expand.grid(unique(mtcars$cyl),unique(mtcars$gear)) Var1 Var2 1 6 4 2 4 4 3 8 4 4 6 3 5 4 3 6 8 3 7 6 5 8 4 5 9 8 5
I think you need to use a Cartesian join in PROC SQL.
proc sql;
create table want as select a.name,b.amount
from have(keep=name) a,have(keep=amount) b;
quit;
Do you mean to do a frequency count for the distinct combinations? Most easily done in SQL:
proc sql;
create table want as
select var1, var2, count(*) as count
from have
group by var1, var2;
quit;
thanks, KurtBremser,
I must admit though that doesn't seem to be working for me - I suspect my question wasn't clear.
So I adapted some code/data online and run the code - is this what was intended ? I do just get the counts rather than all combinations - there is no entry in want for name=Raj and amount=20 for example
data have; input NAME $ AMOUNT ; datalines; RAJ 90
RAJ 40 RAVI 20 JOHN 30 JOSEPH 40 run; proc sql; create table want as select name, amount, count(*) as count from have group by name, amount; quit;
many thanks again for your help
thanks, Paige, good point the first column in the "R" example is jus the row number that R inserts - it is confusing apologies
for the data set I sent earlier the output I'd want would be
RAJ 20
RAJ 30
RAJ 40
RAJ 90
RAVI 20
RAVI 30
RAVI 40
RAVi 90 JOHN 20
JOHN 30
JOHN 40
JOHN 90 JOSEPH 20
JOSEPH 30
JOSEPH 40
JOSEPH 90 run;
thanks again for all your advice
You need to do a cartesian join with distinct, then:
proc sql;
create table want as
select distinct
a.name as name,
b.amount as amount
from have a, have b
;
quit;
I think you need to use a Cartesian join in PROC SQL.
proc sql;
create table want as select a.name,b.amount
from have(keep=name) a,have(keep=amount) b;
quit;
Some of us don't understand R code. However, you talk about 2 categorical variables, but you show an example with three variables, which is confusing. Could you show us the input SAS data set, and the desired output?
So you only want the combinations based on the values that actually appear in the data?
Not ALL of the possible combinations based on some external definition of the population of possible values for those two variables?
For the latter you would need some other source of the list of possible values.
many thanks, Paige and Kurt, both methods work brilliantly so I'm spoilt for choice 🙂
yes, thats right Tom - just the values in the actual data set would be OK at the moment though I can see maybe your more general problem might be useful sometimes too
many thanks again for all your help - sorry I'm a bit of a novice SAS user
If you need to treat the two RAJ's in your example data as separate persons, @PaigeMiller's code is the way to go. If it's one person, then mine with the distinct prevents the doubling of the RAJ entries.
Hi KurtBremser
ah yes ... you're right of course - I think its your code I need - many many thanks for all your help and patience !
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.