- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
is there a way to take a data set that contains 2 categorical variables and create an output data set with a single line for all possible combinations of those two variables (maybe I'd have to create data sets with duplicates first ? not sure)
if it makes it clearer in R this would be done as follows using the mtcars data (all combinatios returned) - many thanks
expand.grid(unique(mtcars$cyl),unique(mtcars$gear)) Var1 Var2 1 6 4 2 4 4 3 8 4 4 6 3 5 4 3 6 8 3 7 6 5 8 4 5 9 8 5
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you need to use a Cartesian join in PROC SQL.
proc sql;
create table want as select a.name,b.amount
from have(keep=name) a,have(keep=amount) b;
quit;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Do you mean to do a frequency count for the distinct combinations? Most easily done in SQL:
proc sql;
create table want as
select var1, var2, count(*) as count
from have
group by var1, var2;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks, KurtBremser,
I must admit though that doesn't seem to be working for me - I suspect my question wasn't clear.
So I adapted some code/data online and run the code - is this what was intended ? I do just get the counts rather than all combinations - there is no entry in want for name=Raj and amount=20 for example
data have; input NAME $ AMOUNT ; datalines; RAJ 90
RAJ 40 RAVI 20 JOHN 30 JOSEPH 40 run; proc sql; create table want as select name, amount, count(*) as count from have group by name, amount; quit;
many thanks again for your help
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks, Paige, good point the first column in the "R" example is jus the row number that R inserts - it is confusing apologies
for the data set I sent earlier the output I'd want would be
RAJ 20
RAJ 30
RAJ 40
RAJ 90
RAVI 20
RAVI 30
RAVI 40
RAVi 90 JOHN 20
JOHN 30
JOHN 40
JOHN 90 JOSEPH 20
JOSEPH 30
JOSEPH 40
JOSEPH 90 run;
thanks again for all your advice
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You need to do a cartesian join with distinct, then:
proc sql;
create table want as
select distinct
a.name as name,
b.amount as amount
from have a, have b
;
quit;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I think you need to use a Cartesian join in PROC SQL.
proc sql;
create table want as select a.name,b.amount
from have(keep=name) a,have(keep=amount) b;
quit;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Some of us don't understand R code. However, you talk about 2 categorical variables, but you show an example with three variables, which is confusing. Could you show us the input SAS data set, and the desired output?
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
So you only want the combinations based on the values that actually appear in the data?
Not ALL of the possible combinations based on some external definition of the population of possible values for those two variables?
For the latter you would need some other source of the list of possible values.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
many thanks, Paige and Kurt, both methods work brilliantly so I'm spoilt for choice 🙂
yes, thats right Tom - just the values in the actual data set would be OK at the moment though I can see maybe your more general problem might be useful sometimes too
many thanks again for all your help - sorry I'm a bit of a novice SAS user
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
If you need to treat the two RAJ's in your example data as separate persons, @PaigeMiller's code is the way to go. If it's one person, then mine with the distinct prevents the doubling of the RAJ entries.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi KurtBremser
ah yes ... you're right of course - I think its your code I need - many many thanks for all your help and patience !