I have data that looks like this (actually 7000 records, multiple entities, 3 sources):
data gender_by_source_and_entity;
input entity source gender;
datalines;
1 1 1
1 1 1
1 1 2
1 2 1
1 2 1
1 2 2
2 1 1
2 1 1
2 1 1
2 1 2
2 2 2
2 2 2
3 1 1
3 1 1
3 2 1
3 2 2
3 2 2
3 2 2
;
run;
I want to create a table that shows counts of gender by source by entity, with subtotals, like so:
gender count 1 2 Entity source 1 1 2 1 2 2 1 Sub 4 2 2 1 3 1 2 0 2 Sub 3 3 3 1 2 0 2 1 3 Sub 3 3
I really actually want percents in there in addition to raw counts.
I THINK Proc TABULATE is the way to do this, but it is smarter than I am.
proc tabulate data = gender_by_source_and_entity;
class entity gender;
var source;
table entity * source, gender * (N rowpctn);
run;
gets me close but not there
gender 1 2 N RowPctN N RowPctN entity 4 66.67 2 33.33 1 source 2 source 3 50 3 50 3 source 3 50 3 50
specifically it doesn't break the entities down into their multiple sources.
Can I get a suggestion, some guidance, or a bone?
Thanks,
HB
PROC TABULATE is smarter that all of us. But here's code that does what I think you want:
data gender_by_source_and_entity;
input entity source gender;
datalines;
1 1 1
1 1 1
1 1 2
1 2 1
1 2 1
1 2 2
2 1 1
2 1 1
2 1 1
2 1 2
2 2 2
2 2 2
3 1 1
3 1 1
3 2 1
3 2 2
3 2 2
3 2 2
;
run;
proc tabulate data=_last_ noseps;
class entity source gender;
tables entity=' '*(source=' ' all='Sub'), gender*(N='count'*f=comma6. rowpctn='Pct'*f=6.2)
/box='Entity Source ';
run;
After the comma in the tables statement is the description of the columns, which are counts and percentages nested within gender: gender*(N rowpctn). The rest of the after-comma clause is just controlling labels and formats.
Before the comma is the row descriptino, in which you can (source and total-over-sources) nested within entity: entity*(source all).
PROC TABULATE is smarter that all of us. But here's code that does what I think you want:
data gender_by_source_and_entity;
input entity source gender;
datalines;
1 1 1
1 1 1
1 1 2
1 2 1
1 2 1
1 2 2
2 1 1
2 1 1
2 1 1
2 1 2
2 2 2
2 2 2
3 1 1
3 1 1
3 2 1
3 2 2
3 2 2
3 2 2
;
run;
proc tabulate data=_last_ noseps;
class entity source gender;
tables entity=' '*(source=' ' all='Sub'), gender*(N='count'*f=comma6. rowpctn='Pct'*f=6.2)
/box='Entity Source ';
run;
After the comma in the tables statement is the description of the columns, which are counts and percentages nested within gender: gender*(N rowpctn). The rest of the after-comma clause is just controlling labels and formats.
Before the comma is the row descriptino, in which you can (source and total-over-sources) nested within entity: entity*(source all).
Thank you, thank you, thank you. I wasn't getting the "all three variables in the class statement" part.
Props to BallardW for the /misstext = '0' thing. Useful.
If you want a variable to be a grouping variable it goes on a CLASS statement, not VAR.
So maybe
proc tabulate data = gender_by_source_and_entity; class entity gender; class source; table entity * source, gender * (N rowpctn)
/ misstext='0'; run;
To get 0 for things not appearing in the data you can use the table misstext option as shown
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.