BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
HB
Barite | Level 11 HB
Barite | Level 11

 

 

I have data that looks like this (actually 7000 records, multiple entities, 3 sources):

data gender_by_source_and_entity;
   input entity source gender;
datalines;
1 1 1
1 1 1
1 1 2
1 2 1
1 2 1
1 2 2
2 1 1
2 1 1
2 1 1
2 1 2
2 2 2
2 2 2
3 1 1
3 1 1
3 2 1
3 2 2
3 2 2
3 2 2
;
run;

I want to create a table that shows counts of gender by source by entity, with subtotals, like so:

		gender count	
		1	2
Entity	source		
			
1	1	2	1
	2	2	1
	Sub	4	2
2	1	3	1
	2	0	2
	Sub	3	3
3	1	2	0
	2	1	3
	Sub	3	3

I really actually want percents in there in addition to raw counts.

 

I THINK Proc TABULATE is the way to do this, but it is smarter than I am.

proc tabulate data = gender_by_source_and_entity;
	class entity gender;
	var source;
	table entity * source, gender * (N rowpctn);
run;

gets me close but not there

 

		gender			
		1		2	
		N	RowPctN	N	RowPctN
entity		4	66.67	2	33.33
1	source				
2	source	3	50	3	50
3	source	3	50	3	50

specifically it doesn't break the entities down into their multiple sources. 

 

Can I get a suggestion, some guidance, or a bone? 

 

Thanks,

HB

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

PROC TABULATE is smarter that all of us.  But here's code that does what I think you want:

 

data gender_by_source_and_entity;
   input entity source gender;
datalines;
1 1 1
1 1 1
1 1 2
1 2 1
1 2 1
1 2 2
2 1 1
2 1 1
2 1 1
2 1 2
2 2 2
2 2 2
3 1 1
3 1 1
3 2 1
3 2 2
3 2 2
3 2 2
;
run;
proc tabulate data=_last_ noseps;
  class entity source gender;

  tables entity=' '*(source=' ' all='Sub'), gender*(N='count'*f=comma6. rowpctn='Pct'*f=6.2)
   /box='Entity   Source ';
run;

After the comma in the tables statement is the description of the columns, which are counts and percentages nested within gender: gender*(N rowpctn).   The rest of the after-comma clause is just controlling labels and formats.

 

Before the comma is the row descriptino, in which you can    (source and total-over-sources) nested within entity:  entity*(source all).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

3 REPLIES 3
mkeintz
PROC Star

PROC TABULATE is smarter that all of us.  But here's code that does what I think you want:

 

data gender_by_source_and_entity;
   input entity source gender;
datalines;
1 1 1
1 1 1
1 1 2
1 2 1
1 2 1
1 2 2
2 1 1
2 1 1
2 1 1
2 1 2
2 2 2
2 2 2
3 1 1
3 1 1
3 2 1
3 2 2
3 2 2
3 2 2
;
run;
proc tabulate data=_last_ noseps;
  class entity source gender;

  tables entity=' '*(source=' ' all='Sub'), gender*(N='count'*f=comma6. rowpctn='Pct'*f=6.2)
   /box='Entity   Source ';
run;

After the comma in the tables statement is the description of the columns, which are counts and percentages nested within gender: gender*(N rowpctn).   The rest of the after-comma clause is just controlling labels and formats.

 

Before the comma is the row descriptino, in which you can    (source and total-over-sources) nested within entity:  entity*(source all).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
HB
Barite | Level 11 HB
Barite | Level 11

Thank you, thank you, thank you.  I wasn't getting the "all three variables in the class statement" part. 

 

Props to BallardW for the /misstext = '0' thing.  Useful.

 

ballardw
Super User

If you want a variable to be a grouping variable it goes on a CLASS statement, not VAR.

 

So maybe

proc tabulate data = gender_by_source_and_entity;
	class entity gender;
	class source;
	table entity * source, gender * (N rowpctn)
/ misstext='0'; run;

To get 0 for things not appearing in the data you can use the table misstext option as shown

sas-innovate-white.png

🚨 Early Bird Rate Extended!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.

 

Lock in the best rate now before the price increases on April 1.

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 825 views
  • 2 likes
  • 3 in conversation