Help using Base SAS procedures

Calculate percent distribution from grouped data

Reply
Frequent Contributor
Posts: 107

Calculate percent distribution from grouped data

I wanted to calculate percent distribution of students by school for each geographic area. I first used proc means to sum up the student counts by area and school, then got the cumulative total. But how do I assign the total student numbers by area so as to calculate the percentage (out of total cases, not out of cumulative total)? I need to merge this percent distribution (by area, school) into other dataset for further data manipulation. Thanks in advance.


data aa;
input id area$ school$ gender $ studentNum;
datalines;
1 area1 school1 M 30
1 area1 school1 F 70
2 area1 school2 M 60
2 area1 school2 F 40
3 area1 school3 M 80
3 area1 school3 F 20
4 area2 school4 M 40
4 area2 school4 F 60
5 area2 school5 M 100
6 area2 school6 F 200
;

proc means data=aa noprint nway;
class area school;
var studentNum;
output out=bb (drop= _freq_ _type_) sum=;
run;
data bb; set bb;
by area school;
if first.area then CUMtotal=0;
CUMtotal+studentNum;
format percent 8.1;
percent=studentNum*100/CUMtotal;
run;
proc print data=aa noobs; run;
proc print data=bb noobs; run;


*** ACTUAL OUTPUT ****************************************;

area school StudentNum CUMtotal Percent
--------------------------------------------------------------------
area1 school1 100 100 100.0
area1 school2 100 200 50.0
area1 school3 100 300 33.3

area2 school4 100 100 100.0
area2 school5 100 200 50.0
area2 school6 200 400 50.0


***DESIRABLE OUTPUT ****************************;
area school StudentNum TOTAL Percent
--------------------------------------------------------------------
area1 school1 100 300 33.3
area1 school2 100 300 33.3
area1 school3 100 300 33.3

area2 school4 100 400 25.0
area2 school5 100 400 25.0
area2 school6 200 400 50.0 Message was edited by: Solph
PROC Star
Posts: 7,467

Re: Calculate percent distribution from grouped data

As usual with SAS, you have multiple options one of which would be to do everything using proc sql. E.g.:
[pre]
proc sql noprint;
create table want as
select distinct
area,
school,
studentnum,
sum(studentnum) as total,
100*studentnum/calculated total as percent
from (select distinct
area,
school,
sum(studentNum) as studentnum
from aa
group by area,school)
group by area
;
quit;
[/pre]
HTH,
Art
Frequent Contributor
Posts: 107

Re: Calculate percent distribution from grouped data

Thanks so much Art. It is one very clever way to do it using SQL.
Ask a Question
Discussion stats
  • 2 replies
  • 2786 views
  • 0 likes
  • 2 in conversation