Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How can I normalize percentages based off number of records?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 10-17-2019 08:50 PM
(562 views)

Hi, my friend asked me the below question but I do not know the answer, could you help me ?

"

I'm doing a basic graph to show the percent of students who graduated within each major. Something like:

Biology - 50% graduate, 50% don't graduate

Art - 30% graduate, 70% don't graduate

etc.

I want to show the majors that have the highest graduation percentages at the top of my graph. However, I want to figure out if there is a way to normalize the percents. What I mean is, some majors are so small they only have 3 students total. So, some majors may have a graduation rate of 66.6%, but really that just means 2 of the 3 students graduated. I want to somehow standardize the percents based off the number of students that make up each percent so that these smaller majors are weighted less. "

6 REPLIES 6

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

In your example, what is the math that you want to use to do this standardizing?

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Paige, to be honest, I do not know the method to do this standardization.......

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

And to be honest, I don't know either.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Maybe you want something like this:

https://blogs.sas.com/content/iml/2013/11/04/create-mosaic-plots-in-sas-by-using-proc-freq.html

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As you and your friend have observed, the uncertainty in an observed proportion depends on the sample size. Majors with few students have bigger uncertainty around their point estimate than majors with many students.

There are two ways to proceed.

The simplest is to add confidence intervals (CIs) to the empirical proportion. You would sort the majors by the empirical proportion of graduates, but the CIs would indicate how confident you are in your computation. A DATA step approach would look like this:

```
data Graduates;
input Major $ Grads Total;
NotGrads = Total - Grads;
datalines;
A 10 22
B 10 32
C 17 25
D 4 7
E 8 14
F 16 28
G 16 19
;
data Binom;
set Graduates;
p = Grads / Total; /* empirical proportion */
StdErr = sqrt(p*(1-p)/Total);
/* use Wald 95% CIs */
z = quantile("normal", 1-0.05/2);
Lower = max(0, p - z*StdErr);
Upper = min(1, p + z*StdErr);
label p = "Proportion" Lower="Lower 95% CL" Upper="Upper 95% CL";
run;
proc sort data=Binom;
by p Lower;
run;
proc sgplot data=Binom;
scatter y=Major x=p /
xerrorlower=Lower xerrorupper=Upper;
yaxis discreteorder=data;
xaxis grid;
run;
```

A more sophisticated method is to use funnel plots for proportions or to incorporate the uncertainty into the ranking.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I wrote an article about this topic: "Compute and visualize binomial proportions in SAS."

**Don't miss out on SAS Innovate - Register now for the FREE Livestream!**

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.