How do I write a SAS function to calculate the hypergeometric distribution of there being more than nine females surveyed from a sample of twenty?
In a large university, 40% of the students are female.
If a random sample of twenty students is selected, what is the probability that the sample will contain more than nine female students? (Round your answer to four decimal places.)
sample #of females surveyed: | |
---|---|
0 | 3.6561584400629733e-05 |
1 | 0.000487487792008398 |
2 | 0.0030874226827198492 |
3 | 0.012349690730879413 |
4 | 0.034990790404158215 |
5 | 0.0746470195288711 |
6 | 0.12441169921478513 |
7 | 0.1658822656197136 |
8 | 0.17970578775468962 |
9 | 0.1597384780041684 |
10 | 0.11714155053639005 |
11 | 0.07099487911296365 |
12 | 0.03549743955648174 |
13 | 0.01456305212573616 |
14 | 0.004854350708578719 |
15 | 0.0012944935222876583 |
16 | 0.00026968615047659553 |
17 | 4.2303709878681673e-05 |
18 | 4.70041220874241e-06 |
19 | 3.2985348833280015e-07 |
20 | 1.0995116277760013e-08 |
I generated the table above using Python, so I imagine this data is likely inconsistent. However, it provided me with the correct probability for exactly four females. which was 0.0350.
I am currently trying to write a function for the excerpt below:
If a random sample of twenty students is selected, what is the probability that the sample will contain more than nine female students?
Hello,
Start here :
SAS® 9.4 and SAS® Viya® 3.5 Programming Documentation | SAS 9.4 / Viya 3.5
Functions and CALL Routines
CDF Hypergeometric Distribution https://go.documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lefunctionsref/p1x9o3ozc5ft8yn1kcn0p6yg4aw...
Maybe you need the QUANTILE function.
The QUANTILE function returns the quantile from a distribution that you specify.
The QUANTILE function is the inverse of the CDF function.
Other functions that SAS has for this distribution (and MANY other distributions) :
data _NULL_;
y=probhypr(10000,4000,20,9);
put y= percent7.2;
run;
data _NULL_;
y=cdf('HYPER', 9, 10000, 4000, 20);
put y= percent7.2;
run;
/* end of program */
Koen
Clearly, this is an assignment, so let me provide some hints rather than the solution:
1. The hypergeometric distribution is used when you want the probabilities for a small finite population where the size of the population is known. Assuming that the university is large, I suspect the instructor intends you to use a binomial distribution. To use the hypergeometric distribution, you must know the total number of students at the university and the number of females. For large populations (like 500 or more students), the hypergeometric and the binomial distributions are similar.
2. There are two related concepts here. If you want to know the probability that EXACTLY k females appear in a sample of size 20, you can use the PDF for the binomial distribution:
/* prob of exactly 4 females in a sample of size 20 */
p4 = pdf("Binomial",
4, /* Prob that sample contains <= 9 females */
0.4, /* Population has 40% female */
20); /* sample size */
This appears to be what you did in Python. If you want to know the probability that there will be 4 OR LESS females in the sample, you use the CDF function:
/* prob of 4 or less females in a sample of size 20 */
pLE4 = cdf("Binomial",
4, /* Prob that sample contains <= 9 females */
0.4, /* Population has 40% female */
20); /* sample size */
3. Use the example above to compute the probability that 9 or fewer females are in the sample.
4. The problem asked for the probability of more than 9 females. How do you get that probability from the answer in 3? (Hint: maybe use subtraction....)
Thank you!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.