BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jsjoden
Obsidian | Level 7

 Generate 625 samples of size 961 random numbers from U(1, 9). For each of these 625 samples calculate the mean.
a) Find the simulated probability that the mean is between 3 and 4.
b) Find the mean of the means.
c) Find the standard deviation of the means.
d) Draw the histogram of the means.

 

I believe my code I have below is a good template for what I need to do above. however it is showing the simulated probability that the mean is between 11 and 12 as of right now. Can someone break down the code below for me. I have a general idea on what its doing but still a little lost. I understand the proc freq and proc univariate but above that I am having trouble. Some explanation would help.  Also how would I change the simulated probability to show between 2 and 4? Thank YOU!

 

data a;
meanx=0;
do j=1 to 225;
sumx=0;
do i = 1 to 625;
u=rand ("Uniform");
x=10+(22-10)*u;
sumx=sumx+x;
end;
meanx=sumx/625;
output;
end;
run;

 

proc freq;
tables meanx;
run;

 

proc univariate;
var meanx;
histogram meanx;
run;

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

First, remember that you should use the NOPRINT option on the PROC MEANS statement, as explained in the article "Turn off ODS when running simulations in SAS." That is Tip 7 in my "Ten Tips" paper, so you might want to re-read that paper.

 

> How do I find the probability that the mean lies between 3 and 4?

You would use a DATA step to create an indicator variable for the event "mean is between 3 and 4," then use PROC FREQ to count. An example is in Tip 10 of my paper. However, you are only using 625 Monte Carlo samples and a large sample size, so all of the sample means are greater than 4 for this simulation. Therefore all you can conclude is that the probability is less than 1/625.

 

> how do I output the standard deviation in proc means? 

On the OUTPUT statement use STD=SampleStd;

 

>  I am confused as of how means= Samplemean  in proc means

The keyword MEAN= specifies the statistic that you want to output. The value to the right of the equal sign (Samplemean) specifies the name of the variable in the output data set that will contain that statistic. 

 

proc means data=Sim noprint;
by SampleID;
var x;
output out=OutStats3 mean=SampleMean std=SampleStd;
run;
 
/* P( Sample mean in [3,4] ) = 0  (less than 1/&NumSamples) */
data PValue34;
set OutStats3;
mean34 = (3<= SampltMean <= 4);
run;
proc freq data=PValue34;
tables mean34;
run;

ods select Moments Histogram;
proc univariate data=OutStats3;
label SampleMean = "Sample Mean of U(1,9) Data";
var SampleMean SampleStd;
histogram SampleMean SampleStd / normal ; /* overlay normal fit */
run;

If you intend to do many more simulations, you might want to invest in the book Simulating Data with SAS.

View solution in original post

12 REPLIES 12
Reeza
Super User

You're creating your random variables incorrectly. Review how to create a random variable from a Uniform Distribution. 

 

If you want to understand your code, add comments as you code. 

Rick_SAS
SAS Super FREQ

See Tip 6 on pp 6-8 of Wicklin (2015) "Ten Tips for Simulating Data with SAS."  The example in the paper is the same as your example except that the paper uses random uniform variates in (0,1).  You can use x = 1 + 8*rand("uniform") to get random variates in the range (1,9).

 

This method is called Monte Carlo simulation of the sampling distribution of the sample mean. It is important that you use BY-group processing for efficiency, as explained in the article "Simulation in SAS: The slow way or the BY way."

PaigeMiller
Diamond | Level 26

The probability that a number is between 3 and 4 is simply the count of numbers between 3 and 4, divided by the total number of values in the entire simulation.

 

To generate random numbers between 1 and 9 that are uniform, generate a uniform RV (which is between 0 and 1, the default) and then expand the range to 1 to 9 by multiplying by a constant and then adding an offset.

 

Also, your loops seem to be incorrect, I don't see the number 961 anywhere, and it seems to me that the number 625 is used in the wrong loop.

--
Paige Miller
Ksharp
Super User

The following code could generated random numbers between 1 and 9.

 

data x;
call streaminit(12345678);
do i = 1 to 625;
u=ceil(rand ("Uniform")*9);
output;
end;
run;
proc freq data=x;
table u;
run;
PaigeMiller
Diamond | Level 26

@Ksharp this generates random INTEGERS between 1 and 9, it does not generate uniform random numbers between 1 and 9

--
Paige Miller
Ksharp
Super User

Opps. My bad. I should clear my eyes before posting.

Rick_SAS
SAS Super FREQ

@Ksharp : I guess we need the OP to clarify whether the random numbers are from the continuous or uniform distribution. The OP said "random numbers from U(1, 9)," which usually means the continuous uniform distribution. If integers, then the correct phrase is "random uniform integers in the range 1-9."

jsjoden
Obsidian | Level 7

Rick,

 

Thank you for pointing me in the right direction. It was right on point. My last question I have is how would I go about finding the probability of the mean between lets say 3 and 4?  As well , how do I output the standard deviation in proc means? I know I have to equal it to something but I am confused to as of what. Also I am confused as of how means= Samplemean  in proc means. As it seemed it appear out of no where. Here is my code.

 

%let N = 961; /* sample size */
%let NumSamples = 625; /* number of samples */
data Sim;
call streaminit(123);
do SampleID = 1 to &NumSamples; /* ID variable for each sample */
do i = 1 to &N;
x = 1+8*rand("Uniform"); /* 1 to 9 */
output;
end;
end;
*output;
run;

 


proc means ;
by SampleID;
var x;
output out=OutStats3 mean=SampleMean;      /* I need to find standard deviation as well here. What do I put std = ? */
run;

 


ods select Moments Histogram;
proc univariate data=OutStats3;
label SampleMean = "Sample Mean of U(1,9) Data";
var SampleMean;
histogram SampleMean / normal ; /* overlay normal fit */
run;

Rick_SAS
SAS Super FREQ

First, remember that you should use the NOPRINT option on the PROC MEANS statement, as explained in the article "Turn off ODS when running simulations in SAS." That is Tip 7 in my "Ten Tips" paper, so you might want to re-read that paper.

 

> How do I find the probability that the mean lies between 3 and 4?

You would use a DATA step to create an indicator variable for the event "mean is between 3 and 4," then use PROC FREQ to count. An example is in Tip 10 of my paper. However, you are only using 625 Monte Carlo samples and a large sample size, so all of the sample means are greater than 4 for this simulation. Therefore all you can conclude is that the probability is less than 1/625.

 

> how do I output the standard deviation in proc means? 

On the OUTPUT statement use STD=SampleStd;

 

>  I am confused as of how means= Samplemean  in proc means

The keyword MEAN= specifies the statistic that you want to output. The value to the right of the equal sign (Samplemean) specifies the name of the variable in the output data set that will contain that statistic. 

 

proc means data=Sim noprint;
by SampleID;
var x;
output out=OutStats3 mean=SampleMean std=SampleStd;
run;
 
/* P( Sample mean in [3,4] ) = 0  (less than 1/&NumSamples) */
data PValue34;
set OutStats3;
mean34 = (3<= SampltMean <= 4);
run;
proc freq data=PValue34;
tables mean34;
run;

ods select Moments Histogram;
proc univariate data=OutStats3;
label SampleMean = "Sample Mean of U(1,9) Data";
var SampleMean SampleStd;
histogram SampleMean SampleStd / normal ; /* overlay normal fit */
run;

If you intend to do many more simulations, you might want to invest in the book Simulating Data with SAS.

jsjoden
Obsidian | Level 7
Thanks Rick! Everything is clarified and straight forward!

Just quick question. If I wanted to do exponential instead of uniform. Would this be correct?

x = 1+8*rand("exponential")/lambda;



Thanks again
Rick_SAS
SAS Super FREQ

Sometimes the exponential family is parameterized by using a scale parameter. Sometimes a rate parameter.

If E ~ Exp(1), then 

- The random variable sigma*E is exponential with scale parameter sigma.

- The random variable E/lambda is exponential with rate parameter lambda.

The lower bound of an exponential r.v. is 0, so adding 1 would translate the threshold to 1.

 

I think you do not need the 8. If you want a truncated exponential distribution, you would use an IF-THEN statement to accept/reject the random values in [1,8], such as

x = 1 + rand("expo")/lambda;

if x <= 8;

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 4662 views
  • 4 likes
  • 5 in conversation