BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
eubankm
Calcite | Level 5

I am new to sas programming and am having difficulty trying to perform a certain function.  I want to generate 4 random numbers from a given range, without replacement, based on given probabilities that each number will be chosen, and I want to perform 2000 repetitions of this.  I have been searching trying to find a way to do this, however everywhere just says to put rand("normal") or rand("uniform") but this will not perform my intended task.  I also know the theta and sigma for each of the 4 numbers based on 1800 actual past observations.  Any help on how I might perform this would be great.

 

Thanks

 

p.s. I am using sas university edition

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

If I may offer some friendly advice, I suggest

1. Talk to your advisor/professor.  He/She wants you succeed.

2. Consider changing to the model I proposed. In that model, you would count how many times each ball has EVER appeared (regardless of whether it was the first, second, ... or fifth ball).  That is a standard probability model in which the probability of drawing each ball is constant and the draws are independent.

 

If you attempt the simpler project, it will still be challenging and you will still learn a lot about SAS programming and simulation. However, the simpler problem will be more tractable for someone with your level of experience.

 

Good luck!

View solution in original post

15 REPLIES 15
ballardw
Super User

What kind of range you do need, provide some example.

What is your theta supposed to represent from which distribuion? if from a Normal distribution you can specify the parameter

 

x=rand('NORMAL',theta,sigma); if theta is your mean and sigma the standard deviation that you want have your sample represented from.

 

There is also rand('TABLE'). You provide a list of probabilities.

 

x=rand('TABLE', 0.1,0.2,0.5,0.2); would returen a 1 with probability .1, a 2 with probability .2, a 3 with probability .5 or a 4 with probability .2

 

p= 1/6;

x=rand('TABLE',p,p,p,p,p,p); does a good job of simulating a 6-sided die.

There are multiple ways to map the 1,2,3, to other values if needed.

 

If you need result ranges sometimes you have to recalculate but specific approaches may depend on what you are attempting.

eubankm
Calcite | Level 5
Thank you that is helpful. The data set is normally distributed, however the distribution is skewed so the probability of each number being chosen is not equal, therefore the 6-sided die example would not work.

I have a table created with the range of numbers available to be chosen for the first number listed out in one column and the probability of each of those numbers being chosen in a 2nd column. I have done this for each of the 4 drawings. If I used rand('table', 0.1,0.2,0.5,0.2) for example but the numbers available starts with 15 and goes to 40, would this return 15 with a probability of 0.1 etc.?

I feel like rand('Normal',theta,sigma) would not work for this since the distribution is skewed and not perfectly normal, but please correct me if I'm wrong.

Thanks
Rick_SAS
SAS Super FREQ

Are you talking about a mixture of normal distributions? In a mixture, the parameters are chosen with a specified probability, then a random value is drawn from the appropriate normal distribution.  See "Generate a random sample from a mixture distribution" for a discussion and SAS code.

eubankm
Calcite | Level 5
I am doing research for one of my statistics classes regarding the lottery and underlying patterns among the random drawing of the numbers. I have analyzed the winning numbers of the last 1800 drawings and have found the mean number drawn for each ball as well as the stdev. I then found the probability that each number is drawn for each ball (odds of a 1 being drawn for ball 1 and so on). Now I am trying to randomly generate 5 numbers based on those probabilities to see if a more clear pattern emerges.
Rick_SAS
SAS Super FREQ

If the i_th ball was drawn k_i times, then the empirical probability for  is p_i = k_i / (5*1800).

You should create a SAS data set that has two coloumns: the ball number and the empirical probability.

You then want to draw a sample (without replacement) of size 5 with those (unequal) probabilities.

See the article "Four essential sampling methods in SAS" which gives the syntax for using PROC SURVEYSELECT or PROC IML to sample according to this scheme. See the upper right corner of the table in the article for the syntax.

 

 

 

eubankm
Calcite | Level 5
That seems like more what I am trying to do. However, the probabilities that each ball number is drawn is unequal depending on the ball that is being drawn. Ex. Ball 1 has a 9 percent chance of being a 1 and a 0 percent chance of being a 45, and ball 5 has a 0 percent chance of being a 1 and a 12 percent chance of being a 45. This is the pattern that I am studying and want to replicate in a simulation.

This is what is giving me a hard time. I have to break my original data set into subsamples based on which ball is being drawn to get the probabilities correct. Then I want to draw a sample of 5 balls, the first being based on the probabilities I have found for ball 1, the second being based on the probabilities of ball 2 etc.

I am sorry I feel like an idiot for having this hard of a time understanding this.
Rick_SAS
SAS Super FREQ
  1. Are you an undergraduate or graduate student?
  2. How experienced are you with SAS DATA step programming?
  3. How experienced are you at SAS/IML (PROC IML) programming?
eubankm
Calcite | Level 5
1. undergraduate
2. Beginner
3. I have only seen SAS/IML in your blog
Rick_SAS
SAS Super FREQ

4. When is the project due?

eubankm
Calcite | Level 5
At the end of the Semester, late April. I have purchased the SAS Programming by Example book to help me learn more but I am afraid I will not learn what I need to complete this before it is due.
Rick_SAS
SAS Super FREQ

If I may offer some friendly advice, I suggest

1. Talk to your advisor/professor.  He/She wants you succeed.

2. Consider changing to the model I proposed. In that model, you would count how many times each ball has EVER appeared (regardless of whether it was the first, second, ... or fifth ball).  That is a standard probability model in which the probability of drawing each ball is constant and the draws are independent.

 

If you attempt the simpler project, it will still be challenging and you will still learn a lot about SAS programming and simulation. However, the simpler problem will be more tractable for someone with your level of experience.

 

Good luck!

eubankm
Calcite | Level 5
Thanks, I think what you suggested along with what ballardw has stated below will probably fix my problem. I was viewing summaries of winning numbers off of lottery website which appear to put numbers in order from smallest to largest which is why I was getting so confused. I will try what you told me about PROC SURVEYSELECT and PROC IML from your blog.

Thank you a lot for the help.
ballardw
Super User

How many records do you have from the lottery where the rules did not change? I am thinking of the PowerBall where they have increased the numbers of the balls in both the main numbers and in the Power ball set? Since the total experience of the results is ever so much smaller than the 69 Choose 5 current possibilities I would be very surprised if many of the individual numbers have result selection rates near the 1/69.

 

This is likely to be an interesting excercise.

I think the rule you state here "1 has a 9 percent chance of being a 1 and a 0 percent chance of being a 45" is because you are examing the ordered result reported in summaries.

If the order of balls drawn in a lottery like the PowerBall is in order as seen on TV , 23,7, 18,2,53 the summary reported in the data I have would be 2, 7, 18, 23, 53. So the 45 ball has a very small opportunity to be reported in the first postion of the ordered tuple.

 

So the question is are you concerned with combinations or permutations (without and with order)? The process you describe seems to describe a process that is somewhat permutation but using the probability of appearance in a combination.

 

I would suggest to start with a subset problem such as 10 balls and picking 2 where you can look at all of the possibilities and see results easier.

 

eubankm
Calcite | Level 5
I believe this fixes my problem. I was going by the summary which orders the numbers from smallest to largest which is what was causing my confusion.

Thanks

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 15 replies
  • 3567 views
  • 3 likes
  • 3 in conversation