SAS Programming

eubankm · Posted 02-02-2017 12:44 PM

I am new to sas programming and am having difficulty trying to perform a certain function. I want to generate 4 random numbers from a given range, without replacement, based on given probabilities that each number will be chosen, and I want to perform 2000 repetitions of this. I have been searching trying to find a way to do this, however everywhere just says to put rand("normal") or rand("uniform") but this will not perform my intended task. I also know the theta and sigma for each of the 4 numbers based on 1800 actual past observations. Any help on how I might perform this would be great.

Thanks

p.s. I am using sas university edition

Rick_SAS · Posted 02-02-2017 04:34 PM

If I may offer some friendly advice, I suggest

1. Talk to your advisor/professor. He/She wants you succeed.

2. Consider changing to the model I proposed. In that model, you would count how many times each ball has EVER appeared (regardless of whether it was the first, second, ... or fifth ball). That is a standard probability model in which the probability of drawing each ball is constant and the draws are independent.

If you attempt the simpler project, it will still be challenging and you will still learn a lot about SAS programming and simulation. However, the simpler problem will be more tractable for someone with your level of experience.

Good luck!

View solution in original post

ballardw · Posted 02-02-2017 01:33 PM

What kind of range you do need, provide some example.

What is your theta supposed to represent from which distribuion? if from a Normal distribution you can specify the parameter

x=rand('NORMAL',theta,sigma); if theta is your mean and sigma the standard deviation that you want have your sample represented from.

There is also rand('TABLE'). You provide a list of probabilities.

x=rand('TABLE', 0.1,0.2,0.5,0.2); would returen a 1 with probability .1, a 2 with probability .2, a 3 with probability .5 or a 4 with probability .2

p= 1/6;

x=rand('TABLE',p,p,p,p,p,p); does a good job of simulating a 6-sided die.

There are multiple ways to map the 1,2,3, to other values if needed.

If you need result ranges sometimes you have to recalculate but specific approaches may depend on what you are attempting.

eubankm · Posted 02-02-2017 02:09 PM

Thank you that is helpful. The data set is normally distributed, however the distribution is skewed so the probability of each number being chosen is not equal, therefore the 6-sided die example would not work.

I have a table created with the range of numbers available to be chosen for the first number listed out in one column and the probability of each of those numbers being chosen in a 2nd column. I have done this for each of the 4 drawings. If I used rand('table', 0.1,0.2,0.5,0.2) for example but the numbers available starts with 15 and goes to 40, would this return 15 with a probability of 0.1 etc.?

I feel like rand('Normal',theta,sigma) would not work for this since the distribution is skewed and not perfectly normal, but please correct me if I'm wrong.

Thanks

Rick_SAS · Posted 02-02-2017 03:23 PM

Are you talking about a mixture of normal distributions? In a mixture, the parameters are chosen with a specified probability, then a random value is drawn from the appropriate normal distribution. See "Generate a random sample from a mixture distribution" for a discussion and SAS code.

eubankm · Posted 02-02-2017 03:30 PM

I am doing research for one of my statistics classes regarding the lottery and underlying patterns among the random drawing of the numbers. I have analyzed the winning numbers of the last 1800 drawings and have found the mean number drawn for each ball as well as the stdev. I then found the probability that each number is drawn for each ball (odds of a 1 being drawn for ball 1 and so on). Now I am trying to randomly generate 5 numbers based on those probabilities to see if a more clear pattern emerges.

Rick_SAS · Posted 02-02-2017 03:43 PM

If the i_th ball was drawn k_i times, then the empirical probability for is p_i = k_i / (5*1800).

You should create a SAS data set that has two coloumns: the ball number and the empirical probability.

You then want to draw a sample (without replacement) of size 5 with those (unequal) probabilities.

See the article "Four essential sampling methods in SAS" which gives the syntax for using PROC SURVEYSELECT or PROC IML to sample according to this scheme. See the upper right corner of the table in the article for the syntax.

eubankm · Posted 02-02-2017 04:03 PM

That seems like more what I am trying to do. However, the probabilities that each ball number is drawn is unequal depending on the ball that is being drawn. Ex. Ball 1 has a 9 percent chance of being a 1 and a 0 percent chance of being a 45, and ball 5 has a 0 percent chance of being a 1 and a 12 percent chance of being a 45. This is the pattern that I am studying and want to replicate in a simulation.

This is what is giving me a hard time. I have to break my original data set into subsamples based on which ball is being drawn to get the probabilities correct. Then I want to draw a sample of 5 balls, the first being based on the probabilities I have found for ball 1, the second being based on the probabilities of ball 2 etc.

I am sorry I feel like an idiot for having this hard of a time understanding this.

Rick_SAS · Posted 02-02-2017 04:08 PM

Are you an undergraduate or graduate student?
How experienced are you with SAS DATA step programming?
How experienced are you at SAS/IML (PROC IML) programming?

eubankm · Posted 02-02-2017 04:10 PM

1. undergraduate
2. Beginner
3. I have only seen SAS/IML in your blog

Rick_SAS · Posted 02-02-2017 04:22 PM

4. When is the project due?

eubankm · Posted 02-02-2017 04:28 PM

At the end of the Semester, late April. I have purchased the SAS Programming by Example book to help me learn more but I am afraid I will not learn what I need to complete this before it is due.

Rick_SAS · Posted 02-02-2017 04:34 PM

If I may offer some friendly advice, I suggest

1. Talk to your advisor/professor. He/She wants you succeed.

2. Consider changing to the model I proposed. In that model, you would count how many times each ball has EVER appeared (regardless of whether it was the first, second, ... or fifth ball). That is a standard probability model in which the probability of drawing each ball is constant and the draws are independent.

If you attempt the simpler project, it will still be challenging and you will still learn a lot about SAS programming and simulation. However, the simpler problem will be more tractable for someone with your level of experience.

Good luck!

eubankm · Posted 02-02-2017 04:38 PM

Thanks, I think what you suggested along with what ballardw has stated below will probably fix my problem. I was viewing summaries of winning numbers off of lottery website which appear to put numbers in order from smallest to largest which is why I was getting so confused. I will try what you told me about PROC SURVEYSELECT and PROC IML from your blog.

Thank you a lot for the help.

ballardw · Posted 02-02-2017 04:32 PM

How many records do you have from the lottery where the rules did not change? I am thinking of the PowerBall where they have increased the numbers of the balls in both the main numbers and in the Power ball set? Since the total experience of the results is ever so much smaller than the 69 Choose 5 current possibilities I would be very surprised if many of the individual numbers have result selection rates near the 1/69.

This is likely to be an interesting excercise.

I think the rule you state here "1 has a 9 percent chance of being a 1 and a 0 percent chance of being a 45" is because you are examing the ordered result reported in summaries.

If the order of balls drawn in a lottery like the PowerBall is in order as seen on TV , 23,7, 18,2,53 the summary reported in the data I have would be 2, 7, 18, 23, 53. So the 45 ball has a very small opportunity to be reported in the first postion of the ordered tuple.

So the question is are you concerned with combinations or permutations (without and with order)? The process you describe seems to describe a process that is somewhat permutation but using the probability of appearance in a combination.

I would suggest to start with a subset problem such as 10 balls and picking 2 where you can look at all of the possibilities and see results easier.

eubankm · Posted 02-02-2017 04:39 PM

I believe this fixes my problem. I was going by the summary which orders the numbers from smallest to largest which is what was causing my confusion.

Thanks

SAS Programming

using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Re: using rand function to generate random numbers based on probabilities

Follow Us

What is...

SAS Programming

Special offer for SAS Communities members

SAS Training: Just a Click Away

Follow Us

What is...