- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to generate two sets of 5,000 random numbers. I have successfully generated the first set, which is a uniform distribution of integers from 0 to 120. For the second set, I would like to sample from a function with a linear (monotonic) increase in probability over that interval. So, the probability of randomly generating the number 0 is near 0 and the probability of randomly generating the number 120 is greater than all other numbers.
I apologize if this seems basic but I'm having a hard time phrasing my issue and so have been unable to find any technical support on this issue. If anyone has any references for me to review, I'd greatly appreciate it. My current thinking is that I somehow need to transform the rand("uniform") distribution to increase linearly, but I can't seem to figure out. I'd prefer if I could adjust the slope of the line.
I've pasted some code below that shows how I've been trying to code this.
data Random_numbers;
call streaminit(123); /* set random number seed */
do i = 1 to 5000;
uniform = int(rand("uniform") *120);
increasing = int((rand("uniform") * SOME TRANSFORMATION HERE? )*120);
output;
end;
PROC FREQ DATA = Random_numbers;
TABLES uniform increasing ;
RUN;
Thanks for your help!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A proposed solution.
In the following,
yRnd1 = random variable with a uniform probability in the, say, 1-100 range.
yRnd2 = random variable with a probability distribution linearly increasing from 1 to, say, 100.
yRnd2 = random variable with a probability distribution quadratically increasing from 1 to, say, 100.
/******************************/
/**** sample distributions ****/
/******************************/
data t_a;
do _N_ = 1 to 100000;
xRnd = ranuni(3);
yRd1 = ceil(100*xRnd);
yRd2 = ceil(100*(xRnd**(1/2)));
yRd3 = ceil(100*(xRnd**(1/3)));
output;
end;
run;
/********************************************************/
/*** wanting to set 3 graphs with the same maximum wt ***/
/********************************************************/
proc sql;
create table t_1 as
select a.yRd1, round((a.cnts/b.zMax),0.001) as wt
from (select yRd1, sum(1) as cnts from t_a group by yRd1) a,
(select max(cnts) as zMax
from
(select yRd1, sum(1) as cnts from t_a group by yRd1)) b
order by a.yRd1;
create table t_2 as
select a.yRd2, round((a.cnts/b.zMax),0.001) as wt
from (select yRd2, sum(1) as cnts from t_a group by yRd2) a,
(select max(cnts) as zMax
from
(select yRd2, sum(1) as cnts from t_a group by yRd2)) b
order by a.yRd2;
create table t_3 as
select a.yRd3, round((a.cnts/b.zMax),0.001) as wt
from (select yRd3, sum(1) as cnts from t_a group by yRd3) a,
(select max(cnts) as zMax
from
(select yRd3, sum(1) as cnts from t_a group by yRd3)) b
order by a.yRd3;
quit;
/******************************/
/*** plotting for, say, t_2 ***/
/******************************/
proc sgplot data=t_2;
scatter x=yRd2 y=wt;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You may be looking for the Rand('TABLE',p1,p2,...) function where p1=probability of selecting 1, p2= probability of selecting 2 and so forth. Admittedly that's a somewhat long statement to code but without more description of what kind of shape you might want...
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'd prefer something that was more easily manipulated and changed (i.e. in an equation rather than a tabled form) if anyone has any other ideas.
However, this will work for now and I appreciate the quick help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The transformation you need here is the inverse CDF.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response. I looked at the link you recommended. I also reviewed Paper 236-25 (Pseudo-Random Numbers: Out of Uniform http://www2.sas.com/proceedings/sugi25/25/po/25p236.pdf), which discusses this method.
I'm just having trouble figuring out how to code it. Do I use the CDF function?
So if I generate a uniform distribution of numbers from 0 to 1 my probability density function is (pdf) is 1. Does this mean my cumulative density function (CDF) is X?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
No: your PDF is a function with the following properties: f(0)=0, f(120)=A, it is linear between 0 and 120 and 0 otherwise (outside of [0,120] interval).
You will get the CDF by intergarting this function. A is a fixed parameter, you need to calculate is so, that the area under the curve of PDF (=the area of the triangle) is 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
A proposed solution.
In the following,
yRnd1 = random variable with a uniform probability in the, say, 1-100 range.
yRnd2 = random variable with a probability distribution linearly increasing from 1 to, say, 100.
yRnd2 = random variable with a probability distribution quadratically increasing from 1 to, say, 100.
/******************************/
/**** sample distributions ****/
/******************************/
data t_a;
do _N_ = 1 to 100000;
xRnd = ranuni(3);
yRd1 = ceil(100*xRnd);
yRd2 = ceil(100*(xRnd**(1/2)));
yRd3 = ceil(100*(xRnd**(1/3)));
output;
end;
run;
/********************************************************/
/*** wanting to set 3 graphs with the same maximum wt ***/
/********************************************************/
proc sql;
create table t_1 as
select a.yRd1, round((a.cnts/b.zMax),0.001) as wt
from (select yRd1, sum(1) as cnts from t_a group by yRd1) a,
(select max(cnts) as zMax
from
(select yRd1, sum(1) as cnts from t_a group by yRd1)) b
order by a.yRd1;
create table t_2 as
select a.yRd2, round((a.cnts/b.zMax),0.001) as wt
from (select yRd2, sum(1) as cnts from t_a group by yRd2) a,
(select max(cnts) as zMax
from
(select yRd2, sum(1) as cnts from t_a group by yRd2)) b
order by a.yRd2;
create table t_3 as
select a.yRd3, round((a.cnts/b.zMax),0.001) as wt
from (select yRd3, sum(1) as cnts from t_a group by yRd3) a,
(select max(cnts) as zMax
from
(select yRd3, sum(1) as cnts from t_a group by yRd3)) b
order by a.yRd3;
quit;
/******************************/
/*** plotting for, say, t_2 ***/
/******************************/
proc sgplot data=t_2;
scatter x=yRd2 y=wt;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You want to use the inverse CDF method for generating random numbers. Sometimes you can use the QUANTILE function to help solve the inverse problem (see the example of the folded normal distribution), but other times you need to solve for the root of the cumulative distribution: F(x) = u, where u ~U(0,1). In SAS/IML, you can use the FROOT function to solve for numerical roots.
If your CDF is given explicitly by a formula or by empirical quantiles, you can use linear interpolation. See this blog post.
http://blogs.sas.com/content/iml/2014/06/18/distribution-from-quantiles.html