BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mconover
Quartz | Level 8

I'm trying to generate two sets of 5,000 random numbers.  I have successfully generated the first set, which is a uniform distribution of integers from 0 to 120. For the second set, I would like to sample from a function with a linear (monotonic) increase in probability over that interval.  So, the probability of randomly generating the number 0 is near 0 and the probability of randomly generating the number 120 is greater than all other numbers.


I apologize if this seems basic but I'm having a hard time phrasing my issue and so have been unable to find any technical support on this issue.  If anyone has any references for me to review, I'd greatly appreciate it. My current thinking is that I somehow need to transform the rand("uniform") distribution to increase linearly, but I can't seem to figure out. I'd prefer if I could adjust the slope of the line.


I've pasted some code below that shows how I've been trying to code this.


data Random_numbers;

call streaminit(123); /* set random number seed */

do i = 1 to 5000;

   uniform = int(rand("uniform") *120);

   increasing = int((rand("uniform") * SOME TRANSFORMATION HERE? )*120);

   output;

end;


PROC FREQ DATA = Random_numbers;

  TABLES uniform increasing ;

RUN;



Thanks for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
billfish
Quartz | Level 8

A proposed solution.

In the following,
  yRnd1 = random variable with a uniform probability in the, say, 1-100 range.
  yRnd2 = random variable with a probability distribution linearly increasing from 1 to, say, 100.
  yRnd2 = random variable with a probability distribution quadratically increasing from 1 to, say, 100.

/******************************/
/**** sample distributions ****/
/******************************/
data t_a;
  do _N_ = 1 to 100000;
     xRnd  = ranuni(3);
     yRd1 = ceil(100*xRnd);
     yRd2 = ceil(100*(xRnd**(1/2)));
     yRd3 = ceil(100*(xRnd**(1/3)));
     output;
  end;
run;

/********************************************************/
/*** wanting to set 3 graphs with the same maximum wt ***/
/********************************************************/
proc sql;
  create table t_1 as
  select a.yRd1, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd1, sum(1) as cnts from t_a group by yRd1) a,
         (select max(cnts) as zMax
          from
           (select yRd1, sum(1) as cnts from t_a group by yRd1)) b
  order by a.yRd1;

  create table t_2 as
  select a.yRd2, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd2, sum(1) as cnts from t_a group by yRd2) a,
         (select max(cnts) as zMax
          from
           (select yRd2, sum(1) as cnts from t_a group by yRd2)) b
  order by a.yRd2;

  create table t_3 as
  select a.yRd3, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd3, sum(1) as cnts from t_a group by yRd3) a,
         (select max(cnts) as zMax
          from
          (select yRd3, sum(1) as cnts from t_a group by yRd3)) b
  order by a.yRd3;

quit;

/******************************/
/*** plotting for, say, t_2 ***/
/******************************/
proc sgplot data=t_2;
   scatter x=yRd2 y=wt;
run;

View solution in original post

7 REPLIES 7
ballardw
Super User

You may be looking for the Rand('TABLE',p1,p2,...) function where p1=probability of selecting 1, p2= probability of selecting 2 and so forth. Admittedly that's a somewhat long statement to code but without more description of what kind of shape you might want...

mconover
Quartz | Level 8

I'd prefer something that was more easily manipulated and changed (i.e. in an equation rather than a tabled form) if anyone has any other ideas.

However, this will work for now and I appreciate the quick help! 

mconover
Quartz | Level 8

Thanks for the response.  I looked at the link you recommended.  I also reviewed Paper 236-25 (Pseudo-Random Numbers: Out of Uniform http://www2.sas.com/proceedings/sugi25/25/po/25p236.pdf), which discusses this method.

I'm just having trouble figuring out how to code it.  Do I use the CDF function?

So if I generate a uniform distribution of numbers from 0 to 1 my probability density function is (pdf) is 1.  Does this mean my cumulative density function (CDF) is X? 

gergely_batho
SAS Employee

No: your PDF is a function with the following properties: f(0)=0, f(120)=A, it is linear between 0 and 120 and 0 otherwise (outside of [0,120] interval).

You will get the CDF by intergarting this function.  A is a fixed parameter, you need to calculate is so, that the area under the curve of PDF (=the area of the triangle) is 1.

billfish
Quartz | Level 8

A proposed solution.

In the following,
  yRnd1 = random variable with a uniform probability in the, say, 1-100 range.
  yRnd2 = random variable with a probability distribution linearly increasing from 1 to, say, 100.
  yRnd2 = random variable with a probability distribution quadratically increasing from 1 to, say, 100.

/******************************/
/**** sample distributions ****/
/******************************/
data t_a;
  do _N_ = 1 to 100000;
     xRnd  = ranuni(3);
     yRd1 = ceil(100*xRnd);
     yRd2 = ceil(100*(xRnd**(1/2)));
     yRd3 = ceil(100*(xRnd**(1/3)));
     output;
  end;
run;

/********************************************************/
/*** wanting to set 3 graphs with the same maximum wt ***/
/********************************************************/
proc sql;
  create table t_1 as
  select a.yRd1, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd1, sum(1) as cnts from t_a group by yRd1) a,
         (select max(cnts) as zMax
          from
           (select yRd1, sum(1) as cnts from t_a group by yRd1)) b
  order by a.yRd1;

  create table t_2 as
  select a.yRd2, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd2, sum(1) as cnts from t_a group by yRd2) a,
         (select max(cnts) as zMax
          from
           (select yRd2, sum(1) as cnts from t_a group by yRd2)) b
  order by a.yRd2;

  create table t_3 as
  select a.yRd3, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd3, sum(1) as cnts from t_a group by yRd3) a,
         (select max(cnts) as zMax
          from
          (select yRd3, sum(1) as cnts from t_a group by yRd3)) b
  order by a.yRd3;

quit;

/******************************/
/*** plotting for, say, t_2 ***/
/******************************/
proc sgplot data=t_2;
   scatter x=yRd2 y=wt;
run;

Rick_SAS
SAS Super FREQ

You want to use the inverse CDF method for generating random numbers.  Sometimes you can use the QUANTILE function to help solve the inverse problem (see the example of the folded normal distribution), but other times you need to solve for the root of the cumulative distribution: F(x) = u, where u ~U(0,1).  In SAS/IML, you can use the FROOT function to solve for numerical roots.

If your CDF is given explicitly by a formula or by empirical quantiles, you can use linear interpolation.  See this blog post.

http://blogs.sas.com/content/iml/2014/06/18/distribution-from-quantiles.html

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3028 views
  • 6 likes
  • 5 in conversation