BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mconover
Quartz | Level 8

I'm trying to generate two sets of 5,000 random numbers.  I have successfully generated the first set, which is a uniform distribution of integers from 0 to 120. For the second set, I would like to sample from a function with a linear (monotonic) increase in probability over that interval.  So, the probability of randomly generating the number 0 is near 0 and the probability of randomly generating the number 120 is greater than all other numbers.


I apologize if this seems basic but I'm having a hard time phrasing my issue and so have been unable to find any technical support on this issue.  If anyone has any references for me to review, I'd greatly appreciate it. My current thinking is that I somehow need to transform the rand("uniform") distribution to increase linearly, but I can't seem to figure out. I'd prefer if I could adjust the slope of the line.


I've pasted some code below that shows how I've been trying to code this.


data Random_numbers;

call streaminit(123); /* set random number seed */

do i = 1 to 5000;

   uniform = int(rand("uniform") *120);

   increasing = int((rand("uniform") * SOME TRANSFORMATION HERE? )*120);

   output;

end;


PROC FREQ DATA = Random_numbers;

  TABLES uniform increasing ;

RUN;



Thanks for your help!

1 ACCEPTED SOLUTION

Accepted Solutions
billfish
Quartz | Level 8

A proposed solution.

In the following,
  yRnd1 = random variable with a uniform probability in the, say, 1-100 range.
  yRnd2 = random variable with a probability distribution linearly increasing from 1 to, say, 100.
  yRnd2 = random variable with a probability distribution quadratically increasing from 1 to, say, 100.

/******************************/
/**** sample distributions ****/
/******************************/
data t_a;
  do _N_ = 1 to 100000;
     xRnd  = ranuni(3);
     yRd1 = ceil(100*xRnd);
     yRd2 = ceil(100*(xRnd**(1/2)));
     yRd3 = ceil(100*(xRnd**(1/3)));
     output;
  end;
run;

/********************************************************/
/*** wanting to set 3 graphs with the same maximum wt ***/
/********************************************************/
proc sql;
  create table t_1 as
  select a.yRd1, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd1, sum(1) as cnts from t_a group by yRd1) a,
         (select max(cnts) as zMax
          from
           (select yRd1, sum(1) as cnts from t_a group by yRd1)) b
  order by a.yRd1;

  create table t_2 as
  select a.yRd2, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd2, sum(1) as cnts from t_a group by yRd2) a,
         (select max(cnts) as zMax
          from
           (select yRd2, sum(1) as cnts from t_a group by yRd2)) b
  order by a.yRd2;

  create table t_3 as
  select a.yRd3, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd3, sum(1) as cnts from t_a group by yRd3) a,
         (select max(cnts) as zMax
          from
          (select yRd3, sum(1) as cnts from t_a group by yRd3)) b
  order by a.yRd3;

quit;

/******************************/
/*** plotting for, say, t_2 ***/
/******************************/
proc sgplot data=t_2;
   scatter x=yRd2 y=wt;
run;

View solution in original post

7 REPLIES 7
ballardw
Super User

You may be looking for the Rand('TABLE',p1,p2,...) function where p1=probability of selecting 1, p2= probability of selecting 2 and so forth. Admittedly that's a somewhat long statement to code but without more description of what kind of shape you might want...

mconover
Quartz | Level 8

I'd prefer something that was more easily manipulated and changed (i.e. in an equation rather than a tabled form) if anyone has any other ideas.

However, this will work for now and I appreciate the quick help! 

mconover
Quartz | Level 8

Thanks for the response.  I looked at the link you recommended.  I also reviewed Paper 236-25 (Pseudo-Random Numbers: Out of Uniform http://www2.sas.com/proceedings/sugi25/25/po/25p236.pdf), which discusses this method.

I'm just having trouble figuring out how to code it.  Do I use the CDF function?

So if I generate a uniform distribution of numbers from 0 to 1 my probability density function is (pdf) is 1.  Does this mean my cumulative density function (CDF) is X? 

gergely_batho
SAS Employee

No: your PDF is a function with the following properties: f(0)=0, f(120)=A, it is linear between 0 and 120 and 0 otherwise (outside of [0,120] interval).

You will get the CDF by intergarting this function.  A is a fixed parameter, you need to calculate is so, that the area under the curve of PDF (=the area of the triangle) is 1.

billfish
Quartz | Level 8

A proposed solution.

In the following,
  yRnd1 = random variable with a uniform probability in the, say, 1-100 range.
  yRnd2 = random variable with a probability distribution linearly increasing from 1 to, say, 100.
  yRnd2 = random variable with a probability distribution quadratically increasing from 1 to, say, 100.

/******************************/
/**** sample distributions ****/
/******************************/
data t_a;
  do _N_ = 1 to 100000;
     xRnd  = ranuni(3);
     yRd1 = ceil(100*xRnd);
     yRd2 = ceil(100*(xRnd**(1/2)));
     yRd3 = ceil(100*(xRnd**(1/3)));
     output;
  end;
run;

/********************************************************/
/*** wanting to set 3 graphs with the same maximum wt ***/
/********************************************************/
proc sql;
  create table t_1 as
  select a.yRd1, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd1, sum(1) as cnts from t_a group by yRd1) a,
         (select max(cnts) as zMax
          from
           (select yRd1, sum(1) as cnts from t_a group by yRd1)) b
  order by a.yRd1;

  create table t_2 as
  select a.yRd2, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd2, sum(1) as cnts from t_a group by yRd2) a,
         (select max(cnts) as zMax
          from
           (select yRd2, sum(1) as cnts from t_a group by yRd2)) b
  order by a.yRd2;

  create table t_3 as
  select a.yRd3, round((a.cnts/b.zMax),0.001) as wt
  from   (select yRd3, sum(1) as cnts from t_a group by yRd3) a,
         (select max(cnts) as zMax
          from
          (select yRd3, sum(1) as cnts from t_a group by yRd3)) b
  order by a.yRd3;

quit;

/******************************/
/*** plotting for, say, t_2 ***/
/******************************/
proc sgplot data=t_2;
   scatter x=yRd2 y=wt;
run;

Rick_SAS
SAS Super FREQ

You want to use the inverse CDF method for generating random numbers.  Sometimes you can use the QUANTILE function to help solve the inverse problem (see the example of the folded normal distribution), but other times you need to solve for the root of the cumulative distribution: F(x) = u, where u ~U(0,1).  In SAS/IML, you can use the FROOT function to solve for numerical roots.

If your CDF is given explicitly by a formula or by empirical quantiles, you can use linear interpolation.  See this blog post.

http://blogs.sas.com/content/iml/2014/06/18/distribution-from-quantiles.html

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 4891 views
  • 6 likes
  • 5 in conversation