BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
babaJB
Calcite | Level 5

Dear All,

I would like to know whether it is possible using SAS/IML to  generate random numbers from zero-modified Negative Binomial.
for example :

The zero-modi ed version is de fine as follows

P(X = k) = p if k = 0 and  K Gamma(r+k)/Gamma(r)k! ( 1+beta  )^-r (  beta/(1+ beta) )^k , otherwise

where K is de ned as 1-p,

r;   usual parameters and p the new parameter


Thanks in advance,
Hermawan Budyanto

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

You've got the right idea, but there are a few mistakes in your program. (1) You shouldn't be using U any more, (2) The R function c() is concatenation, not addition, so you want to use the IML vertical concatentation operator (//)
(3) You want Y0 = Y[ loc(Y>0) ]

There is also the matter that IML will issue an error if you use an empty subscript, so it is best to make sure that LOC returns a nonempty matrix.

Here's one approach. To make it easier to follow, I encapsulated the random variate generation into it's own function:

proc iml;
start rpois(N, T); /* helper function */
   x = j(N,1);
   call randgen ( x, 'POISSON', T);
   return ( x );
finish;

start RandZeroTruncatedPoisson( N, /* number of observation */
                               lambda); /* parameter of Poisson Distribution */
  r = N;
  do while ( r > 0);
     Y = rpois(r,lambda);
     idx = loc(Y>0);
     if ncol(idx)>0 then do;
        Y0 = Y0 // Y[idx];
        r = N-nrow(Y0);
     end;
  end;
return(y0);
finish;

/* test the function */
call randseed(1);
x = RandZeroTruncatedPoisson(10,1);
print x;

View solution in original post

9 REPLIES 9
IanWakeling
Barite | Level 11

If I understand correctly, then I think you can get what you want using the standard RAND fuction.  Something like:

c=rand( 'BERNOULLI' , 1-ip) * rand( 'NEGBINOMIAL' , p, k);

where ip is the zero inflation probability.

Ian.

babaJB
Calcite | Level 5

In such other program ( I. E Stata) there is  a package/procedure to run Zero modified or Zero Truncated. I just wondering how to generate a sample from negative binomial which reject 0 values in SAS.  There must be a way to do using RAND function, I just don't know how.

Rick_SAS
SAS Super FREQ

Since you posted this to the IML forum, I assume you want an IML solution.
See this article for an overview of how to simulate efficiently in IML by using the RANDGEN routine:   Efficient Sampling - The DO Loop

See this article for how to simulate from a mixture distribution:   Generate a random sample from a mixture distribution - The DO Loop

(Notice that your distribution is a mixture of a Bernoulli and a Negative Binomial distrib.)

It sounds like you want to wrap Ian's idea into an IML module. If so, try this (untested):

proc iml;
start RandZeroModNegBin( N,   /* number of observations to simulate */
                         ip,  /* inflated probability of zero */
                         p,   /* success probability for Neg Binom */
                         k ); /* number of trials for Neg Binom */
   x = j(N,1);     /* allocate vector */
   call randgen(x, "Bernoulli", 1-ip);
   b = x;          /* vector of 0/1 */
   call randgen(x, "NegBinomial", p, k);
   return ( b#x ); /*  0 with prob=ip, NegBin(p,k) with prob=(1-ip) */
finish;

/* test the function */

x = RandZeroModNegBin(1000, 0.2, 0.5, 5);

/* to see how we did, plot the empirical distribution */
create Plot var{x}; append; close;
quit;

proc freq data=Plot;
tables x / plot=freqplot;
run;

babaJB
Calcite | Level 5

Right now I am working on SAS base using SAS IML. Actually, I am using "Loss Model Data and Decision" written by Klugman et all as reference for the distribution. The author's isn't mention that the distribution is mixture between Bernoulli and Negative Binomial. I am just thinking using Inversion Method, but is quite difficult to counter the problem comes from inversion function of Gamma function.

What about to generate data from Zero truncated Negative Binomial or Extended Truncated Negative Binomial? Generating count data for which the value zero cannot occur and for which the conditional means are not equal to the conditional variances. The data exhibit over dispersion.

In my case, I  want to generate the count data of the length of hospital stay. Absolutely,The Length of hospital stay is recorded as a minimum of at least one day. When I am trying to assign zero in ip " variable of inflated probabilty of zero" in the code, the 0 data is still existed  because the simulation from negative binomial distribution still produce the 0's data.

Rick_SAS
SAS Super FREQ

As you say, using the inversion method would be hard because there isn't a simple inverse CDF.

The other distributions are similar. They are all built by combining elementary distributions, perhaps with some IF/THEN logic. For each distribution, write down the formula and then translate it into vecotr notation, using the previous links and examples. If you get stuck, post the IML statements that show your attempt (and a formula or link to the PDF or CDF) and someone can help you get unstuck.

babaJB
Calcite | Level 5

I found the algorithm to create 1000 sample from a Poisson distribution with mean T, after rejecting all zeros ( link :R help archive: RE: simulate zero-truncated Poisson distribu)  which written in R program :

n<-1000; T<-3.5

  Y<-rpois(n,T); Y0<-Y[Y>0]; r<-(n - length(Y0))   while(r>0){

    Y<-rpois(r,T); Y0<-c(Y0,Y[Y>0]); r<-(n - length(Y0))   }

Then I'm trying to write in SAS IML Languange. The code is below :

ods graphics on;

proc iml;

start RandZeroTruncatedPoisson ( N, /* number of observation */

  lambda); /* parameter of Poisson Distribution */

Y = j(N,1,.);

call randgen ( U, 'POISSON', lambda);

Y0= loc(Y>0);

r = nrow(U)-nrow(Y0);

Y=j(r,1);

  do while ( r > 0);

  call randgen ( Y,'POISSON',lambda);

  Yx=loc(Y>0);

  Y0=Y0+Yx;

  r= N-nrow(Y0);

  end;

return(y);

finish;

/* test the function */

x = RandZeroTruncatedPoisson(1000,1007.38);

create Plot var{x}; append; close;

quit;

proc freq data=Plot;

tables x / plot=freqplot;

run;

This is the log report for the result :

IML Ready

302  start RandZeroTruncatedPoisson ( N, /* number of observation */

303                                  lambda);

303!                                          /* parameter of Poisson Distribution */

304  Y = j(N,1,.);

305  call randgen ( U, 'POISSON', lambda);

306  Y0= loc(Y>0);

307  r = nrow(U)-nrow(Y0);

308  Y=j(r,1);

309      do while ( r > 0);

310          call randgen ( Y,'POISSON',lambda);

311          Yx=loc(Y>0);

312          Y0=Y0+Yx;

313          r= N-nrow(Y0);

314      end;

315  return(y);

316  finish;

NOTE: Module RANDZEROTRUNCATEDPOISSON defined.

317  /* test the function */

318  x = RandZeroTruncatedPoisson(1000,1007.38);

ERROR: (execution) Matrix has not been set to a value.

operation : + at line 312 column 14

operands  : Y0, Yx

Y0      0 row       0 col     (type ?, size 0)

Yx      1 row       1 col     (numeric)

         1

statement : ASSIGN at line 312 column 9

traceback : module RANDZEROTRUNCATEDPOISSON at line 312 column 9

NOTE: Paused in module RANDZEROTRUNCATEDPOISSON.

319  /* to see how we did, plot the empirical distribution */

320  create Plot var{x};

I believe that Yx=loc(Y>0); Y0=j[Y0,Yx]; isn't the right way to express  R code for Y0<-c(Y0,Y[Y>0]). That R code is assignment statement to set up vector Y0 consisting prior Y0 and the new Y which the values in Y is greater than zero.  Does Anybody know how to fix it ?

Rick_SAS
SAS Super FREQ

You've got the right idea, but there are a few mistakes in your program. (1) You shouldn't be using U any more, (2) The R function c() is concatenation, not addition, so you want to use the IML vertical concatentation operator (//)
(3) You want Y0 = Y[ loc(Y>0) ]

There is also the matter that IML will issue an error if you use an empty subscript, so it is best to make sure that LOC returns a nonempty matrix.

Here's one approach. To make it easier to follow, I encapsulated the random variate generation into it's own function:

proc iml;
start rpois(N, T); /* helper function */
   x = j(N,1);
   call randgen ( x, 'POISSON', T);
   return ( x );
finish;

start RandZeroTruncatedPoisson( N, /* number of observation */
                               lambda); /* parameter of Poisson Distribution */
  r = N;
  do while ( r > 0);
     Y = rpois(r,lambda);
     idx = loc(Y>0);
     if ncol(idx)>0 then do;
        Y0 = Y0 // Y[idx];
        r = N-nrow(Y0);
     end;
  end;
return(y0);
finish;

/* test the function */
call randseed(1);
x = RandZeroTruncatedPoisson(10,1);
print x;

babaJB
Calcite | Level 5

Many Thanks ianWickellin especially Rick Wicklin. Simulation of the above can also be applied in the negative binomial distribution. I tried to generate 10 samples of data by the number of 1000 observations of each sample, statistics descriptive of the results of 10 samples are close to the descriptive statistics of the sample exists. Even there are not literature about this algorithm, I believe this is the one way to generate data from zero truncated distribution.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 9 replies
  • 2891 views
  • 3 likes
  • 3 in conversation