BookmarkSubscribeRSS Feed
Pyrite | Level 9

my data has 4 categories, and each category has count of 50, 100, 150, 200 (percentages as 10%, 20%, 30% and 40% of total data).

I want to simulate and sampling with randdirichlet function for these 4 percentages, but with the code below, I only got 3 percentages and have to calculate the 4th one myself. Maybe I did not specify the shape parameters correctly. Please advise. 


/* want to simulate 4 percentages with randdirichlet*/
proc iml;
call randseed(1);
n = 1000;
Shape = {50, 100, 150, 200};	/* counts of each category */
x = RandDirichlet(n,Shape);  	/* x is 1000 x 3 matrix*/
samplemean=mean(x);  		/* check mean */
print samplemean;

create MyData from x[colname=varnames]; 		
append from x;
close MyData;       /* only first 3 percentages columns */

/* to get the 4th percentage */
data mydata2;
set mydata;

according to sas user's guide

Shape is a $1 \times (p+1)$ vector of shape parameters for the distribution, $\mbox{Shape}[i]>0$


and in the example, the shape has 3 parameters for a two-dimensional Dirichlet distribution

call randseed(1);
n = 1000;
Shape = {2, 1, 1};
x = RandDirichlet(n,Shape);




Because you mention counts and percentages, I think you are looking for the multinomial distribution, not the Dirichlet distribution, The multinomial distribution, which you can simulate by using the RANDMULTINOMIAL function in SAS/IML, generates random frequencies for k categories where the probabilities of the categories in the population are known. For example, 


proc iml;
call randseed(1);
n = 1000;
counts = {50, 100, 150, 200};	/* expected count for each category */
total = sum(counts);
prob = counts / total;
x = RandMultinomial(n, total, prob); /* x is 1000 x 4 matrix of counts */
print (x[1:5,]);


The Dirichlet distribution is a multivariate generalization of the beta distribution, so I do not immediately see how it is related to counts.




The early bird rate has been extended! Register by March 18 for just $695 - $100 off the standard rate.


Check out the agenda and get ready for a jam-packed event featuring workshops, super demos, breakout sessions, roundtables, inspiring keynotes and incredible networking events. 


Register now!

Multiple Linear Regression in SAS

Learn how to run multiple linear regression models with and without interactions, presented by SAS user Alex Chaplin.

Find more tutorials on the SAS Users YouTube channel.

From The DO Loop
Want more? Visit our blog for more articles like these.
Discussion stats
  • 1 reply
  • 1 like
  • 2 in conversation