Programming the statistical procedures from SAS

Coverage probability

Reply
Contributor
Posts: 21

Coverage probability

I built confidence intervals for an unknown parameter (coefficient of variation) using different combinations of true parameters (mu and sigma) for simulation purpose and I calulated coverage probabilities of these confidence intervals and interval widths. The resulted confidence intervals have the same coverage probability and different interval widths in case that the sample sample is fixed and different combinatrions of mu and sigma values. Is it correct? What is the explantion? 
Thank you

Grand Advisor
Posts: 16,925

Re: Coverage probability

This is more of a stats than SAS question. Since you don't seem to be getting a response here, and keep posting the same question multiple time (please don't) I Suggest posting it at stats.stackexchange.com

SAS Super FREQ
Posts: 3,317

Re: Coverage probability

Please review the suggestions for how to write a good question. Your question needs an example or sample code so that we have a better idea what you've tried and what you want to know..  Is this part of a simulation? If so, please post the code.

Contributor
Posts: 21

Calculate the coverage probability

Hello,

I used SAS to build confidence intervals using simulation work for an unknown parameter (coefficient of variation) for different combinations of true parameters (mu and sigma) and I got the same coverage probability for these confidence intervals. Is it correct? What is the explantion? 

Thank you

Grand Advisor
Posts: 10,062

Re: Calculate the coverage probability

Show some code and some data and/or results.

Contributor
Posts: 21

Re: Calculate the coverage probability

Hello,

I used SAS to build confidence intervals of coefficient of variation for lognormal distribution using simulation work for different combinations of true parameters of lognormal distribution (mu and sigma). I tried sample size =15, mu=0 and different values of sigma such that sigma=0.04996, 0.09975, 0.19804, 0.38525, 0.47238.  I got the same coverage probability=0.9496 and interval widths =0.0349, 0.0702, 0.1436, 0.3125, 0.4141 respectively. Is it correct? What is the explanation? The following is the code.

Thank you.

Code of the program:

/*Prog to construct CI for COV of lognormal dist. using ranked set sampling technique*/  

proc iml;

sigma=0.47238;

sigmasq=sigma**2;/*the value of sigma in normal dist.*/

mu=1;

toi=sqrt(exp(sigmasq)-1);

*print toi;

mun=exp(mu+(0.5*sigmasq));/*the value of mean in lognormal dist.*/

print mun;

sigmasqn=exp((2*mu)+(2*sigmasq))-exp((2*mu)+(sigmasq));/*the value of sigmasquare in lognormal dist.*/

sigman=sqrt(sigmasqn);/*the value of sigma in lognormal dist.*/

na=5000;

nb=5000;

* Total sample size is 15 that is expressed as 3 cycles and each cycle contains 5 observations;

cycle=3;

n = 5; m = 5;

numobs=cycle*m;/*number of observations in the matrix*/

print numobs;

x1 = j(n,m);

x2 = j(n,m);

y1=j(cycle,m);

y2=j(cycle,m);

r=j(5000,1,0);

s=j(5000,1,0);

count=0;

*Generate 5000 data sets from normal dist. to calculate mean1, std1, estcov1;

do t=1 to na;

do l=1 to cycle;

call randseed(12345);

*x1 = j(n,m); /* allocate (n x m) matrix*/

call randgen(x1, "Normal", mu, sigma); /* fill it*/

do i = 1 to nrow(x1);

   v1 = x1[i, ]`;  /* get i_th row; transpose to column vector */

   call sort (v1, 1); /* sort it */

   x1[i, ] = v1`;  /* replace with sorted row */

end;/*end of i loop*/

c1=j(n,1);

c1=vecdiag(x1);/*to get diag elements of a matrix in a column*/

*print c1;

d1=c1`;

y1[l, ]=d1;

end;/*end of l loop*/

mean1=y1[:];/* calculate the mean of all elements of the matrix */

sumsq1=y1[##];/* calculate the sum of squares of all elements of the matrix */

var1=(1/(numobs-1))*(sumsq1-numobs*(mean1##2));

stdev1=sqrt(var1);

*Generate 5000 data sets from normal dist. to calculate indepent copy of the statistics mean2, std2, estcov2;

do k=1 to nb;

do h=1 to cycle;

call randseed(67891);

*x2 = j(n,m); /* allocate (n x m) matrix*/

call randgen(x2, "Normal", mu, sigma); /* fill it*/

do g = 1 to nrow(x2);

   v2 = x2[g, ]`;  /* get i_th row; transpose to column vector */

   call sort (v2, 1); /* sort it */

   x2[g, ] = v2`;  /* replace with sorted row */

end;/*end of g loop*/

c2=j(n,1);

c2=vecdiag(x2);/*to get diag elements of a matrix in a column*/

d2=c2`;

y2[h, ]=d2;

end;/*end of h loop*/

mean2=y2[:];/* calculate the mean of all elements of the matrix */

sumsq2=y2[##];/* calculate the sum of squares of all elements of the matrix */

var2=(1/(numobs-1))*(sumsq2-numobs*(mean2##2));

stdev2=sqrt(var2);

FGlognor=sqrt(exp((var1/var2)*(sigmasq))-1);

r[k,1]=FGlognor;

end;/*end of k loop*/

e=r;

r[rank(r),]=e;

low=r[125,];

up=r[4875,];

wid=up-low;

s[t,]=wid;

*Check if toi in the constructed FGCIs;

if low<=toi then if toi<=up then count=count+1;

end;/*end of t loop*/

p=count/5000;

print p;

* q is the coverage proability;

q=1-p;

print q;

*meanwid is the mean width of the constructed confidence intervals;

meanwid=s[:,];

print meanwid;

quit;

run;

SAS Super FREQ
Posts: 3,317

Re: Calculate the coverage probability

I don't understand your program well enough to be able to tell you if it is correct, nut I do want to make you aware of some SAS/IML functions that might simplify your program.

  1. SAS/IML supports the CV function for computing the sample coeffiient of variation.
  2. You seem to be sorting every row of random variates, maybe because you later compute quantiles? SAS/IML supports the QNTL function for computing sample quantiles.

 

Two other comments:

 

It looks like you are doing two simulations, one to generate the confidence intervals and another to assess the coverage probability. Because you are using the same algorithm to construct the intervals and to evaluate the coverage probability, the second simulation will always give you a number close to 95%.  Thus the second simulation is unnecessary.

 

I'm not sure what you are trying to accomplish at the end of the program. For each set of parameters, you have 5000 intervals of the form [L, U].  What question are you trying to answer about these intervals?  It looks like you are computing the mean width of these intervals, but I'm not sure why this is relevant.  I might want to look at the distribution of the widths, or the distribution of the L's and the U's.

Contributor
Posts: 21

Re: Calculate the coverage probability

Thank you for your valuable comments. The second simulation for 5000 data sets from normal dist. is important for the approach I used to calculate independent copy of the statistics (means and variances) for these simulated data. I used these variances to calculate “FGlognor” that is used to calculate L and U. It is a part of my work to calculate widths of these CIs. There is no problem that the coverage probability to be close to 95%. My question is: why for different values of sigma (sigma=0.04996, 0.09975, 0.19804, 0.38525, 0.47238), I got exactly the same coverage probability that equals 0.9496 without any change in the four decimals with different widths (width=0.0349, 0.0702, 0.1436, 0.3125, 0.4141 respectively)? Is this considered to be normal? Thank you
SAS Super FREQ
Posts: 3,317

Re: Calculate the coverage probability

No, that doesn't sound correct. I would generate 5 samples instead of 5000 and see if it still happens.  If so, you can print out some of the intermediate computations to determine what might be happening.

 

While you're at it, you might try to vectorize some of the program so that it runs much faster.  One thing that might help is using the QNTL function for each row instead of sorting and selecting particualr indices. If you put each sample in a column rather than row, then a single call to the QNTL function can compute the 0.025 and 0.975 quantiles of all the columns. No need to loop, sort, and extract.

Contributor
Posts: 21

Re: Calculate the coverage probability

I generated 10 samples instead of 5000 and still have the same problem.  I printed out the intermediate computations and they are correct. So, I could not determine what is wrong. Attached the purpose of the program, the steps I followed for creating the program, and the program.

Thank you

SAS Super FREQ
Posts: 3,317

Re: Calculate the coverage probability

I'm sorry, but I cannot figure out what you are trying to accomplish.  Maybe part of my problem is that I don't know what parameter scale you are working on, normal or lognormal?  With respect to the Wikipedia article on the lognormal distribution (and this article, too), are you computing statistics in terms of (mu, sigma) or in terms of (m, v)?

 

Here are a few things I noticed:

1) In the first section, the definition of the lognormal CV is CV=sqrt(exp(sigma##2)-1);  This is independent of mu.

2) You partially vectorized your code, but you can still do better.  The following SAS/IML code sections are equivalent, but the second is much faster and easier to read/understand:

 

n = 15;
numSamples = 5000;
call randseed(12345);

/* the long way: Loop over number of samples */
v=j(numSamples,1,0);
do t=1 to numSamples;
   x1=j(n,1,0);/* allocate (n x 1) matrix*/
   call randgen(x1, "Normal", mu, sigma); /* fill it*/
   mean1=x1[:];/* calculate the mean of all elements of the matrix */
   sumsq1=x1[##];/* calculate the sum of squares of all elements of the matrix */
   var1=(1/(n-1))*(sumsq1-n*(mean1**2));
   v[t,1]=var1;
end;

/* the short way */
x = j(n, numSamples);   /* each column is sample */
call randgen(x, "normal", mu, sigma);  /* generate all samples */
v2 = var(x);            /* compute variance of each column */
Contributor
Posts: 21

Re: Calculate the coverage probability

Thanks for your advice about the code that helps getting faster runs.

I computed coefficient of variation of lognormal distribution in terms of mu and sigma. Since the coefficient of variation of lognormal distribution depends on sigma-square only, mu is assumed to be one (without loss of generality). I simulated data from normal distribution since the statistics (FGlognor in the program) I used for constructing the CI for the coefficient of variation based on independent copies of sample variances that are calculating from the data generated from independent normal distribution.

The problem is that:

I generated data from normal distribution and used simple random sample and ranked set sampling methods to build CIs for the lognormal coefficient of variation. The coverage probability is the same for each sample size and different values assumed for sigma (as I mentioned in the previous messages). I checked the procedure I used and check SAS calculations for 10 simulations but I could not discover mistakes.

Are the results of coverage probabilities are correct in this case and in terms of the idea of the procedure I followed (which I explained in the previous message)?

Thank you.

SAS Super FREQ
Posts: 3,317

Re: Calculate the coverage probability

 

You are computing FGlognor as sqrt(exp(Q) -1)  where

Q = (variance from a sample)/(variance from another sample) * (population variance)

Why is this the sample CV? Where did the formula come from?

 

I would have thought that the correct formula is 

CVEst = sqrt(exp( varEst ) - 1);

where varEst is the estimated variance of the simulated sample.

 

What am I missing?

Contributor
Posts: 21

Re: Calculate the coverage probability

I used this estimate in previous work for unknown variance and it proved its correction. Assuming that this formula is correct, is there any mistakes in the constructed program that I sent in previous messages?  If it is O.K, can I accept the results of same coverage probabilities for each sample size if interval widths increase as the values of sigma increase? On the other hand, these coverages probabilities are different for different sample sizes (n=15, 50).

SAS Super FREQ
Posts: 3,317

Re: Calculate the coverage probability

[ Edited ]

Notice that your estimate contains the population parameter. In practice, the population CV is unknown. Thus I am trying to find out where this formula came from and why you are not using something like sigma_hat / x_bar. If the formula is some sort of standardization, I'd like to see a reference so that I can understand what you are attempting.  What formula are you using for the CI?

 

It might be helpful to think about how you would do a simulation study for the coverage probability of the standard CI for the mean:

x_bar +/- t(1-alpha/2, n-1) * sigma_hat / sqrt(n)

1) You would choose a parameter mu, a distribution F, and a sample size. You would generate 5000 samples from that distribution.

2) For each sample, you compute the CI by applying the formula using the value of x_bar, sigma_hat, and n

3) The coverage probability is p = (number of samples for which the true parameter mu is inside the CI) / (total number of samples).

 

4) You would conclude that "when the true distribution is F, then the formula has coverage probability p."

 

Notice that the formula should give wide intervals for small samples and shrink towards zero width as n --> infinity.

 

Ask a Question
Discussion stats
  • 17 replies
  • 764 views
  • 1 like
  • 4 in conversation