08-05-2016 03:41 AM
I built confidence intervals for an unknown parameter (coefficient of variation) using different combinations of true parameters (mu and sigma) for simulation purpose and I calulated coverage probabilities of these confidence intervals and interval widths. The resulted confidence intervals have the same coverage probability and different interval widths in case that the sample sample is fixed and different combinatrions of mu and sigma values. Is it correct? What is the explantion?
08-05-2016 03:51 AM
This is more of a stats than SAS question. Since you don't seem to be getting a response here, and keep posting the same question multiple time (please don't) I Suggest posting it at stats.stackexchange.com
08-05-2016 05:02 AM
Please review the suggestions for how to write a good question. Your question needs an example or sample code so that we have a better idea what you've tried and what you want to know.. Is this part of a simulation? If so, please post the code.
08-05-2016 05:36 PM
I used SAS to build confidence intervals using simulation work for an unknown parameter (coefficient of variation) for different combinations of true parameters (mu and sigma) and I got the same coverage probability for these confidence intervals. Is it correct? What is the explantion?
08-06-2016 06:49 AM
I used SAS to build confidence intervals of coefficient of variation for lognormal distribution using simulation work for different combinations of true parameters of lognormal distribution (mu and sigma). I tried sample size =15, mu=0 and different values of sigma such that sigma=0.04996, 0.09975, 0.19804, 0.38525, 0.47238. I got the same coverage probability=0.9496 and interval widths =0.0349, 0.0702, 0.1436, 0.3125, 0.4141 respectively. Is it correct? What is the explanation? The following is the code.
Code of the program:
/*Prog to construct CI for COV of lognormal dist. using ranked set sampling technique*/
sigmasq=sigma**2;/*the value of sigma in normal dist.*/
mun=exp(mu+(0.5*sigmasq));/*the value of mean in lognormal dist.*/
sigmasqn=exp((2*mu)+(2*sigmasq))-exp((2*mu)+(sigmasq));/*the value of sigmasquare in lognormal dist.*/
sigman=sqrt(sigmasqn);/*the value of sigma in lognormal dist.*/
* Total sample size is 15 that is expressed as 3 cycles and each cycle contains 5 observations;
n = 5; m = 5;
numobs=cycle*m;/*number of observations in the matrix*/
x1 = j(n,m);
x2 = j(n,m);
*Generate 5000 data sets from normal dist. to calculate mean1, std1, estcov1;
do t=1 to na;
do l=1 to cycle;
*x1 = j(n,m); /* allocate (n x m) matrix*/
call randgen(x1, "Normal", mu, sigma); /* fill it*/
do i = 1 to nrow(x1);
v1 = x1[i, ]`; /* get i_th row; transpose to column vector */
call sort (v1, 1); /* sort it */
x1[i, ] = v1`; /* replace with sorted row */
end;/*end of i loop*/
c1=vecdiag(x1);/*to get diag elements of a matrix in a column*/
end;/*end of l loop*/
mean1=y1[:];/* calculate the mean of all elements of the matrix */
sumsq1=y1[##];/* calculate the sum of squares of all elements of the matrix */
*Generate 5000 data sets from normal dist. to calculate indepent copy of the statistics mean2, std2, estcov2;
do k=1 to nb;
do h=1 to cycle;
*x2 = j(n,m); /* allocate (n x m) matrix*/
call randgen(x2, "Normal", mu, sigma); /* fill it*/
do g = 1 to nrow(x2);
v2 = x2[g, ]`; /* get i_th row; transpose to column vector */
call sort (v2, 1); /* sort it */
x2[g, ] = v2`; /* replace with sorted row */
end;/*end of g loop*/
c2=vecdiag(x2);/*to get diag elements of a matrix in a column*/
end;/*end of h loop*/
mean2=y2[:];/* calculate the mean of all elements of the matrix */
sumsq2=y2[##];/* calculate the sum of squares of all elements of the matrix */
end;/*end of k loop*/
*Check if toi in the constructed FGCIs;
if low<=toi then if toi<=up then count=count+1;
end;/*end of t loop*/
* q is the coverage proability;
*meanwid is the mean width of the constructed confidence intervals;
08-06-2016 05:44 PM
I don't understand your program well enough to be able to tell you if it is correct, nut I do want to make you aware of some SAS/IML functions that might simplify your program.
Two other comments:
It looks like you are doing two simulations, one to generate the confidence intervals and another to assess the coverage probability. Because you are using the same algorithm to construct the intervals and to evaluate the coverage probability, the second simulation will always give you a number close to 95%. Thus the second simulation is unnecessary.
I'm not sure what you are trying to accomplish at the end of the program. For each set of parameters, you have 5000 intervals of the form [L, U]. What question are you trying to answer about these intervals? It looks like you are computing the mean width of these intervals, but I'm not sure why this is relevant. I might want to look at the distribution of the widths, or the distribution of the L's and the U's.
08-07-2016 02:00 PM
08-07-2016 02:22 PM
No, that doesn't sound correct. I would generate 5 samples instead of 5000 and see if it still happens. If so, you can print out some of the intermediate computations to determine what might be happening.
While you're at it, you might try to vectorize some of the program so that it runs much faster. One thing that might help is using the QNTL function for each row instead of sorting and selecting particualr indices. If you put each sample in a column rather than row, then a single call to the QNTL function can compute the 0.025 and 0.975 quantiles of all the columns. No need to loop, sort, and extract.
08-12-2016 02:06 PM
08-15-2016 02:57 PM
I'm sorry, but I cannot figure out what you are trying to accomplish. Maybe part of my problem is that I don't know what parameter scale you are working on, normal or lognormal? With respect to the Wikipedia article on the lognormal distribution (and this article, too), are you computing statistics in terms of (mu, sigma) or in terms of (m, v)?
Here are a few things I noticed:
1) In the first section, the definition of the lognormal CV is CV=sqrt(exp(sigma##2)-1); This is independent of mu.
2) You partially vectorized your code, but you can still do better. The following SAS/IML code sections are equivalent, but the second is much faster and easier to read/understand:
n = 15; numSamples = 5000; call randseed(12345); /* the long way: Loop over number of samples */ v=j(numSamples,1,0); do t=1 to numSamples; x1=j(n,1,0);/* allocate (n x 1) matrix*/ call randgen(x1, "Normal", mu, sigma); /* fill it*/ mean1=x1[:];/* calculate the mean of all elements of the matrix */ sumsq1=x1[##];/* calculate the sum of squares of all elements of the matrix */ var1=(1/(n-1))*(sumsq1-n*(mean1**2)); v[t,1]=var1; end; /* the short way */ x = j(n, numSamples); /* each column is sample */ call randgen(x, "normal", mu, sigma); /* generate all samples */ v2 = var(x); /* compute variance of each column */
08-15-2016 03:42 PM
Thanks for your advice about the code that helps getting faster runs.
I computed coefficient of variation of lognormal distribution in terms of mu and sigma. Since the coefficient of variation of lognormal distribution depends on sigma-square only, mu is assumed to be one (without loss of generality). I simulated data from normal distribution since the statistics (FGlognor in the program) I used for constructing the CI for the coefficient of variation based on independent copies of sample variances that are calculating from the data generated from independent normal distribution.
The problem is that:
I generated data from normal distribution and used simple random sample and ranked set sampling methods to build CIs for the lognormal coefficient of variation. The coverage probability is the same for each sample size and different values assumed for sigma (as I mentioned in the previous messages). I checked the procedure I used and check SAS calculations for 10 simulations but I could not discover mistakes.
Are the results of coverage probabilities are correct in this case and in terms of the idea of the procedure I followed (which I explained in the previous message)?
08-15-2016 04:30 PM
You are computing FGlognor as sqrt(exp(Q) -1) where
Q = (variance from a sample)/(variance from another sample) * (population variance)
Why is this the sample CV? Where did the formula come from?
I would have thought that the correct formula is
CVEst = sqrt(exp( varEst ) - 1);
where varEst is the estimated variance of the simulated sample.
What am I missing?
08-16-2016 04:10 PM
I used this estimate in previous work for unknown variance and it proved its correction. Assuming that this formula is correct, is there any mistakes in the constructed program that I sent in previous messages? If it is O.K, can I accept the results of same coverage probabilities for each sample size if interval widths increase as the values of sigma increase? On the other hand, these coverages probabilities are different for different sample sizes (n=15, 50).
08-16-2016 05:36 PM - edited 08-17-2016 08:08 AM
Notice that your estimate contains the population parameter. In practice, the population CV is unknown. Thus I am trying to find out where this formula came from and why you are not using something like sigma_hat / x_bar. If the formula is some sort of standardization, I'd like to see a reference so that I can understand what you are attempting. What formula are you using for the CI?
It might be helpful to think about how you would do a simulation study for the coverage probability of the standard CI for the mean:
x_bar +/- t(1-alpha/2, n-1) * sigma_hat / sqrt(n)
1) You would choose a parameter mu, a distribution F, and a sample size. You would generate 5000 samples from that distribution.
2) For each sample, you compute the CI by applying the formula using the value of x_bar, sigma_hat, and n
3) The coverage probability is p = (number of samples for which the true parameter mu is inside the CI) / (total number of samples).
4) You would conclude that "when the true distribution is F, then the formula has coverage probability p."
Notice that the formula should give wide intervals for small samples and shrink towards zero width as n --> infinity.