Statistical Procedures

Kai123 · Posted 09-09-2024 02:22 PM

Hi,

I have a question about what CV should be entered in CV or PairCVs option in Pairmeans of Power Procedure for lognormal data. See the SAS documentation for CV below.

CV=number-list

specifies the coefficient of variation that is assumed to be common to both members of a pair. The coefficient of variation is defined as the ratio of the standard deviation to the mean on the original data scale. You can use this option only with DIST=LOGNORMAL. For information about specifying the number-list, see the section Specifying Value Lists in Analysis Statements.

This sounds like, when data are assumed lognormally distributed, CVs entered should be STD/mean, and STD and Mean are calculated from data in the original scale.

However, in SAS/STAT® 13.1 User’s Guide The POWER Procedure, Page 6445-6446

If CV=SD/Mean, where SD and Mean are calculated from data in original scale, how the σ* (SD from log-transformed data) can be derived from sqrt(log(CV2+1))?

Instead, if CV is calculated from CV=sqrt(exp(SDlog2)-1) where SDlog is the SD from natural Log scale data, then it makes sense that σ* (SDlog) can be derived by back-calculating. However, the documentation clearly indicates the inputted CV is from data in original scale.

This confuses me. Also have a similar question about the correlation, as the documentation states to enter "correlation of the original untransformed pairs (Y1; Y2)".

Thanks,

jiltao · Posted 09-09-2024 04:19 PM

you might want to check out equation (44.36) on page 27 of Kotz, Balakrishnan, and Johnson (2000) and Jones and Miller (1966) to find out how the equation is derived.

Thanks,

Jill

Kai123 · Posted 09-09-2024 04:41 PM

Thanks Jill for your reply.

This reference is for correlation conversion, I don't have it now, will appreciate if you can share the part for derivation.

Regarding my original question about CV, it makes more sense if CV=sqrt(exp(SDlog2)-1) where SDlog is the SD from natural Log scale data. I suspect it is a mistake in SAS documentation to indicate it as the ratio of the standard deviation to the mean on the original data scale, but I want to get the confirmation from the experts in SAS.

jiltao · Posted 09-10-2024 04:14 PM

Why do you think your equation makes more sense? Can you show me how that was derived?

Thanks,

Jill

Kai123 · Posted 09-10-2024 04:33 PM

CV=sqrt(exp(SDlog2)-1)

CV2=exp(SDlog2)-1

CV2+1= exp(SDlog2)

log(CV2+1)= SDlog2

SDlog=( log(CV2+1))1/2

Note "2" is referring to square and "1/2" represents square root. This window does not allow me to type it.

which is the same as in SAS documentation

jiltao · Posted 09-10-2024 07:24 PM

You are correct, so is the SAS documentation-

sigma1* is the standard deviation "of the bivariate normal distribution of the log-transformed data". So it is the same as your SDLog.

Kai123 · Posted 09-11-2024 09:16 AM

Yes. That's why I think CV=sqrt(exp(SDlog2)-1) should be entered in the CV option, instead of the ratio of SD and Mean of original scale data, as in SAS documentation.

jiltao · Posted 09-11-2024 09:20 AM

They are both valid.

Kai123 · Posted 09-11-2024 10:04 AM

I don't understand what you meant by "valid".

CVs calculated from these two formulas are different, both can be entered in Power Procedure, and get different results. As a SAS user, I want to know which is the right one.

jiltao · Posted 09-11-2024 12:18 PM

You can have two different formulas for the same thing.

Just like if you have a series of positive values, x1, x2, ... xn.

You can use the formula x1+x2+...+xn to compute the sum.

You can also use the formula exp(log(x1)*log(x2)*....*log(xn)) to compute the sum.

Mathematically they are equivalent.

Same thing with the CV formulas you referred to under the log-normal assumption.

I wrote the following program to show you how empirically you can get almost the same value between using these two formulas for CV --

data one;
do sample=1 to 2000;
call streaminit(2366);
do i=1 to 2000;
logx=rand('normal');
x=exp(logx);
output;
end;
end;
run;

proc means data=one noprint;
by sample;
var x logx;
output out=out(drop=_freq_ _type_) mean(x)=meanx std(x)=stdx var(logx)=var_logx;
run;

data out;
set out;
cv_x2 = sqrt(exp(var_logx)-1);
cv_x = stdx/meanx;
run;

proc means data=out mean;
var cv_x cv_x2 ;
run;

Hope this helps,

Jill

Kai123 · Posted 09-11-2024 01:48 PM

Thanks for taking the time to create an example.

1. x1+x2+...+xn = exp(log (x1+x2+...+xn)) , which are not equal to exp(log(x1)*log(x2)*....*log(xn)), as log (x1+x2+...+xn) is not equal to log(x1)*log(x2)*....*log(xn)

2. As you mentioned, in your example, the two CVs are "almost the same value", meaning they are different, no matter how similar they are in this particular example. For a statistical software like SAS, I do believe we should aim at an accurate result, instead of some results similar. Even a small difference could mean a lot in different industries/context.

3. The key is to understand how SAS uses this CV value at the back end to calculate the sample size/power. I'm hoping the SAS team that wrote the SAS documentation can answer my question.

jiltao · Posted 09-11-2024 04:19 PM

Sorry I got the law of logs wrong 😞 But you got the idea: there could be two different ways of computing a statistic under certain assumptions.

The results are not exactly the same due to random errors in the generated data.

How CV's are used in the computation of the power and sample size is provided in the documentation --

https://go.documentation.sas.com/doc/en/statcdc/14.3/statug/statug_power_details68.htm

Kai123 · Posted 09-11-2024 04:33 PM

This is not due to random error, the two CVs are two different things. Look at the dataset below, %CVs are 80 and 113, very different. There is no proof that the two values should be similar.

I don't think you understand my question. If you are a staff in SAS, could you please refer me to the right team? Thanks.

Original Scale	Ln scale		Original Scale (Assuming Normal)	Original Scale (LogNormal)	Ln scale
0.182934103	-1.698629283	n	47	47
0.335884261	-1.090988639
1.768862198	0.570336514	GM		0.921088052
0.322330226	-1.132178713	Mean	1.300955496		-0.0822
1.869962942	0.625918614	GSD		2.482316779
4.092048456	1.40904569	SD	1.039364637		0.909192
1.615234314	0.479480032	%CV	80	113
0.586349193	-0.533839775
0.854526206	-0.157208108
0.368046246	-0.999546681
1.92127702	0.65299008
0.694299929	-0.364851237
0.062064248	-2.779585165
0.522980221	-0.648211634
1.603548144	0.472218764
0.495035732	-0.703125334
3.263487488	1.182796405
0.991658216	-0.008376771
0.807604862	-0.213682372
2.166518473	0.773121489
0.367439171	-1.001197496
0.796960379	-0.226950314
0.437185347	-0.827398039
1.649246645	0.500318605
0.28163224	-1.267153174
3.533515975	1.262293402
1.850595694	0.615507584
0.50108275	-0.690984021
0.578666907	-0.547028257
2.031517886	0.708783241
1.004733644	0.004722476
1.748168892	0.558568893
0.727650154	-0.317934903
1.325797122	0.28201388
0.242380432	-1.417246756
0.843038824	-0.170742268
2.131842779	0.75698676
1.011822918	0.011753574
4.396706363	1.480855707
2.611930193	0.960089486
0.961948844	-0.038794006
2.065023208	0.725141465
0.251275204	-1.38120651
0.792867525	-0.232099127
0.426753947	-0.851547668
2.196685175	0.786949486
1.853787599	0.617230897

ballardw · Posted 09-11-2024 04:50 PM

@Kai123 wrote:

Yes. That's why I think CV=sqrt(exp(SDlog2)-1) should be entered in the CV option, instead of the ratio of SD and Mean of original scale data, as in SAS documentation.

Why are you worrying about this at all? Are you calculating a CV from collected data? If so, why are you running Proc Power at all?

If you are estimating the Power it is sort of too late. Proc power documentation overview starts with:

Power and sample size analysis optimizes the resource usage and design of a study, improving chances of conclusive results with maximum efficiency. The POWER procedure performs prospective power and sample size analyses for a variety of goals, such as the following:

I added the bold to the text to highlight key points of the description.

Typically value lists for parameters like CV would come from literature search or the results of previous similar studies where you think in the planning stage that the CVs for this planned analysis would be similar.

Statistical Procedures

What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Re: What CV should be entered for lognormal data in Power Procedure, PairedMeans?

Follow Us

What is...

Statistical Procedures

Our biggest data and AI event of the year.

Follow Us

What is...