BookmarkSubscribeRSS Feed
Kai123
Fluorite | Level 6

Hi, 

 

I have a question about what CV should be entered in CV or PairCVs option in Pairmeans of Power Procedure for lognormal data. See the SAS documentation for CV below.

 

CV=number-list

specifies the coefficient of variation that is assumed to be common to both members of a pair. The coefficient of variation is defined as the ratio of the standard deviation to the mean on the original data scale. You can use this option only with DIST=LOGNORMAL. For information about specifying the number-list, see the section Specifying Value Lists in Analysis Statements.

 

This sounds like, when data are assumed lognormally distributed, CVs entered should be STD/mean, and STD and Mean are calculated from data in the original scale.

 

However, in SAS/STAT® 13.1 User’s Guide The POWER Procedure,  Page 6445-6446

Kai123_0-1725905365043.png

Kai123_1-1725905379628.png

If CV=SD/Mean, where SD and Mean are calculated from data in original scale, how the σ* (SD from log-transformed data) can be derived from sqrt(log(CV2+1))?

 

Instead, if CV is calculated from CV=sqrt(exp(SDlog2)-1) where SDlog is the SD from natural Log scale data, then it makes sense that σ* (SDlog) can be derived by back-calculating. However, the documentation clearly indicates the inputted CV is from data in original scale.

 

This confuses me. Also have a similar question about the correlation, as the documentation states to enter "correlation of the original untransformed pairs (Y1; Y2)".

 

Thanks,

13 REPLIES 13
jiltao
SAS Super FREQ

you might want to check out equation (44.36) on page 27 of Kotz, Balakrishnan, and Johnson (2000) and Jones and Miller (1966) to find out how the equation is derived.

Thanks,

Jill

Kai123
Fluorite | Level 6

Thanks Jill for your reply.

This reference is for correlation conversion, I don't have it now, will appreciate if you can share the part for derivation.

 

Regarding my original question about CV, it makes more sense if CV=sqrt(exp(SDlog2)-1) where SDlog is the SD from natural Log scale data. I suspect it is a mistake in SAS documentation to indicate it as the ratio of the standard deviation to the mean on the original data scale, but I want to get the confirmation from the experts in SAS.

 

jiltao
SAS Super FREQ

Why do you think your equation makes more sense? Can you show me how that was derived?

Thanks,

Jill

Kai123
Fluorite | Level 6

 

CV=sqrt(exp(SDlog2)-1)

CV2=exp(SDlog2)-1

CV2+1= exp(SDlog2)

log(CV2+1)= SDlog2

SDlog=( log(CV2+1))1/2

Note "2" is referring to square and "1/2" represents square root. This window does not allow me to type it.

 

which is the same as in SAS documentation

Kai123_0-1726000195399.png

 

 

 

jiltao
SAS Super FREQ

You are correct, so is the SAS documentation-

sigma1* is the standard deviation "of the bivariate normal distribution of the log-transformed data". So it is the same as your SDLog.

 
Kai123
Fluorite | Level 6

Yes. That's why I think CV=sqrt(exp(SDlog2)-1) should be entered in the CV option, instead of the ratio of SD and Mean of original scale data, as in SAS documentation.

Kai123
Fluorite | Level 6

I don't understand what you meant by "valid".

CVs calculated from these two formulas are different, both can be entered in Power Procedure, and get different results. As a SAS user, I want to know which is the right one.

jiltao
SAS Super FREQ

You can have two different formulas for the same thing.

Just like if you have a series of positive values, x1, x2, ... xn.

You can use the formula x1+x2+...+xn to compute the sum.

You can also use the formula exp(log(x1)*log(x2)*....*log(xn)) to compute the sum.

Mathematically they are equivalent. 

Same thing with the CV formulas you referred to under the log-normal assumption.

I wrote the following program to show you how empirically you can get almost the same value between using these two formulas for CV --

 

data one;
do sample=1 to 2000;
call streaminit(2366);
do i=1 to 2000;
logx=rand('normal');
x=exp(logx);
output;
end;
end;
run;


proc means data=one noprint;
by sample;
var x logx;
output out=out(drop=_freq_ _type_) mean(x)=meanx std(x)=stdx var(logx)=var_logx;
run;


data out;
set out;
cv_x2 = sqrt(exp(var_logx)-1);
cv_x = stdx/meanx;
run;

proc means data=out mean;
var cv_x cv_x2 ;
run;

 

Hope this helps,

Jill

 

Kai123
Fluorite | Level 6

Thanks for taking the time to create an example.

 

1.  x1+x2+...+xn = exp(log (x1+x2+...+xn)) , which are not equal to exp(log(x1)*log(x2)*....*log(xn)), as log (x1+x2+...+xn) is not equal to log(x1)*log(x2)*....*log(xn)

 

2. As you mentioned, in your example, the two CVs are "almost the same value", meaning they are different, no matter how similar they are in this particular example. For a statistical software like SAS, I do believe we should aim at an accurate result, instead of some results similar. Even a small difference could mean a lot in different industries/context.

 

3. The key is to understand how SAS uses this CV value at the back end to calculate the sample size/power.  I'm hoping the SAS team that wrote the SAS documentation can answer my question. 

 

 
jiltao
SAS Super FREQ

Sorry I got the law of logs wrong 😞  But you got the idea: there could be two different ways of computing a statistic under certain assumptions.

The results are not exactly the same due to random errors in the generated data.

How CV's are used in the computation of the power and sample size is provided in the documentation --

https://go.documentation.sas.com/doc/en/statcdc/14.3/statug/statug_power_details68.htm

 

Kai123
Fluorite | Level 6

This is not due to random error, the two CVs are two different things. Look at the dataset below, %CVs are 80 and 113, very different. There is no proof that the two values should be similar.

 

I don't think you understand my question. If you are a staff in SAS, could you please refer me to the right team? Thanks.

 

Original ScaleLn scale  Original Scale (Assuming Normal)Original Scale (LogNormal)Ln scale
0.182934103-1.698629283 n4747 
0.335884261-1.090988639     
1.7688621980.570336514 GM 0.921088052 
0.322330226-1.132178713 Mean1.300955496 -0.0822
1.8699629420.625918614 GSD 2.482316779 
4.0920484561.40904569 SD1.039364637 0.909192
1.6152343140.479480032 %CV80113 
0.586349193-0.533839775     
0.854526206-0.157208108     
0.368046246-0.999546681     
1.921277020.65299008     
0.694299929-0.364851237     
0.062064248-2.779585165     
0.522980221-0.648211634     
1.6035481440.472218764     
0.495035732-0.703125334     
3.2634874881.182796405     
0.991658216-0.008376771     
0.807604862-0.213682372     
2.1665184730.773121489     
0.367439171-1.001197496     
0.796960379-0.226950314     
0.437185347-0.827398039     
1.6492466450.500318605     
0.28163224-1.267153174     
3.5335159751.262293402     
1.8505956940.615507584     
0.50108275-0.690984021     
0.578666907-0.547028257     
2.0315178860.708783241     
1.0047336440.004722476     
1.7481688920.558568893     
0.727650154-0.317934903     
1.3257971220.28201388     
0.242380432-1.417246756     
0.843038824-0.170742268     
2.1318427790.75698676     
1.0118229180.011753574     
4.3967063631.480855707     
2.6119301930.960089486     
0.961948844-0.038794006     
2.0650232080.725141465     
0.251275204-1.38120651     
0.792867525-0.232099127     
0.426753947-0.851547668     
2.1966851750.786949486     
1.8537875990.617230897     
ballardw
Super User

@Kai123 wrote:

Yes. That's why I think CV=sqrt(exp(SDlog2)-1) should be entered in the CV option, instead of the ratio of SD and Mean of original scale data, as in SAS documentation.


Why are you worrying about this at all? Are you calculating a CV from collected data? If so, why are you running Proc Power at all?

 

If you are estimating the Power it is sort of too late. Proc power documentation overview starts with:

Power and sample size analysis optimizes the resource usage and design of a study, improving chances of conclusive results with maximum efficiency. The POWER procedure performs prospective power and sample size analyses for a variety of goals, such as the following:

I added the bold to the text to highlight key points of the description.

Typically value lists for parameters like CV would come from literature search or the results of previous similar studies where you think in the planning stage that the CVs for this planned analysis would be similar.

 

 

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 1520 views
  • 2 likes
  • 3 in conversation