BookmarkSubscribeRSS Feed
Bailey
Calcite | Level 5

I have the following data below and know that the values were found by using a logarithmic scale.  How do I incorporate that into my t-test?  Would I have to use a log statement somewhere?  If so, where?  If not, why not? 


Thank you


data work.compare;

length group $1;
input group value;
cards;

A 2.7

A 0.0

A 3.1

A 4.6

B 6.2

B 5.2

B 3.5

B 3.6

B 5.9

;
run;
ods graphics on;


proc ttest data=work.compare plots=box;
class group;

var value;
ods graphics off;

13 REPLIES 13
PGStats
Opal | Level 21

Hi,

For some data, it is reasonable to assume that the logarithm are normally distributed. Taking the logarithms often stabilizes the group variances. Compare group variances on the original scale (assumed logarithmic) and the back transformed scale (assumed linear):

data compare;

length group $1;

input group Value;

expValue = exp(value);

label Value="Log Value" expValue = "Linear Value";

cards;

A 2.7

A 0.0

A 3.1

A 4.6

B 6.2

B 5.2

B 3.5

B 3.6

B 5.9

;

ods listing exclude ConfLimits;

proc ttest data=compare plots=none;

class group;

var value expValue;

run;

                               The TTEST Procedure

                           Variable:  Value  (Log Value)

    group          N        Mean     Std Dev     Std Err     Minimum     Maximum

    A              4      2.6000      1.9166      0.9583           0      4.6000

    B              5      4.8800      1.2677      0.5669      3.5000      6.2000

    Diff (1-2)           -2.2800      1.5788      1.0591

            Method           Variances        DF    t Value    Pr > |t|

            Pooled           Equal             7      -2.15      0.0683

            Satterthwaite    Unequal      5.0074      -2.05      0.0958

                               Equality of Variances

                 Method      Num DF    Den DF    F Value    Pr > F

                 Folded F         3         4       2.29    0.4414

                        Variable:  expValue  (Linear Value)

    group          N        Mean     Std Dev     Std Err     Minimum     Maximum

    A              4     34.3905     44.2774     22.1387      1.0000     99.4843

    B              5       221.8       203.4     90.9600     33.1155       492.7

    Diff (1-2)            -187.4       156.5       105.0

            Method           Variances        DF    t Value    Pr > |t|

            Pooled           Equal             7      -1.79      0.1174

            Satterthwaite    Unequal       4.467      -2.00      0.1085

                               Equality of Variances

                 Method      Num DF    Den DF    F Value    Pr > F

                 Folded F         4         3      21.10    0.0311

Variances are indeed more stable on the (assumed) log scale. It is thus better justified to run parametric tests (such as the t-test) on that scale. No special option required.

PG

PG
stat_sas
Ammonite | Level 13

Not sure why logarithmic scale was used? Log transformation is usually done on skewed data. With this small sample size it would be hard to validate results statistically. Also t test interpertation based on log data will not be the same as on original data.

Doc_Duke
Rhodochrosite | Level 12

stat@sas:  Although the log transform is used empirically on skewed data (the variance stabilization that PGStats mentioned), in some areas it is the standard approach to work with certain types of variables.  It may be that the data are known from other work to follow a lognormal distribution.

Bailey:  If the data are indeed log scaled, stat@sas' comment on interpretation needs to be paid attention to.  The test statistics are valid and the means/S.D's are too (on the log scale).  Where you run into challenges is that the difference of the means may not be interpretable.

stat_sas
Ammonite | Level 13

Agreed - I was concerned about the interpretation of results based on original vs log transformed data.

slg
Obsidian | Level 7 slg
Obsidian | Level 7

Exponentiate the results for interpretation...

stat_sas
Ammonite | Level 13

What do you mean by "Exponentiate the results for interpretation" please?

Doc_Duke
Rhodochrosite | Level 12

slg, that's the problem:  If you take a mean of logged data (as Bailey did in the t-test) and exponentiate that mean, it is NOT an estimate of the mean of the non-logged data.You end up with the geometric mean, which is much more difficult to interpret.

slg
Obsidian | Level 7 slg
Obsidian | Level 7

I don't believe I stated that exponentiation produces the mean of the non-logged data. Assuming that the log-transformation is warranted due to extreme skew or outliers then the mean of the non-logged data will not give useful information (i.e., is not a typical value of the distribution). Log transformation "normalizes" the distribution - but the log-mean is not a meaningful value (in terms of the original unit of measurement of the data), hence makes sense to take the anti-log to bring the mean value back to the original unit of measurement. The new value (which as you correctly point out is the geometric mean) is a "normalized mean" and is, of course, different from the original mean... but assuming that the outliers in the original distribution are results of anomaly, it is a better representation of the central location. Why is that difficult?

PGStats
Opal | Level 21

In some fields, such as solution chemistry, it is quite natural to take the logarithms of measurements (e.g. concentrations) and build linear models about them. But at one point, predictions must be back-transformed to their original scale and true means are required. Many estimators of the true mean have been proposed for the back-transformation problem. My favorite is Duan's Smearing Estimator which is fairly simple and robust.

Log transformation is however not a good method for dealing with the skewing effect of outliers. You will be much better off with more robust location estimators such as the median or the Winsorized mean and scale estimators such as MAD, Qn or Sn.

PG

PG
SteveDenham
Jade | Level 19

What looks like a smearing estimator is presented in the PROC GLIMMIX documentation where DIST=LOGNORMAL is discussed (although it is NOT the nonparametric bootstrapped value in the original paper).  A big tip of the hat to you, PG, for giving me a name for what we have been doing, according to the stub definition on Wikipedia.

Steve Denham

PGStats
Opal | Level 21

Hi , I don't think smearing estimates are available in SAS procedures. It is however easy to get them from the residuals.

Logarithmic Scale.PNG

Reference: Duan, N., 1983. Smearing estimate: a nonparametric retransformation method. J. Am. Stat. Assoc. 78, 605–610.

PG

PG
slg
Obsidian | Level 7 slg
Obsidian | Level 7

Raise the log base by the result. For example in your data the first case (A) = 2.7. Assuming that 2.7 is a log to the base 10, exponentiation = 10^2.7 or 501.2. If it is a log to the base e, then 2.7183^2.7 =  14.88 etc... Hope this helps..

stat_sas
Ammonite | Level 13

Thanks -  What is the interpretation based on this statistic?

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 4748 views
  • 1 like
  • 6 in conversation