Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Programming
- /
- SAS Procedures
- /
- Re: Geometric mean

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

🔒 This topic is **solved** and **locked**.
Need further help from the community? Please
sign in and ask a **new** question.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 04-27-2012 03:15 AM
(39131 views)

Hi Please tell me how to calculate geometric mean in sas.Is it possible with proc mean.

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

From @Ksharp:

SAS has already offered such a function.

x1=geomean(1,2,2,4);

More info... Assuming you want the geometric mean because your data has a lognormal distribution, you could do :

```
data test;
set myData;
LogV1 = log(V1);
run;
proc means data=test alpha=0.1;
var LogV1;
output out=myStats
mean=meanLogV1
lclm=lclmLogV1
uclm=uclmLogV1;
run;
proc sql;
select exp(meanLogV1) as geometricMean,
exp(lclmLogV1) as lclmGeoMean,
exp(uclmLogV1) as uclmGeoMean
from myStats;
quit;
```

*Editor's note:* see also:

- this deeper discussion in this thread about uses of the geometric mean.

- blog post from @Rick_SAS about the arithmetic-geometric mean, which includes methods for calculation.

- comment from the OP: "I found that we can directly calculate geometric mean and its confidence interval by proc ttest specifying distribution=lognormal (TTEST doc here)."

PG

PG

11 REPLIES 11

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

SAS has already offered such a function.

x1=geomean(1,2,2,4);

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Not sure if this link can help:

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

From @Ksharp:

SAS has already offered such a function.

x1=geomean(1,2,2,4);

More info... Assuming you want the geometric mean because your data has a lognormal distribution, you could do :

```
data test;
set myData;
LogV1 = log(V1);
run;
proc means data=test alpha=0.1;
var LogV1;
output out=myStats
mean=meanLogV1
lclm=lclmLogV1
uclm=uclmLogV1;
run;
proc sql;
select exp(meanLogV1) as geometricMean,
exp(lclmLogV1) as lclmGeoMean,
exp(uclmLogV1) as uclmGeoMean
from myStats;
quit;
```

*Editor's note:* see also:

- this deeper discussion in this thread about uses of the geometric mean.

- blog post from @Rick_SAS about the arithmetic-geometric mean, which includes methods for calculation.

- comment from the OP: "I found that we can directly calculate geometric mean and its confidence interval by proc ttest specifying distribution=lognormal (TTEST doc here)."

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

interesting terminology,

"geometric mean and 90% confidence interval"

Is the confidence interval calculated differently when the appropriate mean is geometric?

(a question from a non-statistician)

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Peter, the usual reason for choosing the geometric mean as a measure of location is to discount the influence of large observations. The geoMean is always less than the mean and that is sometimes an untold motivation for choosing it. Whether it is appropriate or not, the estimate is a random quantity and it can be characterized by a confidence interval. The little example below shows two methods for obtaining confidence intervals for both the arithmetic and geometric means. Here, the parametric method wrongly assumes that the data comes from a lognormal distribution.

```
/* Generate random sample from uniform distribution */
%let sampleSize=30;
data test;
call streaminit(8567845);
do i = 1 to &sampleSize.;
x = rand("UNIFORM");
output;
end;
run;
/* Get parametric estimates of the mean, geometric mean and confidence intervals */
data logTest;
set test;
logX = log(x);
run;
proc means data=logTest noprint;;
var x logX;
output out=parmTest mean=MeanX GeoMeanX
lclm=lclmMeanX lclmGeoMeanX
uclm=uclmMeanX uclmGeoMeanX;
run;
/* Get bootstrap estimates of the mean, geometric mean and confidence intervals */
proc surveyselect data=test method=urs sampsize=&sampleSize.
seed=8634235 reps=10000 out=repTest noprint;
run;
proc sql;
create table statTest as
select replicate, mean(x) as meanX, exp(mean(log(x))) as geoMeanX
from repTest
group by replicate;
proc univariate data=statTest noprint;
var meanX geoMeanX;
output out=bootTest mean=MeanX GeoMeanX
p5=lclmMeanX lclmGeoMeanX
p95=uclmMeanX uclmGeoMeanX;
run;
/* Assemble result table */
proc sql;
select "Parametric" as estimationMethod, "Arithmetic Mean" as statistic,
meanX label="Estimate",
lclmMeanX label="90% Lower conficence interval",
uclmMeanX label="90% Upper conficence interval"
from parmTest
union all
select "Parametric", "Geometric Mean",
exp(geoMeanX), exp(lclmGeoMeanX), exp(uclmGeoMeanX) from parmTest
union all
select "Bootstrap", "Arithmetic Mean" as statistic,
meanX, lclmMeanX, uclmMeanX from bootTest
union all
select "Bootstrap", "Geometric Mean",
geoMeanX, lclmGeoMeanX, uclmGeoMeanX from bootTest;
quit;
```

90% lower 90% upper estimation confidence confidence Method statistic Estimate interval interval --------------------------------------------------------------- Parametric Arithmetic Mean 0.507751 0.394374 0.621128 Parametric geometric mean 0.35044 0.229529 0.535045 Bootstrap Arithmetic Mean 0.507732 0.439312 0.576648 Bootstrap geometric mean 0.354656 0.274441 0.456925

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

PG

thank you for the mostr effective demonstration and hope you might tolerate an old man's tardy reply.

You have not only confirmed my expectations but demonstrated how much a geo-mean might differ from the arithmetic.

I like the simplicity of the suggestion "discount the influence of large observations". Previously for me, it (geo mean) appeared to be of practical use solely among the tools of the actuary (and perhaps technical investment analysts), providing the only theoretical way to derive "average" rate of return

(n-th root of) product[( over n periods) of (1+rate_i)] -1

(where i goes from 1 to n)

Such calculations averaging rates, suffer problems if ever a pereiodic rate_i exceeds -100% as in for example, bankruptcy.

Have I misinterpreted the situation?

Do you see a practical work-around?

PeterC

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hi Peter,

Well, I don't know much about economics but some errors are common to many fields. The error I see lurking in your argument is one I've made more than once, I'm afraid. When dealing with rates of change, we tend to estimate the change by multiplying the rate by time. But that is an approximation. Things that change at a constant relative rate* r* obey the equation :

Y(t) = Y(0)*Exp(r*t) Eq 1

which we approximate by

Y(t) = Y(0) * (1 + r*t) Eq 2

that's fine as long as Abs(r*t) is much smaller than one. And that is usually how r is estimated : by looking at change over a short period. Inflation for example can be estimated by looking at the change in prices over a month, and the rate estimated by Equation 2 will be very close to the real value, as long as it is small, 2 or 3 % per year, say. The approximation breaks down when you face something that lost 90% of its value over a month. The rate estimated with Equation 2 will be -10.8 per year when the true value is -27.6. Worse, either of those values, when plugged back in Equation 2 would mean that after two months the value would become negative.

Now, let's say you have many ratios Ri = Yi(T)/Yi(0) that you want to summarize with a single value. It makes a lot of sense to use the geometric mean of the Ri to do that because GeoMean(R) = Exp(Mean(r)*T), which is the ratio you would get from the average rate of change. That is the logic, I believe, behind the use of the geometric mean in that context.

To summarize: the geometric mean should be calculated on ratios that can be approximated with Equation 2 only when the rate of change is small and the time period is short. Otherwise the geometric mean should be calculated on the actual observed ratios or ratios estimated with Equation 1.

PG

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Easy and quick to solve with the geomean function.

Further explanations are also didactic and great.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

OK. Try it.

data test ; input v1 @@; cards; 1 2 3 4 5 6 7 8 9 ; run; %let dsid=%sysfunc(open(test,i)); %let nobs=%sysfunc(attrn(&dsid,nobs)); %let dsid=%sysfunc(close(&dsid)); data _null_; set test end=last; array _g{&nobs} _temporary_; _g{_n_}=v1; if last then do; geomean=geomean(of _g{*}); put geomean=; end; run;

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**SAS Innovate 2025** is scheduled for May 6-9 in Orlando, FL. Sign up to be **first to learn** about the agenda and registration!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Ready to level-up your skills? Choose your own adventure.