Posts: 3,805

# 95% CLM for N=2 MIN=MAX STDERR=0

What is the CI for 2 values when they are equal?

SAS Super FREQ
Posts: 3,839

## 95% CLM for N=2 MIN=MAX STDERR=0

A surprisingly subtle question. For constant data, I think the official answer is undefined (or missing).  I'll try to find a reference when I get into the office.

On the other hand, the LIMIT of this situation is zero, which is probably why you are writing. In other words, if your data are {0, delta}, then the width of the CLM approaches zero.  You can see this graphically by running the following SAS code:

data a;

keep sample x;

do y = 1 to 0 by -0.05;

sample + 1;

x = 0;   output;

x = y; output;

end;

sample + 1;

x=0; output; x=0; output; /* exact zero */

run;

proc means data=a noprint;

by sample;

var x;

output out=out range=range lclm=lclm mean=mean uclm=uclm;

run;

proc sgplot data=out;

band x=range upper=uclm lower=lclm;

run;

SAS Super FREQ
Posts: 3,839

## 95% CLM for N=2 MIN=MAX STDERR=0

I think it comes out of the derivation of the CLM formula.  You derive the formula by looking at the expression

t = (sampleAverage - populationMean)/ (sampleStdDev/sqrt(N))

You then argue that if N is large and the x are normally distributed (yada, yada, yada) then the statistics has a certain distribution.

When you have constant data, the sample std dev is 0, and therefore the expression is undefined.

Posts: 3,805

## 95% CLM for N=2 MIN=MAX STDERR=0

Thanks Rick that's what I thought too.  So what is being estimated by PROC UNIVARIATE CIBASIC?

SAS Super FREQ
Posts: 3,839

## Re: 95% CLM for N=2 MIN=MAX STDERR=0

I suspect they are just plugging into the formula

avg +/- t(1-alpha) s/sqrt(N)

and since s=0, the CI is assigned zero width.

It's really not clear to me what the correct answer should be for these degenerate data. Both can be justified.

Let's see how other statisticians weigh-in.

Posts: 2,655

## 95% CLM for N=2 MIN=MAX STDERR=0

If a sample standard error is zero, by implication the population standard error is zero (why? because it is the only inference you can make about the population parameter), and thus there is no variability in the population.  So any sample will give a CI of zero width.

This is one of the killers of small sample size, and why, in our shop, there was a caveat that there would be no analysis unless N>2 for every group.  (Note that this fails to account for internal replication when there are multiple groups.)

Steve Denham

Discussion stats
• 5 replies
• 187 views
• 0 likes
• 3 in conversation