Help with statistics please

Reply
Contributor
Posts: 29

Help with statistics please

Hi,

i would really appreciate if anyone could help me to solve a problem:

i have dataset X containing 1000 observations that are normaly distirbuted. X~N(0.05,0.02)

What i need to do is to find out what minimum amount of such observations do i need to get confident mean and std.

Why i need it? this dataset was collected in two years and now i can tell that it is really enougth of observations to claim that mean=0.05. i can not wait another two years to collect data for another type of observations so i need to find out how many observations do i need to get until  i can claim that mean of data set is confident?

Super User
Posts: 17,737

Re: Help with statistics please

Sounds like you need a sample size calculation. Look at proc power, or in this case a google search will bring up a lot of online calculators.

Contributor
Posts: 29

Re: Help with statistics please

thanks for help. i have problem with proc power..

proc power;

onesamplemeans

mean = 0.05

ntotal = 10

stddev = 0.02

power = .;

run;

it gives me power >.999 all the time, no matter what ntotal value i enter. how can it be...?

Super User
Posts: 17,737

Re: Help with statistics please

Because SAS is saying you only need 4 so anything over 4 will give you a lot of power...I thought you were looking for n though, so you'd set your power to 0.8 say and see what you get for n instead.

proc power;

onesamplemeans

mean = 5

ntotal = .

stddev = 2

power = .8;

run;

Trusted Advisor
Posts: 1,606

Re: Help with statistics please

To tell you the truth, I don't have a problem with getting a power of > 0.999 in this case.

I do have a problem with your statement "no matter what ntotal value i enter". When I enter an ntotal value of 5 or less, the power decreases.

The power of a test is: the probability that the test will reject the null hypothesis when the null hypothesis is false. So, with only 2 observation, the probability of rejecting H0: mean=0 when in fact H0 is false is 0.219. That seems like the right answer to me. With only 2 observations, and mean ne 0, you could easily wind up accepting the null hypothesis. When ntotal increases to 10, it seems very likely that you will reject the null hypothesis when it is false.

Contributor
Posts: 29

Re: Help with statistics please

Thanks for help, but i still cant get the needed answer. Let me explain the problem more clearly. I have a data set which looks like that (-1, 0.8, -1, -1, 0.9,0, 1.1, 0, -1,...........,0.7, 0, 0.9), N=1000, mean ~0.04. I am using bootstrapping technique to find out the distribution and i get X~N(0.05, 0.02). Everything is clear for me here. If i put mean, std dev, and lower/upper bounds of mean to proc power to get N i get ~300. Thats look totally normal as i was guessing that 300 would be enough before trying to calculate this. But if i choose 300 random observations from data set containing 1000 observations and calculate mean, every time it is not even close to 0.05. Even running bootstrapping when generating 300 samples means where N =300 i get totally different results. So how do i get N with which mean would be similar in both cases with all data from set and with N=300 ?

Problem: for example i want to collect another data set with similar observations (it takes too long to get another 1000 observations) and i want to decide when number of observations is enough to conclude that mean is equal to some number and it will be the same in a long term so and i can invest real money into these observations.

Super User
Posts: 17,737

Re: Help with statistics please

Something is wrong in your math somewhere.

If you're bootstrapping your results and generating 300 different samples that don't have a mean similar to your 'true' mean then I question the calculation of the original mean.  Unless you have a few extreme outliers, but then I'd expect the std to account for that. 

Contributor
Posts: 29

Re: Help with statistics please

Its not like "not even close" but if i take mean of original data its ~0.04. if i use bootstraping for distribution analysis and generate 1000 datasets with 1000 randomly selected observations from original (which contains 1000 obs) i get mean 0.05 +-0.005, std=0.025. if i put mean, std and up/low limits of mean into proc power to get N for onesample mean with power 0.9 and alpha=0.05 i get N= ~300. if i repeat bootstraping and generate 300 datasets with 300 randomly selected obs from original dataset i get mean ~0.08 which is twice bigger than real mean of original dataset. if i simply take obs 1-300, 300-600, 600-900 from original data set i get means like ~0.08, ~0.05, ~ -0.02. im stuck...

Super User
Posts: 17,737

Re: Help with statistics please

EdvinasS wrote:

Its not like "not even close" but if i take mean of original data its ~0.04. if i use bootstraping for distribution analysis and generate 1000 datasets with 1000 randomly selected observations from original (which contains 1000 obs) i get mean 0.05 +-0.005, std=0.025.

Usually you use less than the number of observations, but I still think somethings wrong, possibly in your code somewhere but I can't tell. 

Trusted Advisor
Posts: 2,113

Re: Help with statistics please

This is very suspicious:  "if i simply take obs 1-300, 300-600, 600-900 from original data set i get means like ~0.08, ~0.05, ~ -0.02."  Are your observations in time order?  Maybe there is a process going on that means this is not really a random sample from a univariate distribution.  Try box plots by quarter to see if there is some sort of regression involve.

Ask a Question
Discussion stats
  • 9 replies
  • 399 views
  • 6 likes
  • 4 in conversation