BookmarkSubscribeRSS Feed
EdvinasS
Calcite | Level 5

Hi,

i would really appreciate if anyone could help me to solve a problem:

i have dataset X containing 1000 observations that are normaly distirbuted. X~N(0.05,0.02)

What i need to do is to find out what minimum amount of such observations do i need to get confident mean and std.

Why i need it? this dataset was collected in two years and now i can tell that it is really enougth of observations to claim that mean=0.05. i can not wait another two years to collect data for another type of observations so i need to find out how many observations do i need to get until  i can claim that mean of data set is confident?

9 REPLIES 9
Reeza
Super User

Sounds like you need a sample size calculation. Look at proc power, or in this case a google search will bring up a lot of online calculators.

EdvinasS
Calcite | Level 5

thanks for help. i have problem with proc power..

proc power;

onesamplemeans

mean = 0.05

ntotal = 10

stddev = 0.02

power = .;

run;

it gives me power >.999 all the time, no matter what ntotal value i enter. how can it be...?

Reeza
Super User

Because SAS is saying you only need 4 so anything over 4 will give you a lot of power...I thought you were looking for n though, so you'd set your power to 0.8 say and see what you get for n instead.

proc power;

onesamplemeans

mean = 5

ntotal = .

stddev = 2

power = .8;

run;

PaigeMiller
Diamond | Level 26

To tell you the truth, I don't have a problem with getting a power of > 0.999 in this case.

I do have a problem with your statement "no matter what ntotal value i enter". When I enter an ntotal value of 5 or less, the power decreases.

The power of a test is: the probability that the test will reject the null hypothesis when the null hypothesis is false. So, with only 2 observation, the probability of rejecting H0: mean=0 when in fact H0 is false is 0.219. That seems like the right answer to me. With only 2 observations, and mean ne 0, you could easily wind up accepting the null hypothesis. When ntotal increases to 10, it seems very likely that you will reject the null hypothesis when it is false.

--
Paige Miller
EdvinasS
Calcite | Level 5

Thanks for help, but i still cant get the needed answer. Let me explain the problem more clearly. I have a data set which looks like that (-1, 0.8, -1, -1, 0.9,0, 1.1, 0, -1,...........,0.7, 0, 0.9), N=1000, mean ~0.04. I am using bootstrapping technique to find out the distribution and i get X~N(0.05, 0.02). Everything is clear for me here. If i put mean, std dev, and lower/upper bounds of mean to proc power to get N i get ~300. Thats look totally normal as i was guessing that 300 would be enough before trying to calculate this. But if i choose 300 random observations from data set containing 1000 observations and calculate mean, every time it is not even close to 0.05. Even running bootstrapping when generating 300 samples means where N =300 i get totally different results. So how do i get N with which mean would be similar in both cases with all data from set and with N=300 ?

Problem: for example i want to collect another data set with similar observations (it takes too long to get another 1000 observations) and i want to decide when number of observations is enough to conclude that mean is equal to some number and it will be the same in a long term so and i can invest real money into these observations.

Reeza
Super User

Something is wrong in your math somewhere.

If you're bootstrapping your results and generating 300 different samples that don't have a mean similar to your 'true' mean then I question the calculation of the original mean.  Unless you have a few extreme outliers, but then I'd expect the std to account for that. 

EdvinasS
Calcite | Level 5

Its not like "not even close" but if i take mean of original data its ~0.04. if i use bootstraping for distribution analysis and generate 1000 datasets with 1000 randomly selected observations from original (which contains 1000 obs) i get mean 0.05 +-0.005, std=0.025. if i put mean, std and up/low limits of mean into proc power to get N for onesample mean with power 0.9 and alpha=0.05 i get N= ~300. if i repeat bootstraping and generate 300 datasets with 300 randomly selected obs from original dataset i get mean ~0.08 which is twice bigger than real mean of original dataset. if i simply take obs 1-300, 300-600, 600-900 from original data set i get means like ~0.08, ~0.05, ~ -0.02. im stuck...

Reeza
Super User

EdvinasS wrote:

Its not like "not even close" but if i take mean of original data its ~0.04. if i use bootstraping for distribution analysis and generate 1000 datasets with 1000 randomly selected observations from original (which contains 1000 obs) i get mean 0.05 +-0.005, std=0.025.

Usually you use less than the number of observations, but I still think somethings wrong, possibly in your code somewhere but I can't tell. 

Doc_Duke
Rhodochrosite | Level 12

This is very suspicious:  "if i simply take obs 1-300, 300-600, 600-900 from original data set i get means like ~0.08, ~0.05, ~ -0.02."  Are your observations in time order?  Maybe there is a process going on that means this is not really a random sample from a univariate distribution.  Try box plots by quarter to see if there is some sort of regression involve.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2375 views
  • 6 likes
  • 4 in conversation