Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Re: When are there too many observations?

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 11-18-2017 09:07 AM
(2231 views)

Dear community,

I am running a study over five years with survey data, and the weighted number of patients included is about 30 million in total.

I will assess for a relationship between two variables. When does the number of observations become so large that everything starts to show significance (as I have heard)? I am subsetting the observations by a classification variable which includes about 20 subsets, and so the numbers will be smaller ultimately, but I would still like to know how I should interpret results with larger data in the millions, or does this issue happen when we go into the billions?

Thanks

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm not a statistician, thus I can't provide a definitive answer to your question. In fact, I'm only responding for two reasons: (1) no one has responded yet after two hours and (2) this will insure that I get to see the other responses you will get.

A nice, easy to read blog provides part of your answer: https://www.google.ca/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwiW4qfbwMjXA...

However, when is big too big? I think it depends upon a number of factors, including type of analysis, practical significance, and power of the test. In answer to your question about millions vs billions, I think the number typically is much, much smaller (e.g., hundreds vs thousands), but depends on the combination of the variability in your data and how large a difference has to be that you would consider practically (not just statistically) significant.

Regardless, I completed my statistical studies too, too many years ago, thus am more interested in current thinking (my second reason for responding).

Art, CEO, AnalystFinder.com

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

As Art said, the answer depends on many things. It's impossible to give a definitive answer. However, if you have 30 million records, there's a good chance you are in that situation.

Just as a piece of information, people who do national political surveys (in the United States) will poll about 1500 people to get the desired results.

You probably want to come up with some estimate of practical significance if possible (for example, a difference of 5 units is of no importance to anyone). If the confidence intervals are an order of magnitude smaller than the practical significance, then you have reached the point where you have too large a sample.

--

Paige Miller

Paige Miller

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Having lots of data is a blessing.

There are not many reasons I can think of for subsampling your data AFTER it is collected.

1) To reduce computing cost

2) To assess model performance (data is divided into training and test subsets)

3) To avoid pseudoreplication by avoiding units that might be correlated (in time, space, or otherwise)

4) To evaluate the small sample performance of an estimator

Otherwise, having more data simply shrinks your confidence intervals to the point where zero is not included.

PG

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

**Available on demand!**

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.