I'm looking at case counts for various diseases from 2001-2013. I can plot the data (cases by year) but I'm having a hard time figuring out how to measure the increase or decrease (or no change) over that time span. I considered using regression (proc reg) but I'm not sure year meets the continuous assumption. I'm also not sure I can use proc reg for what is basically summary data. This seems like a pretty standard thing to do but my google search keywords aren't giving me what I need. Maybe I just don't know the right language. Any help is appreciated. Below is an example of the data for one of the diseases:
Year | COUNT |
2001 | 1,489 |
2002 | 1,612 |
2003 | 2,039 |
2004 | 2,537 |
2005 | 2,837 |
2006 | 3,031 |
2007 | 2,943 |
2008 | 2,384 |
2009 | 2,394 |
2010 | 4,430 |
2011 | 5,214 |
2012 | 4,142 |
2013 | 3,298 |
You have no covariates?
This his is times series data so you can google time series analysis, but a basic plot and proc reg are good places to start. a basic test if the slope is not equal to zero.
You have no covariates?
This his is times series data so you can google time series analysis, but a basic plot and proc reg are good places to start. a basic test if the slope is not equal to zero.
Go with 's suggestion of PROC REG as a first approximation. Be sure to look at the Durbin-Watson statistic (looking for autocorrelation).
Just eyeballing the data, it appears that there is both a trend and a cyclicity (period approximately 7 years) for this data. To get at that, you are probably going to have use some of the time series procedures in SAS/ETS.
Steve Denham.
Thanks Reeza and Steve. I'm just doing some preliminary analyses on 60+ diseases. From the results of the preliminary analyses we'll pick some to look look at more closely (i.e. time series, possibly multivariate). From your responses it sounds like I can use proc reg and look at the slope as a basic test.
I'm getting a warning "WARNING: The range of variable year is so small relative to its mean that there may be loss of accuracy in the computations." I was worried that the format of the data (summary data), one row for each year, turns year into a categorical variable of sorts. When I look at an example of proc reg work I've done in school, we used a dataset with cases listing age and SBP, and the interpretation of the results (e.g. For every year increase in age we see an increase in SBP of 0.73) is more obvious to me than it is for my current project.
Do you have any recommendations on time-series resources? I've read a little but it seems complex enough that I might need a class to gain enough understanding to use comfortably.
Thanks again.
The warning has to do with a range of 12 for variables with a mean of 2006. To get around it, select a common year as a baseline (2000 in the example), and subtract that from all the year values in data step. Then use the elapsed time since baseline (0, 1, 2, 3,...) as the right hand side variable.
As far as time series, a course would be useful. Check out the offerings by SAS under Forecasting and Econometrics on their training pages. It is a field where hands on learning is essential in the early stages. The later, more theoretical, parts are not as data driven, but do require thinking differently than you ordinarily would think about designed experiments or surveys.
Steve Denham
Thanks Steve. Much appreciated.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.