If you have been to one of my courses where I touch on forecasting, you have heard my rant on this before. Please! Do not use ordinary least squares (OLS) regression to forecast to the future! This article will explain why not.
Why Can’t I Use Ordinary Regression?
Sadly, in my checkered past, I have seen innocent, well-meaning folks use time as the independent variable in ordinary least squares regression to accomplish forecasting. Do not do this! One good reason is assumption violations.
Ordinary regression models have a number of assumptions.
Data with a time component violate these assumptions causing the following issues:
(Information above largely from Forecasting Using SAS Software: A Programming Approach by Dickey & Woodfield)
Recall from my earlier article of goal-seeking and scenario analysis that homoscedasticity (also called homogeneity) of variance means that variances (error) are equal/constant. Heteroscedasticity is the opposite and means that the variance (error) changes.
Traditional regression methods are performed on data at a single point in time, or where time is not even considered. Cross-sectional data would be appropriate for this.
Cross-sectional data – a collection of observations for multiple individuals at a single point in time. The following table shows an example.
Aside: For perspective, what is 7,500 calories? Well, it could be eight 12-ounce steaks.
Or…four Valentine’s boxes of chocolates.
Let’s just say that it would be much easier to eat four boxes of chocolates in a day. Don’task me how I know this.
Recall that time series analysis requires an historic set of data that includes repeated measures.
Longitudinal and Panel data – individual (or other entity) observations measured repeatedly over time. If there is more than one individual measured over time these are called panel data and may be called cross-sectional time series data. This data may be transactional, that is measured at various times by individuals. The following table provides an example.
Longitudinal (panel) data are useful for distinguishing cohort effects from aging effects. Let’s say an effective reading program was begun three years ago in a new public kindergarten. In a cross-sectional study comparing reading level by age, we may find 9-year-olds to be poorer readers than 8-year-olds. This can be a cohort effect (the 9-year-old cohort did not get the early reading training that the 8-year-old cohort got) as opposed to an ageing effect. If the reading level of students is measured over time for the SAME individuals as in a longitudinal study, the cohort effect is removed and the ageing effect can be accurately measured. (Adapted from Diggle et al. 1994.)
Time Series – is an indexed set of data over equally-spaced time periods. Note that this is an example for illustration purposes, and you would absolutely never conduct a forecast from only three periods!
Many time series analyses require that you create a time series from the transactional data. This is commonly done by taking the average, minimum, or maximum for given time periods. In the fictional example above, I have taken the average for each month.
Using our weight illustrates the point that subsequent measures are not independent. How much I weigh this month is not independent from how much I weighed last month.
Doing it the Right Way
If OLS regression is the wrong way, then what is the right way? You must:
I will discuss this further in future posts.
Proper forecasting methods are available in SAS forecasting tools, making it easy for you to use them. For more specifics on forecasting with SAS tools, visit the forecasting courses listed at the end of this article and my summary of this in Forecasting Concepts 4.
Sources and Additional Information
Unsolicited Advice for Valentine’s Day:
DON’T: Eat half of the chocolates in a heart-shaped box and then give your true love a half-empty box.
DON’T: Re-gift a box of chocolates that your ex gave you last year to your new love.
DON’T: Decide at 6 pm that you will go out to your favorite restaurant on Valentine’s Day.
DO: Make a reservation or plan to go out on a different night.
DON’T: Take your true love to McDonald’s on Valentine’s Day if you are over the age of 11.
DO: Write a love poem.
DON’T: Print it out in
DO: Print it out in
And finally, what you’ve always wondered…the correct answer to “Does this outfit make me look fat?” is “Of course not; you look amazing, honey!” followed by, “Try this chocolate truffle.”
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.