BookmarkSubscribeRSS Feed

How Do I Perform Simple Linear Regression in SAS®? Q&A, Slides, and On-Demand Recording

Started ‎10-17-2022 by
Modified ‎10-19-2022 by
Views 721

Watch this Ask the Expert session to learn the mechanics behind simple linear regression and how to perform it using both SAS®9 and SAS® Viya®. 

 

Watch the Webinar

 

You will learn:

  • How easy it is to perform a simple linear regression on the SAS platform.
  • How to interpret the results from a simple linear regression.
  • The basic mechanics behind the scenes of a simple linear regression.

 

The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.

 

Q&A

What will be a believable good R-square value? 

It depends on a lot of things. For example, in areas like psychology when you're doing studies, those R squares turn out to be very, very low. Another example, if you’re working for NASA building spaceships and you're building regressions to look at quality control, you're not going to be happy if your R-squared is about 0.65. You'll want to have a really high R-squared. Often times, your R-squared value is going to be industry dependent. One of the things you can do as a data scientist is talk to other people in your industry and say, hey, what is a good R-squared for our type of data? What would we find? Try to find some examples of other people that are using similar data and see what a good R-squared value for them is. 

 

Can this analysis also be performed on SAS Viya? 

Yes, absolutely. SAS Viya is great for when you have big data. It works based off in-memory analytics and in-memory data. It'll take your data, shove it into memory, and it will perform the analytics on it in-memory. It works very fast. Just like there's multiple ways of performing linear regression in our traditional versions of SAS, there's also multiple ways that you can do it in Viya. 

 

Can you go over the difference in the gray area and dotted lines for 95% confidence intervals? 

The gray area, which was a lot smaller than the dashed line, was 95% confidence intervals for the average. That's why it's a lot smaller because you can be surer about coming up with an average value. We would be able to say things like, I'm 95% confident that the average weight for a student of this height is between these two values. But the larger 95% confidence intervals were for an individual, so we're going to be a little less confident. That's why those dashed lines are a little bit larger, and we would be able to say I am 95% confident that an individual’s weight would be between these two values for this particular height. I hope that answered your question. 

 

What if your data is a population rather than a sample? Does that change anything? 

If your data is the entire population then “statistical significance” becomes irrelevant. Remember, all statistical inference is based on taking a sample of the population. In the case of performing regression on the entire population instead of a sample, be careful with interpretation of your results. 

 

What if you have more than one input, can you have multiple inputs or features? Will the equation look like y = b0 + b1x1 + b2x2 + b3x3 +...? Is this still considered "linear"? 

Yes, you can perform multiple linear regression by having multiple features or inputs. The model is still considered “linear” provided that the dependent variable is a linear function of the independent variables. The inputs can even be higher order terms (like quadratic or cubic) provided you meet the assumption that the mean of the target is a linear function of the inputs. 

 

Why square rather than absolute value? 

Using squares (as opposed to the absolute value) gives a nice continuous and differentiable function. 

 

Why do I get this error message? ERROR: All observations have the same response. No statistics are computed. 

It sounds like there may be some sort of data quality issue. Is it possible that your target variable is either all missing or all the same value? Could you have a filter in effect that is affecting your sample? I would look at your target column values first. 

 

Could you also show us how to split validation & test sets? 

I would suggest using the Partition Data Task in SAS Studio. Honestly, there are several ways of doing it in SAS. Enjoy! 

 

How does the standard error relate to parameter estimate? 

The standard error of the regression is also known as the standard error of the estimate. It represents the average distance between the observed values and the regression line. Another way to think about it is it tells you how “wrong” your model is on average. Smaller values are desirable. 

 

If you use the task list to select on linear regression, can you modify that template code (add options, etc.)? It doesn't look like it, at least in my version of SAS Studio. 

Yes, you can save the code produced by a task and make modifications to the code. Once you edit that code, it is no longer attached to the original task. It becomes an independent program. 

 

One of the assumptions is indep errors. How do we determine this? 

You can validate this assumption by examining residual or error plots. Examine the plot of the residual values versus the predicted values. There should be no patterns. It should look like random scatter to validate this assumption. 

 

Do you have plans to show other types of model analysis such as Logistic regression or SVM? 

Today we only covered simple linear regression. Perhaps we can schedule a future “expert” seminar on Logistic Regression or Support Vector Machines. 

 

Will a follow-up webinar be scheduled showing real world data or other kinds of regression models? 

It is possible. 

 

What is a good t-value? 

A good rule of thumb is that a significant t-value (with adequate sample size) would be when the absolute value of t is greater than 2. 

 

Do you have suggestions for good books or resources for Statistical Analysis/Concepts? 

https://support.sas.com/edu/schedules.html?crs=STAT0&ctry=US 

https://support.sas.com/edu/schedules.html?crs=STAT1&ctry=US 

https://video.sas.com/category/videos/basic-statistics 

 

In one of the tables, why is DF 1, which is the same for both intercept and indep variable? 

For a simple linear regression with only one input, the Model Degrees of Freedom will always be 1. It is defined as the number of parameters, 2 (being the slope and y-intercept) minus 1. 

In the Parameter Estimates table both the intercept and independent variable have 1 degree of freedom each by definition. 

 

Are the step-by-step instructions available to perform SLR in SAS Viya? 

SAS Studio is available to SAS Viya, so you can perform a simple linear regression in SAS Viya the same way that I showed you today. 

 

Do you have advice on transforming a different relationship, say quadratic, to linear in order to run linear regression? 

The only caution I have is that when you add higher order terms into your model (like quadratic and cubic) you start to lose interpretability of the model. In other words, when trying to explain the model to your peers, will they understand the context in which it makes sense to square or cube your input? 

 

 

Recommended Resources

SAS Tutorial: Simple Linear Regression in SAS

Introduction to Statistical Concepts (Free E-Learning)

Statistics 1: Introduction to ANOVA, Regression, and Logistic Regression (Free E-Learning)

 

Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.  

Comments

At 21:21, how do you get the Beta1 equation of the sums of various com combinations

There are many references available for the derivation of (simple) linear regression. Here is a good one: Linear Regression Complete Derivation from Towards AI 

Version history
Last update:
‎10-19-2022 09:54 PM
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Tags