Is this your first time using statistical procedures within SAS software? Are you new to statistics in general? Has it been a while since your last statistics course? Need a review of the multitude of statistical procedures found in SAS? If you answer yes to any of these questions, then this series is for you. In part 1, we discussed aspects of exploring and describing continuous variables. We investigated PROC SGPLOT, MEANS, UNIVARIATE, and CORR. In part 2, our discussion will now turn to the modeling aspects of continuous variables. Our focus will be on PROC REG, GLM, GLMSELECT, and PLM.
Assuming you have plotted your data with PROC SGPLOT and explored your data with PROCs UNIVARIATE and CORR, now we move to the modeling aspect of statistical analysis. For most of you, this will be focusing on making predictions of future observations. However, this conversation can focus on modeling procedures to assist with explanations and predictions.
Let’s start with the procedure that many start with in SAS to model continuous data, PROC REG. REG does stand for regression analysis. The procedure does require that your response or target variable be continuous of type. In its simplest format, your possible predictor or explanatory variables will also be continuous of type. This is due to there being no CLASS statement within the REG procedure. Recall that the CLASS statement is how SAS will know that the variables listed will be considered categorical of type within our model.
It must be noted that this does not mean that PROC REG cannot produce models that utilize categorical predictors. It will be up to you as the analyst to create the design or dummy variables that will be used by the procedure. Essentially, you will be doing the work of the CLASS statement yourself. (Sidenote: SAS can help you with this by using the OUTDESIGN option in several other procedures.)
Do not be surprised when you see an Analysis of Variance table appear in the output of PROC REG. This does cause confusion with some analysts. The mathematics behind the solving of a linear regression and an ANOVA, analysis of variance, are the same. We will still be using the information from this table to determine whether we have an overall significant model.
proc reg data=sashelp.baseball;
model salary = nRBI nRuns;
run;
Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.
When you have multiple possible predictors within the model, you will be looking at the individual p-values within the parameter estimates table to decide whether a variable will remain in the model or will be removed.
SAS will also generate ODS output graphics automatically dependent on what statements, options, or sub-options are provided within the code. A great thing to do is to put PLOTS=ALL in the procedure line to request SAS show all images that it can for the code provided. PROC REG provides a panel plot of fit diagnostics that includes items such as leverage, residual plots, Cook’s D, RStudent, etc.
PROC GLM, general linear models, does have a CLASS statement that allows for the easy inclusion of categorical predictors. The output of PROC GLM is focused on comparing group means or analysis of variance (ANOVA). The primary output will focus on the ANOVA table, which shows the overall significance of the model (is there a group whose mean is significantly different from the rest).
proc glm data=sashelp.baseball;
class league;
model salary = nRBI nRuns league;
run;
Rather than showing parameter estimates for the generated model, the default output displays the Type 1 and Type 3 sums of squares. These tables show which variable or variables contains the group or groups that have significantly different averages. In most cases, we will use the Type 3 output to justify significance of the predictor variables. Type 1 is used for situations where polynomials are included in the structure or order of the variable inclusion matters.
But if I want to get a mathematical equation for my model like in PROC REG, can I in PROC GLM? Certainly! Adding the SOLUTION option to the MODEL statement will produce the parameter estimates table for the generated model.
proc glm data=sashelp.baseball;
class league;
model salary = nRBI nRuns league / solution;
run;
It should be noted that if you want the fit diagnostics that were mentioned earlier you will need to go to PROC REG to produce this output. Remember that if categorical variables are being used as predictors, you will need to create the design variables before calling PROC REG.
Have you ever been given a data set containing your target variable and a collection of predictors but no guidance on which variables to include in your model? PROC GLMSELECT will be your friend in this situation.
PROC GLMSELECT takes the aspects of PROC GLM and adds to it the variable selection methods of PROC REG and more. Yes, you read that correctly! PROC REG allows an analyst to use methods like forward selection, backwards elimination, and stepwise selection. However, PROC REG only allows for significance level to be the deciding factor to add or remove a predictor from the model. PROC GLMSELECT allows more choices of statistics to make this decision. You can choose from information criteria like AIC, BIC, and SBC. Adjusted r-squared is also a choice going beyond just significance level.
proc glmselect data=sashelp.baseball;
class league;
model salary = nRBI nRuns league / selection=forward select=AIC;
run;
In most cases, in PROC REG, the final model in the step process is the one that is used. In PROC GLMSELECT, we can allow the step process to create a collection of possible models and then use an alternative method to pick the selected model with the CHOOSE= option.
PROC GLMSELECT basically is taking aspects of PROC GLM and PROC REG and bringing them together in one very powerful and useful procedure.
Let’s end this discussion with the PLM procedure, post-fitting for linear models. PROC PLM is not the procedure to generate your model but is useful after the model has already been created. Within several modeling procedures, you can include a STORE statement. In this statement, you provide a location and a name for something called an item store. This is a binary file that contains a vast amount of information about your model and the data that generated said model.
PROC PLM does not save the original data that you investigated. It saves the sufficient statistics from that data and allows you to generate post model analysis without the need of the original data. From SCORE to EFFECTPLOTS to LSMESTIMATES, PROC PLM can provide a quick way to answer follow up questions that someone may ask.
proc glmselect data=sashelp.baseball;
class league division;
model salary = nRBI nRuns league division / selection=none showpvalues;
store out=baseballitem;
run;
proc plm restore=baseballitem plots=all;
slice league*division / sliceby=league;
run;
You may have noticed that all the procedures mentioned above are from the SAS 9 Platform. If you are utilizing SAS Workbench, each of these procedures are available to you. If you are utilizing SAS Viya, you do not need to worry as all SAS 9 procedures are executable within SAS Viya using the Compute Server. But what if you wanted to utilize the power of the Cloud Analytic Service (CAS)? Are there versions of these statistical procedures that are CAS enabled? Yes, there are. Visit this link to find a list of SAS 9 procedures and their comparable CAS-enabled procedures.
Regardless of your use of the SAS 9 PROCs or the CAS-enabled PROCs, in SAS Viya or SAS Workbench, you will have the tools you need to model your continuous variables and be prepared to proceed with scoring or post-analysis. Give some of these procedures a try and let me know which is your favorite. See you in the next installment of this series.
Find more articles from SAS Global Enablement and Learning here.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.