BookmarkSubscribeRSS Feed
admendez03
Fluorite | Level 6

I know when model building using proc reg or even proc logistic it automatically outputs the assumptions or at least gives you the option to input the code for in the model statement. However, when it comes to proc surveyreg and logistic, it has been hard to test the following assumptions:

 

Linear: linearity, normality, homoscedasticity

logistic: linearity between independent variables and logit

both: multicollinearity, independence of errors, and influential outliers

 

I know how to use obtain these model assumptions via normal proc reg/logistic, but since I am taking into account of the complex survey design, how does one test these assumptions? It's been extremely difficult finding this information.

Like in:

proc reg data=x;

model y=z / vif coll tol

run;

I can use this kind of specification in order to test for VIF, tolerance etc.

 

 

Thank you!

3 REPLIES 3
SteveDenham
Jade | Level 19

Linear:  Homogeneity of variance can be done with PROC GLM, through the HOVTEST option to the MEANS statement.  Keep in mind Box's comment: To make preliminary tests on variances is rather like putting to sea in a row boat to find out whether conditions are sufficiently calm for an ocean liner to leave port.

             Normality of residuals - Visual plots from PROC GLM or REG are best.  PROC UNIVARIATE does provide various tests for deviation of a dataset from normality, but all of them proceed from a null hypothesis of normality, so can only be considered approximate.  PROC NPAR1WAY has Kolmogorov-Smirnov goodness of fit testing.

            Linearity - check the F test for deviation from linearity, if you have replication at points.  If it is a fit to a set of non-replicated values, try polynomials.  Non-linearity in residual plots from any of the PROC's is another clue.

 

Logistic: Linearity between IV's and logit transformed response - Again I would look for tests for deviation from linearity. or graphic presentations

 

Both: Multicollinearity does not depend on the response variable, so PROC REG is a good tool.

         Independence of errors - residual plots

         Influential outliers - leverage plots

 

The latter two are harder to come by, but PROC REG and PROC LOGISTIC have plots that fall in this category.

 

SteveDenham

Zard
SAS Employee

The PROC GLM (and PROC REG) approach to inference is model-based. You rely on the appropriateness of the model when evaluating your tests. That reliance includes the model components, the variance specification, and the distributional assumptions of the model. GLM is assuming independent random sampling from a normal distribution, constant residual variance, and an infinite population. The concept of iid errors is inherent to ordinary least squares regression.

 

The SURVEYREG approach to inference is design-based. Expectations and variances are derived with respect to the probability sampling mechanism of the design. The design underlying the SURVEYREG code is probability-based random sampling from a finite population. There is not an assumption that the errors are normal or iid.

 

SteveDenham
Jade | Level 19

@Zard  that is great information.  Thank you for bring it to the front of the discussion.

 

SteveDenham

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 1064 views
  • 2 likes
  • 3 in conversation