Re: Testing model assumptions using PROC SURVEYLOGISTIC and SURVEYREG

admendez03 · Posted 10-17-2021 04:05 PM

I know when model building using proc reg or even proc logistic it automatically outputs the assumptions or at least gives you the option to input the code for in the model statement. However, when it comes to proc surveyreg and logistic, it has been hard to test the following assumptions:

Linear: linearity, normality, homoscedasticity

logistic: linearity between independent variables and logit

both: multicollinearity, independence of errors, and influential outliers

I know how to use obtain these model assumptions via normal proc reg/logistic, but since I am taking into account of the complex survey design, how does one test these assumptions? It's been extremely difficult finding this information.

Like in:

proc reg data=x;

model y=z / vif coll tol

run;

I can use this kind of specification in order to test for VIF, tolerance etc.

Thank you!

SteveDenham · Posted 10-18-2021 01:03 PM

Linear: Homogeneity of variance can be done with PROC GLM, through the HOVTEST option to the MEANS statement. Keep in mind Box's comment: To make preliminary tests on variances is rather like putting to sea in a row boat to find out whether conditions are sufficiently calm for an ocean liner to leave port.

Normality of residuals - Visual plots from PROC GLM or REG are best. PROC UNIVARIATE does provide various tests for deviation of a dataset from normality, but all of them proceed from a null hypothesis of normality, so can only be considered approximate. PROC NPAR1WAY has Kolmogorov-Smirnov goodness of fit testing.

Linearity - check the F test for deviation from linearity, if you have replication at points. If it is a fit to a set of non-replicated values, try polynomials. Non-linearity in residual plots from any of the PROC's is another clue.

Logistic: Linearity between IV's and logit transformed response - Again I would look for tests for deviation from linearity. or graphic presentations

Both: Multicollinearity does not depend on the response variable, so PROC REG is a good tool.

Independence of errors - residual plots

Influential outliers - leverage plots

The latter two are harder to come by, but PROC REG and PROC LOGISTIC have plots that fall in this category.

SteveDenham

Zard · Posted 10-18-2021 04:03 PM

The PROC GLM (and PROC REG) approach to inference is model-based. You rely on the appropriateness of the model when evaluating your tests. That reliance includes the model components, the variance specification, and the distributional assumptions of the model. GLM is assuming independent random sampling from a normal distribution, constant residual variance, and an infinite population. The concept of iid errors is inherent to ordinary least squares regression.

The SURVEYREG approach to inference is design-based. Expectations and variances are derived with respect to the probability sampling mechanism of the design. The design underlying the SURVEYREG code is probability-based random sampling from a finite population. There is not an assumption that the errors are normal or iid.

SteveDenham · Posted 10-19-2021 10:55 AM

@Zard that is great information. Thank you for bring it to the front of the discussion.

SteveDenham

Testing model assumptions using PROC SURVEYLOGISTIC and SURVEYREG