BookmarkSubscribeRSS Feed
RyanD
Fluorite | Level 6
I've been reading about multivariate analysis and proc logistic, and although there are some online descriptions of multivariate analysis there are few that describe how to do it in SAS. I need something that takes me step by step through the output to determine what adjustments I need to make (i.e. When to exclude a given independent variable).

From what I've read and been told, it's my interpretation that if the p-value of any independent variable is above .25, I should exclude the variable with the highest p-value until all p-values are are below .25. Is that a standard and accepted approach?

Any help is greatly appreciated.

Thanks.
4 REPLIES 4
Paige
Quartz | Level 8
Hard to answer any of this without a more detailed description of what your predictors are and what your dependent variables are, and what you hope to learn from this analysis.

Also, based on my understanding of the word "multivariate", PROC LOGISTIC does not do multivariate analyses. To me, multivariate means multiple response variables, analyzed with respect to their joint (correlated) distributions. Maybe you are using this word to mean something than what I think it means? Message was edited by: Paige
RyanD
Fluorite | Level 6
I'm probably using the work multivariate incorrectly.

This is the code I wrote to test the relationship of some binary (1=Yes, 2=No) independent variables on the dependent variable BreastFeeding (binary as well).

proc logistic data=nbscrBirthVars;
class NoCollege (ref="1") cesarean (ref="1") PreTerm (ref="1") LBW (ref="1") NICU (ref="1") TenStep (ref="1")/ param=ref;
model BreastFeeding (event="2")= NoCollege cesarean PreTerm LBW NICU TenStep;
run;


The output is below. So, my understanding is that I would remove Macrosomia from the model because the Pr > Chisq in the Type 3 analysis is greater than 0.25 (0.6956). Is that the standard way of determining what to remove?

Thanks.



The LOGISTIC Procedure

Model Information

Data Set WORK.NBSCRBIRTHVARS
Response Variable FormulaSupp
Number of Response Levels 2
Model binary logit
Optimization Technique Fisher's scoring


Number of Observations Read 106701
Number of Observations Used 99826


Response Profile

Ordered Formula Total
Value Supp Frequency

1 1 18503
2 2 81323

Probability modeled is FormulaSupp=2.

NOTE: 6875 observations were deleted due to missing values for the response or explanatory variables.


Class Level Information

Design
Class Value Variables

NoCollege 1 0
2 1

cesarean 1 0
2 1

PreTerm 1 0
2 1

LBW 1 0
2 1

NICU 1 0
2 1

Macrosomia 1 0
2 1

TenStep 1 0
2 1

------------------------------------------------------------------------------------------------------
The LOGISTIC Procedure

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.


Model Fit Statistics

Intercept
Intercept and
Criterion Only Covariates

AIC 95717.853 93430.154
SC 95727.364 93506.243
-2 Log L 95715.853 93414.154


Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 2301.6993 7 <.0001
Score 2338.5007 7 <.0001
Wald 2265.3540 7 <.0001


Type 3 Analysis of Effects

Wald
Effect DF Chi-Square Pr > ChiSq

NoCollege 1 462.7169 <.0001
cesarean 1 47.0002 <.0001
PreTerm 1 13.8791 0.0002
LBW 1 3.6452 0.0562
NICU 1 229.8353 <.0001
Macrosomia 1 0.1531 0.6956
TenStep 1 1166.5014 <.0001


Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 0.7874 0.0964 66.7686 <.0001
NoCollege 2 1 0.3688 0.0171 462.7169 <.0001
cesarean 2 1 0.1185 0.0173 47.0002 <.0001
PreTerm 2 1 0.1106 0.0297 13.8791 0.0002
LBW 2 1 0.0706 0.0370 3.6452 0.0562
NICU 2 1 0.5175 0.0341 229.8353 <.0001
Macrosomia 2 1 0.0348 0.0890 0.1531 0.6956
TenStep 2 1 -0.5757 0.0169 1166.5014 <.0001

------------------------------------------------------------------------------------------------------
9

The LOGISTIC Procedure

Odds Ratio Estimates

Point 95% Wald
Effect Estimate Confidence Limits

NoCollege 2 vs 1 1.446 1.398 1.495
cesarean 2 vs 1 1.126 1.088 1.165
PreTerm 2 vs 1 1.117 1.054 1.184
LBW 2 vs 1 1.073 0.998 1.154
NICU 2 vs 1 1.678 1.569 1.794
Macrosomia 2 vs 1 1.035 0.870 1.233
TenStep 2 vs 1 0.562 0.544 0.581


Association of Predicted Probabilities and Observed Responses

Percent Concordant 56.6 Somers' D 0.232
Percent Discordant 33.3 Gamma 0.259
Percent Tied 10.1 Tau-a 0.070
Pairs 1504719469 c 0.616
Paige
Quartz | Level 8
While I am not familiar with the advice to use 0.25 as your cutoff, I would use 0.05 as the cutoff. In any event, it seems reasonable to remove Macrosomia from the model.
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
There are many stepwise variable-selection options in proc logistic. Check out the documentation for the model statement. But note: one should be cautious with all of these methods. Use them as an exploratory guide, not as a final model-selection method.Model selection (i.e., variable selection in a model) is a complex endeavor.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 2546 views
  • 0 likes
  • 3 in conversation