Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Home
- /
- Analytics
- /
- Stat Procs
- /
- How to do multivariate analysis in SAS (proc logistic)

Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Posted 05-26-2011 08:38 AM
(2815 views)

I've been reading about multivariate analysis and proc logistic, and although there are some online descriptions of multivariate analysis there are few that describe how to do it in SAS. I need something that takes me step by step through the output to determine what adjustments I need to make (i.e. When to exclude a given independent variable).

From what I've read and been told, it's my interpretation that if the p-value of any independent variable is above .25, I should exclude the variable with the highest p-value until all p-values are are below .25. Is that a standard and accepted approach?

Any help is greatly appreciated.

Thanks.

From what I've read and been told, it's my interpretation that if the p-value of any independent variable is above .25, I should exclude the variable with the highest p-value until all p-values are are below .25. Is that a standard and accepted approach?

Any help is greatly appreciated.

Thanks.

4 REPLIES 4

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

Hard to answer any of this without a more detailed description of what your predictors are and what your dependent variables are, and what you hope to learn from this analysis.

Also, based on my understanding of the word "multivariate", PROC LOGISTIC does not do multivariate analyses. To me, multivariate means multiple response variables, analyzed with respect to their joint (correlated) distributions. Maybe you are using this word to mean something than what I think it means? Message was edited by: Paige

Also, based on my understanding of the word "multivariate", PROC LOGISTIC does not do multivariate analyses. To me, multivariate means multiple response variables, analyzed with respect to their joint (correlated) distributions. Maybe you are using this word to mean something than what I think it means? Message was edited by: Paige

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

I'm probably using the work multivariate incorrectly.

This is the code I wrote to test the relationship of some binary (1=Yes, 2=No) independent variables on the dependent variable BreastFeeding (binary as well).

proc logistic data=nbscrBirthVars;

class NoCollege (ref="1") cesarean (ref="1") PreTerm (ref="1") LBW (ref="1") NICU (ref="1") TenStep (ref="1")/ param=ref;

model BreastFeeding (event="2")= NoCollege cesarean PreTerm LBW NICU TenStep;

run;

The output is below. So, my understanding is that I would remove Macrosomia from the model because the Pr > Chisq in the Type 3 analysis is greater than 0.25 (0.6956). Is that the standard way of determining what to remove?

Thanks.

The LOGISTIC Procedure

Model Information

Data Set WORK.NBSCRBIRTHVARS

Response Variable FormulaSupp

Number of Response Levels 2

Model binary logit

Optimization Technique Fisher's scoring

Number of Observations Read 106701

Number of Observations Used 99826

Response Profile

Ordered Formula Total

Value Supp Frequency

1 1 18503

2 2 81323

Probability modeled is FormulaSupp=2.

NOTE: 6875 observations were deleted due to missing values for the response or explanatory variables.

Class Level Information

Design

Class Value Variables

NoCollege 1 0

2 1

cesarean 1 0

2 1

PreTerm 1 0

2 1

LBW 1 0

2 1

NICU 1 0

2 1

Macrosomia 1 0

2 1

TenStep 1 0

2 1

------------------------------------------------------------------------------------------------------

The LOGISTIC Procedure

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept

Intercept and

Criterion Only Covariates

AIC 95717.853 93430.154

SC 95727.364 93506.243

-2 Log L 95715.853 93414.154

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 2301.6993 7 <.0001

Score 2338.5007 7 <.0001

Wald 2265.3540 7 <.0001

Type 3 Analysis of Effects

Wald

Effect DF Chi-Square Pr > ChiSq

NoCollege 1 462.7169 <.0001

cesarean 1 47.0002 <.0001

PreTerm 1 13.8791 0.0002

LBW 1 3.6452 0.0562

NICU 1 229.8353 <.0001

Macrosomia 1 0.1531 0.6956

TenStep 1 1166.5014 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 0.7874 0.0964 66.7686 <.0001

NoCollege 2 1 0.3688 0.0171 462.7169 <.0001

cesarean 2 1 0.1185 0.0173 47.0002 <.0001

PreTerm 2 1 0.1106 0.0297 13.8791 0.0002

LBW 2 1 0.0706 0.0370 3.6452 0.0562

NICU 2 1 0.5175 0.0341 229.8353 <.0001

Macrosomia 2 1 0.0348 0.0890 0.1531 0.6956

TenStep 2 1 -0.5757 0.0169 1166.5014 <.0001

------------------------------------------------------------------------------------------------------

9

The LOGISTIC Procedure

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

NoCollege 2 vs 1 1.446 1.398 1.495

cesarean 2 vs 1 1.126 1.088 1.165

PreTerm 2 vs 1 1.117 1.054 1.184

LBW 2 vs 1 1.073 0.998 1.154

NICU 2 vs 1 1.678 1.569 1.794

Macrosomia 2 vs 1 1.035 0.870 1.233

TenStep 2 vs 1 0.562 0.544 0.581

Association of Predicted Probabilities and Observed Responses

Percent Concordant 56.6 Somers' D 0.232

Percent Discordant 33.3 Gamma 0.259

Percent Tied 10.1 Tau-a 0.070

Pairs 1504719469 c 0.616

This is the code I wrote to test the relationship of some binary (1=Yes, 2=No) independent variables on the dependent variable BreastFeeding (binary as well).

proc logistic data=nbscrBirthVars;

class NoCollege (ref="1") cesarean (ref="1") PreTerm (ref="1") LBW (ref="1") NICU (ref="1") TenStep (ref="1")/ param=ref;

model BreastFeeding (event="2")= NoCollege cesarean PreTerm LBW NICU TenStep;

run;

The output is below. So, my understanding is that I would remove Macrosomia from the model because the Pr > Chisq in the Type 3 analysis is greater than 0.25 (0.6956). Is that the standard way of determining what to remove?

Thanks.

The LOGISTIC Procedure

Model Information

Data Set WORK.NBSCRBIRTHVARS

Response Variable FormulaSupp

Number of Response Levels 2

Model binary logit

Optimization Technique Fisher's scoring

Number of Observations Read 106701

Number of Observations Used 99826

Response Profile

Ordered Formula Total

Value Supp Frequency

1 1 18503

2 2 81323

Probability modeled is FormulaSupp=2.

NOTE: 6875 observations were deleted due to missing values for the response or explanatory variables.

Class Level Information

Design

Class Value Variables

NoCollege 1 0

2 1

cesarean 1 0

2 1

PreTerm 1 0

2 1

LBW 1 0

2 1

NICU 1 0

2 1

Macrosomia 1 0

2 1

TenStep 1 0

2 1

------------------------------------------------------------------------------------------------------

The LOGISTIC Procedure

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Model Fit Statistics

Intercept

Intercept and

Criterion Only Covariates

AIC 95717.853 93430.154

SC 95727.364 93506.243

-2 Log L 95715.853 93414.154

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 2301.6993 7 <.0001

Score 2338.5007 7 <.0001

Wald 2265.3540 7 <.0001

Type 3 Analysis of Effects

Wald

Effect DF Chi-Square Pr > ChiSq

NoCollege 1 462.7169 <.0001

cesarean 1 47.0002 <.0001

PreTerm 1 13.8791 0.0002

LBW 1 3.6452 0.0562

NICU 1 229.8353 <.0001

Macrosomia 1 0.1531 0.6956

TenStep 1 1166.5014 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 0.7874 0.0964 66.7686 <.0001

NoCollege 2 1 0.3688 0.0171 462.7169 <.0001

cesarean 2 1 0.1185 0.0173 47.0002 <.0001

PreTerm 2 1 0.1106 0.0297 13.8791 0.0002

LBW 2 1 0.0706 0.0370 3.6452 0.0562

NICU 2 1 0.5175 0.0341 229.8353 <.0001

Macrosomia 2 1 0.0348 0.0890 0.1531 0.6956

TenStep 2 1 -0.5757 0.0169 1166.5014 <.0001

------------------------------------------------------------------------------------------------------

9

The LOGISTIC Procedure

Odds Ratio Estimates

Point 95% Wald

Effect Estimate Confidence Limits

NoCollege 2 vs 1 1.446 1.398 1.495

cesarean 2 vs 1 1.126 1.088 1.165

PreTerm 2 vs 1 1.117 1.054 1.184

LBW 2 vs 1 1.073 0.998 1.154

NICU 2 vs 1 1.678 1.569 1.794

Macrosomia 2 vs 1 1.035 0.870 1.233

TenStep 2 vs 1 0.562 0.544 0.581

Association of Predicted Probabilities and Observed Responses

Percent Concordant 56.6 Somers' D 0.232

Percent Discordant 33.3 Gamma 0.259

Percent Tied 10.1 Tau-a 0.070

Pairs 1504719469 c 0.616

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

While I am not familiar with the advice to use 0.25 as your cutoff, I would use 0.05 as the cutoff. In any event, it seems reasonable to remove Macrosomia from the model.

- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content

There are many stepwise variable-selection options in proc logistic. Check out the documentation for the model statement. But note: one should be cautious with all of these methods. Use them as an exploratory guide, not as a final model-selection method.Model selection (i.e., variable selection in a model) is a complex endeavor.

Are you ready for the spotlight? We're accepting content ideas for **SAS Innovate 2025** to be held May 6-9 in Orlando, FL. The call is **open **until September 16. Read more here about **why** you should contribute and **what is in it** for you!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.