BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
David_M
Fluorite | Level 6

Ok, thanks for the clarification. So will PLS also generate regression coefficients that will map my 35 variables to the Y response in its model?

PaigeMiller
Diamond | Level 26

@David_M wrote:

Ok, thanks for the clarification. So will PLS also generate regression coefficients that will map my 35 variables to the Y response in its model?


Yes

--
Paige Miller
David_M
Fluorite | Level 6

So ... is it safe to say those predictors with very small or minute coefficients do not contribute much to the model, hence they could be potential candidates for 'elimination'?

Season
Barite | Level 11

Yes (It seems that Paige has not responded to your question. I am not sure of the underlying reason, but I will respond your question anyway in case you wait too long).

By the way, the message I would like to add is that PROC PLS is suitable for building PLS linear models. In other words, the dependent variables of the model have to be continuous if you wish to apply PROC PLS. On the other hand, you cannot build PLS logistic regression models with PROC PLS. Other software packages have to be employed in this case.

David_M
Fluorite | Level 6

Thank you for this clarification and now I'm a bit confused. If PLS only works with continuous response variables (SAS help is about as clear as mud on this matter), then are these my only options for collinearity reduction amongst mixed variables?

 

1. Use PROC REG with the CLASS statement that contains my categorical variables, to perform VIF or GVIF analysis. Then eliminate those variables (as @Ksharp had suggested) with a VIF score > 5 (my threshold for collinearity for now), then continue with PROC LOGISTIC for ordinal regression with the new uncorrelated variables.

 

2. Use PROC PCA with the risk of ending up with a reduced set of uninterpretable components.

 

3. What do you suggest is best way to reduce multicollinearity amongst 35 confounders (categorical and continuous variables) to improve a logistic regression model of 1 ordinal outcome and 1 ordinal predictor?

 

BTW, all my data is from a 400 respondent survey, if that helps. Not sure sas survey specific functions help much here.

Season
Barite | Level 11

OK, so now that your problem concerns survey data analysis, things are much more complicated than I anticipate.

We all did not know that your data involved survey data analysis, the appearance of which challenges the plausibility of all our previous suggestions. Anyway, there has been R packages designed for testing for collinearity in complex survey data analysis, like CRAN: Package svydiags and svycollinear: Condition indexes and variance decompositions in general... in svydiags: Regression Mo.... On the other hand, methods like Variable selection with LASSO regression for complex survey data - Iparragirre - 2023 - Stat - Wiley... could be used for dealing with collinearity.

David_M
Fluorite | Level 6

Thank you for your suggestions. Yes, this is a matched paired survey from a longitudinal study of 400 healthcare workers whose characteristics I'm trying to analyze at two time points only, pre- and post-Covid. The characteristics are Job Satisfaction (JS), Intention to Leave the workplace (ITL) and Respect/Civility at Work (RW), which are in the form of Likert scales (1 - 4 and 1 - 5). I want to develop 2 models by performing a standard logistic regression between JS and RW and another between ITL and RW at both pre and post-Covid time points, so 4 models total but similar analytical methods for all 4 models. The results, conclusions and implications for each model would be different.

 

As previously mentioned, I've identified 30+ mixed type categorical and continuous confounders that I want to add to my models but I need to eliminate or drastically reduce any correlation between them, hence preventing unstable regression estimates, inflated standard errors, etc. These confounders will be treated as standard predictors in my models. All 4 models will have and end up with these same set of de-correlated confounders.

 

Are there no SAS procedures or methods suitable for this task?

 

My apologies for not specifying at the outset that I was dealing with survey data. I didn't think it mattered if my confounders were survey based or not.

Season
Barite | Level 11

@David_M wrote:

My apologies for not specifying at the outset that I was dealing with survey data. I didn't think it mattered if my confounders were survey based or not.


Never mind. But please be clear from now on that complex survey data complicates everything concerning statistics. Every statistical method you employ have to be complex survey data-adapted ones, not the generic ones. In fact, I am not aware of any deterimental effect collinearity would have upon the parameter estimates of the regression coefficients in logistic regression, but I know that they are severly biased if you neglect the complex survey nature of your data. It would therefore be much better if you pointed out that your data originated from complex surveys in the first place the next time you raise a question on complex survey data analysis.


@David_M wrote:

Thank you for your suggestions. Yes, this is a matched paired survey from a longitudinal study of 400 healthcare workers whose characteristics I'm trying to analyze at two time points only, pre- and post-Covid. The characteristics are Job Satisfaction (JS), Intention to Leave the workplace (ITL) and Respect/Civility at Work (RW), which are in the form of Likert scales (1 - 4 and 1 - 5). I want to develop 2 models by performing a standard logistic regression between JS and RW and another between ITL and RW at both pre and post-Covid time points, so 4 models total but similar analytical methods for all 4 models. The results, conclusions and implications for each model would be different.

 

As previously mentioned, I've identified 30+ mixed type categorical and continuous confounders that I want to add to my models but I need to eliminate or drastically reduce any correlation between them, hence preventing unstable regression estimates, inflated standard errors, etc. These confounders will be treated as standard predictors in my models. All 4 models will have and end up with these same set of de-correlated confounders.

 

Are there no SAS procedures or methods suitable for this task?


Unfortunately, to the best of my knowledge, the answer is "yes", unless you learn the formulae and compile codes on your own in SAS.

By the way, complex survey data analysis is a field of statistics receiving not that much attention, so there are really few researchers working on how to conduct statistical diagnostics like detecting for collinearity and correcting them. In fact, as Taylor H. Lewis, author of the book Complex Survey Data Analysis with SAS | Taylor H. Lewis | Taylor & Francis noted, statistical diagnostics of models for complex survey data are "in its nascent stage". It is therefore highly likely that you encounter a particular problem in this field and discover that to date, there is no way of dealing with it.

David_M
Fluorite | Level 6

In general, can you explain to me why my simple survey a complex survey for SAS? I only have two groups of people with only 3 observations.

Season
Barite | Level 11

@David_M wrote:

In general, can you explain to me why my simple survey a complex survey for SAS? I only have two groups of people with only 3 observations.


First of all, whether a survey is a complex survey has little to do with how many observations a group have, whether a survey can be called "complex" is determined by its sampling scheme. Those with multi-stage sampling, stratification and clustering in the sampling scheme are called complex surveys. By the way, didn't you say in your last post that your dataset contained 400 observations? Why is it 3*2=6 now?

Second, whether a survey is complex itself matters little as statistical methods for complex surveys are readily suitable for surveys of all kinds of complexity. Therefore, whether your data is survey data is important, whether or not your survey data is complex is not important in terms of making decisions on the statistical methods to use.

David_M
Fluorite | Level 6

Yes, the whole survey has 400 observations but I'm only working with 3 main observations per time period plus confounders.

Ksharp
Super User
Nope. PROC LOGISTIC is totally unlike with PROC PLS.
You need to delete the variables by hand.
Season
Barite | Level 11

Take a look at this discussion, which is a question I raised about two years ago in this forum: Solved: How can I perform principal component analysis for logistic regression... - SAS Support Comm.... A reference I find extremely helpful is a SAS usage note named 32471 - Testing assumptions in logit, probit, Poisson and other generalized linear models. Special thanks are given to @StatDave for introducing me this note.

By the way, I agree with @PaigeMiller's remarks on collinearity: you cannot eliminate collinearity, you can only somehow manage to alleviate its impact on results.

sas-innovate-white.png

Missed SAS Innovate in Orlando?

Catch the best of SAS Innovate 2025 — anytime, anywhere. Stream powerful keynotes, real-world demos, and game-changing insights from the world’s leading data and AI minds.

 

Register now

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 27 replies
  • 1232 views
  • 15 likes
  • 4 in conversation