BookmarkSubscribeRSS Feed
Tommy1201
Calcite | Level 5

Hello Everyone,

I would be very grateful if someone could help me about conducting (PLS) regression analysis in 3x2x2 factorial design (for my pH.D.)

I understand the use of regression methods when they are applied to simple designs (i.e, one "true" categorical variable (with few levels) and several continuous predictors),

but in my situation (picture attached), it's giving me a big headache ...the question is, what is the best way to develop model to predict (Y7, Y8, Y9) from (Y1, Y2, Y3, Y4, Y5, Y6) which will contain all combinations of categorical variables (x1,x2,x3), as is done in the attached paper...


and finaly, what is SAS command to do this??


thank you..


data.jpg
13 REPLIES 13
SteveDenham
Jade | Level 19

The relationship between the Y variables can be easily done with PROC PLS.  Example 70.1 in the documentation is a good starting point.

Think of the X variables as defining a response surface on which Y1 through Y9 are measured.  You have prior knowledge that Y1 through Y6 will help predict Y7 through Y9.  The X variables are codes for Y1 through Y6, and shouldn't enter into the analysis (just my opinion).

However, have you looked at response surface regression of the Y variables on the design (X) variables, especially of Y1 through Y6?  If there is an extremely strong relationship of Y1 through Y6 on the design, you may be better off looking at a multivariate response of Y7, Y8, Y9  on the design variables.

Steve Denham

Tommy1201
Calcite | Level 5

Thank you Steve for the generous help..

Response surface regression is often used in my area of science, especially in research involving for example the effect of pressure, temperature on certain properties of the substances...in my case, X2 variable is pure categorical (1 = yes, 2 = no), while x1 and x3 are continuous in nature...


If I'm not mistaken, I can use this method to predict Y7-Y9 from Y1-Y6 but only through X1 and X3 .... what do you think, whether it would be better to use the PLS regression, but for each level of X1,X2,X3 separately??


Tnx

SteveDenham
Jade | Level 19

I would not do it separately.

I am a bit confused, though, about the X variables.  You say X1 and X3 are continuous.  However, you only use fixed values, so those values are categorical/ordinal, else you would not have a 3x2x2 factorial.

PROC PLS works best when both the LHS and RHS variables are continuous, and would be appropriate for exploring the relationships between the Y variables.  To look at the X - Y relationship, my approach would be a multivariate ANOVA, something like:

proc glm;

class X1 X2 X3;

model Y1--Y6=X1|X2|X3;

manova h=_all_/printe printh;

run;

Does this make sense?

Steve Denham


PaigeMiller
Diamond | Level 26

Some things don't add up here

If you have a 3x2x2 factorial design, how can you have 6 different independent variables (Y1-Y6)?

If you have any factorial design, there really is no reason to use PLS, because ANOVA/ANCOVA and/or Regression with multiple dependent variables is a superior method (although if you use enough dimensions in PLS, the predicted values will be equivalent)

--
Paige Miller
SteveDenham
Jade | Level 19

I think the design has twelve factors (3x2x2) described by the X variables.  The experiment was run and Y1 through Y9 were measured.  The OP is now looking to predict Y7 through Y9 from the Y1 through Y6 values.  That does fit PLS.

It would really help to have definitions of all of these variables, and a reason to believe that Y7 through Y9 are functionally dependent on the other Y variables.

Steve Denham

PaigeMiller
Diamond | Level 26

A "3x2x2 Factorial Design", according to the original post, is 12 runs, not 12 factors. An orthogonal experiment (if that's what it is, and it certainly sounds like one to me) would not be a candidate for PLS.

--
Paige Miller
SteveDenham
Jade | Level 19

I agree if the interest were in predicting Y7 through Y9 from the design variables.  But I don't think that is what was asked.  Consider the design variables as a mechanism to produce independent variables Y1 through Y6.  That is what the LWT paper did, in essence.

Steve Denham

PaigeMiller
Diamond | Level 26

In the cited paper, Y1 through Y3 are the original variables from a 3x2x2 factorial, Y4 through Y6 are two-way interactions of the original variables. All of this is handled easily in PROC GLM as a 3-factor experiment with main effects and two-way interactions, and multiple response variables.

I fail to see any benefit of using PLS here. PLS would create new predictor variables which are linear combinations of Y1 through Y6 (i.e. linear combinations of mean effects and two-way interactions). How would you interpret those, compared to interpreting an ordinary least squares model containing main effects and two-way interactions? In my opinion, the cited paper is a great example of an author choosing a popular technique (PLS) that is completely inappropriate and getting an article published, even though the technique adds more confusion to the analysis than would ordinary least squares regression.

--
Paige Miller
SteveDenham
Jade | Level 19

Looking at this study in analogy to the LWT paper, I see:

Design variables (here the X variables, in the LWT paper 3 factors defined in section 2.1)

A set of dependent variables (here Y1 through Y6, in the LWT paper the TBARS variables)

A second set of dependent variables (here Y7 through Y9, in the LWT paper the volatile compound concentrations)

The relationship between the dependent variables and the design variables has one type of analysis (see my recommendation for PROC GLM, in the LWT paper, a MANOVA).

And finally, an examination of the relationship between the first set of dependent variables and the second set.  In the LWT paper, PLS is used (Unscrambler 8.0.5), and here I propose PROC PLS, with the caveat that the first set of variables somehow be a functional progenitor of the second set..

I find this an interesting approach, as the relationship between these variables may be of greater interest than the relationship of either to the design variables.  The design is used to set up the system to obtain values that are nearly impossible to design into a biological system.

Steve Denham

Tommy1201
Calcite | Level 5

Thank you both for your help and constructive criticism ..

Instead MANOVA in first part, I used a three-way ANOVA because small sample size per cell, and the inability to test homogeneity of variance-covariance matrices...

(I was thinking before about using factorial MANOVA on principal components, but parallel analysis tells me (for physico-chemical properties) that I can extract only one component ...

but when I use the Kaiser-Guttman 'Eigenvalues ​​greater than one" rule, the PCA extracted two components that are with the use of non orthogonal rotation (direct oblim. and  promax) in a very weak correlation required for MANOVA)

For examination of the relationship between the sets of dependent variables..I do not know, might not be a bad idea (instead pls) to use a structural equation modelling techniques and

mediation model..


Model(s).jpg
Tommy1201
Calcite | Level 5

Once again, i need your professional advice....

Does structural equation modelling approach might be a better solution for examination of the relationship between the sets of dependent variables (attached picture in previous post) in 3x2x2 factorial desing, in relation to previous comments about pls regression?

Tnx.

SteveDenham
Jade | Level 19

Maybe.  What I know about structural equation modeling can be written on a grain of rice with a paint roller, so I am afraid I will have to bow out on this one.

Steve Denham

Tommy1201
Calcite | Level 5

Anyway.. i'm very appreciate for your help Steve..tnx..

Tomislav

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 13 replies
  • 2729 views
  • 3 likes
  • 3 in conversation