11-30-2014 02:47 PM
In design with one factor (with 3 levels) and multiple continuous variables, whether is statistically correct to use unstandardized coefficients to obtain values for canonical discriminant functions, and then use function that explains most of the variance as predictor in linear regression for further study?
11-30-2014 03:30 PM
Using the same observations to estimate factors and to assess the proportion of variance explained will systematically yield optimistic fit statistics. A better approach is to split your dataset randomly in two disjoint sets. You estimate factors with the first set and assess their efficiency with the second set.
11-30-2014 06:37 PM
Thank you for your professional assistance,
I will certainly do analysis with two randomly disjoint data sets, but there is another " problem " that gives me a bad headache.....
When i calculate scores for discrim. functions via MANOVA, i get two stat. signific. function, but 1st (DF) function explain 92% of variance...
A priori regression analysis, i was plotted Y-variable against isolated 1st DF, and i get some exponential relationship..i don't know how to set up equation for that relationship (picture 1 in attachment)
and how to interpret regress. results
On the other hand, i tried to get one component (from 3 original variable) via PCA, and 84 % of variance was extracted from them.. (i don't know whether is this correct way of using PCA in order to isolate only one component)
Plotting Y variable against PCA, i get "quadratic" relationship (picure 2)..this is now much easier for me to work with,.....
How to do everything stat. correctly, but not to complicate due to theoretical basis of the scientific field and analysis objective?
12-01-2014 10:48 AM
Data reduction leads to find components/variables which are orthogonal to each other that helps in introducing more stable coefficients. In the above analysis it seems 3 original variables are highly correlated and only first PC explains 84% of the total variation. I would suggest try to plot Y variable with each of the 3 original variables and see if you can find some linear relationship to build a regression model with only one of the original variables.
12-01-2014 01:31 PM
Thank you for your help..
(I will try to concretely describe the problem)
Yes, those 3 variables are mutually highly correlated (r = 0,70 to max. 0,85), because they describe 3 different colour characteristics (L*, a*, b*) change in dependence of fish fillets salting time (2 hours, 4 and 6 hours)..
We know that in system with high salt concentration, muscle tissue will release water, and "absorb" salt, and there is colour changes - those 3 colour variable are also in high correlation with water and salt content in fish muscle during process...
(In this content, i will not describe other measured Y variables that are related to study)
Changes in water & salt content are most important physicochemical characteristic that allow us to "monitor" microbiological stability of these products, so that will be safe for human consumption..
My goal of stat. analysis is to predict water & salt content from 3 colour variable, or to be more precise, to introduce possibility for industrial usage...I was first tried used a canonical correlation analysis, then PLS and Correlated Component Regression (in XLSTAT)
After consulting with XLSTAT staff, they told me: "Unfortunately, the PLSR and CCR are not solutions for residual heteroscedasticity and/or autocorrelation of residuals", which are issues in my reg. models..
After that, i was tried to use discrim. function (from MANOVA) that combine those 3 colour variable, and use this function as predictor...or (don't know) maybe using a neural network model will resolve problem in the best way with max. predicting capability ??