BookmarkSubscribeRSS Feed
Tommy1201
Calcite | Level 5

Hello everyone,

In design with one factor (with 3 levels) and multiple continuous variables, whether is statistically correct to use unstandardized coefficients to obtain values for canonical ​​discriminant functions, and then use function that explains most of the variance as predictor in linear regression for further study?

Tnx.,

Tomislav

4 REPLIES 4
PGStats
Opal | Level 21

Using the same observations to estimate factors and to assess the proportion of variance explained will systematically yield optimistic fit statistics. A better approach is to split your dataset randomly in two disjoint sets. You estimate factors with the first set and assess their efficiency with the second set.

PG

PG
Tommy1201
Calcite | Level 5

Thank you for your professional assistance,

I will certainly do analysis with two randomly disjoint data sets, but there is another " problem " that gives me a bad headache.....

When i calculate scores for discrim. functions via MANOVA, i get two stat. signific. function, but 1st (DF) function explain 92% of variance...

A priori regression analysis, i was plotted Y-variable against isolated 1st DF, and i get some exponential relationship..i don't know how to set up equation for that relationship (picture 1 in attachment)

and how to interpret regress. results

On the other hand, i tried to get one component (from 3 original variable) via PCA, and 84 % of variance was extracted from them.. (i don't know whether is this correct way of using PCA in order to isolate only one component)

Plotting Y variable against PCA, i get "quadratic" relationship (picure 2)..this is now much easier for me to work with,.....

How to do everything stat. correctly, but not to complicate due to theoretical basis of the scientific field and analysis objective?

Tnx,

Tomislav


picture2.jpgpicture1.jpg
stat_sas
Ammonite | Level 13

Hi,

Data reduction leads to find components/variables which are orthogonal to each other that helps in introducing more stable coefficients. In the above analysis it seems 3 original variables are highly correlated and only first PC explains 84% of the total variation. I would suggest try to plot Y variable with each of the 3 original variables and see if you can find some linear relationship to build a regression model with only one of the original variables.

Tommy1201
Calcite | Level 5

Hi,

Thank you for your help..

(I will try to concretely describe the problem)

Yes, those 3 variables are mutually highly correlated (r = 0,70 to max. 0,85), because they describe 3 different colour characteristics (L*, a*, b*) change in dependence of fish fillets salting time (2 hours, 4 and 6 hours)..

We know that in system with high salt concentration, muscle tissue will release water, and "absorb" salt, and there is colour changes - those 3 colour variable are also in high correlation with water and salt content in fish muscle during process...

(In this content, i will not describe other measured Y variables that are related to study)

Changes in water & salt content are most important physicochemical characteristic that allow us to "monitor" microbiological stability of these products, so that will be safe for human consumption..

My goal of stat. analysis is to predict water & salt content from 3 colour variable, or to be more precise, to introduce possibility for industrial usage...I was first tried used a canonical correlation analysis, then PLS and Correlated Component Regression (in XLSTAT)

After consulting with XLSTAT staff, they told me: "Unfortunately, the PLSR and CCR are not solutions for residual heteroscedasticity and/or autocorrelation of residuals", which are issues in my reg. models..

After that, i was tried to use discrim. function (from MANOVA) that combine those 3 colour variable, and use this function as predictor...or (don't know) maybe using a neural network model will resolve problem in the best way with max. predicting capability ??

Tnx,

Tomislav

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 916 views
  • 1 like
  • 3 in conversation