turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- help with trimming independent variables in a conf...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-14-2013 04:41 PM

Hi,

I have a relative confusing experiment on trying to quantify fitness level of an individual based on different types of exercises.

50 subjects total, each subject is categorized into 3 groups (control/mov/fit) and did 20 different types of exercise (vertical jump, hand grip strength, ....) to be used to explain the variables in experiment 2.

once each subject have completed the 20 initial exercises, they proceed to experiment 2 where each subject will perform 8 different tasks (symmetric pull, asymmetric pull,....) and 3 spine measurements (compression and 2 shears) are recorded. The I want to ask is how does the 20 exercises explain the 3 spine measurements?

I'm unsure as to what methods to use? I've been told PCA (not sure what this is), multiple regression (stepwise), or multivariate regression (gets tricky with multi-level (tasks)) or maybe other methods?!? I'm not even sure if the number of the 20 IV (exercises) differs between tasks for the DV (compression, shears). For example, for symmetric pull, there maybe 7 exercises that explains the 3 DV well meanwhile for asymmetric pull, there might be only 3 exercises. Or maybe there are 6 exercises that explains the 3 DV for all 8 tasks.

any suggestion on how to tackle this problem will be greatly appreciated.

attached is a sample data

thanks.

ming

Accepted Solutions

Solution

08-22-2013
04:59 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-22-2013 04:59 PM

Your PROC GLM code looks alright, except that I've forgottend what the variable, GROUP, in the PROC GLM CLASS statement referes to. For each of the tasks within each king of exercise, this PROC GLM code will perform multivariate analysis of variance (MANOVA).

To have all the exercises included together, I'd change the syntax of the PROC GLM "paragraph" to the following:

proc glm data=long;

class task exercise;

model std_comp std_shr1 std_shr2 = task(exercise) / solution;

manova h=_all_ / printe printh summary;

run;

quit;

This will perform MANOVAs for tasks nested within each exercise. This is equivalent to having an interaction term, TASK*EXERCISE, without any "main-effect" terms for TASK and EXERCISE. This

so-called "cell-means" model assumes that the "effect" of a specific TASK on the three dependent variables may differ by the kind of EXERCISE and that the "effect" of a specific EXERCISE on the three dependent variables may differ by the kind of TASK.

An alternative model would include main-effect independent variable terms for TASK and EXERCISE in the MODEL statement as well as their interaction. Then, the interaction term, TASK*EXERCISE, would be interpreted as showing additional effects on the three dependent variables after the effects of the main-effect terms were accounted for:

model std_comp std_shr1 std_shr2=task exercise task*exercise / solution;

However, the original description of your study seemed to fit the nested cell-means model more closely than this latter "saturated" model.

You could do doubly multivariate repeated measures using proc mixed. I don't know why you received the error message about non-convergence because there are several reasons why this may have happened (see the PROC MIXED documentation. For such an analysis, you do not need to standardize the dependent variables.

I would change your SAS syntax to the following:

* For each king of exercise and each dependent variable, creata a separate dependent variable;

* with the same name, RESPONSE, but indexed by the variable, VAR.;

data long;

length var $ 12;

set wide;

array e{20} exerc1-exerc20;

array dv{3} comp shear1 shear2;

array varlist{3} $ 12 _temporary_ ("comp" "shear1" "shear2");

do i=1 to 20;

exercise=e{i};

do j=1 to 3;

response=dv{j};

var=varlist{j};

output long;

end;

end;

drop i j comp shear1 shear2 exerc1-exerc20;

run;

proc sort data=long;

by subject exercise task var;

run;

proc mixed data=long;

class subject exercise task var;

model response = var task exercise var*task var*exercise task*exercise var*task*exercise

/ solution ddfm=kenwardroger;

repeated var task / type=un@un subject=subject rcorr;

run;

quit;

This saturated proc mixed model may not converge either. You may have to reduce the number of independent variable terms and to simplify the variance-covariance structure.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-17-2013 03:40 PM

If the three dependent variables are in the same units, first consider a multivariate analysis of variance (MANOVA) on these three dependent variables of the eight different tasks by each of the twenty exercises using PROC GLM (see the documentation): proc sort; by exercise; run; proc glm; by exercise; class task; model compression shear1 shear2=task / solution; manova h=_all_ / printe printh summary; run; quit; This would look at the "effects" of the eight tasks on the three dependent variables, taking into account the correlation among these dependent variables, for each of the twenty exercises taken separately. You can specify, using PROC FORMAT, the task that is the "reference" category among all the eight tasks. For example, to specify the "asymmetric pull" as the reference category of tasks, use the following (or similar syntax): proc format; value taskfm 1="Symmetric pull" 2="} Asymmetric pull" etc. 8="Last task"; run; Then use the option, ORDER=FORMATTED, on the PROC GLM statement, and add the following statement, format task taskfm.; to the PROC GLM "paragraph". Another possibility would be to "nest" task within exercise instead of looking at the effects of the tasks on the dependent variables separately. This nesting assumes that each combination of test and exercise (up to 160 of them) are considered separately and similarly: proc glm order=formatted; class task exercise; model compression shear1 shear2=task(exercise) / solution; manova h=_all_ / printe printh summary; format task taskfm. exercise exercisef.; run; quit; By the way, PCA is the abbreviation for principal components analysis, which tries to identify linear combinations of numerical, interval-ratio level variables that best "explain" the combined variance of these variables. PCA attempts to reduce the number of variables by using fewer principal components as independent variables in your analysis. Since your independent variables, exercise and task, are nominal variables, PCA is not relevant for them. However, you could use PCA to develop a single component to represent your three, interval-ratio level dependent variables; whether this is worthwhile, you'd have to decide.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-19-2013 03:18 PM

Hi,

Thanks for the suggestion! just a question, if the 3 DVs does not have the same units (unit for compression is not the same as shear), does that mean I can not use MANOVA?

thanks.

ming

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-19-2013 03:58 PM

Unfortunately, you should use the same units when you use MANOVA. One of the advantages of MANOVA over that of separate ANOVAs is that MANOVA accounts for the correlation among related DVs. One subterfuge you can try is to create standardized values for your DVs by subtracting the mean DV from the individual value of the DV and dividing this difference by the standard deviation of the DV. Then, for all the DVs, these standardized DVs are unit-less with a mean of zero and a variance of 1. The problem with this "solution" is the interpretation of the results in terms of standardized units. An alternative solution is only to analyze multiple DVs with the same units using MANOVA and to analyze single DVs with different units using ANOVA.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-22-2013 01:42 PM

Hi,

So here is my code according to your first suggestion (manova with each of the exercise separately) with standardized DV (comp and 2 shear variables).

data wide;

input task$ subject$ group$ time$ height weight std_comp std_shr1 std_shr2 exerc1-exerc20;

data long; set long;

array e(20) exerc1-exerc20;

do i = 1 to 20;

exercise = i;

exerc = e(i);

output;

end;

drop i exerc1-exerc20;

proc sort data=long;

by exercise;

proc glm data=long; by exercise;

class task group;

model std_comp std_shr1 std_shr2 = task /solution;

manova h=_all_ /printe printh summary;

run;

quit;

does that look right? how do the same thing but have all the exercises be included together (ie. not do manova as separate for each exercise)? How about doing doubly multivariate repeated measures using proc mixed? I attempted doing that and I got an error message "did not converge" and stopped.

here is my proc mixed code: (I left the DVs as their usual values, not standardized) should I standardize them?

data wide;

input task$ subject$ group$ time$ height weight comp shear1 shear2 exerc1-exerc20;

data long; set wide;

length var $12.;

response = comp; var = 'comp'; output;

response = shear1; var = 'shear1'; output;

response = shear2; var = 'shear2'; output;

drop comp shear1 shear2;

proc mixed data=univariate;

class task var subject group;

model response = task group height weight exerc1-exerc20;

repeated var task / type=un@ar(1) subject=subject;

run;

quit;

thanks for your help!

ming

Solution

08-22-2013
04:59 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-22-2013 04:59 PM

Your PROC GLM code looks alright, except that I've forgottend what the variable, GROUP, in the PROC GLM CLASS statement referes to. For each of the tasks within each king of exercise, this PROC GLM code will perform multivariate analysis of variance (MANOVA).

To have all the exercises included together, I'd change the syntax of the PROC GLM "paragraph" to the following:

proc glm data=long;

class task exercise;

model std_comp std_shr1 std_shr2 = task(exercise) / solution;

manova h=_all_ / printe printh summary;

run;

quit;

This will perform MANOVAs for tasks nested within each exercise. This is equivalent to having an interaction term, TASK*EXERCISE, without any "main-effect" terms for TASK and EXERCISE. This

so-called "cell-means" model assumes that the "effect" of a specific TASK on the three dependent variables may differ by the kind of EXERCISE and that the "effect" of a specific EXERCISE on the three dependent variables may differ by the kind of TASK.

An alternative model would include main-effect independent variable terms for TASK and EXERCISE in the MODEL statement as well as their interaction. Then, the interaction term, TASK*EXERCISE, would be interpreted as showing additional effects on the three dependent variables after the effects of the main-effect terms were accounted for:

model std_comp std_shr1 std_shr2=task exercise task*exercise / solution;

However, the original description of your study seemed to fit the nested cell-means model more closely than this latter "saturated" model.

You could do doubly multivariate repeated measures using proc mixed. I don't know why you received the error message about non-convergence because there are several reasons why this may have happened (see the PROC MIXED documentation. For such an analysis, you do not need to standardize the dependent variables.

I would change your SAS syntax to the following:

* For each king of exercise and each dependent variable, creata a separate dependent variable;

* with the same name, RESPONSE, but indexed by the variable, VAR.;

data long;

length var $ 12;

set wide;

array e{20} exerc1-exerc20;

array dv{3} comp shear1 shear2;

array varlist{3} $ 12 _temporary_ ("comp" "shear1" "shear2");

do i=1 to 20;

exercise=e{i};

do j=1 to 3;

response=dv{j};

var=varlist{j};

output long;

end;

end;

drop i j comp shear1 shear2 exerc1-exerc20;

run;

proc sort data=long;

by subject exercise task var;

run;

proc mixed data=long;

class subject exercise task var;

model response = var task exercise var*task var*exercise task*exercise var*task*exercise

/ solution ddfm=kenwardroger;

repeated var task / type=un@un subject=subject rcorr;

run;

quit;

This saturated proc mixed model may not converge either. You may have to reduce the number of independent variable terms and to simplify the variance-covariance structure.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-23-2013 12:58 PM

Hi,

Thanks for the suggestion on the nested cell-means model. I think that is the proper model based on your description. The ANOVA table for each DVs (comp, shear1, shear2) showed p<.0001. Looking at the parameter estimates for each combination of task(exercise) for each DVs, there are some p>.05 and many p<.05. Is there a way to determine which exercise(s) can significantly explain the variance of the DVs? Since the parameter estimates shows the task(exercise) above and over the other IVs and F-test showed significance, doesn't that mean I need to have all IVs (task(exercise)) in the model? manova test results (wilki, pilal...) all showed p<.0001, so what does this mean? all task(exercise) are important in the model, so I can't trim down the 20 exercises to a smaller number?

thanks a lot for your help!

ming

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-24-2013 05:59 PM

The ANOVA models for each DV show that some TASK(EXERCISE) combinations statistically significantly predicted individual DVs. The significance probabilities for the parameter estimates for these combinations indicate which of these combinations statistically signifincantly predicted these DVs. These combinations are the likely ones to analyze further. However, it may be possible that some exercises "affected" some DVs only through some but not all tasks and that other exercises affected the same DVs through the same tasks or through other tasks. Thus, one exercise may not be better than another in predicting a specific DV. The R-squared statistic is printed with each ANOVA table. Sometimes the quantity, 1 - Wilks' lambda, is interpreted as a multivariate counterpart for the R-squared statistic: In this example, this quantity would "explain" the proportion of the variance of the three DVs "due to" the TASK(EXERCISE) combination. However, in small samples, this quantity may be biased and misleading. The statistically significant MANOVA results for the TEST-EXERCISE combinations indicate that the pattern of means of the three DVs differed statistically significantly across those specific TEST(EXERCISE) combinations. You might try plotting the means of these DVs by the different combinations to show where they might differ. The partial correlation matrix from the ERROR SSCP matrix shows the magnitude and the significance probability of the correlations between pairs of the dependent variables after accounting for the TEST(EXERCISE) combination. Since I have not seen your program's output, it may be that some exercises significantly predict some of your DVs only when combined with specific tasks but not other DVs and that other exercises significantly predict the same DVs when combined with either the same tasks or with other tasks. So, you may not be able to trim down the number of exercises.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2013 02:20 PM

Hi,

After sorting out the significant probabilities for the parameter estimates for these combination, the result showed these task(exercise) not significant for the DVs.

For comp, ASYM(all 20 exercises), CHOP(all 20 exercises), and SYMM(all 20 exercises)

For shear1, HOH(all 20 exercises) and SYMM(all 20 exercises)

For shear2, PUSH(all 20 exercises) and SYMM(all 20 exercises)

looks like it is very task dependent, for task SYMM, no exercises is useful to explain the variance in the 3 DVs and depending on DVs, different task(s) does not help in explaining the variance. So there is no eliminating the 20 exercises. Am I on the right track?

Here are the interaction plots of the 3 DVs. I'm not sure what to make out of these.

as for MANOVA results, the Wilki's Lamba is 0.03786, so this would mean that task(exercise) explains ~4% of variance of the 3 DVs. that's not very much.

thanks for your help!

ming

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2013 04:22 PM

Actually, the multivariate statistic corresponding to the R-squared statistic ("proportion of variance explained") is 1 - Wilk's lambda, so that the 180 [=9 tasks * 20 exercises] TASK(EXERCISE) combinations explain slightly more than 96% of the variability in the dependent variables. However, you are correct that it is difficult to understand what is going on with the results in this experiment. You are also correct that none of the exercises in the SYMM task appear to affect any of the three dependent variables. None of the exercises for a few of the other tasks appear to affect specific dependent variables. Can you generate interaction plots with the dependent variable on the Y-axis and the different exercises (instead of the tasks) on the X-axis? Perhaps such plots will provide a different perspective on the relationships between the TASK(EXERCISE) combinations and these dependent variables.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-26-2013 05:17 PM

The interaction plots with exercise on X-axis will give you all zeros for all exercise (ie. a horizontal line at 0) because the exercise doesn't change the DVs, it's the tasks that changes DVs.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-27-2013 01:31 PM

Hi,

I'm also trying the PROC MIXED doubly multivariate approach, if I use your code above, SAS will run for over 30min and showed "insufficient memory". So I deleted "exercise" from the class statement and it runs much faster but with the error msessage...

NOTE: An infinite likelihood is assumed in iteration 0 because of a nonpositive definite estimated R matrix for subject S02.

here are the only changes I made...

proc sort data=long;

by subject task var;

proc mixed data=long;

class subject task var;

model response = var task|exercise /solution ddfm=kenwardroger;

repeated var task / type=un@un subject=subject rcorr;

run;

thanks for your help!

ming

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-27-2013 01:57 PM

Deleting EXERCISE from the CLASS statement means that SAS interprets the TASK|EXERCISE "independent variable" in the MODEL statement as consisting of a nominal variable, TASK; a continuous variable, EXERCISE; and an interaction term between a nominal variable, TASK, and a continuous variable, EXERCISE. Since EXERCISE is not a continuous variable but a nominal variable, this model does not make any sense even if it runs much faster than the original model with EXERCISE as a nominal variable. The PROC MIXED documentation describes several techniques to reduce the running time. One possibility is to include changing the variance-covariance matrix TYPE in the REPEATED statement from TYPE=UN@UN to TYPE=UN@CS, which will reduce the number of parameters to estimate for this matrix at the expense of assuming a single compound symmetry (CS) parameter to model the variability in the TASKs. The second possibility is to analyze your data in pieces using a BY-variable (for example, BY EXERCISE), though this has its own problems.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-27-2013 03:36 PM

I see what you mean. But "exercise" right now as it is defined "exercise = e(i)" is continuous variable (ie. the measured values of 20 exercises * 50 subjects). Maybe if I assign another variable "exerclist" to the exercise values (similar to varlist), and put exerclist in the class statement. But then that wouldn't use the exercise values...

any thoughts around that?

thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

08-27-2013 04:20 PM

this is the output from the PROC mixed with "exercise" in the class statement.

I changed the code in the data long for the exercise=i instead of exercise=e(i). The output is the same except for exercise=i that makes it a nominal variable and with exercise=e(i), the nominal variable will show 526 values.

error message is still the same,

ERROR: Unable to allocate sufficient memory: a request for 2760K bytes exceeded the 1254K

available. Note that the deficit amount may not be the amount of memory needed for a

successful run, since it does not reflect subsequent allocations by this or other

processes.

ERROR: The SAS System stopped processing this step because of insufficient memory.

thanks.

ming