BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The GLIMMIX documentation (for 14.3 anyway) has more detail. Read

 

GLIMMIX | LSMEANS | ADJUST=

 

then ask again if you still can't make sense of it.

nlpurumi
Obsidian | Level 7

I still wanted to test the normality and homoscedasticity assumptions with codes and I faced an error message like this:

 

For normality assumption testing, the code I used is this:

 

proc mixed data=dissertation1 covtest method=ml;*plots=all;
class group subjectid slc trial1;
model gmp_syl_log = group|slc|trial1/ ddfm=kr2 outp=R;
random intercept / subject=subjectid(group);
repeated trial1 / subject=slc*subjectid(group) type=sp(pow)(trial1);
/*lsmeans trial1 / diff adjust=simulate(seed=98375);
lsmeans group*slc ;*/
run;

ods graphics on;
proc univariate data=R normal;
BY group slc;
var Resid;
histogram / normal
ctext = blue;
run;

 

The error I received is this:

ERROR: Data set WORK.R is not sorted in ascending sequence. The current BY group has SLC = 3 and
the next BY group has SLC = 1.

 

Could you please help me one more time?

In addition, I changed the covariance structure to sp(pow)(trial1) because the trials had unequally spaced time intervals. There were different numbers of fillers.

Thanks in advance.

 

Linda

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

The error I received is this:

ERROR: Data set WORK.R is not sorted in ascending sequence. The current BY group has SLC = 3 and
the next BY group has SLC = 1.

 


 

The ERROR tells you what you need to know: when you use a BY statement, the data set must be sorted in advance in the same order specified by the BY statement. Try sorting the dataset R and see if that works.

 


In addition, I changed the covariance structure to sp(pow)(trial1) because the trials had unequally spaced time intervals. There were different numbers of fillers.

 


 

The variable for distance in SP(POW)(c-list) must be numeric and it must reflect the actual "distance" represented by each level of TRIAL1. Read 

 

MIXED | REPEATED | TYPE=

 

Using SP(POW)(TRIAL1) will not work because (1) TRIAL1 is in the CLASS statement and so is not numeric, and (2) the levels of TRIAL1 (1,2,3,...,10) do not reflect the different numbers of "fillers".

 

nlpurumi
Obsidian | Level 7

Thanks for your advice. After changing the order of variables and sort the R dataset, the normality test ran well.

 

Thanks.

 

Regarding the covariance structure, there is not specific interval and the time intervals among trials vary and depend on each individual. This is because each individual had the capability to move on to the next trial when they completed responding to the current task in each trial.

In this type of unequally time spaced dataset, do you have any recommendation in terms of what code should be used to specify covariance structure? Can I just use "UN" instead of "AR(1), ARH(1) for heteroscedastic data, or SP(pow) (c-list)"?

 

Or can I use VC because following description about VC (http://documentation.sas.com/?docsetId=statug&docsetTarget=statug_mixed_syntax14.htm&docsetVersion=1...) says that VC specifies standard variance components and is the default structure for both the RANDOM and REPEATED statements. In the RANDOM statement, a distinct variance component is assigned to each effect. In the REPEATED statement, this structure is usually used only with the GROUP= option to specify a heterogeneous variance model.

I like the fact that VC with "Group=" option handles heterogeneous variance model, which is the case in my result.

 

 

SP(pow) (c-list) was something I found in  Littell et al.'s(2006) book.

 

If I ask one more question, I want to report the F statistics in my document.

I wonder if I can assume the second degrees of freedom I need to report is written under the heading of Den DF in the second column of the SAS output.

So if I write, "a significant group by SLC interaction (F(2,359)=18.14, p<.0001), the main effect of group (F(1,46.2)=48.96, p<.0001), and the main effect of SLC (F(2,359)=148.47, p<.0001) were observed." will this be a correct statement?

 

Thanks again in advance.

 

ex) SAS Output:

gmp_syl.png
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Regarding the covariance structure, there is not specific interval and the time intervals among trials vary and depend on each individual. This is because each individual had the capability to move on to the next trial when they completed responding to the current task in each trial.

In this type of unequally time spaced dataset, do you have any recommendation in terms of what code should be used to specify covariance structure? Can I just use "UN" instead of "AR(1), ARH(1) for heteroscedastic data, or SP(pow) (c-list)"?

 

 

 

 

 

I'd say that you need to rethink what TRIAL1 represents. To this point, I have been thinking of it as a repeated measurement through time in response to a consistent treatment. But apparently each level of TRIAL1 represents a level of another factor ("task"), complicated by the fact that the response of a subject to the different tasks may be autocorrelated and that the degree of autocorrelation may vary by subject. The complexity of this study has now exceeded what we can resolve in this forum. You will want to find a statistical consultant at your institution; this problem needs a lot of discussion, and the context of the study is important and needs to be considered.

 


If I ask one more question, I want to report the F statistics in my document.

I wonder if I can assume the second degrees of freedom I need to report is written under the heading of Den DF in the second column of the SAS output.

So if I write, "a significant group by SLC interaction (F(2,359)=18.14, p<.0001), the main effect of group (F(1,46.2)=48.96, p<.0001), and the main effect of SLC (F(2,359)=148.47, p<.0001) were observed." will this be a correct statement?

 

 

 

 

Yes, the format for the F statistic is F(numerator df, denominator df).

 

nlpurumi
Obsidian | Level 7

But apparently each level of TRIAL1 represents a level of another factor ("task"), complicated by the fact that the response of a subject to the different tasks may be autocorrelated and that the degree of autocorrelation may vary by subject. 

 

--> My reply: Actually, each task is repeated ten times over ten trials, but these tasks were randomly mixed in each experimental block. So I am not sure if the trial1 really represents tasks. It may not.

 

After consulting with one faculty and students majoring in biostats, I decided to ignore the unequally spaced intervals among the trials because it is unlikely that variation in terms of the time each trial occurred caused by each individual would change how they perform each task. 

 

I will determine the best covariance structure by examining the fit statistics.

 

And thanks for confirming that I can use info under Den DF as my second degrees of freedom.

 

 

 

 

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Now I'm confused about TRIAL1 and the random mixing of tasks and "experimental blocks" that have not been previously addressed. My confusion is fine, I don't need to have a clear understanding at this point, but given that I am not at all sure about your experimental design, the statistical model that I suggested might well be incorrect. 

 

There is no "perfect" model. The treatment and design structure of the model should match the experimental design. But "good enough" could be sufficient when it comes to distributional assumptions, covariance structures, etc.

 

Good luck!

 

nlpurumi
Obsidian | Level 7

The participants produced 12 sequences (4 3-syllable sequences, 4 6-syllable sequences, 4 9-syllable sequences) 2 times in one block. They produced these five times over five blocks.

They produced each sequence 10 times.

So basically what I am interested in is the comparison among three sequence length conditions and blocks mean nothing.

The tasks were blocked only to give participants a few minute breaks.

Although my research questions do not question about trial effect, I am still interested in it and I assume there might be learning effect when participants repeat the same task over ten times (trial1). So I wanted to include this as one of random or repeated variable in the model.

 

Among 12 sequences, I picked 1 3-syllable sequence, 1 6-syllable sequence, and 1 9-syllable sequence as my target sequences for analysis. The other ones were fillers and were not included in the dataset I provided.

 

12 sequences were randomly mixed, but I pre-determined this random order before the experiment, so every participant underwent the same order of tasks.

 

I don't know if this design changes any of your codes.

Because I face some problems whenever I include random and repeat statements together in one model and somehow your codes work fine, I am careful to change any parts of your code.

So if you think your code is not reflecting my design, I am not capable of changing it without causing errors.

 

These are the final codes I used. I am also attaching my final dataset. Could you please help confirming this looks fine?

After testing with these codes, normality assumption did not meet even with the log tranformed gmp_syl, so I decided to report only glimmix results.  Although both proc glimmix and proc mixed produced the same results.

 

Thanks.

 

 

/* Import data from xlsx */
libname a1 'C:\Data Analysis\Experimental Results\SAS';

proc import datafile="C:\Data Analysis\Experimental Results\SAS\dissertation.xlsx"

out=Work.dissertation

dbms=xlsx

replace;

run;

/* Delete unfilled columns and unfilled rows */
data dissertation1;
set dissertation;
/*drop var20-var56;*/
where whole_1st= 1;
run;

/* Variable transformation */
data dissertation1;
set dissertation1;
gmp_syl_log = log(gmp_syl);
gmp_syl_sqrt = sqrt(gmp_syl);
run;

title1 "Coding check";
proc freq data=dissertation1;
table group subjectid slc block trial1;
run;

title1 "Pattern of observations";
proc tabulate data=dissertation1(where=(gmp_syl ne .));
class group subjectid slc trial1 block;
table group*subjectid, slc*trial1 / misstext="X";
run;

proc sort data=dissertation1;
by group subjectid slc trial1;
run;
title1 "Observed data";
proc sgpanel data=dissertation1 noautolegend;
panelby slc group / columns=2;
series x=trial1 y=gmp_syl / group=subjectid markers;
run;
title1 "Observed data, log scale";
proc sgpanel data=dissertation1 noautolegend;
panelby slc group / columns=2;
series x=trial1 y=gmp_syl_log / group=subjectid markers;
run;
proc sgpanel data=dissertation1;
panelby group subjectid / columns=6;
series x=trial1 y=gmp_syl_log / group=slc markers;
run;
/*unconditional model ICC check*/
proc mixed data=dissertation1 ic;
class group subjectid slc trial1;
model GMP_syl_log= trial1/solution;
random intercept trial1/sub=subjectid(group);
run;
/*homoscedasticity assumption testing*/
proc mixed data=dissertation1 ic;
class group subjectid slc trial1;
model GMP_syl_log= group|slc|trial1 /ddfm=kr2 outp=R;
random intercept / subject=subjectid(group);
repeated trial1 / subject=slc*subjectid(group) type=ar(1);
run;
data R1; set R;
absr=ABS(RESID);
run;
proc glm data=R1;
class group subjectid slc trial1;
model ABSR=group|slc|trial1;
run;

/*normality assumption testing*/
proc mixed data=dissertation1 covtest method=ml;*plots=all;
class group slc subjectid trial1;
model gmp_syl_log = group|slc|trial1/ ddfm=kr2 outp=R;
random intercept / subject=subjectid(group);
repeated trial1 / subject=slc*subjectid(group) type=ar(1);
/*lsmeans trial1 / diff adjust=simulate(seed=98375);
lsmeans group*slc ;*/
run;
proc sort data=R;
by group slc subjectid trial1;
run;
ods graphics on;
proc univariate data=R normal;
BY group slc ;
var Resid;
histogram / normal
ctext = blue;
run;
title1 "AR(1) among TRIAL1 levels using MIXED";
proc mixed data=dissertation1 covtest ;*plots=all;
class group subjectid slc trial1;
model gmp_syl_log = group|slc|trial1 / ddfm=kr2;
random intercept / subject=subjectid(group);
repeated trial1 / subject=slc*subjectid(group) type=ar(1);
lsmeans trial1 / diff adjust=simulate(seed=98375);
lsmeans group*slc ;
run;
title1 "AR(1) among TRIAL1 levels using GLIMMIX";
proc glimmix data=dissertation1 plots=(studentpanel boxplot(fixed student));
class group subjectid slc trial1;
model gmp_syl_log = group|slc|trial1 / ddfm=kr2;
random intercept / subject=subjectid(group);
random trial1 / subject=slc*subjectid(group) type=ar(1) residual;
lsmeans trial1 / diff adjust=simulate(seed=98375) lines;
lsmeans group*slc / plot=meanplot(sliceby=group join cl);
lsmestimate group*slc "Group effect: SLC 1 v 2" 1 -1 0 -1 1 0,
"Group effect: SLC 1 v 3" 1 0 -1 -1 0 1,
"Group effect: SLC 2 v 3" 0 1 -1 0-1 1
/ adjust=simulate(seed=29847);
run;

 

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Why did you have each participant do 10 trials? What research questions do you want to answer about trials?

 

Is SLC the sequence length condition (3-, 6-, or 9-syllable)?

 

Did you use the same order of sequences in all 10 trials? And, to confirm, the same order of sequences with all participants? In other words, there was only one order of sequences for the whole study? Do you think that order of sequences would affect the response?

How did you choose which 3 sequences of the 12 to use for analysis?

 

nlpurumi
Obsidian | Level 7

Why did you have each participant do 10 trials? What research questions do you want to answer about trials?

A: It is to obtain enough number of trials for each target sequence. Just one trial may not work.

 

Is SLC the sequence length condition (3-, 6-, or 9-syllable)?

A: Yes.

 

Did you use the same order of sequences in all 10 trials?

A: I am not sure what you are asking here.

The variable "Trial1" reflects the order of 10 trials that were produced for each sequence .

So the trial 1 was produced before trial2, trial 2 was produced before trial 3, etc.

This order does not necessarily match with the numbers in the file name though.

The 12 sequences were randomly mixed up in each block.

Block 1 had different order of 12 sequences from block 2, the block 2 had different order of 12 sequences from block 3, etc.

 

And, to confirm, the same order of sequences with all participants? In other words, there was only one order of sequences for the whole study?

A: Yes.

 

Do you think that order of sequences would affect the response?

A: No. That is why I randomly mixed up but made sure 12 sequences were produced twice in one block.

 

How did you choose which 3 sequences of the 12 to use for analysis?

A: Randomly.

 

I think the dataset I uploaded in my previous reply contained some errors. I uploaded the right one again.

nlpurumi
Obsidian | Level 7

I think the dataset I uploaded in my previous reply contained some errors. I uploaded the right one again.

 

Assuming the current code I described in my previous reply is correct, I start to write up the results.

 

Now I wonder how I write the codes to examine posthoc analysis when interaction effect is significant.

For example, if I want to come up with p-value for the effect of one variable (e.g., Trial) at each level of another variable (e.g., SLC).

Will these two lines for such simple main effects?

 

lsmeans trial1 / diff adjust=simulate(seed=98375) lines;
lsmeans group*slc / plot=meanplot(sliceby=group join cl); 

 

Thanks for your help again in advance.

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Assuming the current code I described in my previous reply is correct, I start to write up the results.

 

Now I wonder how I write the codes to examine posthoc analysis when interaction effect is significant.

For example, if I want to come up with p-value for the effect of one variable (e.g., Trial) at each level of another variable (e.g., SLC).

Will these two lines for such simple main effects?

 

lsmeans trial1 / diff adjust=simulate(seed=98375) lines;
lsmeans group*slc / plot=meanplot(sliceby=group join cl); 

I would say that you do not know yet what the correct statistical model is for your study, and that writing results is premature. In addition, I suspect that you do not fully understand your statistical model or the code that implements it, and understanding is absolutely a prerequisite. 

 

Pairwise mean comparisons are legitimate but are of little use in interpreting interaction in my opinion: interactions reflect unequal pairwise comparisons, and so merely looking at pairwise comparisons misses the point of interaction. I recommend using pertinent contrasts instead. In my code suggestion, I have included contrasts that I think could be sensible. Figure out what they do, and see whether you agree that they are pertinent.

 

 

nlpurumi
Obsidian | Level 7

Sorry for my late reply to this comment.

I realized you suggested pertinent contrast, however it is not the term I know.

 

Your codes looks like it provides the marginal effect, but not the simple main effect at each line of the other dependent variable.

 

I have a flow chart to follow when interaction is significant or not significant.

In that flow chart, it was recommended to examine interaction contrast or simple main effect if the interaction is significant.

That was why I looked for the codes how I can examine simple main effects.

If you can teach me what the pertinent contrast is and why it is better than pairwise comparison, I will use the information your codes provided, but for now because I am not fully understanding what you are suggesting, I think I will go with the direction what I have learned at classes.

 

Thanks for your suggestion though.

 

 

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

I think the dataset I uploaded in my previous reply contained some errors. I uploaded the right one again.


 

I think the most recently uploaded dataset may still have problems. Check SubjectID=24 in Group=2.



Why did you have each participant do 10 trials? What research questions do you want to answer about trials?

A: It is to obtain enough number of trials for each target sequence. Just one trial may not work.

 


Your response implies that the multiple trials are merely subsamples--that you do not have a hypothesis about the effect of trials on the mean of the response. (For example, subjects do not get better or worse over multiple trials, or subjects do not get weary over multiple trials.) If this implication is correct, then you would not include TRIAL1 as a fixed effect in the MODEL statement. if it is not correct, then you have not given enough thought to your study and the research questions you are trying to answer.


A: I am not sure what you are asking here.

The variable "Trial1" reflects the order of 10 trials that were produced for each sequence .

So the trial 1 was produced before trial2, trial 2 was produced before trial 3, etc.

This order does not necessarily match with the numbers in the file name though.

The 12 sequences were randomly mixed up in each block.

Block 1 had different order of 12 sequences from block 2, the block 2 had different order of 12 sequences from block 3, etc.

 


Did you use the same order of sequences in all 10 trials?

Before you do another study of this type, please spend some time learning about crossover designs.

 

If Trial1 levels are in fact subsamples, this is the modeling approach I would consider. I would report the second of the two models below.

 

/*  Import data from CSV */
PROC IMPORT OUT= WORK.dissertation0 
            DATAFILE= "Dissertation.csv" 
            DBMS=CSV REPLACE;
     GETNAMES=YES;
     DATAROW=2; 
RUN;
/*   Extract analysis data */
data dissertation;
    set dissertation0;
    where whole_1st=1;
    drop filenames filenames_a subjectid2 trial trial2 ;
    run;

/*   Variable transformation */
data dissertation;
    set dissertation;
    gmp_syl_log = log(gmp_syl);
    run;

title1 "Pattern of observations";
proc tabulate data=dissertation(where=(gmp_syl ne .));
    class group subjectid syl trial1 block;
    table group*subjectid, block*trial1*syl / misstext="X";
    run;

/*  I think SubjectID=24 in Group 2 has bad data */
data dissertation;
    set dissertation;
    if subjectid=24 and group=2 then delete;
    run;

title1 "Trial1 levels as random subsamples";
proc glimmix data=dissertation plots=(studentpanel boxplot(fixed student));
    class group subjectid syl trial1;
    model gmp_syl_log = group|syl / ddfm=kr2;
    random intercept / subject=subjectid(group);
    random syl / subject=subjectid(group);
    random intercept / subject=trial1(subjectid group);
    lsmeans group*syl / plot=meanplot(sliceby=group join cl);
    lsmestimate group*syl "Group effect: SYL 1 v 2" 1 -1 0 -1 1 0,
                          "Group effect: SYL 1 v 3" 1 0 -1 -1 0 1,
                          "Group effect: SYL 2 v 3" 0 1 -1  0-1 1
        / adjust=simulate(seed=29847);
    run;

/*  Variances for SYL levels appear unequal
    Fit the heterogeneous variances model and compare AICc values. 
    The heterogeneous variances model provides better fit
    but does not alter the conclusions. */
title1 "Trial1 levels as random subsamples";
title2 "with heterogeneous SYL variances";
proc glimmix data=dissertation plots=(studentpanel boxplot(fixed student));
    class group subjectid syl trial1;
    model gmp_syl_log = group|syl / ddfm=kr2;
    random intercept / subject=subjectid(group);
    random syl / subject=subjectid(group);
    random intercept / subject=trial1(subjectid group);
    random _residual_ / group=syl;
    lsmeans group*syl / plot=meanplot(sliceby=group join cl);
    lsmestimate group*syl "Group effect: SYL 1 v 2" 1 -1 0 -1 1 0,
                          "Group effect: SYL 1 v 3" 1 0 -1 -1 0 1,
                          "Group effect: SYL 2 v 3" 0 1 -1  0-1 1
        / adjust=simulate(seed=29847);
    run;

 

For this model, your approach to assessing normality and homogeneity of variance assumptions does not assess the appropriate values because your approach is based on residuals, which are subsamples and not true experimental units.

 

nlpurumi
Obsidian | Level 7

If I readdress my question, could you please help me understand following two lines?

 
random intercept / subject=trial1(subjectid group);   (--> Does it mean to report the variance due to a variable "trial"? Is there a possibility that "trial" was treated as "repeated" variable in proc glimmix through this code, because proc glimmix does not have repeated statement?)
random _residual_ / group=slc;                         (--> I think I have a vague sense that residuals are divided by the level of slc, but I am not quite understanding why we divide them. How does it take care of heteroscedasticity problem?)
 
Thanks in advance.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 32 replies
  • 2366 views
  • 2 likes
  • 2 in conversation