BookmarkSubscribeRSS Feed
FabioMC
Obsidian | Level 7

Dear SAS Forum,
I have to analyze the data of an experiment where there are four Pools (Boxes filled with water) where there are 2 species of water plants that are subjected or not to a treatment (treatment=adding metals to the water, control=nothing added).
Four "parts" (Leave/Stem, Roots, Water near the plant, Soil near the plant) of one plant were analyzed, at each time and in each Pool, for metal concentration (response variable) Sixteen metals are measured on each sub-sample . So 48 sub-samples (4 parts/sample plant) were analyzed. On each of them the concentration of each of the sixteen metals is measured.
The design is the following:
Pool Species Treat Part Time Id
P1 TL C L/S T0 1
P1 TL C R T0 1
P1 TL C Soil T0 1
P1 TL C Water T0 1
P1 TL C L/S T1 2
P1 TL C R T1 2
P1 TL C Soil T1 2
P1 TL C Water T1 2
P1 TL C L/S T2 3
P1 TL C R T2 3
P1 TL C Soil T2 3
P1 TL C Water T2 3
P2 TP C L/S T0 4
P2 TP C R T0 4
P2 TP C Soil T0 4
P2 TP C Water T0 4
P2 TP C L/S T1 5
P2 TP C R T1 5
P2 TP C Soil T1 5
P2 TP C Water T1 5
P2 TP C L/S T2 6
P2 TP C R T2 6
P2 TP C Soil T2 6
P2 TP C Water T2 6
P3 TL M L/S T0 7
P3 TL M R T0 7
P3 TL M Soil T0 7
P3 TL M Water T0 7
P3 TL M L/S T1 8
P3 TL M R T1 8
P3 TL M Soil T1 8
P3 TL M Water T1 8
P3 TL M L/S T2 9
P3 TL M R T2 9
P3 TL M Soil T2 9
P3 TL M Water T2 9
P4 TP M L/S T0 10
P4 TP M R T0 10
P4 TP M Soil T0 10
P4 TP M Water T0 10
P4 TP M L/S T1 11
P4 TP M R T1 11
P4 TP M Soil T1 11
P4 TP M Water T1 11
P4 TP M L/S T2 12
P4 TP M R T2 12
P4 TP M Soil T2 12
P4 TP M Water T2 12

We have repetition through time with pool as the subject and through part with Id (sample plant) as subject.
I wonder how to analyze this design to get possibly effects of species, time, treatment and interactions of time and tratment with species .
I see that design could not be the best and there are few individuals (Id).
I tried the following univariate syntax (the 16 metals analyzed separately). Var is the metal type variable, y is the concentration if the metal.

proc mixed data=dati_l;
by var;
class Var Species Time Pool Treatment Id Part_of_plants;
model y= Species Time Treatment Species*Time Species*Treatment /s outp=outp ;
repeated Part_of_plants/subject=Id ;
random time /subject=Pool s;
run;


Syntax above worked but for there were random effect calculation or convergence problems.
What could be the best syntax for the repeated/random structure of data?
Can a multivariate (considering simultaneously all 16 metals analyzed) analysis help?
Thank You in advance for helping,
Fabio

5 REPLIES 5
StatsMan
SAS Super FREQ

There is a lot to consider here!  What kind or errors or warnings do you receive from PROC MIXED with this model?  

 

One potential problem I see is the way that ID is set up.  For your POOL=P1, you have these observations:

 

Pool Species Treat Part Time Id
P1 TL C L/S T0 1
P1 TL C R T0 1
P1 TL C Soil T0 1
P1 TL C Water T0 1
P1 TL C L/S T1 2
P1 TL C R T1 2
P1 TL C Soil T1 2
P1 TL C Water T1 2
P1 TL C L/S T2 3
P1 TL C R T2 3
P1 TL C Soil T2 3
P1 TL C Water T2 3

 

ID=1 is only observed at TIME=T0.  ID=2 is only observed at TIME=T1.  ID=3 is only observed at TIME=T2.  That would imply that you have 3 plants in POOL=P1, but you only took measurements on one plant at each time point?  The same pattern repeats for the other levels of POOL.  Are your repeated measures on each plant just the 4 parts of the plant and there is no repeated measures on time? 

 

 

FabioMC
Obsidian | Level 7

Hi StatsMan,

here Id is analyzed 4 times throught its parts; parts are four (L/S, R, Soil, Water); Id is not repeated through time.

Instead Pool is repeated through time (3 times, at T0,T1,T2).

I tried the following Glimmix syntax and it worked (well).

I am in doubt expecially about the structure of repeated effects (Id through parts and Pool through time).

Thanks,

Fabio 

----

proc glimmix data=dati_l ;
by var;
class Var Species Time Pool Treatment Id Part_of_plants;
t2 = timen / 100;
model y= timen |Treatment |Species | Part_of_plants /s ;
random t2 / subject=Pool type=rsmooth
knotmethod=kdtree(bucket=100 knotinfo) s;

random Part_of_plants /subject=id residual;
nloptions tech=newrap;
run;

 

 

----

StatsMan
SAS Super FREQ

The Radial Smoother is a nice altermative to the spatial structure and could be used here.

 

However, I am concerned that you have four different responses here (L/S, R, Soil, and Water).  You can try a multivariate approach, but if these four responses are measuring different things then breaking them out into four univariate models is a good alternative.  In that model, you probably would not need the second RANDOM statement.

FabioMC
Obsidian | Level 7

 

 

Thank You.
Could you suggest the best syntax to use according to your opinion?

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Based on your description of your study, I would say that you have a serious design problem: both SPECIES and TREATMENT are assigned to POOL (i.e., POOL is the experimental unit for these two factors), but you have no replicates of the SPECIES and TREATMENT combinations. You have only one pool for each combination.

 

If you assume that there is no interaction of SPECIES and TREATMENT, then you could use the 1 df for SPECIES*TREATMENT as the error for testing SPECIES and TREATMENT. Is that a legitimate approach? It depends upon the extent to which there actually is no SPECIES*TREATMENT. And with only 1 denom df, the tests would be of low power.

 

Unless you are willing to assume that there is no TIME effect and no PART effect, you cannot consider "repetition through time" or "through part" to be valid replications. You clearly think that there may be TIME and/or PART differences, so I doubt that you can find replicates in this fashion, apart from other concerns I would have about the validity of this.

 

Ignoring the lack of replication problem for the moment.... Plants (aka ID) are clustered within POOLs, and IDs are the experimental unit associated with the TIME factor. At this point, your design resembles a split plot where POOL is the whole plot unit, SPECIES and TREATMENT are the whole plot factors; ID is the subplot unit, and TIME is the subplot factor. If 3 plants in each pool were randomly assigned at the beginning of the study to be destructively sampled at different TIMEs, then you do not need anything fancy for the covariance structure: I'd consider either CS or CSH.

 

With respect to PART: Do you want to compare metal concentrations among PARTs, to do separate analysis of each PART, to analyze the mean over PARTs, or something else?

 

But back to lack of replication. You need to sort that problem out first, to your satisfaction and to the satisfaction of anyone who will be evaluating your work (committee, reviewers, etc.) before turning your attention to your other analysis questions. You could look into the methodologies presented here

https://www.amazon.com/Analysis-Messy-Data-Nonreplicated-Experiments/dp/0412063719

While searching for that link I noticed a symposium on the topic here

https://dl.sciencesocieties.org/publications/cs/tocs/46/6#h1-ANALYSIS OF UNREPLICATED EXPERIMENTS (S...

It may be that a descriptive approach would be best, no statistical inference tests.

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1692 views
  • 0 likes
  • 3 in conversation