BookmarkSubscribeRSS Feed
Kurt
Calcite | Level 5

Hello,

I am working with a large dataset (over 12,000 lines of data) of measured values for soil moisture. The experimental design is somewhat complex: measurements collected at four distances in relation to a central tree row (DIST), within five silvicultural treatments (TRT), over three years (YEAR), at two sites (SITE). Each site contains four blocks (BLOCK). Essentially, DIST is nested within TRT which is nested within YEAR within SITE.

The premise of the experiment is to determine how soil moisture varies in relation to distance from trees due to silvicultural treatment. Soil moisture measurements were collected at each site every two weeks, during each of the three growing seasons. After examining residuals and running the Kolmogorov-Smirnov test, I have determined the data is not normally distributed, even when it is log transformed. Therefore, I've decided testing with Proc Glimmix is necessary.

The SAS code I'm using runs properly and converges, but I am unsure I've specified the SUBJECT correctly and after running it several ways it seems specifying the correct subject can drastically affect the output. The code is as follows:

PROC GLIMMIX DATA=KURT;

CLASS BLOCK SITE YEAR TRT DIST;

MODEL MOISTURE = SITE YEAR TRT DIST SITE*TRT YEAR*TRT TRT*DIST SITE*TRT*DIST YEAR*TRT*DIST SITE*YEAR*TRT*LOCATION;

RANDOM _RESIDUAL_/ TYPE=CS SUBJECT=REP(TRT*DIST);

LSMEANS SITE*YEAR*TRT*DIST/ PDIFF=ALL ADJUST=TUKEY LINES SLICE=SITE;

NLOPTIONS TECH=NRRIDG;

RUN;

When I run this, I get a significant SITE*YEAR*TRT*LOCATION effect. However, my other inclination was to run the model with SUBJECT=REP(SITE*YEAR*TRT*DIST), and running this model I do not get the significant four way interaction....so I'm curious if anyone has an idea which is the proper way to conduct this analysis?

This will be the first of my heiarchial testing for treatment effects. If site interacts with treatment, I will be analyzing the data seperately sites. At that point, if year interacts with treatment I will analyze each year within each site seperately by sampling date.

Any advise concerning this question would be greatly appreciated!

Thanks,

Kurt

3 REPLIES 3
SteveDenham
Jade | Level 19

Hi Kurt,

A couple of things to think about.  The posted model is an R side model with the rep within treatment and distance as the subject.  In this case, all site and year observations are 'lumped'.  Consequently, it would imply a lot more power to detect differences, which you would not see with the alternative subject specification.  In addition, in the alternative specification, part of the variability originally all attributed to the fixed effects is now assigned to the subject variability.

So, here are questions I would think about.  First, with no distribution specified in the model statement, you are assuming a Gaussian distribution, which you said doesn't apply.  Correct specification of the distribution will be critical to any tests.  Second, you mention that measurements were taken every two weeks, but I see no accommodation of that source of variability (and correlated, at that) in the model.  Essentially, you are modeling the mean value over all weeks with this analysis.  Third, unless you identify substantial differences in variance due to the fixed effects you mentioned in the submodel analyses, I would not fit different models by site or year.  They are a part of the study design, and are not independent.  I understand not fitting SITE as a random effect as there are only two instances, but it does seem to be a whole plot blocking effect, which would be random.  The model I would try (and note try, because it is pretty complex for a starting place) would be:

PROC GLIMMIX DATA=KURT;

CLASS BLOCK SITE YEAR TRT DIST rep week;

MODEL MOISTURE = TRT DIST week trt*week trt*dist trt*dist*week;

random block site year block*site block*year block*site*year;

RANDOM week/ residual TYPE=CS SUBJECT=REP(TRT*DIST);

LSMEANS TRT*DIST/ PDIFF=ALL ADJUST=TUKEY LINES ;

NLOPTIONS TECH=NRRIDG;

RUN;

My additions are in lower case.  Note that if you want BLUPs for individual sites and years, these will require ESTIMATE statements.

Steve Denham

Kurt
Calcite | Level 5

Steve,

Thank you for your help! However, I've ran into two (at least) additional problems. First, as you mentioned, it is important to understand the distribution of your data. My data appear normally distributed when examining the histograms in PROC UNIVARIATE, however, based on the Kolmogorov-Smirnov test (< 0.01) they are not normally distributed. I attempted a test to determine the distribution (below),

ODS SELECT PARAMETER ESTIMATES GOODNESSOFFIT FITQUANTILES MYHIST;

PROC UNIVARIATE DATA=KURT;

VAR MOISTURE;

HISTOGRAM / MIDPOINTS=0.2 TO 1 BY 0.2

LOGNORMAL WEIBULL GAMMA

VAXIS=AXIS1

NAME='MYHIST';

INSET N MEAN (5.3) STD='STD DEV' (5.3) SKEWNESS (5.3)/POSITION=NE HEADER='SUMMARY STATISTICS';

AXIS1 LABEL=(A=90 R=0);

RUN;

however, it seems the dataset also does not qualify under the lognormal, Weibull, or gamma distributions either (p < 0.01 for all). Is it possible this dataset is normally distributed (as the residual plots suggest) but due to the large size of the dataset the K-S test is incorrect? I also tried a second method to determine distribution but SAS didn't recognize the LOSS statement (it appears in red instead of blue).

proc severity data=KURT crit=aicc;

  loss MOISTURE;

  dist _predefined_;

run;

And secondly, when I try to run the model you specified I get an error message stating "The initial estimates did not yield a vaid objective function". I read up on this and tried to address this problem by adding an "INITITER=" option, but whatever number I put, the program will not properly run.

Do you have any further ideas for addressing the distribution of the data or getting the model to perform correctly? Your help is greatly appreciated!

SteveDenham
Jade | Level 19

Regarding normality:  As big as this dataset is, even trivial differences will turn up significant for most of the tests of normality.  I think you would be better off looking at QQ plots.  What you probably have are some extreme values (I hesitate to call them outliers) that might be influencing these tests.  I would say that an assumption of normality is justifiable.

So, on to fitting the model.  If the residuals are normal, then for a model this complex, you may wish to look at HPMIXED.  Perhaps something like:

proc hpmixed data=kurt;
CLASS BLOCK SITE YEAR TRT DIST rep week;

MODEL MOISTURE = TRT DIST week trt*week trt*dist trt*dist*week;

random block site year block*site block*year block*site*year;

repeated week/ residual TYPE=AR(1) SUBJECT=REP(TRT*DIST);

test trt dist trt*dist;

run;

While HPMIXED does not accept the STORE statement, you can pass the covariance parameters to GLIMMIX, and use the results.  See Example 46.3 in the HPMIXED documentation for STAT12.3 as a start.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 808 views
  • 3 likes
  • 2 in conversation