Data visualization with SAS programming

Multiple Linear Regression Analysis of Data from Field Expt. using RCBD

Reply
Occasional Contributor
Posts: 17

Multiple Linear Regression Analysis of Data from Field Expt. using RCBD

I did a field experiment using RCBD with three blocks and three treatments.

At each treatment, data were collected at two points

Data collection was done over two years. There were four and five data collection times from each point during the first and second years, respectively. Data collection was done every 30 days.

In addition, possible explanatory variables were collected including B, P, C, N, R, WC, and TM.

The WC and TM data were measured concurrent with collection of the response variable, while the rest were collected one time only.

I want to do a multiple linear regression analysis between the response variable and all the aforementioned explanatory variables for each treatment.

My main concerns are; 1) I have unbalanced data , 2) I have two groups of explanatory variables
(collected one-time only, and collected multiple times), 3) I am not sure how to proceed with the analysis (e.g. Should I average

data over three blocks or include block in the model, which I don't know how to do)

I can do MLR analysis using PROC REG on data from non-RCBD experiments.

I have not experienced using PROC AUTOREG (I suspect my data are autocorrelated) or PROC GLMSELECT.

If you have time and it’s not too much to ask, please express your suggestions in the form of comments and SAS codes.

Thank you.

TD21

Respected Advisor
Posts: 2,655

Re: Multiple Linear Regression Analysis of Data from Field Expt. using RCBD

I guess I would think of this as the following:

Fixed class effects: treatment, point, year, time, and appropriate interactions to give the skeleton ANOVA for an RCBD

Fixed covariate effects:BD, pH, TC, TN, CNR, WC, and TMP

Random effect: block

The subject here (I think) is point within block within treatment.

My idea of a mixed model would look something like:

proc mixed;

class treatment point year time;

model response = treatment*year*time BD pH, TC TN CNR WC TMP/solution ddfm=kr(firstorder);

repeated time/subject=point(block*treatment) type=ar(1); /* May want to explore other covariance structures */

random block; /* and perhaps block*year if the blocks are not identically laid out over years */

run;

Note that this is a "means model".  Because of the unequal replication in time, least squares means for the main effect of treatment would not be estimable.  However, they could be constructed using LSMESTIMATE statements, but we can get to that if this approach seems logical.

Steve Denham

Occasional Contributor
Posts: 17

Re: Multiple Linear Regression Analysis of Data from Field Expt. using RCBD

Thanks Steve. These are the codes I used for ANOVA without the fixed covariate effects. I tried four CVs (CS, CSH, AR(1), and UN), and CSH gives the best fit statistics.

PROC Mixed DATA=WORK.NFldGrav_Final PLOTS(ONLY) = (ResidualPanel(Marginal));

CLASS Block Treat Point Year Time;

MODEL NM = Treat|Year|Point  Time(Year) Treat*Time(Year) Point*Time(Year)

RANDOM Block Block*Treat  Block*Treat*Point;

REPEATED Time/SUBJECT=Point(Block*Treat*Year) TYPE = CSH;

LSMEANS Treat*Year Treat*Year*Point Treat*Time(Year)/ADJUST=TUKEY SLICE=(Treat Year Point Time)CL;

LSMEANS Treat*Point*Time(Year)/ADJUST=TUKEY SLICE=(Treat Year Point Time)CL;

RUN;

The approach you mentioned for analysis that includes the fixed covariate effects seems logical. I just would like to know if there's a way we could do to detect collinearity using this approach.

While I was browsing for ideas to do the analysis with fixed covariates, I found that one solution is to do principal component analysis first  then MLR, or MLR only with automatic selection and variance inflation factors to detect collinearity. However, I don't know how to do PCA in SAS, much more with RCBD. I had experienced doing MLR in SAS, but not with data from RCBD experiment. So, my approach will be to do MLR by treatment and point with both dependent variables and covariates averaged over three blocks. To illustrate:

Treatment 1, Point 1

PROC REG DATA = WORK.NField  PLOTS (ONLY) = (CP);

STEPWISE: MODEL NM = BD TN TC CNR WC TMP/ SELECTION = STEPWISE; RUN;

PROC REG DATA = WORK.NField PLOTS;

FULL:  MODEL NM = BD TN TC CNR WC TMP / VIF; RUN;

.

.

.

Treatment 3, Point 2

PROC REG DATA = WORK.NField  PLOTS (ONLY) = (CP);

STEPWISE: MODEL NM= BD TN TC CNR WC TMP/ SELECTION = STEPWISE; RUN;

PROC REG DATA = WORK.NField PLOTS;

FULL:  MODEL NM = BD TN TC CNR WC TMP / VIF; RUN;

I didn't proceed with this idea, because I am not sure how much information will be lost by averaging over 3 blocks, if ever doing the analysis per treatment is a correct approach, in addition to eliminating time out of the equation. The bottomline is, I just would like to know, if some (eliminate collinear var.) or all of the fixed covariate effects significantly explain the variation in the dependent variable for each treatment at each point within each treatment. Thanks.

Respected Advisor
Posts: 2,655

Re: Multiple Linear Regression Analysis of Data from Field Expt. using RCBD

Well, collinearity can be examined without considering the random effects as random and the fixed classification effects.  For collinearity purposes, you can consider them as affecting only the "intercept".  So to check on the covariates, I would just use PROC REG, and not even include all of the other factors.  Also, don't use stepwise methods--search this site and SAS-L for reasons not to.  LASSO might be OK, but it would be really hard to beat subject matter knowledge.  So what happens if you ran:

PROC REG DATA = WORK.NField PLOTS;

FULL:  MODEL NM = BD TN TC CNR WC TMP / VIF; RUN;

and none of the covariates looked like they were strongly correllated with the others.  I assume you would fit all.  Then in PROC MIXED you could look at Type3 tests of significance, and remove, en block, those that didn't look like they were of utility.  And if one or two did have large VIF values, you could eliminate at the PROC REG step.

But it comes down to the substantive questions that you are trying to answer with the analysis.  Simply coming up with a "best-fitting" model is probably not the objective of the study.

Steve Denham

Occasional Contributor
Posts: 17

Re: Multiple Linear Regression Analysis of Data from Field Expt. using RCBD

Thanks Steve.

TD21

Ask a Question
Discussion stats
  • 4 replies
  • 869 views
  • 2 likes
  • 2 in conversation