turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS/GRAPH and ODS Graphics
- /
- Multiple Linear Regression Analysis of Data from F...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

08-24-2014 03:20 PM

I did a field experiment using RCBD with three blocks and three treatments.

At each treatment, data were collected at two points

Data collection was done over two years. There were four and five data collection times from each point during the first and second years, respectively. Data collection was done every 30 days.

In addition, possible explanatory variables were collected including B, P, C, N, R, WC, and TM.

The WC and TM data were measured concurrent with collection of the response variable, while the rest were collected one time only.

I want to do a multiple linear regression analysis between the response variable and all the aforementioned explanatory variables for each treatment.

My main concerns are; **1)** I have unbalanced data , **2)** I have two groups of explanatory variables

(collected one-time only, and collected multiple times), **3)** I am not sure how to proceed with the analysis (e.g. Should I average

data over three blocks or include block in the model, which I don't know how to do)

I can do MLR analysis using PROC REG on data from non-RCBD experiments.

I have not experienced using PROC AUTOREG (I suspect my data are autocorrelated) or PROC GLMSELECT.

If you have time and it’s not too much to ask, please express your suggestions in the form of comments and SAS codes.

Thank you.

TD21

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TD21

08-27-2014 11:20 AM

I guess I would think of this as the following:

Fixed class effects: treatment, point, year, time, and appropriate interactions to give the skeleton ANOVA for an RCBD

Fixed covariate effects:BD, pH, TC, TN, CNR, WC, and TMP

Random effect: block

The subject here (I think) is point within block within treatment.

My idea of a mixed model would look something like:

proc mixed;

class treatment point year time;

model response = treatment*year*time BD pH, TC TN CNR WC TMP/solution ddfm=kr(firstorder);

repeated time/subject=point(block*treatment) type=ar(1); /* May want to explore other covariance structures */

random block; /* and perhaps block*year if the blocks are not identically laid out over years */

run;

Note that this is a "means model". Because of the unequal replication in time, least squares means for the main effect of treatment would not be estimable. However, they could be constructed using LSMESTIMATE statements, but we can get to that if this approach seems logical.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

08-28-2014 01:49 PM

Thanks Steve. These are the codes I used for ANOVA without the fixed covariate effects. I tried four CVs (CS, CSH, AR(1), and UN), and CSH gives the best fit statistics.

PROC **Mixed** DATA=WORK.NFldGrav_Final PLOTS(ONLY) = (ResidualPanel(Marginal));

CLASS Block Treat Point Year Time;

MODEL NM = Treat|Year|Point Time(Year) Treat*Time(Year) Point*Time(Year)

RANDOM Block Block*Treat Block*Treat*Point;

REPEATED Time/SUBJECT=Point(Block*Treat*Year) TYPE = CSH;

LSMEANS Treat*Year Treat*Year*Point Treat*Time(Year)/ADJUST=TUKEY SLICE=(Treat Year Point Time)CL;

LSMEANS Treat*Point*Time(Year)/ADJUST=TUKEY SLICE=(Treat Year Point Time)CL;

RUN;

The approach you mentioned for analysis that includes the fixed covariate effects seems logical. I just would like to know if there's a way we could do to detect collinearity using this approach.

While I was browsing for ideas to do the analysis with fixed covariates, I found that one solution is to do principal component analysis first then MLR, or MLR only with automatic selection and variance inflation factors to detect collinearity. However, I don't know how to do PCA in SAS, much more with RCBD. I had experienced doing MLR in SAS, but not with data from RCBD experiment. So, my approach will be to do MLR by treatment and point with both dependent variables and covariates averaged over three blocks. To illustrate:

**Treatment 1, Point 1**

PROC REG DATA = WORK.NField PLOTS (ONLY) = (CP);

STEPWISE: MODEL NM = BD TN TC CNR WC TMP/ SELECTION = STEPWISE; RUN;

PROC REG DATA = WORK.NField PLOTS;

FULL: MODEL NM = BD TN TC CNR WC TMP / VIF; RUN;

.

.

.

Treatment 3, Point 2

PROC REG DATA = WORK.NField PLOTS (ONLY) = (CP);

STEPWISE: MODEL NM= BD TN TC CNR WC TMP/ SELECTION = STEPWISE; RUN;

PROC REG DATA = WORK.NField PLOTS;

FULL: MODEL NM = BD TN TC CNR WC TMP / VIF; RUN;

I didn't proceed with this idea, because I am not sure how much information will be lost by averaging over 3 blocks, if ever doing the analysis per treatment is a correct approach, in addition to eliminating time out of the equation. The bottomline is, I just would like to know, if some (eliminate collinear var.) or all of the fixed covariate effects significantly explain the variation in the dependent variable for each treatment at each point within each treatment. Thanks.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to TD21

08-28-2014 03:05 PM

Well, collinearity can be examined without considering the random effects as random and the fixed classification effects. For collinearity purposes, you can consider them as affecting only the "intercept". So to check on the covariates, I would just use PROC REG, and not even include all of the other factors. Also, don't use stepwise methods--search this site and SAS-L for reasons not to. LASSO might be OK, but it would be really hard to beat subject matter knowledge. So what happens if you ran:

PROC REG DATA = WORK.NField PLOTS;

FULL: MODEL NM = BD TN TC CNR WC TMP / VIF; RUN;

and none of the covariates looked like they were strongly correllated with the others. I assume you would fit all. Then in PROC MIXED you could look at Type3 tests of significance, and remove, en block, those that didn't look like they were of utility. And if one or two did have large VIF values, you could eliminate at the PROC REG step.

But it comes down to the substantive questions that you are trying to answer with the analysis. Simply coming up with a "best-fitting" model is probably not the objective of the study.

Steve Denham

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SteveDenham

08-28-2014 03:45 PM

Thanks Steve.

TD21