turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- OLS residual from PROC MIXED

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-16-2013 05:55 PM

Hi everyone,

I am trying to fit a geostatistical model to spatial data **Z(s) = U + e(s).** where Z(s) denotes the yield measured from fixed location s (easting and northing as coordinate system) belonging to a spatial domain D and U is the mean and e(s) is the residual error at location s. This is an intercept-only or mean model.The large-scale source of variation is attributed to U and small-scale or microscale variation is attributed to stochastic component e.

This is a spatial data with spatial autocorrelation of residual term. I attempt to use the linear mixed model of SAS to account for the spatial dependency of error structure. by using

%macro spatialcov(cov=);

proc mixed data=spatial method=reml;

model yield=;

repeated /subject=intercept type=sp(&cov) (easting northing);

run;

quit;

%mend spatialcov;

%spatial(cov=sph);

%spatial(cov=exp);

%spatial(cov=gau);

I intend to use parms to provide the intial starting values for the geostatistical parameter range, sill. I read that initial values of the these parameters can be obtaoned from inspection of variogram of OLS residuals.

My question is can we obtain the OLS residual from proc mixed by using **model yield=/outp=residat**; through examining the dataset residat without using the repeated statement as given above? My worry is that MIXED procedure uses GLS (Generalized Least Square) approach in estimating parameters even though the residual is assumed to be independent as long as repeated statement is not used. Am I right to consider the residual from **model yield=/outp=residat** as being OLS residual? I want to do variogram model on the residual to be able to have initial starting values for the geostatistical parameters (sill, range, nugget). Does anyone have other way of determining the starting values for the sills, range, and nugget? Please help me out.

Thank you.

Accepted Solutions

Solution

06-10-2013
09:58 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-10-2013 09:58 AM

Since you don't have any fixed effects in your model (other than the intercept), you can use VARIOGRAM directly on your data (you don't need OLS residuals).

See my comment in the other post regarding memory limitations in 9.3 (or earlier versions). You probably need to look at subsets of your data.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-16-2013 08:12 PM

I'm curious why you want to use a general mixed models approach instead of PROC KRIGE2D, which is specialized for spatial models and spatial covariance? See this paper for some description and examples:

Everything In Its Place: Efficient Geostatistical Analysis with SAS/STAT Spatial Procedures

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

05-17-2013 01:46 PM

Thank you Rick for your response. I thought PROC KRIGE2D is most required for interpolation at later stage of the analysis. That is good suggestion and I will try to read through for better understanding. Meanhwhile, I would like to know if the residual obtain through the statement model yield=/outp=residat of proc mixed can be referred to as OLS residual without using repeated statement. We know that without repeated statement in proc mixed, it is assumed that the residual error term is independent and normally distributed with mean of zero and constant variance. Please clarify this for me

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-09-2013 07:29 PM

Dear Rick,

I could not see how to use PROC KRIGE2D for fitting spatial structures such as spherical, exponential, gaussian, power, linear, linear log. It seems the REML algorithm of SAS can not handle my spatial data of size 12009 observations. Modeling the residual R structure gives 12009 x 12009 variance-covariance matrix. The error message received is

Unable to allocate sufficient memory: a request for 1126689K bytes exceeded the 392454K available. Note that the deficit amount may not be the amount of memory needed for a successful run,

since it does not reflect subsequent allocations by this or other processes.

ERROR: The SAS System stopped processing this step because of insufficient memory”

Please what do you advise? I got this message despite running the program on system with 512 GB of RAM. Does it mean SAS cannot handle the data?

Solution

06-10-2013
09:58 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-10-2013 09:58 AM

Since you don't have any fixed effects in your model (other than the intercept), you can use VARIOGRAM directly on your data (you don't need OLS residuals).

See my comment in the other post regarding memory limitations in 9.3 (or earlier versions). You probably need to look at subsets of your data.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-11-2013 11:56 AM

Dear lvm,

I have decided to work on the subset of the data rather than the whole data in fitting the spatial structure. I have 2-year data from the same location. This is 2 environments. Each of the environments has not less than 12,000 observations with easting, northing and yield as attributes. easting and northing are projected coordinates in metres derived from conversion of longitude and latitude. I intend to consider the subset of data from the same geographical area from each environment for spatial modeling before combined analysis. I thought of using IF statement to define the subset of data. It is just like considering a field plot area and take observation from a particular region of the field for analysis. Please let me know if there is any better approach to this.

Thank you.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-11-2013 02:45 PM

You can create a subset of your data (from a randomly selected intact region) in a separate DATA step and then analyze this data set. Or you could randomly sample the locations from the entire data set, possibly using SURVEYSELECT to do the selections. I can see both approaches. The problem with limiting your analysis to an intact section of the field is that you will not have any information on spatial associations over large distances (by definition, you will not be considering the largest distance lags if you are only looking at a section). Of course, it is possible that you will find no spatial correlations at the largest distances. The problem with a random sample of all the locations is that you may not have sufficient numbers of locations at a given distance to obtain precise parameter estimates. I would consider doing the analysis on multiple sections or multiple samplings to see if your results are consistent.

You could thus do most of your analysis with VARIOGRAM (you don't need to do this with residuals when your model is only a constant for the fixed effects). I am guessing that you can have much larger data sets with VARIOGRAM. Plus, you have to be careful in using REML or ML (the MIXED approach for the spatial parameters) for fitting spatial covariances (or semi-variances). Those large spatial lags with few observations can be too influential (the spatial covariance parameter estimates may be poor, unless you have good starting values). Note that MIXED uses GLS for the fixed effects, but REML/ML for the random effects (which includes the terms in a REPEATED statement). This is all very nicely described in the spatial analysis chapter in Littell et al. (2006), SAS for Mixed Models, 2nd edition. Weighted least squares (as in VARIOGRAM) is useful for this analysis.

There is trial and error in getting good starting values for covariance/semi-variance models (sills, etc.). Scabenberger & Pierce (2002), Contemporary Statistical Models for the Plant and Soil Sciences, has a very good description of this. But also check out the VARIOGRAM User's Guide; recent versions of the proc have very good modeling capabilities.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

06-11-2013 02:50 PM

Also, check out this from the SAS Global Forum:

http://support.sas.com/resources/papers/proceedings10/337-2010.pdf