BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Moshood
Fluorite | Level 6


Hi everyone,

I am trying to fit a geostatistical model to spatial data Z(s) = U + e(s). where Z(s) denotes the yield measured from fixed location s (easting and northing as coordinate system)  belonging to a spatial domain D and U is the mean and e(s) is the residual error at location s. This is an intercept-only or mean model.The large-scale source of variation is attributed to U  and small-scale or microscale variation is attributed to stochastic component e.

This is a spatial data with spatial autocorrelation of residual term. I attempt to use the linear mixed model of SAS to account for the spatial dependency of error structure. by using

%macro spatialcov(cov=);

proc mixed data=spatial method=reml;

     model yield=;

     repeated /subject=intercept type=sp(&cov) (easting northing);

run;

quit;

%mend spatialcov;

%spatial(cov=sph);

%spatial(cov=exp);

%spatial(cov=gau);

I intend to use parms to provide the intial starting values for the geostatistical parameter range, sill. I read that initial values of the these parameters can be obtaoned from inspection of variogram of OLS residuals.

My question is can we obtain the OLS residual from proc mixed by using  model yield=/outp=residat; through examining the dataset residat without using the repeated statement as given above? My worry is that MIXED procedure uses GLS (Generalized Least Square) approach in estimating parameters even though the residual is assumed to be independent as long as repeated statement is not used. Am I right to consider the residual from model yield=/outp=residat as being OLS residual? I want to do variogram model on the residual to be able to  have initial starting values for the geostatistical parameters (sill, range, nugget). Does anyone have other way of determining the starting values for the sills, range, and nugget? Please help me out.

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Since you don't have any fixed effects in your model (other than the intercept), you can use VARIOGRAM directly on your data (you don't need OLS residuals).

See my comment in the other post regarding memory limitations in 9.3 (or earlier versions). You probably need to look at subsets of your data.

View solution in original post

7 REPLIES 7
Rick_SAS
SAS Super FREQ

I'm curious why you want to use a general mixed models approach instead of PROC KRIGE2D, which is specialized for spatial models and spatial covariance?  See this paper for some description and examples:

Everything In Its Place: Efficient Geostatistical Analysis with SAS/STAT Spatial Procedures

Moshood
Fluorite | Level 6

Thank you Rick for your response. I thought PROC KRIGE2D is most required for interpolation at later stage of the analysis. That is good suggestion and I will try to read through for better understanding. Meanhwhile, I would like to know if the residual obtain through the statement model yield=/outp=residat of proc mixed can be referred to as OLS residual without using repeated statement. We know that without repeated statement in proc mixed, it is assumed that the residual error term is independent and normally distributed with mean of zero and constant variance. Please clarify this for me

Moshood
Fluorite | Level 6

Dear Rick,

I could not see how to use PROC KRIGE2D for fitting spatial structures such as spherical, exponential, gaussian, power, linear, linear log. It seems the REML algorithm of SAS can not handle my spatial data of size 12009 observations. Modeling the residual R structure gives 12009 x 12009 variance-covariance matrix. The error message received is

Unable to allocate sufficient memory: a request for 1126689K bytes exceeded the 392454K available. Note that the deficit amount may not be the amount of memory needed for a successful run,

       since it does not reflect subsequent allocations by this or other processes.

  ERROR: The SAS System stopped processing this step because of insufficient memory”


Please what do you advise?  I got this message despite running the program on system with 512 GB of RAM. Does it mean SAS cannot handle the data?

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Since you don't have any fixed effects in your model (other than the intercept), you can use VARIOGRAM directly on your data (you don't need OLS residuals).

See my comment in the other post regarding memory limitations in 9.3 (or earlier versions). You probably need to look at subsets of your data.

Moshood
Fluorite | Level 6

Dear lvm,

I have decided to work on the subset of the data rather than the whole data in fitting the spatial structure. I have 2-year data from the same location. This is 2 environments. Each of the environments has not less than 12,000 observations with easting, northing and yield as attributes. easting and northing are projected coordinates in metres derived from conversion of longitude and latitude. I intend to consider the subset of data from the same geographical area from each environment for spatial modeling before combined analysis. I thought of using IF statement to define the subset of data. It is just like considering a field plot area and take observation from a particular region of the field for analysis. Please let me know if there is any better approach to this.

Thank you.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

You can create a subset of your data (from a randomly selected intact region) in a separate DATA step and then analyze this data set. Or you could randomly sample the locations from the entire data set, possibly using SURVEYSELECT to do the selections. I can see both approaches. The problem with limiting your analysis to an intact section of the field is that you will not have any information on spatial associations over large distances (by definition, you will not be considering the largest distance lags if you are only looking at a section). Of course, it is possible that you will find no spatial correlations at the largest distances. The problem with a random sample of all the locations is that you may not have sufficient numbers of locations at a given distance to obtain precise parameter estimates. I would consider doing the analysis on multiple sections or multiple samplings to see if your results are consistent.

You could thus do most of your analysis with VARIOGRAM (you don't need to do this with residuals when your model is only a constant for the fixed effects). I am guessing that you can have much larger data sets with VARIOGRAM. Plus, you have to be careful in using REML or ML (the MIXED approach for the spatial parameters) for fitting spatial covariances (or semi-variances). Those large spatial lags with few observations can be too influential (the spatial covariance parameter estimates may be poor, unless you have good starting values). Note that MIXED uses GLS for the fixed effects, but REML/ML for the random effects (which includes the terms in a REPEATED statement). This is all very nicely described in the spatial analysis chapter in Littell et al. (2006), SAS for Mixed Models, 2nd edition. Weighted least squares (as in VARIOGRAM) is useful for this analysis.

There is trial and error in getting good starting values for covariance/semi-variance models (sills, etc.). Scabenberger & Pierce (2002), Contemporary Statistical Models for the Plant and Soil Sciences, has a very good description of this. But also check out the VARIOGRAM User's Guide; recent versions of the proc have very good modeling capabilities.

lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12

Also, check out this from the SAS Global Forum:

http://support.sas.com/resources/papers/proceedings10/337-2010.pdf

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1897 views
  • 0 likes
  • 3 in conversation