BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
tdlang
Calcite | Level 5

Hello all,

First question I have posted, so please let me know if addition information is needed beyond that provided below:

I am looking at a dataset of fish density values, recorded over 3 years at 3 different locations.

  • These density values are obtained during surveys that are composed of multiple transect lines at a single site
  • Transect "curtains" are further divided into grids (distance x depth), of which each cell is assigned a density and coordinate
  • Additional parameters (Bottom habitat, salinity, temperature, etc) are assigned to the cells
  • The density values have a very very skewed distribution (many small, few very large), hence the use of GLIMMIX to attempt to address this.

I am interested in the effects of these "additional parameters" on density, though I cannot ignore the spatial nature of the data (fish group together, nearby cells will tend to have more similar densities).

I am assuming that spatial correlations would only exist within a single survey.

My model currently looks like this:

proc glimmix data=Data.BankPresent pconv =1e-5 maxopt=30;

    class Site Habitat Season Survey;

    model Density= Relief DShallow MeanSalinity MeanTemp Site|Habitat|Season

        / dist=gamma link=log;

    random Survey / subject=intercept type=sp(exp) (Easting Northing);

run;

I am unsure of where to put the spatial autocorrelation component [type=sp(exp) (Easting Northing)], or if this model is doing what I have attempted to describe above.

If needed/of interest, the variables included are:

  • Relief: continuous, vertical relief of the seafloor below cell i
  • DShallow: continuous, distance from cell i to the shallowest point at the site
  • MeanSalinity: continuous, salinity of cell i
  • MeanTemp: continuous, temperature of cell i
  • Site: categorical, site at which the survey was conducted (1,2,3)
  • Habitat: categorical, habitat classification of seafloor below cell i (1,2,3,4,5)
  • Season: categorical, season in which the survey was conducted (1,2,3,4)

To reiterate, I want to run the above model with a consideration for spatial autocorrelation on a survey-by-survey basis.

Thank you for any information you may be able to provide, applicable references, or questions of clarification.

1 ACCEPTED SOLUTION

Accepted Solutions
SteveDenham
Jade | Level 19

Very close.  Rearrange the random statement to:

random intercept/subject=survey type=sp(exp)(easting northing);

And I think you will be able to fit the model (of course, this supposes sufficient data...).  This does kind of depend on the easting, northing values being unique within survey.  If you have exact duplicates, you may need to specify the subject with greater detail (say survey(season), if the measurements were taken at exactly the same locations in each season).

Once you get that, you may want to consider that season induces autocorrelation, and so a second random term that addresses that may be worth a look.

Steve Denham

View solution in original post

7 REPLIES 7
SteveDenham
Jade | Level 19

Very close.  Rearrange the random statement to:

random intercept/subject=survey type=sp(exp)(easting northing);

And I think you will be able to fit the model (of course, this supposes sufficient data...).  This does kind of depend on the easting, northing values being unique within survey.  If you have exact duplicates, you may need to specify the subject with greater detail (say survey(season), if the measurements were taken at exactly the same locations in each season).

Once you get that, you may want to consider that season induces autocorrelation, and so a second random term that addresses that may be worth a look.

Steve Denham

tdlang
Calcite | Level 5

Thank you very much Steve.

I gave that a shot and it does run, however in the resulting "Covariance Parameter Estimates," the spatial term comes back with an estimate of 1.000 and no standard error.

Cov Parm     Subject     Estimate     Standard Error

Variance       Name        0.1279        0.04371

SP(EXP)      Name        1.0000         .

Residual                        8.4454        0.06207

This yields a model that is no different from one that was run without the type= option included.

Could this be due to the easting/northing duplication you mentioned?

It is possible that there is duplication of easting/northing pairs:

  • Within a Survey (since grid cells are in multiple layers, stacked cells would receive the same easting/northing)
    • Can the type=sp option handle three dimensions? I think that would solve this problem.
  • Between surveys (since surveys were conducted in the same locations multiple times)

I appreciate the response very much, one step closer!

SteveDenham
Jade | Level 19

I would think the duplication may well be the problem--you are seeing perfect correlation within each survey, which may not be surprising if we are dealing with a marginal over all the fixed effects.  I have not seen three dimensional spatial covariances, and I don't think GLIMMIX allows them.  What might work is if you included layer in the model, and then would try subject=survey group=layer in the random statement.  That should fit separate 2 dimensional errors for each layer by survey combination.  Hopefully, your data are rich enough in both detail and N to fit such a model.

Steve Denham

tdlang
Calcite | Level 5

Thanks again Steve.

I have given this a run as well, looks to increase computing time dramatically so I am still waiting on output (11hrs and counting!) but I will report back once it finishes.

From reading about the options again, this seems to make sense.

I appreciate your help putting these things in a context I can understand.

SteveDenham
Jade | Level 19

Think about how many more covariance parameters you are now fitting simultaneously.  CPU time for this goes up on the order of n log n, I think.

Steve Denham

tdlang
Calcite | Level 5

Not much progress here. Even after including the group=layer option, the sp(exp) term for each group is still 1.

I double checked that there are no duplicated coordinates within the Name*Layer combinations.

As for data richness in N, each Name*Layer combination has at least 1 observation (usually many, many more, dataset is 37,000 observations).

I can see how those groups with a single observation would cause a problem, but I would think that since I have tried to "partition" the calculations, the problem would be confined to only those groups with a single observation. I may be mistaken, as I don't know exactly how SAS is computing this giant covariance matrix.

Even when I restrict the analysis to those Name*Layer combinations with at least 3 observations, I have the same problem.

Besides this, I cannot think of anything else that might be causing the problem. I am open to suggestions and, as always, appreciative of your comments.

SteveDenham
Jade | Level 19

Have you tried other spatial autocorrelation types?  Anisotropic spatial power, maybe?  You might also want to look at Radial Smoothing Based on Mixed Models in the documentation for a multidimensional approach.

This is the part where I grasp at straws.

Steve Denham

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2821 views
  • 4 likes
  • 2 in conversation