BookmarkSubscribeRSS Feed
sg248
Fluorite | Level 6

I have time-series cross-section data (a balanced panel) with multiple markets and mutiple weeks for each market. I have the spatial location coordinates for each market as (latitude longitude). So the data look like this (these are made up):

 

Market latitude longitude  Week   Y    X

23          8.73     77.53           1   34  11

23          8.73     77.53           2   21  12

23          8.73     77.53           3   62  14

24          6.73     87.53           1   24  9

24          6.73     87.53           2   45  8

24          6.73     87.53           3   71  14

......

I would like to model the spatial covariance in the error between markets, as a function of spatial proximity. I tried the following program:

 

proc mixed;

model   y = x;

repeated week/subject = market type = sp(exp)(latitude longitude);

run;

 

The error this produces is "a nonpositive definite estimated R matrix for subject 1". I think this is happening because (latitude longitude) is repeated within subject. But I cannot figure out how to specify the model so I can allow for covariance between markets, as a function of (latitude longitude). Any suggestions greatly appreciated.

5 REPLIES 5
mkeintz
PROC Star

I'm not even sure you would get spatial covariance use lat/long, regardless of the nonpositive definite matrix.  Just specifying lat/long menas you think there is an east/west or north/south (or combination) trend over your study region.  But aren't you really interested in the spatial analog of serial autocorrelation?  I mean, isn't it "economic distance from competing/cooperationg markets" that you care about?  I don't know if you're really interested in estimating the impact of other markets, or just eliminating that impact to assess other relations, but I don't see how lat/long will help with either objective.

 

Without getting into robust spatial analysis (see https://support.sas.com/rnd/app/stat/procedures/SpatialAnalysis.html), I think you're just trying to get the impact of distance from other markets on each given market.  Or more likely, just the impact of the nearest markets.

 

I'm just speculating here, but ...

  1. If you believe that influence decays with distance (and probably distance-squared as in a gravity model), why not simplify and get the most important part?  That is generate, for each market, a weighted sum of relevant values for all other markets within a given distance, i.e. within a suitably small circle.   You could probably get a second group of intermediate distance markets as well if you think they could be relevant.
  2. This approach, of course, assumes that your markets are on a " homogeneous transport plane" (i.e. symmetric and a given distance (e.g. 20 miles) has the same impact in densely populated regions as sparsely positive).
  3. If the fixed-size circle technique leaves some markets without a competitor market, or you don't like the homogeneous distance implication, then perhaps you can just take the closest 1 or 2 competitors to each of your markets.  That would presumably capture the most relevent spatial interactions.   And it kind-of assumes that spatial competition has already generated nearest neighbors at the economically relevant distance, regardless of actual mileage.

 

At least this approach would unburden you of the fixed lat/long values for each market.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sg248
Fluorite | Level 6
Hi mkeintz,



Thank you for your response.



Indeed I am looking for the spatial analog of autocorrelation. And I am only looking to control for (or eliminate) such impact in order to correctly assess the effect of X on Y. I am not interested in the pattern of spatial covariance per se.



I think what you are proposing is that I include a weighted sum of Y_m' as a predictor in the model for Y_m, where m' are markets in the neighborhood of market m.



I was hoping that I could achieve this outcome by allowing the errors from a model without such predictors to be correlated based on the spatial location of the market. And I thought that is what Proc Mixed allowed via the TYPE = SP option. And the different types of spatial covariance structures (e.g. EXP, LIN, etc.) allowed different relationships between distance between markets and the strength of the correlation between their errors.



If that is not so, then what does such a covariance matrix represent?



Re: How do I model a Spatial Covariance structure for panel data in Proc Mixed?

I'm not even sure you would get spatial covariance use lat/long, regardless of the nonpositive definite matrix. Just specifying lat/long menas you think there is an east/west or north/south (or combination) trend over your study region. But aren't you really interested in the spatial analog of serial autocorrelation? I mean, isn't it "economic distance from competing/cooperationg markets" that you care about? I don't know if you're really interested in estimating the impact of other markets, or just eliminating that impact to assess other relations, but I don't see how lat/long will help with either objective.



Without getting into robust spatial analysis (see https://support.sas.com/rnd/app/stat/procedures/SpatialAnalysis.html), I think you're just trying to get the impact of distance from other markets on each given market. Or more likely, just the impact of the nearest markets.



I'm just speculating here, but ...
If you believe that influence decays with distance (and probably distance-squared as in a gravity model), why not simplify and get the most important part? That is generate, for each market, a weighted sum of relevant values for all other markets within a given distance, i.e. within a suitably small circle. You could probably get a second group of intermediate distance markets as well if you think they could be relevant. This approach, of course, assumes that your markets are on a " homogeneous transport plane" (i.e. symmetric and a given distance (e.g. 20 miles) has the same impact in densely populated regions as sparsely positive). If the fixed-size circle technique leaves some markets without a competitor market, or you don't like the homogeneous distance implication, then perhaps you can just take the closest 1 or 2 competitors to each of your markets. That would presumably capture the most relevent spatial interactions. And it kind-of assumes that spatial competition has already generated nearest neighbors at the economically relevant distance, regardless of actual mileage.



At least this approach would unburden you of the fixed lat/long values for each market.




mkeintz
PROC Star

@sg248:

 

First I hadn't been aware of the type=sp parameter. And I think you're right about what it supposed to do for you - namely get spatial autocorrelation.  I've looked at some examples of proc mixed with "type=sp" and I can't find any that treats your situation - constant lat/long with a subject.

 

This goes beyond my depth of understanding, but does this link (http://www.ats.ucla.edu/stat/sas/faq/spatial_reg.htm) offer any possibilty?  It uses "/subject=intercept", and then uses a type=sp.  If you have a time variable, would it work if you used time as a third spatial dimension?

     "repeated / subject=intercept type=sp(expa) (time lat long)"

I use "expa" instead of "exp", because the SAS documentation states that EXP is two-dimensional.  EXPA allows more dimensions.  

 

And yes, in the absence of a solution using "type=sp", I was proposing Y_m' as a predictor.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
sg248
Fluorite | Level 6

Thank you, mkeintz.

The ats.ucla link that you provided led me to this reference book

http://ebooks.cawok.pro/SAS.Publishing.SAS.for.Mixed.Models.2nd.Edition.Mar.2006.pdf

that seems to have a very detailed chapter 11 on using Proc Mixed to estimate Spatial Covariance structures. I hope to find a solution there. I appreciate the input and help.

mkeintz
PROC Star

@sg248

 

I'm quite curious to know the resolution of this problem.   Please post it when you find one.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 2544 views
  • 3 likes
  • 2 in conversation