WEIGHT MATRIX for PROC SPATIELREG

Cruise · Posted 09-29-2018 01:32 PM

I'm trying to conduct Spatial Autoregressive Regression model using PROC SPATIALREG which requires pre-defined spatial matrix data. The below hash object macro is used from Silva 2018, SUGI paper "Creating a Spatial Weights Matrix for the SPATIALREG Procedure".

https://www.sas.com/content/dam/SAS/support/en/sas-global-forum-proceedings/2018/2481-2018.pdf

I defined Tiger shapefile at ZCTA level (1794 unique ZIP codes) for New York in the macro call and it produces 4 datasets below.

%Wmatrix(map=uniq_ny_zcta1,id=nctid,lat=y,long=x,out=neighbor, type=contiguity); run;

/*RESULTING MATRIX FILES:
NEIGHBOR,        N=154, column=2
Wmatrix,         N=154, column=154
Wmatrix_compact, N=154, column=3
Wmatrix_sdr,     N=154, column=154*/

For the next step with Wmatrix created, i defined my health outcome-exposure data in the PROC SPATIALREG as shown beow. Pancreatic_cancer dataset has incidence rates of pancreatic cancer at each 1606 unique ZIP codes of New York.

When I run PROC SPATIALREG as shown below, I get an error: Input dataset and spatial matrix do not have equal number of observations. I don't have exact phrase to recreate the error message on my laptop SAS but the meaning was Wmatrix and pancreatic cancer datasets are required to have same number of observations.

But how? what can i do differently here to make number of observations equal at both datasets?

Any hints are appreciated !

proc spatialreg data=pancreatic_cancer /*N=1606 unique zip-codes*/ 
                Wmat=Wmatrix NONORMALIZE;
   model incidence_rate_cancer = n_benzene_sites_at_zipcodes 
								 n_chemical_sites_at_zipcpdes
                                 n_volatilechemicals_sites_zipcodes/ type=SAR;
run;

/*IMPORT TIGER SHAPEFILE AT ZIP-CODE LEVEL FOR NEW YORK*/

proc mapimport out=ny_tiger_zip /*N=1,827,869*/
datafile="...\tl_2010_36_zcta510.shp";
run;

proc sql; /*N=1,794 UNIQUE ZIP CODES IN NEW YORK STATE*/
select count(distinct ctid) as n_ctid
from newyork_ct1;
quit; 

/*DEDUPLICATE CENSUS SHAPEFILE BY UNIQUE ZIP-CODE*/
proc sort nodupkey data=ny_tiger_zip
out=uniq_ny_zcta;
by ZCTA5CE10; 
run;

/*CONVERT CHARACTERS TO NUMERIC*/
data uniq_ny_zcta1;  format ZCTA5CE10 $9.; set uniq_ny_zcta; 
long=input(INTPTLAT10,12.);
lat=input(INTPTLON10,12.);
nctid=input(ZCTA5CE10,9.);
run;

/*CALL FOR MACRO FOR UNIQUE ZIP-CODE LEVEL SHAPEFILE*/

%Wmatrix(map=uniq_ny_zcta1,id=nctid,lat=y,long=x,out=neighbor, type=contiguity); run;

/*RESULTING MATRIX FILES:

NEIGHBOR,        N=154, column=2
Wmatrix,         N=154, column=154
Wmatrix_compact, N=154, column=3
Wmatrix_sdr,     N=154, column=154
*/

/*PANCREATIC CANCER-HEALTH OUTCOME DATA*/

proc spatialreg data=pancreatic_cancer /*N=1606 unique zip-codes*/ 
                Wmat=Wmatrix NONORMALIZE;
   model incidence_rate_cancer = n_benzene_sites_at_zipcodes 
								 n_chemical_sites_at_zipcpdes
                                 n_volatilechemicals_sites_zipcodes/ type=SAR;
run;


/*MACRO*/
%macro Wmatrix(map=,id=,lat=,long=,type=contiguity,centroid_lat=,
centroid_long=, distance=,neighbors=,out=neighbor);
%if %upcase(&type)=CONTIGUITY %then %do;
%if &neighbors = %then %do;
/********Defining the Neighborhood for contiguity ***********/
proc sort data=&map out=&map.2 nodupkey; by &long &lat &id;run;
data _hashmap_ (keep=&long &lat z &id);
retain z;
set &map.2(where=(&long NE . and &lat NE .));
by &long &lat;
if first.&long or first.&lat then z=1;
z=z+1;
run;
proc means data=_hashmap_ noprint;
output out=maxz max(z)=mz; run;
/*put the maximum value of z into macro variable maxnb*/
data _null_;
set maxz;
call symputx("maxnb",mz);
run;
%put &maxnb;
proc sort data=&map.2; by &id;run;
data nonb (keep=myid);
/*length &long &lat z 8; format &id 16.;*/
if _n_=1 then do;
/*this hash object holds a second copy of the entire map for comparison*/
declare hash fc(dataset: "_hashmap_");
fc.definekey("&long","&lat","z");
fc.definedata("&long","&lat","z","&id");
fc.definedone();
call missing (&long,&lat,z,&id);
/*this hash object will hold the rook neighbors for each area: they have two
points in common*/
declare hash rook();
rook.definekey("&id","myid");
rook.definedata("&id","myid");
rook.definedone();
end;
/*this hash object holds the bishop neighbors for each area: they have a
point
in common*/
declare hash bishop();
bishop.definekey("&id","myid");
bishop.definedata("&id","myid");
bishop.definedone();
foundnb="N";
do until (last.myid);
set &map.2 (keep=&id &long &lat rename=(&id=myid &long=myx &lat=myy)
where=(myx NE .
and myy NE .)) end=eof;
by myid;
do n=1 to &maxnb.; /*this is max number of points in common =max z*/
rc=fc.find(key:myx, key:myy, key:n);
if rc=0 and myid NE &id then do; 
nbrc=rook.check(key:&id, key:myid);
if nbrc=0 then do;
rc2=rook.add(key:&id, key:myid, data:&id, data:myid);
foundnb="Y";
end;
else rc1=rook.add(key:&id, key:myid, data:&id, data:myid);
end;
end;*do &maxnb.;
end;*end DOW loop;
if foundnb="N" then output nonb;
if eof then rook.output(dataset:"&out");
run;
proc sort data=&out;by &id myid;run;
proc sort data=&out(keep=&id) nodupkey out=_tab_;by &id;run;
data _tab_;set _tab_;idi=_n_;run;
proc sort data=&out;by &id;run;
data _tab2_;merge &out _tab_;by &id;run;
proc sort data=_tab2_;by myid;run;
data _tab2_;merge _tab2_ _tab_(rename=(idi=idj &id=myid));by myid;run;
proc sort data=_tab2_;by &id;run;
proc sql;drop table _tab_,_hashmap_,Nonb,Maxz,&map.2;quit; 
data Wmatrix_compact;set &out(rename=myid=c&id);
value=1;
run;
%end;
%else %do;
%if &centroid_lat= or &centroid_long= %then %do;
%annomac;
proc sort data=&map(keep=&id &lat &long rename=(&long=y &lat=x)) out=_&map;by
&id;run;
%centroid(_&map,&out,&id);
proc sql;drop table _&map;quit;
%end;
%else %do;
data &out;set &map(keep=&id &centroid_lat &centroid_long
rename=(&centroid_long=y &centroid_lat=x));run;
%end;
proc iml;
use &out;
read all var{x y} into COORD;
n=nrow(coord[,1]);
*print n;
d=j(1,3,0);
nome={"idi" "idj" "d"};
create _dist_ from d[colname=nome];
do i=1 to n;
 do j=1 to n;
 d[1]=i;
 d[2]=j;
 d[3]=sqrt((COORD[i,1]-COORD[j,1])**2+(COORD[i,2]-COORD[j,2])**2);
 append from d;
 end;
end;
close _dist_; 
quit;
data _dist_;set _dist_;if d=0 then delete;run;
proc sort data=_dist_;by idi d;run;
data _dist_;retain seq 0;set _dist_;by idi;
if first.idi then seq=1;
else seq+1;
run;
data _tab2_;set _dist_(where=(seq<=&neighbors));run;
data &out;set _tab2_;run;
data Wmatrix_compact;set &out;
rename idi=&id idj=c&id;value=1;
drop seq d;
run;
proc sql;drop table _dist_;quit;
%end;
/********creating W Matrix ***********/
proc iml;
use _tab2_ var{idi idj};
read all;
n=max(idj);
w=j(n,n,0);
do h=1 to nrow(idi);
w[idi[h],idj[h]]=1;
end;
wpdr=j(n,n,0);
soma=j(n,1,0);
do i=1 to n;
do j=1 to n;
soma[i]=soma[i]+w[i,j];
end;
end;
do i=1 to n;
do j=1 to n;
if soma[i]=0 then wpdr[i,j]=0;
else wpdr[i,j]=w[i,j]/soma[i];
end;
end;
create Wmatrix from w; 
append from w;
create Wmatrix_sdr from wpdr;
append from wpdr;
quit;
proc sql;drop table _tab2_;quit;
%end;
%else %if %upcase(&type)=DISTANCE %then %do;
%if &centroid_lat= or &centroid_long= %then %do;
%annomac;
proc sort data=&map(keep=&id &lat &long rename=(&long=y &lat=x)) out=_&map;by
&id;run;
%centroid(_&map,&out,&id);
proc sql;drop table _&map;quit;
%end;
%else %do; 
data &out;set &map(keep=&id &centroid_lat &centroid_long
rename=(&centroid_long=y &centroid_lat=x));run;
%end;
proc iml;
use &out;
read all var{x y} into COORD;
n=nrow(coord[,1]);
*print n;
d=j(1,3,0);
nome={"idi" "idj" "d"};
create _dist_ from d[colname=nome];
do i=1 to n;
 do j=i+1 to n;
 d[1]=i;
 d[2]=j;
 d[3]=sqrt((COORD[i,1]-COORD[j,1])**2+(COORD[i,2]-COORD[j,2])**2);
 append from d;
 end;
end;
close _dist_;
quit;
data &out;set _dist_;run;
proc sql;drop table _dist_;quit;
data Wmatrix_compact;
set &out(rename=(idi=&id idj=c&id)) &out(rename=(idi=c&id idj=&id));
%if &distance= %then %do;
value=1/d;
%end;
%else %do;
if d<=&distance then value=1;
if value=. then delete;
%end;
keep &id c&id value;
run;
proc sort data=Wmatrix_compact;by &id c&id;run; 
/********creating W Matrix ***********/
proc iml;
use &out var{idi idj d};
read all;
n=max(idj);
w=j(n,n,0);
do h=1 to nrow(idi);
%if &distance= %then %do;
w[idi[h],idj[h]]=1/d[h];
w[idj[h],idi[h]]=1/d[h];
%end;
%else %do;
if d[h]<=&distance then do;
w[idi[h],idj[h]]=1;
w[idj[h],idi[h]]=1;
end;
%end;
end; 
wpdr=j(n,n,0);
soma=j(n,1,0);
do i=1 to n;
do j=1 to n;
soma[i]=soma[i]+w[i,j];
end;
end;
do i=1 to n;
do j=1 to n;
if soma[i]=0 then wpdr[i,j]=0;
else wpdr[i,j]=w[i,j]/soma[i];
end;
end;
create Wmatrix from w;
append from w;
create Wmatrix_sdr from wpdr;
append from wpdr;
quit;
%end;
%mend Wmatrix;

@Reeza @Rick_SAS

Rick_SAS · Posted 09-29-2018 06:04 PM

> Any hints are appreciated !

1. Reread the paper, paying particular attention to any examples

2. Alan da Silva is a nice guy and wants people to use his macro. Contact him directly if you can't determine the problem from re-reading the paper.

Cruise · Posted 09-30-2018 12:50 PM

@Rick_SAS

Thanks Rick,

1. I studied Getting started example for proc spatial reg here: https://support.sas.com/documentation/onlinedoc/ets/ex_code/143/sprgs.html

But it didn'texplain where weight data came from. It was used as pre-defined dataset. Wmatrix dataset has 49 rows as same as in Crimeoh dataset defined in the proc spatialreg. No further justifications or explanations provided.

2. I wrote to Alan da Silva and hope he'll respond. He used columbus map in his macro without reference to data source and its contents.

3. Do you want to move my post to "SAS/IML Software and Matrix Computations"? you know better though.

doylejm · Posted 10-03-2018 12:03 PM

Did you find a resolution to this problem? I am getting the same error message. I used the %WMATRIX macro and am trying to use the compact form of the W matrix that the macro creates, for use in the SPATIALREG procedure. My Wmatrix has 130 distinct IDS, my data set for the regression is a panel data set with n=130 over 14 years. I have double checked that the 130 ids in each dataset match.

jm

doylejm · Posted 10-03-2018 12:46 PM

actually I got it to run using a single year of data. Now I need to figure out how to run this on a panel of data with n=130 counties and t=14 years of data. The error message says that the number of spatial IDS is not the same in the compact W matrix (has n=130 spatial ID values) and in the panel data set (which has the same n=130 counties repeated for each of 14 years). Any suggestions? thank you

Cruise · Posted 10-04-2018 03:35 AM

I still am stuck with exact same issue you're having here. I wrote to the author Alan da Silva a week ago. He had not gotten back to me yet. Please let me know or post solution if you figure out before I do. Thanks. I'll appreciate that.

doylejm · Posted 10-12-2018 02:13 PM

Sorry for the delay. I have not found a solution, and a probably going to switch to stata as it appears to be able to copy the matrix appropriately for use in a balanced panel. I want to import the matrix created by sas into stata. trying to figure that out now.

Cruise · Posted 10-12-2018 02:20 PM

@doylejm

Thanks for update. I was advised by one of the professors teaching spatial statistics to consider 'sdep' package in R. Just thinking of trying this. Please let me know is STAT works out.

https://cran.r-project.org/web/packages/spdep/spdep.pdf

WEIGHT MATRIX for PROC SPATIELREG

Re: WEIGHT MATRIX for PROC SPATIELREG

Re: WEIGHT MATRIX for PROC SPATIELREG

Re: WEIGHT MATRIX for PROC SPATIELREG

Re: WEIGHT MATRIX for PROC SPATIELREG

Re: WEIGHT MATRIX for PROC SPATIELREG

Re: WEIGHT MATRIX for PROC SPATIELREG

Re: WEIGHT MATRIX for PROC SPATIELREG