Solved: Re: Faster way to get local average of cells in matrix?

jmoody77 · Posted 01-28-2021 11:45 AM

Hi -

I'm using IML to do an agent based model, treating the matrix space as the substrate agents move over.

As part of the input for agents I'd like to smooth the data around all other agents' values. So for every ij cell in the matrix, the goal is to return the average of all neighboring cells (including index). The code listed below works; but I'm curious if there's a way to do this that doesn't involve looping over all cells.

Thanks in advance!

Jim

proc iml;

/* function to pull out the subset of cells adjacent 
to selected node, including index */
start nrhd(inmat,idx);
  rc=ndx2sub(dimension(inmat),idx); /* give xy coord */
  rs=(rc[1]-1):(rc[1]+1); /* get nbrhd rows*/
  rs=xsect(rs,1:nrow(inmat)); /* limit by bndry */

  cs=(rc[2]-1):(rc[2]+1); /* get nbrhd cols */
  cs=xsect(cs,1:ncol(inmat));
  nbhd=inmat[rs,cs]; 
 return(nbhd);
 free rs cs;
finish;

/* silly example */
mat=shape(1:25,5,5);
newmat=mat>100; /*silly way to get empty matrix. I know.  being lazy.*/
do i=1 to 25;
 nbrs_i=nrhd(mat,i);
 val=mat[nbrs_i][:]; 
 newmat[i]=val;
end;


print mat newmat;

quit;

Rick_SAS · Posted 01-28-2021 02:39 PM

This computation is an example of a 2D convolution. Usually computations like this are performed by "padding" the boundary of the matrix with missing values (or sometimes zeros) so that you can average the neighbors without worrying about whether a cell is in the first or last row/column.

There are lots of different averaging schemes. It looks like you want to compute an evenly weighted average of each 3x3 set of cells. Other schemes might put more weight on the center of the square or less weight in the corners. To accommodate different weighting schemes, researchers often use a "mask" of weights to indicate how much weight to give to each element in the 3x3 grid. They also have to decide how to average the boundary cells. For your example, the weight matrix would be j(3,3,1) and wouldn't affect the result, so I have omitted it.

I think your solution is fine. But since you asked for an alternative way to solve the problem, you might consider using the padding idea, as follows:

proc iml;
mat=shape(1:30,5,6);
nr = nrow(mat);
nc = ncol(mat);
avg=j(nr, nc, 0);

pad = j(nr+2, nc+2, .);   /* add row/col of missing values on boundary */
pad[ 2:nr+1, 2:nc+1 ] = mat;
do i = 1 to nr;
  do j = 1 to nc;
     avg[i,j] = ( pad[i:i+2, j:j+2] )[:];   /* average */
  end;
end;
print avg;

By the way, these ideas a used in studying automata, such as Conway's Game of Life and Wolfram's Rule 30, which I have explored in SAS/IML.

View solution in original post

Rick_SAS · Posted 01-28-2021 02:39 PM

This computation is an example of a 2D convolution. Usually computations like this are performed by "padding" the boundary of the matrix with missing values (or sometimes zeros) so that you can average the neighbors without worrying about whether a cell is in the first or last row/column.

There are lots of different averaging schemes. It looks like you want to compute an evenly weighted average of each 3x3 set of cells. Other schemes might put more weight on the center of the square or less weight in the corners. To accommodate different weighting schemes, researchers often use a "mask" of weights to indicate how much weight to give to each element in the 3x3 grid. They also have to decide how to average the boundary cells. For your example, the weight matrix would be j(3,3,1) and wouldn't affect the result, so I have omitted it.

I think your solution is fine. But since you asked for an alternative way to solve the problem, you might consider using the padding idea, as follows:

proc iml;
mat=shape(1:30,5,6);
nr = nrow(mat);
nc = ncol(mat);
avg=j(nr, nc, 0);

pad = j(nr+2, nc+2, .);   /* add row/col of missing values on boundary */
pad[ 2:nr+1, 2:nc+1 ] = mat;
do i = 1 to nr;
  do j = 1 to nc;
     avg[i,j] = ( pad[i:i+2, j:j+2] )[:];   /* average */
  end;
end;
print avg;

By the way, these ideas a used in studying automata, such as Conway's Game of Life and Wolfram's Rule 30, which I have explored in SAS/IML.

Rick_SAS · Posted 01-28-2021 02:45 PM

PS. Your program doesn't actually work because you are returning inmat[rs,cs]. You should be returning the subscripts, not the matrix values.

jmoody77 · Posted 01-28-2021 04:41 PM

Yup...good eye; thanks. the nrhd() function pulls out the values of the input "mat" matrix -- i.e. returns the submatrix w. i in the center. The example here only worked because the cell values are the index values!. Pure coincidence on how I constructed the example matrix.

In case anyone is interested, this modification fixes it:

do i=1 to 25;
    nbrs_i=nrhd(mat,i);
	*val=mat[nbrs_i][:]; /* this is wrong */
	val=nbrs_i[:]; /* this is right */
	newmat[i]=val;
  end;

jmoody77 · Posted 01-28-2021 04:42 PM

thanks much; appreciate it!

Rick_SAS · Posted 01-28-2021 05:03 PM

The "pad" method seems to be substantially faster. On a 600x400 matrix, the "Nbhd" method takes about 4.5 seconds on my system whereas the "Pad" method takes about 0.2 seconds (about 25 times faster).

jmoody77 · Posted 01-29-2021 11:05 AM

Very cool. Thank you!

Rick_SAS · Posted 02-02-2021 09:56 AM

Do you have other questions? If not, you can close this thread by choosing a response to mark as the "Accepted Solution."

jmoody77 · Posted 02-02-2021 09:58 AM

ahh...sorry; done!