Programming the statistical procedures from SAS

bivariate kerneldensity estimation with proc KDE setting the bandwith

Reply
Occasional Contributor
Posts: 6

bivariate kerneldensity estimation with proc KDE setting the bandwith

Dear community, I don't know if this is the right location of my post, but my question is the following: I want to estimate a bivariate KDE and i want to set the bwm seperatly. I can do this until a number of 20. But if i want to set it like it is recommended in the manual of proc kde (calculate a plug in from std of the variables) it is not possible and I get the error message "The SAS System stopped processing this step because of insufficient memory". In addition i choosed ngrid=400 gridl=5799520 gridu=5837259. I'm working with SAS Studio. Any ideas why this appear? In R the same algorithm with the same values is working very well. Thanks for help in advance! lalilu
SAS Super FREQ
Posts: 3,306

Re: bivariate kerneldensity estimation with proc KDE setting the bandwith

For an example, see http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_kde_examples...

The memory problem might be caused by using too large a value for NGRID=. If you double that number, SAS uses four times as much memory and computation.  Try dialing back that number to NGRID=150 or 200, which is still a very fine grid.

 

If that doesn't fix the problem, please post your complete PROC KDE code.

Occasional Contributor
Posts: 6

Re: bivariate kerneldensity estimation with proc KDE setting the bandwith

Dear Rick, thanks for your answer. I've already tried this. I just can set ngrid to 10 that it is working. To explain a bit more what I'm doing: I'm translating an existing function from R (https://www.rdocumentation.org/packages/Kernelheaping/versions/1.5/topics/dshapebivr) into SAS. This algoithm is able to estimate the density of anonymised geographical data. I'm working with the election data of Berlin (Germany). You are right 100 should be fine, then I get exact data for around 0.5 km^2. And until now there is no method available to plot it like image.plot in R, right? You can find my code and the data in the attachement, just add your path for reading in. I've already programmed all, and i get also adequate estimate without defining bwm. In that case it is not clear for me why they are good because the literature says something different. Thanks again for your help.
Attachment
Attachment
Attachment
Occasional Contributor
Posts: 6

Re: bivariate kerneldensity estimation with proc KDE setting the bandwith

a little correction: and i get also adequate estimate without defining bwm inside my algorithm. In the first kde where I calcluated a pilotestimat i choosed the bwm in the way that i cutted the number until it worked (for example 8000 to 8). I know this is not a good approach, but the result is ok.
SAS Super FREQ
Posts: 3,306

Re: bivariate kerneldensity estimation with proc KDE setting the bandwith

there is no method available to plot it like image.plot in R, right?

 

PROC KDE provides the PLOTS= option which can plot a surface plot or contour plot of the density estimate. If you need something fancier, you can also output the KDE and use SAS graphics to visualize the estimate.  For example, see

How to create a surface plot in SAS

How to create a contour plot in SAS

 

SAS Employee
Posts: 9

Re: bivariate kerneldensity estimation with proc KDE setting the bandwith

Let me refer to the manual with regard to bivariate bandwidth selection:

 

For the bivariate case, Wand and Jones (1993) note that automatic bandwidth selection is both difficult and computationally expensive. Their study of various ways of specifying a bandwidth matrix also shows that using two bandwidths, one in each coordinate’s direction, is often adequate. PROC KDE enables you to adjust the two bandwidths by specifying a multiplier for the default bandwidths recommended by Bowman and Foster (1993):

 

\begin{eqnarray*} h_{X} & =& {\hat\sigma }_{X}n^{-1/6} \\ h_{Y} & =& {\hat\sigma }_{Y}n^{-1/6} \end{eqnarray*}

 

Here ${\hat\sigma }_{X}$ and ${\hat\sigma }_{Y}$ are the sample standard deviations of X and Y, respectively. These are the optimal bandwidths for two independent normal variables that have the same variances as X and Y. They are, therefore, conservative in the sense that they tend to oversmooth the surface.

 

The bandwidth calculation due to Bowman and Foster is performed internally by PROC KDE, and the initial bandwidths are set accordingly.

 

You can specify the BWM= option to adjust the aforementioned bandwidths to provide the appropriate amount of smoothing for your application.

 

The final bandwidth used for computing the KDE is the initial bandwidth times BWM.  If you want more smoothing than the default, set BWM > 1.0.  If you want less smoothing than the default, set BMW < 1.0. 

 

Your IML code

 

*  calculate bwm plug-in;
proc iml;
    use data;
    read all var {X Y FREQ} into xy;
    n=sum(xy[,3]);
    stdx=std(xy[,1])/(n**(1/6));
    call symput("stdxg", char(stdx));
    stdy=std(xy[,2])/(n**(1/6));
    call symput("stdyg", char(stdy));
quit;

 

duplicates the internal initial bandwidth calculation.  When you set BWM to &stdxg and &stdyg in

 

proc kde data=data;
bivar (X (bwm=&stdxg ngrid=&gridsize gridl=&minX gridu=&maxX ) 
       Y (bwm=&stdyg ngrid=&gridsize gridl=&minY gridu=&maxY )) / plots=all;
       freq FREQ;
run;

 

the final bandwidth used to calculate the KDE is the variance in each dimension rather than the standard deviation.  Is this what you want?

Occasional Contributor
Posts: 6

Re: bivariate kerneldensity estimation with proc KDE setting the bandwith

this was exactly my Problem. Thank you very much for your help.
Ask a Question
Discussion stats
  • 6 replies
  • 99 views
  • 0 likes
  • 3 in conversation