BookmarkSubscribeRSS Feed
lalilu
Calcite | Level 5
Dear community, I don't know if this is the right location of my post, but my question is the following: I want to estimate a bivariate KDE and i want to set the bwm seperatly. I can do this until a number of 20. But if i want to set it like it is recommended in the manual of proc kde (calculate a plug in from std of the variables) it is not possible and I get the error message "The SAS System stopped processing this step because of insufficient memory". In addition i choosed ngrid=400 gridl=5799520 gridu=5837259. I'm working with SAS Studio. Any ideas why this appear? In R the same algorithm with the same values is working very well. Thanks for help in advance! lalilu
6 REPLIES 6
Rick_SAS
SAS Super FREQ

For an example, see http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_kde_examples...

The memory problem might be caused by using too large a value for NGRID=. If you double that number, SAS uses four times as much memory and computation.  Try dialing back that number to NGRID=150 or 200, which is still a very fine grid.

 

If that doesn't fix the problem, please post your complete PROC KDE code.

lalilu
Calcite | Level 5
Dear Rick, thanks for your answer. I've already tried this. I just can set ngrid to 10 that it is working. To explain a bit more what I'm doing: I'm translating an existing function from R (https://www.rdocumentation.org/packages/Kernelheaping/versions/1.5/topics/dshapebivr) into SAS. This algoithm is able to estimate the density of anonymised geographical data. I'm working with the election data of Berlin (Germany). You are right 100 should be fine, then I get exact data for around 0.5 km^2. And until now there is no method available to plot it like image.plot in R, right? You can find my code and the data in the attachement, just add your path for reading in. I've already programmed all, and i get also adequate estimate without defining bwm. In that case it is not clear for me why they are good because the literature says something different. Thanks again for your help.
lalilu
Calcite | Level 5
a little correction: and i get also adequate estimate without defining bwm inside my algorithm. In the first kde where I calcluated a pilotestimat i choosed the bwm in the way that i cutted the number until it worked (for example 8000 to 8). I know this is not a good approach, but the result is ok.
Rick_SAS
SAS Super FREQ

there is no method available to plot it like image.plot in R, right?

 

PROC KDE provides the PLOTS= option which can plot a surface plot or contour plot of the density estimate. If you need something fancier, you can also output the KDE and use SAS graphics to visualize the estimate.  For example, see

How to create a surface plot in SAS

How to create a contour plot in SAS

 

dougc
SAS Employee

Let me refer to the manual with regard to bivariate bandwidth selection:

 

For the bivariate case, Wand and Jones (1993) note that automatic bandwidth selection is both difficult and computationally expensive. Their study of various ways of specifying a bandwidth matrix also shows that using two bandwidths, one in each coordinate’s direction, is often adequate. PROC KDE enables you to adjust the two bandwidths by specifying a multiplier for the default bandwidths recommended by Bowman and Foster (1993😞

 

\begin{eqnarray*} h_{X} & =& {\hat\sigma }_{X}n^{-1/6} \\ h_{Y} & =& {\hat\sigma }_{Y}n^{-1/6} \end{eqnarray*}

 

Here ${\hat\sigma }_{X}$ and ${\hat\sigma }_{Y}$ are the sample standard deviations of X and Y, respectively. These are the optimal bandwidths for two independent normal variables that have the same variances as X and Y. They are, therefore, conservative in the sense that they tend to oversmooth the surface.

 

The bandwidth calculation due to Bowman and Foster is performed internally by PROC KDE, and the initial bandwidths are set accordingly.

 

You can specify the BWM= option to adjust the aforementioned bandwidths to provide the appropriate amount of smoothing for your application.

 

The final bandwidth used for computing the KDE is the initial bandwidth times BWM.  If you want more smoothing than the default, set BWM > 1.0.  If you want less smoothing than the default, set BMW < 1.0. 

 

Your IML code

 

*  calculate bwm plug-in;
proc iml;
    use data;
    read all var {X Y FREQ} into xy;
    n=sum(xy[,3]);
    stdx=std(xy[,1])/(n**(1/6));
    call symput("stdxg", char(stdx));
    stdy=std(xy[,2])/(n**(1/6));
    call symput("stdyg", char(stdy));
quit;

 

duplicates the internal initial bandwidth calculation.  When you set BWM to &stdxg and &stdyg in

 

proc kde data=data;
bivar (X (bwm=&stdxg ngrid=&gridsize gridl=&minX gridu=&maxX ) 
       Y (bwm=&stdyg ngrid=&gridsize gridl=&minY gridu=&maxY )) / plots=all;
       freq FREQ;
run;

 

the final bandwidth used to calculate the KDE is the variance in each dimension rather than the standard deviation.  Is this what you want?

lalilu
Calcite | Level 5
this was exactly my Problem. Thank you very much for your help.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1563 views
  • 0 likes
  • 3 in conversation