BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tiffanie
Calcite | Level 5

Dear community,

 

I'm proceeding to a comparison of the default options for kernel distributions estimates between the "proc univariate" (HISTOGRAM statement / KERNEL option) and the "proc kde" (UNIVAR option).
There is one point where I couldn't find the information for the proc univariate it is about the number of grid points.
For the proc kde it can be chosen with the NGRID option, and is set by default to 401 points.

On the other hand it seems that for the proc univariate it is fixed without the possibility of modifying it, but I cannot find what its value is by default ?

Could you help me on the subject please?

 

Thanks a lot.

Tiffanie.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

There isn't a simple answer, but if you don't use the LOWER= or UPPER= options, the OUTKERNEL= data set is formed by doing the following:

1. Divide the range (max-min) by 128. This is the step size, dx = (max-min)/128.

2. Evaluate the kernel on the 128 intervals whose endpoints min, min+dx, min+2*dx, ..., max.

3. Usually, we can't stop there, because we want the KDE to integrate to unity over the support of the distribution. So start adding more grid points before x=min and after x=max until the integral is approximately 1.  For example, on the left add the points 

..., min-3*dx, min-2*dx, min-dx

and on the right add the points

max+dx, max+2*dx, max+3*dx, ...

4. Stop adding points in the tail when the tail area is inconsequential, such as less than 1E-6.

 

As a result, you'll always get at least 128 points, but sometimes you will get 160 or 170 or more points. It depends on the area in the tails of the distribution, which depends on the data and on the bandwidth of the kernel.

View solution in original post

3 REPLIES 3
ballardw
Super User

I am not sure what you mean by "grid points" in terms of Proc Univariate. If you mean the tickmarks used by the GRID statement, you can set a value list with the VAXIS option such as in:

proc univariate data=sashelp.stocks;
   var close;
   histogram /vaxis = (0 to 25 by 5);
run;

 

You can control the number of bars using BARWIDTH, to specify how wide each bar of the histogram should be, or the Midpoints to list the center of each displayed bar.

Rick_SAS
SAS Super FREQ

There isn't a simple answer, but if you don't use the LOWER= or UPPER= options, the OUTKERNEL= data set is formed by doing the following:

1. Divide the range (max-min) by 128. This is the step size, dx = (max-min)/128.

2. Evaluate the kernel on the 128 intervals whose endpoints min, min+dx, min+2*dx, ..., max.

3. Usually, we can't stop there, because we want the KDE to integrate to unity over the support of the distribution. So start adding more grid points before x=min and after x=max until the integral is approximately 1.  For example, on the left add the points 

..., min-3*dx, min-2*dx, min-dx

and on the right add the points

max+dx, max+2*dx, max+3*dx, ...

4. Stop adding points in the tail when the tail area is inconsequential, such as less than 1E-6.

 

As a result, you'll always get at least 128 points, but sometimes you will get 160 or 170 or more points. It depends on the area in the tails of the distribution, which depends on the data and on the bandwidth of the kernel.

Tiffanie
Calcite | Level 5
Thks a lot for your answer very clear. That was exactly chat i wanted to know

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 695 views
  • 2 likes
  • 3 in conversation