Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tiffanie
Calcite | Level 5

Dear community,

 

I'm proceeding to a comparison of the default options for kernel distributions estimates between the "proc univariate" (HISTOGRAM statement / KERNEL option) and the "proc kde" (UNIVAR option).
There is one point where I couldn't find the information for the proc univariate it is about the number of grid points.
For the proc kde it can be chosen with the NGRID option, and is set by default to 401 points.

On the other hand it seems that for the proc univariate it is fixed without the possibility of modifying it, but I cannot find what its value is by default ?

Could you help me on the subject please?

 

Thanks a lot.

Tiffanie.

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

There isn't a simple answer, but if you don't use the LOWER= or UPPER= options, the OUTKERNEL= data set is formed by doing the following:

1. Divide the range (max-min) by 128. This is the step size, dx = (max-min)/128.

2. Evaluate the kernel on the 128 intervals whose endpoints min, min+dx, min+2*dx, ..., max.

3. Usually, we can't stop there, because we want the KDE to integrate to unity over the support of the distribution. So start adding more grid points before x=min and after x=max until the integral is approximately 1.  For example, on the left add the points 

..., min-3*dx, min-2*dx, min-dx

and on the right add the points

max+dx, max+2*dx, max+3*dx, ...

4. Stop adding points in the tail when the tail area is inconsequential, such as less than 1E-6.

 

As a result, you'll always get at least 128 points, but sometimes you will get 160 or 170 or more points. It depends on the area in the tails of the distribution, which depends on the data and on the bandwidth of the kernel.

View solution in original post

3 REPLIES 3
ballardw
Super User

I am not sure what you mean by "grid points" in terms of Proc Univariate. If you mean the tickmarks used by the GRID statement, you can set a value list with the VAXIS option such as in:

proc univariate data=sashelp.stocks;
   var close;
   histogram /vaxis = (0 to 25 by 5);
run;

 

You can control the number of bars using BARWIDTH, to specify how wide each bar of the histogram should be, or the Midpoints to list the center of each displayed bar.

Rick_SAS
SAS Super FREQ

There isn't a simple answer, but if you don't use the LOWER= or UPPER= options, the OUTKERNEL= data set is formed by doing the following:

1. Divide the range (max-min) by 128. This is the step size, dx = (max-min)/128.

2. Evaluate the kernel on the 128 intervals whose endpoints min, min+dx, min+2*dx, ..., max.

3. Usually, we can't stop there, because we want the KDE to integrate to unity over the support of the distribution. So start adding more grid points before x=min and after x=max until the integral is approximately 1.  For example, on the left add the points 

..., min-3*dx, min-2*dx, min-dx

and on the right add the points

max+dx, max+2*dx, max+3*dx, ...

4. Stop adding points in the tail when the tail area is inconsequential, such as less than 1E-6.

 

As a result, you'll always get at least 128 points, but sometimes you will get 160 or 170 or more points. It depends on the area in the tails of the distribution, which depends on the data and on the bandwidth of the kernel.

Tiffanie
Calcite | Level 5
Thks a lot for your answer very clear. That was exactly chat i wanted to know

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 3 replies
  • 893 views
  • 2 likes
  • 3 in conversation