turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS/GRAPH and ODS Graphics
- /
- How to get the estimated density function from pro...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-09-2016 06:07 AM

I would like to estimate the (cumulative) density function of a continuous variable (TimeToEvent) with proc kde.

proc KDE data=DistributionSet; univar TimeToEvent / percentiles=0 to 100 by 0.01; ods output percentiles=P1CDF; run;

As I only have one variable, I use an univar statement.

For smoothing I use the SJPI method.

If I output the estimated density function, it looks just great.

However, in a next step, I would like to use the estimated density function to map probabilities (probabilities like P(X<=x)) to certain values of the continuous variable (TimeToEvent) in a data base. In order to do that, I should have access to either the expression of the estimated smooth density function or a fine map of either the estimated smooth density function or the cumulative density function.

The only solution I could come up with is the following: I output 10000 percentile values (percentiles=0 to 100 by 0.01). However, I think these percentile values are calculated directly from the sample of the TimeToEvent variable (if I plot the values I see a stepwise function), not based on the estimated smooth density function.

Could you suggest a better solution on how to "access" the estimated smooth density function?

Accepted Solutions

Solution

12-09-2016
08:59 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-09-2016 08:41 AM

Sure. The KDE is a nonparametric density function (PDF) on a finite domain. To get the CDF from the PDF, Numerically integrate the PDF. See the article "The area under s density curve: Nonparametric estimates" The article uses PROC UNIVARIATE, but you can use PROC KDE if you prefer.

If you use a very fine grid for the KDE, you should be able to closely approximate the CDF and therefore the probabilities for any interval. You can use the NGRID= option to specify how many grid points you want in the output.

If you require interpolation, see the article "Interpolation in SAS" I think if you use lots of grid points (maybe1001) you won't need interpolation. After all, the KDE is only an estimate.

As you say, for each test value in the data base, you can get an approximate probability by finding the first x value in the linear grid that is greater than or equal to the test value and looking up the CDF for that x.

All Replies

Solution

12-09-2016
08:59 AM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

12-09-2016 08:41 AM

Sure. The KDE is a nonparametric density function (PDF) on a finite domain. To get the CDF from the PDF, Numerically integrate the PDF. See the article "The area under s density curve: Nonparametric estimates" The article uses PROC UNIVARIATE, but you can use PROC KDE if you prefer.

If you use a very fine grid for the KDE, you should be able to closely approximate the CDF and therefore the probabilities for any interval. You can use the NGRID= option to specify how many grid points you want in the output.

If you require interpolation, see the article "Interpolation in SAS" I think if you use lots of grid points (maybe1001) you won't need interpolation. After all, the KDE is only an estimate.

As you say, for each test value in the data base, you can get an approximate probability by finding the first x value in the linear grid that is greater than or equal to the test value and looking up the CDF for that x.