Solved
Established User
Posts: 1

# How to get the estimated density function from proc kde

I would like to estimate the (cumulative) density function of a continuous variable (TimeToEvent) with proc kde.

```proc KDE data=DistributionSet;
univar TimeToEvent / percentiles=0 to 100 by 0.01;
ods output percentiles=P1CDF;
run;```

As I only have one variable, I use an univar statement.

For smoothing I use the SJPI method.

If I output the estimated density function, it looks just great.

However, in a next step, I would like to use the estimated density function to map probabilities (probabilities like P(X<=x))  to certain values of the continuous variable (TimeToEvent)  in a data base. In order to do that, I should have access to either the expression of the estimated smooth density function or a fine map of either the estimated smooth density function or the cumulative density function.

The only solution I could come up with is the following: I output 10000 percentile values (percentiles=0 to 100 by 0.01). However, I think these percentile values are calculated directly from the sample of the TimeToEvent variable (if I plot the values I see a stepwise function), not based on the estimated smooth density function.

Could you suggest a better solution on how to "access" the estimated smooth density function?

Accepted Solutions
Solution
‎12-09-2016 08:59 AM
SAS Super FREQ
Posts: 3,900

## Re: How to get the estimated density function from proc kde

Sure. The KDE is a nonparametric density function (PDF) on a finite domain. To get the CDF from the PDF,  Numerically integrate the PDF. See the article "The area under s density curve: Nonparametric estimates"  The article uses PROC UNIVARIATE, but you can use PROC KDE if you prefer.

If you use a very fine grid for the KDE, you should be able to closely approximate the CDF and therefore the probabilities for any interval. You can use the NGRID= option to specify how many grid points you want in the output.

If you require interpolation, see the article "Interpolation in SAS"  I think if you use lots of grid points (maybe1001) you won't need interpolation. After all, the KDE is only an estimate.

As you say, for each test value in the data base, you can get an approximate probability by finding the first x value in the linear grid that is greater than or equal to the test value and looking up the CDF for that x.

All Replies
Solution
‎12-09-2016 08:59 AM
SAS Super FREQ
Posts: 3,900

## Re: How to get the estimated density function from proc kde

Sure. The KDE is a nonparametric density function (PDF) on a finite domain. To get the CDF from the PDF,  Numerically integrate the PDF. See the article "The area under s density curve: Nonparametric estimates"  The article uses PROC UNIVARIATE, but you can use PROC KDE if you prefer.

If you use a very fine grid for the KDE, you should be able to closely approximate the CDF and therefore the probabilities for any interval. You can use the NGRID= option to specify how many grid points you want in the output.

If you require interpolation, see the article "Interpolation in SAS"  I think if you use lots of grid points (maybe1001) you won't need interpolation. After all, the KDE is only an estimate.

As you say, for each test value in the data base, you can get an approximate probability by finding the first x value in the linear grid that is greater than or equal to the test value and looking up the CDF for that x.

☑ This topic is solved.

Discussion stats