BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
berencsiklara
Calcite | Level 5

I would like to estimate the (cumulative) density function of a continuous variable (TimeToEvent) with proc kde.

 

proc KDE data=DistributionSet;
  univar TimeToEvent / percentiles=0 to 100 by 0.01;
  ods output percentiles=P1CDF;
run;

 

As I only have one variable, I use an univar statement.

For smoothing I use the SJPI method.

If I output the estimated density function, it looks just great.

 

However, in a next step, I would like to use the estimated density function to map probabilities (probabilities like P(X<=x))  to certain values of the continuous variable (TimeToEvent)  in a data base. In order to do that, I should have access to either the expression of the estimated smooth density function or a fine map of either the estimated smooth density function or the cumulative density function.

 

The only solution I could come up with is the following: I output 10000 percentile values (percentiles=0 to 100 by 0.01). However, I think these percentile values are calculated directly from the sample of the TimeToEvent variable (if I plot the values I see a stepwise function), not based on the estimated smooth density function.

 

Could you suggest a better solution on how to "access" the estimated smooth density function?

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

Sure. The KDE is a nonparametric density function (PDF) on a finite domain. To get the CDF from the PDF,  Numerically integrate the PDF. See the article "The area under s density curve: Nonparametric estimates"  The article uses PROC UNIVARIATE, but you can use PROC KDE if you prefer. 

 

If you use a very fine grid for the KDE, you should be able to closely approximate the CDF and therefore the probabilities for any interval. You can use the NGRID= option to specify how many grid points you want in the output.

 

If you require interpolation, see the article "Interpolation in SAS"  I think if you use lots of grid points (maybe1001) you won't need interpolation. After all, the KDE is only an estimate.  

 

As you say, for each test value in the data base, you can get an approximate probability by finding the first x value in the linear grid that is greater than or equal to the test value and looking up the CDF for that x.

View solution in original post

1 REPLY 1
Rick_SAS
SAS Super FREQ

Sure. The KDE is a nonparametric density function (PDF) on a finite domain. To get the CDF from the PDF,  Numerically integrate the PDF. See the article "The area under s density curve: Nonparametric estimates"  The article uses PROC UNIVARIATE, but you can use PROC KDE if you prefer. 

 

If you use a very fine grid for the KDE, you should be able to closely approximate the CDF and therefore the probabilities for any interval. You can use the NGRID= option to specify how many grid points you want in the output.

 

If you require interpolation, see the article "Interpolation in SAS"  I think if you use lots of grid points (maybe1001) you won't need interpolation. After all, the KDE is only an estimate.  

 

As you say, for each test value in the data base, you can get an approximate probability by finding the first x value in the linear grid that is greater than or equal to the test value and looking up the CDF for that x.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 1 reply
  • 1169 views
  • 1 like
  • 2 in conversation