SAS Procedures

hua · Posted 10-22-2017 04:40 PM

Hello all,

I'm trying to fit my precipitation data to a generalized pareto distribution and find the 95th percentile . And of the three parameters in generalized pareto distribution (sigma, mu, and xi), sigma and mu are fixed, parameter xi is unknown.

I'm considering to use maximum likelihood to estimate xi of generalized pareto distribution, but I don't know what procedure should I use.

I tried the "proc nlin":

proc nlin data= have;
parameters kai=1;
model F=(1/sigma)*((1+xi*((Prep-mu)/sigma)**(-1-1/xi)));
run;

Prep as the precipitation samples, F as corresponding frequencies (probabilities), mu and sigma are fixed parameters. But under proc nlin, the estimation methods I could choose don't include maximum likelihood.

Anyone has any ideas about using maximum likelihood to estimate xi in generalized pareto distribution?

Or if there has any other method to fit my data to a generalized pareto distribution with fixed mu and sigma and get the parameter xi and get the percentile?

I will really appreciate if there has any ideas or thoughts from you all!

Regards,

Hua

Rick_SAS · Posted 10-26-2017 10:33 AM

> 1. I could get the parameters now, but did not see where is the percentile. And don't know how to output them in a new table(dataset).

You didn't specify the SIGMA= value. You originally said that the threshold and scale parameters are fixed.

You can use the ODS OUTPUT statement to create a data set from any SAS table. The FitQuantiles table contains the results of the PERCENT= option, but the table will only appear if the MLE for the Pareto distribution converges. Here is an example that generates the tables and saves the estimates:

ods trace on;
proc univariate data=sashelp.cars;
where EngineSize>3;
   var enginesize;
   histogram enginesize / pareto(theta=2.9 sigma=1.5 percents=(5 95));
   ods output ParameterEstimates=PE FitQuantiles=FQ;
   ods select Histogram ParameterEstimates FitQuantiles GoodnessOfFit;
run; 
ods trace off;

> 2. Since I have a lot of groups with different parameters, it's not possible to manually write every corresponding parameter in each code. And I tried but it seems like I could not use one variable in the dataset as a parameter in the code. What I means is : I add a column named "a" in dataset "have", and use THETA=a in the pareto options.

I don't think PROC UNIVARIATE supports an option like that.

> 3. Do you know how can I define :

$\text{[math]}$ width of histogram interval
$\text{[math]}$ vertical scaling factor

You don't need to worry about those parameters. They appear in the formula for a mathematical reason (so that the density integrates to 1 on whatever vertical scale is being used.) PROC UNIVARIATE handles those values automatically.

> 4. How can I know that how does the data fit the distribution, like R2, how to evaluate this fitting.

Look at the GoodnessOfFit table, which provides GOF statistics such as Kolmogorov-Smirnov and Anderson-Darling tests.

View solution in original post

PGStats · Posted 10-23-2017 12:12 AM

Proc univariate fits generalized Pareto distributions with the histogram statement. Check the PARETO and PERCENTS options.

PG

hua · Posted 10-25-2017 05:34 PM

@PGStats wrote:

Proc univariate fits generalized Pareto distributions with the histogram statement. Check the PARETO and PERCENTS options.

Thank you for giving ideas that using proc univariate, I viewed the SAS Procedures Guide about proc univariate, and used following code:

proc univariate data=have;
histogram Prep/ pareto(THETA= 3 PERCENT=0.95);
ods select histogram;
run;

But I still have some questions:

1. I could get the parameters now, but did not see where is the percentile. And don't know how to output them in a new table(dataset).

2. Since I have a lot of groups with different parameters, it's not possible to manually write every corresponding parameter in each code. And I tried but it seems like I could not use one variable in the dataset as a parameter in the code. What I means is : I add a column named "a" in dataset "have", and use THETA=a in the pareto options.

3. Do you know how can I define :

$\text{[math]}$ width of histogram interval
$\text{[math]}$ vertical scaling factor

these two parameters? It seems like the histogram I get from the code is automatically have an interval? Where should I define them?

4. How can I know that how does the data fit the distribution, like R2, how to evaluate this fitting.

Really thank you for help! It helped me a lot!

Best,

Hua

Rick_SAS · Posted 10-24-2017 04:01 PM

As @PGStats says, PROC UNIVARIATE can fit the generalized Pareto distribution. However, to address your question, you can use PROC NLMIXED (without a random component) to fit data using MLE. For details and an example, see "Two ways to compute maximum likelihood estimates in SAS."

Rick_SAS · Posted 10-26-2017 10:33 AM

> 1. I could get the parameters now, but did not see where is the percentile. And don't know how to output them in a new table(dataset).

You didn't specify the SIGMA= value. You originally said that the threshold and scale parameters are fixed.

You can use the ODS OUTPUT statement to create a data set from any SAS table. The FitQuantiles table contains the results of the PERCENT= option, but the table will only appear if the MLE for the Pareto distribution converges. Here is an example that generates the tables and saves the estimates:

ods trace on;
proc univariate data=sashelp.cars;
where EngineSize>3;
   var enginesize;
   histogram enginesize / pareto(theta=2.9 sigma=1.5 percents=(5 95));
   ods output ParameterEstimates=PE FitQuantiles=FQ;
   ods select Histogram ParameterEstimates FitQuantiles GoodnessOfFit;
run; 
ods trace off;

> 2. Since I have a lot of groups with different parameters, it's not possible to manually write every corresponding parameter in each code. And I tried but it seems like I could not use one variable in the dataset as a parameter in the code. What I means is : I add a column named "a" in dataset "have", and use THETA=a in the pareto options.

I don't think PROC UNIVARIATE supports an option like that.

> 3. Do you know how can I define :

$\text{[math]}$ width of histogram interval
$\text{[math]}$ vertical scaling factor

You don't need to worry about those parameters. They appear in the formula for a mathematical reason (so that the density integrates to 1 on whatever vertical scale is being used.) PROC UNIVARIATE handles those values automatically.

> 4. How can I know that how does the data fit the distribution, like R2, how to evaluate this fitting.

Look at the GoodnessOfFit table, which provides GOF statistics such as Kolmogorov-Smirnov and Anderson-Darling tests.

hua · Posted 10-26-2017 11:02 AM

@Rick_SAS wrote:

> 1. I could get the parameters now, but did not see where is the percentile. And don't know how to output them in a new table(dataset).

You didn't specify the SIGMA= value. You originally said that the threshold and scale parameters are fixed.

You can use the ODS OUTPUT statement to create a data set from any SAS table. The FitQuantiles table contains the results of the PERCENT= option, but the table will only appear if the MLE for the Pareto distribution converges. Here is an example that generates the tables and saves the estimates:
ods trace on;
proc univariate data=sashelp.cars;
where EngineSize>3;
   var enginesize;
   histogram enginesize / pareto(theta=2.9 sigma=1.5 percents=(5 95));
   ods output ParameterEstimates=PE FitQuantiles=FQ;
   ods select Histogram ParameterEstimates FitQuantiles GoodnessOfFit;
run; 
ods trace off;
> 2. Since I have a lot of groups with different parameters, it's not possible to manually write every corresponding parameter in each code. And I tried but it seems like I could not use one variable in the dataset as a parameter in the code. What I means is : I add a column named "a" in dataset "have", and use THETA=a in the pareto options.

I don't think PROC UNIVARIATE supports an option like that.

> 3. Do you know how can I define :

$\text{[math]}$ width of histogram interval

$\text{[math]}$ vertical scaling factor

You don't need to worry about those parameters. They appear in the formula for a mathematical reason (so that the density integrates to 1 on whatever vertical scale is being used.) PROC UNIVARIATE handles those values automatically.

> 4. How can I know that how does the data fit the distribution, like R2, how to evaluate this fitting.

Look at the GoodnessOfFit table, which provides GOF statistics such as Kolmogorov-Smirnov and Anderson-Darling tests.

Thank you so much for your answer. It's very helpful for me. And I'm still reading the website you provided about PROC NLMIXED, hope it can help me in some way.

hua · Posted 10-26-2017 11:09 AM

Do you know if I can define h, which is the width of histogram interval?

Rick_SAS · Posted 10-26-2017 11:15 AM

Yes. See the article "Choosing bins for histograms in SAS", which discusses the MIDPOINTS= and ENDPOINTS= options on the HISTOGRAM statement.

hua · Posted 10-27-2017 03:50 PM

Yes! Thank you so much, I tried the code you provided, it works well, but still I'm wondering if I could output the parameters and percentile in a new dataset, not only displayed in a view window.

Rick_SAS · Posted 10-27-2017 04:03 PM

I've already answered that question, and the code I provided writes two data set. Read the article

"ODS OUTPUT: Store any statistic created by any SAS procedure"

If you run PROC PRINT or PROC CONTENTS on the 'PE' and 'FQ' data sets, you should see the information you want.

SAS Procedures

Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Re: Using maximum likelihood to estimate parameter of generalized pareto distribution

Follow Us

What is...

SAS Procedures

Our biggest data and AI event of the year.

SAS Training: Just a Click Away

Follow Us

What is...