BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hua
Obsidian | Level 7 hua
Obsidian | Level 7

Hello all,

 

I'm trying to fit my precipitation data to a generalized pareto distribution and find the 95th percentile . And of the three parameters in generalized pareto distribution (sigma, mu, and xi), sigma and mu are fixed, parameter xi is unknown.

I'm considering to use maximum likelihood to estimate xi of generalized pareto distribution, but I don't know what procedure should I use.

I tried the "proc nlin":

proc nlin data= have;
parameters kai=1;
model F=(1/sigma)*((1+xi*((Prep-mu)/sigma)**(-1-1/xi)));
run;

Prep as the precipitation samples, F as corresponding frequencies (probabilities), mu and sigma are fixed parameters. But under proc nlin, the estimation methods I could choose don't include maximum likelihood. 

Anyone has any ideas about using maximum likelihood to estimate xi in generalized pareto distribution?

Or if there has any other method to fit my data to a generalized pareto distribution with fixed mu and sigma and get the parameter xi and get the percentile? 

I will really appreciate if there has any ideas or thoughts from you all!

 

Regards,

Hua

1 ACCEPTED SOLUTION

Accepted Solutions
Rick_SAS
SAS Super FREQ

> 1. I could get the parameters now, but did not see where is the percentile. And don't know how to output them in a new table(dataset).

 

You didn't specify the SIGMA= value. You originally said that the threshold and scale parameters are fixed.

 

You can use the ODS OUTPUT statement to create a data set from any SAS table. The FitQuantiles table contains the results of the PERCENT= option, but the table will only appear if the MLE for the Pareto distribution converges. Here is an example that generates the tables and saves the estimates:

 

 

ods trace on;
proc univariate data=sashelp.cars;
where EngineSize>3;
   var enginesize;
   histogram enginesize / pareto(theta=2.9 sigma=1.5 percents=(5 95));
   ods output ParameterEstimates=PE FitQuantiles=FQ;
   ods select Histogram ParameterEstimates FitQuantiles GoodnessOfFit;
run; 
ods trace off;

 

> 2. Since I have a lot of groups with different parameters, it's not possible to manually write every corresponding parameter in each code. And I tried but it seems like I could not use one variable in the dataset as a parameter in the code. What I means is : I add a column named "a" in dataset "have", and use THETA=a in the pareto options.

 

I don't think PROC UNIVARIATE supports an option like that. 

 

> 3. Do you know how can I define :

  •  width of histogram interval

  •  vertical scaling factor

 

You don't need to worry about those parameters. They appear in the formula for a mathematical reason (so that the density integrates to 1 on whatever vertical scale is being used.)  PROC UNIVARIATE handles those values automatically.

 

> 4. How can I know that how does the data fit the distribution, like R2, how to evaluate this fitting.

 

Look at the GoodnessOfFit table, which provides GOF statistics such as Kolmogorov-Smirnov and Anderson-Darling tests.

 

View solution in original post

9 REPLIES 9
PGStats
Opal | Level 21

Proc univariate fits generalized Pareto distributions with the histogram statement. Check the PARETO and PERCENTS options.

PG
hua
Obsidian | Level 7 hua
Obsidian | Level 7

@PGStats wrote:

Proc univariate fits generalized Pareto distributions with the histogram statement. Check the PARETO and PERCENTS options.


 

Thank you for giving ideas that using proc univariate, I viewed the SAS Procedures Guide about proc univariate, and used following code:

proc univariate data=have;
histogram Prep/ pareto(THETA= 3 PERCENT=0.95);
ods select histogram;
run;

 

But I still have some questions:

1. I could get the parameters now, but did not see where is the percentile. And don't know how to output them in a new table(dataset).

2. Since I have a lot of groups with different parameters, it's not possible to manually write every corresponding parameter in each code. And I tried but it seems like I could not use one variable in the dataset as a parameter in the code. What I means is : I add a column named "a" in dataset "have", and use THETA=a in the pareto options.

3. Do you know how can I define :

  • width of histogram interval

  • vertical scaling factor

these two parameters? It seems like the histogram I get from the code is automatically have an interval? Where should I define them?

4. How can I know that how does the data fit the distribution, like R2, how to evaluate this fitting.

 

Really thank you for help! It helped me a lot!

 

Best,

Hua

Rick_SAS
SAS Super FREQ

As @PGStats says, PROC UNIVARIATE can fit the generalized Pareto distribution. However, to address your question, you can use PROC NLMIXED (without a random component) to fit data using MLE. For details and an example, see "Two ways to compute maximum likelihood estimates in SAS."

Rick_SAS
SAS Super FREQ

> 1. I could get the parameters now, but did not see where is the percentile. And don't know how to output them in a new table(dataset).

 

You didn't specify the SIGMA= value. You originally said that the threshold and scale parameters are fixed.

 

You can use the ODS OUTPUT statement to create a data set from any SAS table. The FitQuantiles table contains the results of the PERCENT= option, but the table will only appear if the MLE for the Pareto distribution converges. Here is an example that generates the tables and saves the estimates:

 

 

ods trace on;
proc univariate data=sashelp.cars;
where EngineSize>3;
   var enginesize;
   histogram enginesize / pareto(theta=2.9 sigma=1.5 percents=(5 95));
   ods output ParameterEstimates=PE FitQuantiles=FQ;
   ods select Histogram ParameterEstimates FitQuantiles GoodnessOfFit;
run; 
ods trace off;

 

> 2. Since I have a lot of groups with different parameters, it's not possible to manually write every corresponding parameter in each code. And I tried but it seems like I could not use one variable in the dataset as a parameter in the code. What I means is : I add a column named "a" in dataset "have", and use THETA=a in the pareto options.

 

I don't think PROC UNIVARIATE supports an option like that. 

 

> 3. Do you know how can I define :

  •  width of histogram interval

  •  vertical scaling factor

 

You don't need to worry about those parameters. They appear in the formula for a mathematical reason (so that the density integrates to 1 on whatever vertical scale is being used.)  PROC UNIVARIATE handles those values automatically.

 

> 4. How can I know that how does the data fit the distribution, like R2, how to evaluate this fitting.

 

Look at the GoodnessOfFit table, which provides GOF statistics such as Kolmogorov-Smirnov and Anderson-Darling tests.

 

hua
Obsidian | Level 7 hua
Obsidian | Level 7

@Rick_SAS wrote:

> 1. I could get the parameters now, but did not see where is the percentile. And don't know how to output them in a new table(dataset).

 

You didn't specify the SIGMA= value. You originally said that the threshold and scale parameters are fixed.

 

You can use the ODS OUTPUT statement to create a data set from any SAS table. The FitQuantiles table contains the results of the PERCENT= option, but the table will only appear if the MLE for the Pareto distribution converges. Here is an example that generates the tables and saves the estimates:

 

 

ods trace on;
proc univariate data=sashelp.cars;
where EngineSize>3;
   var enginesize;
   histogram enginesize / pareto(theta=2.9 sigma=1.5 percents=(5 95));
   ods output ParameterEstimates=PE FitQuantiles=FQ;
   ods select Histogram ParameterEstimates FitQuantiles GoodnessOfFit;
run; 
ods trace off;

 

> 2. Since I have a lot of groups with different parameters, it's not possible to manually write every corresponding parameter in each code. And I tried but it seems like I could not use one variable in the dataset as a parameter in the code. What I means is : I add a column named "a" in dataset "have", and use THETA=a in the pareto options.

 

I don't think PROC UNIVARIATE supports an option like that. 

 

> 3. Do you know how can I define :

  •  width of histogram interval

  •  vertical scaling factor

 

You don't need to worry about those parameters. They appear in the formula for a mathematical reason (so that the density integrates to 1 on whatever vertical scale is being used.)  PROC UNIVARIATE handles those values automatically.

 

> 4. How can I know that how does the data fit the distribution, like R2, how to evaluate this fitting.

 

Look at the GoodnessOfFit table, which provides GOF statistics such as Kolmogorov-Smirnov and Anderson-Darling tests.

 


Thank you so much for your answer.  It's very helpful for me. And I'm still reading the website you provided about PROC NLMIXED, hope it can help me in some way.

hua
Obsidian | Level 7 hua
Obsidian | Level 7
Do you know if I can define h, which is the width of histogram interval?
hua
Obsidian | Level 7 hua
Obsidian | Level 7

Yes! Thank you so much, I tried the code you provided, it works well, but still I'm wondering if I could output the parameters and percentile in a new dataset, not only displayed in a view window.

 

Rick_SAS
SAS Super FREQ

I've already answered that question, and the code I provided writes two data set. Read the article 

"ODS OUTPUT: Store any statistic created by any SAS procedure"

If you run PROC PRINT or PROC CONTENTS on the 'PE' and 'FQ' data sets, you should see the information you want.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 2011 views
  • 4 likes
  • 3 in conversation