Re: Modelling data using inflated beta regression and PROC NLMIXED

Quantopic · Posted 08-23-2017 06:07 AM

Hi all SAS users,

I'm trying to model loss-given-default data on the basis of the SAS paper n. 1593-2014 (see http://support.sas.com/resources/papers/proceedings14/1593-2014.pdf).

Particularly, the third model proposed by the authors in such paper is based on the inflated beta regression model, that is explained with the following SAS program:

proc nlmixed data=MyData tech=quanew maxiter=3000 maxfunc=3000 qtol=0.0001;
parms b0-b14=0.0001
pie=0.2
kesai=0.3
phi=2;
cov_mu=b0+b1*Var1+b2*Var2+…+b14*Var14;
mu=logistic(cov_mu);
if RR=0
then loglikefun=log(pie)+log(1-kesai);
if RR>=1
then loglikefun=log(pie)+log(kesai);
if 0<RR<1
then loglikefun=log(1-pie)+lgamma(phi)-lgamma(mu*phi)-lgamma((1-mu)*phi)
+(mu*phi-1)*log(RR)+((1-mu)*phi-1)*log(1-RR);
predict pie*kesai+(1-pie)*mu out=Inf_beta_output (keep=instrument_id RR pred);
model RR~general(loglikefun);
run;

I would like to understand what are the parameters pie, kesai and phi and how they can be computed/estimated.

Indeed, I know that the beta distribution has 2 shape parameters, usually named alpha and beta, that can be computed on the basis of the sample mean and variance by using the method of moments, but I do not know how to to relate such parameters with ones used in the program (pie, kesai and phi).

Could you suggest some solution or explanation about such parameters?

Thanks all!

Rick_SAS · Posted 08-23-2017 08:29 AM

Those are the parameters in the mixture model on p 4-5 of the reference paper.

pie = p = mixing probability, the Greek letter 'pi'

kesai = y = the Greek letter 'psi' (the authors apparently thought this was 'xi', which is pronounced ke-sai)

phi = f = the Greek letter 'phi'

Quantopic · Posted 08-23-2017 07:25 PM

Thanks for the answer @Rick_SAS but I meant about the way to estimate them, assuming they were something like the distribution parameters.

According to you, is there a way to extrapolate such values from data?

Rick_SAS · Posted 08-23-2017 07:56 PM

The paper that you link and the code that you posted show how to fit the parameters to data. What else are you looking for?

Quantopic · Posted 08-24-2017 03:27 AM

Hi @Rick_SAS, according to me, the paper I posted in the link and the relative code show how to estimate the parameter of the inflated beta regression, while I referred to the input parameters pie, kesai and phi.

Such ones are the parameters of the distributions and in the paper are given.

I am asking for a way to get such values from data because I did not understand how to do that from the paper.

Thanks.

Rick_SAS · Posted 08-24-2017 05:24 AM

Those parameters appear in the PARMS statement. Therefore they are not fixed values, they are parameters in the model that are estimated from the data as part of the MLE. Near the top of p. 5 the authors say "The unknown parameters [including pie, kesai, and phi] are solved by standard optimization algorithms" in PROC NLMIXED.

Are you asking how to find the INITIAL GUESSES for the PARMS statement? You can often guess a value based on graphical analysis and preliminary modeling, or use the tips at "How to find an initial guess for an optimization."

If you provide example data, we could show you with code that pie, kesai, and phi are part of the parameter estimates table in the output.

Quantopic · Posted 08-24-2017 05:56 AM

Thanks @Rick_SAS for providing the link to the article about the initial guess computation.

Really useful!

Anyway, I attached a sample dataset with 1000 records and the variables used in the programs; it would be really appreciated if you could show me how to do that with an example.

Thanks again!

Rick_SAS · Posted 08-24-2017 09:17 AM

What variable is the response? Which are the independent variables?

Quantopic · Posted 08-24-2017 09:30 AM

v_lgd_da_mora_st_out is the response variable while the other ones are the independent variables; sorry for the superficiality.

Rick_SAS · Posted 08-24-2017 09:42 AM

Here is the "translation" from your variables to the variables in the paper:

proc nlmixed data=MyData tech=quanew maxiter=3000 maxfunc=3000 qtol=0.0001;
parms b0-b5=1
      pie=0.2
      kesai=0.3
      phi=2;
RR = v_lgd_da_mora_st_out;
Var1 = flg_proc_conc;
Var2 = dummy_ipo_p;
Var3 = dummy_pegno_p;
Var4 = dummy_pers_p;
Var5 = cat_syges_rt;

cov_mu=b0+b1*Var1+b2*Var2+b3*Var3+b4*Var4+b5*Var5;
mu=logistic(cov_mu);
if RR=0
then loglikefun=log(pie)+log(1-kesai);
if RR>=1
then loglikefun=log(pie)+log(kesai);
if 0<RR<1
then loglikefun=log(1-pie)+lgamma(phi)-lgamma(mu*phi)-lgamma((1-mu)*phi)
+(mu*phi-1)*log(RR)+((1-mu)*phi-1)*log(1-RR);
predict pie*kesai+(1-pie)*mu out=Inf_beta_output (keep=v_lgd_da_mora_st_out pred);
model RR~general(loglikefun);
run;

StatDave · Posted 08-23-2017 11:14 AM

If this is, as it seems, a finite mixture model, then you might find it easier to implement in PROC FMM.

Quantopic · Posted 08-23-2017 07:29 PM

Hi @StatDave and thanks for the reply.

Actually my problem is to understand how I can estimate such parameters and not to understand how to use PROC NLMIXED.

Anyway, if you suggest an example code by using the PROC FMM it will be surely appreciated.

Thanks!

SAS_Rob · Posted 08-24-2017 10:11 AM

Based on the description at the beginning of the paper, it seems they are fitting something similar to this example from the Proc FMM documentation.

http://support.sas.com/documentation/cdl/en/statug/68162/HTML/default/viewer.htm#statug_fmm_examples...

In any regard, FMM will give you estimates of the mixing probabilties directly (or they can be modeled using additional effects with the PROBMODEL statement0.