turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- Analytics
- /
- Stat Procs
- /
- Non-linear estimation procedure; quasi maximum lik...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-19-2011 09:07 AM

Hello guys,

I have a problem integrating a regression function inside SAS.

Essentially, I want to run a regression with the recovery rate of loans as dependant and the size of the loan as explanatory variable. However, -> non-normal distribution, heteroscedasticity ETC. huge problems. I tried log forms of explanatory etc. does not help.

Academia suggests a solution to this problem based on a model developed by Papke & Wooldrige (1996). I have to integrate their regression model into SAS or Enterprise Guide.

They define the log-log function as E(x|y)=G(xß) where G(xß) = e^-e^-xß which is pretty standard.

Following up on Papke their non-linear estimation procedure maximizes a Bernoulli log-likelihood function:

l (b) = y[logG*(xb)] + (1-y)*log[1-G*(xb)]

They further postulate that their quasi maximum likelihood estimators are consistent and asymptotically normal.

Does anybody have an idea, what code i need to create or if Enterprise Guide can do this function with a little extra programming?

I have a problem integrating a regression function inside SAS.

Essentially, I want to run a regression with the recovery rate of loans as dependant and the size of the loan as explanatory variable. However, -> non-normal distribution, heteroscedasticity ETC. huge problems. I tried log forms of explanatory etc. does not help.

Academia suggests a solution to this problem based on a model developed by Papke & Wooldrige (1996). I have to integrate their regression model into SAS or Enterprise Guide.

They define the log-log function as E(x|y)=G(xß) where G(xß) = e^-e^-xß which is pretty standard.

Following up on Papke their non-linear estimation procedure maximizes a Bernoulli log-likelihood function:

l (b) = y[logG*(xb)] + (1-y)*log[1-G*(xb)]

They further postulate that their quasi maximum likelihood estimators are consistent and asymptotically normal.

Does anybody have an idea, what code i need to create or if Enterprise Guide can do this function with a little extra programming?

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

01-19-2011 12:27 PM

What is G*(xb) in the log-likelihood function? Assuming that G*(xb) should really be G(xb) and that y[logG*(xb)] should be y*log[G(xb)], the problem can be handled quite easily using the NLIN or NLMIXED procedures in SAS/STAT and probably using the MODEL procedure in SAS/ETS. Since I am most familiar with the syntax for the NLMIXED procedure, I will provide code for that.

proc nlmixed data=mydata;

eta = b0 + b1*x1 + b2*x2 + ... ;

mu = exp(-exp(eta));

ll = y*log(mu) + (1-y)*log(1-mu);

model y ~ general(ll);

run;

If my assumptions about G*(xb) and y[logG*(xb)] are wrong, then please clarify the problem.

proc nlmixed data=mydata;

eta = b0 + b1*x1 + b2*x2 + ... ;

mu = exp(-exp(eta));

ll = y*log(mu) + (1-y)*log(1-mu);

model y ~ general(ll);

run;

If my assumptions about G*(xb) and y[logG*(xb)] are wrong, then please clarify the problem.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Dale

01-19-2011 04:57 PM

Hello Dale,

First of all, thank you for your answer!

You are right about your assumptions, I meant G(xb) and y*logG(xb).

However, can you generate the code for SAS/ETS? I am not the world's best SAS-programmer admittedly, and I do not know how to get the program to utilize quasi maximum likelihood estimators, etc...

You would save my life

Kind regards,

Lasse

First of all, thank you for your answer!

You are right about your assumptions, I meant G(xb) and y*logG(xb).

However, can you generate the code for SAS/ETS? I am not the world's best SAS-programmer admittedly, and I do not know how to get the program to utilize quasi maximum likelihood estimators, etc...

You would save my life

Kind regards,

Lasse

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

01-19-2011 07:29 PM

Lasse,

I should note that in order to compute quasi likelihood estimates, a function that specifies the mean and variance of the response is employed. To this point in time, you have provided a mean function and a likelihood function. That allows obtaining a maximum likelihood estimate of the parameters.

If you want to employ a quasi likelihood estimator (to handle overdispersion?), then you need to also specify the variance function. With a variance function, you could probably obtain estimates via quasi likelihood using the GENMOD procedure.

I am not really experienced with the ETS module in SAS. My experience is almost entirely with SAS/STAT. The NLMIXED code which I previously provided will produce ML estimates according to the log-likelihood function you provided. (Given that the NLMIXED code is really quite simple, I don't think you should have any trouble implementing that code if you are willing to employ ML instead of quasi-likelihood.) I would note that you can add overdispersion into the model estimated via the NLMIXED procedure by adding a random intercept term to the linear predictor. So, there are options for dealing with overdispersion even if you employ ML estimation.

In order to provide any more assistance, I would need to know the variance function - in which case I could probably help construct PROC GENMOD code which would return quasi-likelihood estimates OR I can help you construct code which incorporates a random effect to account for overdispersion employing some variation of the NLMIXED code previously specified. But I can't really help you with code employing the ETS module.

I should note that in order to compute quasi likelihood estimates, a function that specifies the mean and variance of the response is employed. To this point in time, you have provided a mean function and a likelihood function. That allows obtaining a maximum likelihood estimate of the parameters.

If you want to employ a quasi likelihood estimator (to handle overdispersion?), then you need to also specify the variance function. With a variance function, you could probably obtain estimates via quasi likelihood using the GENMOD procedure.

I am not really experienced with the ETS module in SAS. My experience is almost entirely with SAS/STAT. The NLMIXED code which I previously provided will produce ML estimates according to the log-likelihood function you provided. (Given that the NLMIXED code is really quite simple, I don't think you should have any trouble implementing that code if you are willing to employ ML instead of quasi-likelihood.) I would note that you can add overdispersion into the model estimated via the NLMIXED procedure by adding a random intercept term to the linear predictor. So, there are options for dealing with overdispersion even if you employ ML estimation.

In order to provide any more assistance, I would need to know the variance function - in which case I could probably help construct PROC GENMOD code which would return quasi-likelihood estimates OR I can help you construct code which incorporates a random effect to account for overdispersion employing some variation of the NLMIXED code previously specified. But I can't really help you with code employing the ETS module.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Dale

01-20-2011 07:32 AM

Just as an explanation, I want to measure the effect of the loan size (EAD_gesamt) on the recovery rate (NEQ_UNBES_ABGEZ).

Assuming we neglect the Quasi maximum likelihood issue and utilize the FIML (Even though this one assumes normality?!), I have attempted to reproduce your code utilizing my dependant variable, NEQ_UNBES_ABGEZ and explanatory variable, EAD_gesamt:

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS G b0 b1;

y = b0 + b1*EAD_gesamt;

G = exp(-exp(-y));

NEQ_UNBES_ABGEZ = y*log(G)+(1-y)*log(1-G);

FIT NEQ_UNBES_ABGEZ / FIML;

run;

Does that code seem about right? I noticed that a "-" was missing in your definition of log-log function, is that correct?

Furthermore, I get the following warning:

"WARNING: The 'parameter' G is assigned a value by the model program. This parameter will not be estimated."

Also, the results seem a bit funky, I probably made several mistakes.... Help is needed!

Thanks for all help!

Kind regards,

Lasse

Assuming we neglect the Quasi maximum likelihood issue and utilize the FIML (Even though this one assumes normality?!), I have attempted to reproduce your code utilizing my dependant variable, NEQ_UNBES_ABGEZ and explanatory variable, EAD_gesamt:

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS G b0 b1;

y = b0 + b1*EAD_gesamt;

G = exp(-exp(-y));

NEQ_UNBES_ABGEZ = y*log(G)+(1-y)*log(1-G);

FIT NEQ_UNBES_ABGEZ / FIML;

run;

Does that code seem about right? I noticed that a "-" was missing in your definition of log-log function, is that correct?

Furthermore, I get the following warning:

"WARNING: The 'parameter' G is assigned a value by the model program. This parameter will not be estimated."

Also, the results seem a bit funky, I probably made several mistakes.... Help is needed!

Thanks for all help!

Kind regards,

Lasse

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

01-20-2011 01:27 PM

Lasse,

I did miss the minus sign in front of my eta term and should have specified

mu = exp(-exp(-eta));

I don't think that the PROC MODEL code should specify G as a parameter of the model. G is a function of the parameters b0 and b1. The WARNING message returned by the PROC MODEL procedure is indicating this same thing - that G is not a parameter.

I see, too, that you are assigning your response variable NEQ_UNBES_ABGEZ the value y*log(G)+(1-y)*log(1-G). This is undoubtedly wrong. The left hand side of

NEQ_UNBES_ABGEZ = y*log(G)+(1-y)*log(1-G);

should not be your response variable. Rather, the left hand side should be an equation name and the response variable should enter on the right hand side in the place of y. Without knowing knowing much about PROC MODEL syntax, I would guess that your code should be something like:

MY_EQN = NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G);

followed by a fit statement which names MY_EQN for optimization via FIML. So, I would suggest the following code:

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS b0 b1;

y = b0 + b1*EAD_gesamt;

G = exp(-exp(-y));

MYFUNC = NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G);

FIT MY_EQN / FIML;

run;

As I indicated previously, I am much more familiar with the NLMIXED procedure than I am with the MODEL procedure. I would suggest that you compare results from the above PROC MODEL code with the following NLMIXED code:

proc nlmixed data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS b0 b1 0;

eta = b0 + b1*EAD_gesamt;

mu = exp(-exp(-eta));

ll = NEQ_UNBES_ABGEZ*log(mu)+(1-NEQ_UNBES_ABGEZ)*log(1-mu);

model NEQ_UNBES_ABGEZ ~ general(ll);

run;

If the PROC MODEL code is constructed properly, results should be quite similar for the two procedures.

I did miss the minus sign in front of my eta term and should have specified

mu = exp(-exp(-eta));

I don't think that the PROC MODEL code should specify G as a parameter of the model. G is a function of the parameters b0 and b1. The WARNING message returned by the PROC MODEL procedure is indicating this same thing - that G is not a parameter.

I see, too, that you are assigning your response variable NEQ_UNBES_ABGEZ the value y*log(G)+(1-y)*log(1-G). This is undoubtedly wrong. The left hand side of

NEQ_UNBES_ABGEZ = y*log(G)+(1-y)*log(1-G);

should not be your response variable. Rather, the left hand side should be an equation name and the response variable should enter on the right hand side in the place of y. Without knowing knowing much about PROC MODEL syntax, I would guess that your code should be something like:

MY_EQN = NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G);

followed by a fit statement which names MY_EQN for optimization via FIML. So, I would suggest the following code:

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS b0 b1;

y = b0 + b1*EAD_gesamt;

G = exp(-exp(-y));

MYFUNC = NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G);

FIT MY_EQN / FIML;

run;

As I indicated previously, I am much more familiar with the NLMIXED procedure than I am with the MODEL procedure. I would suggest that you compare results from the above PROC MODEL code with the following NLMIXED code:

proc nlmixed data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS b0 b1 0;

eta = b0 + b1*EAD_gesamt;

mu = exp(-exp(-eta));

ll = NEQ_UNBES_ABGEZ*log(mu)+(1-NEQ_UNBES_ABGEZ)*log(1-mu);

model NEQ_UNBES_ABGEZ ~ general(ll);

run;

If the PROC MODEL code is constructed properly, results should be quite similar for the two procedures.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Dale

01-20-2011 04:31 PM

Hello Dale,

thanks for your great help! You are the best!

While your equation works fine, I just can not get the statement to work under SAS / ETS.

I attempted to get it to work with the "errormodel" statement, but I failed miserably for the last 2 hours... (Hardcore headache)

Is it possible that you program a normality test (Jarque-berra?), white's heteroscedasticity, and autocorrelation test, if possible with figures of residuals into your model? I could do it under SAS/ETS myself, however i can not solve the problem related to the errormodel statement ...

Kind regards and thanks for your great assistance,

Lasse

thanks for your great help! You are the best!

While your equation works fine, I just can not get the statement to work under SAS / ETS.

I attempted to get it to work with the "errormodel" statement, but I failed miserably for the last 2 hours... (Hardcore headache)

Is it possible that you program a normality test (Jarque-berra?), white's heteroscedasticity, and autocorrelation test, if possible with figures of residuals into your model? I could do it under SAS/ETS myself, however i can not solve the problem related to the errormodel statement ...

Kind regards and thanks for your great assistance,

Lasse

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to deleted_user

01-21-2011 10:26 AM

You may be trying to get MODEL to do something that it is not designed for. To start with, the code you are trying is incorrectly telling the procedure that the log-likelihood *is *the response variable. MODEL is a wonderful procedure, but it primarily deals with normal error distributions. The FIML method is explicitly for multivariate normal distributions, and I don't think you can change that. With other estimation methods, one can use the ERRORMODEL statement to choose other distributions that are continuous (Cauchy, etc.), although I have not done this myself. You are wanting to use a binomial or Bernoulli distribution, which is discrete (of course). There are options for defining your own error distribution with the ERRORMODEL statement, and you will have to read some of the documentation to learn about this (there are examples in the User's Guide). Caution: given that MODEL is primarily for data with continuous distributions, I don't know what happens even if you get the program to run with your own objective function for a discrete variable.

The NLMIXED procedure would definitely be my first choice to deal with your problem (with the code given by another poster).

The NLMIXED procedure would definitely be my first choice to deal with your problem (with the code given by another poster).

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-21-2011 10:45 AM

Thank you for the response lvm! You are definitely right on all accounts.

I attempted to specify a different distribution with the errormodel statement. I actually succeeded in writing the code for the errormodel attempt, however, the answers I got were somewhat spurious.

Consequently, I am sticking to the NLMIXED method proposed by Dale. I just want to do a rudimentary analysis of the results. For that I need the distribution of the residuals, hetero and autocorr tests. I will attempt to integrate those into the NLMIXED statement myself, yet help would as always, be highly appreciated ...

I attempted to specify a different distribution with the errormodel statement. I actually succeeded in writing the code for the errormodel attempt, however, the answers I got were somewhat spurious.

Consequently, I am sticking to the NLMIXED method proposed by Dale. I just want to do a rudimentary analysis of the results. For that I need the distribution of the residuals, hetero and autocorr tests. I will attempt to integrate those into the NLMIXED statement myself, yet help would as always, be highly appreciated ...

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-21-2011 01:53 PM

Following the comments by lvm, I looked at the ERRORMODEL statement. It appears that PROC MODEL code that is consistent with PROC NLMIXED code would be something like:

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS b0 b1;

y = b0 + b1*EAD_gesamt;

G = exp(-exp(-y));

neg_ll = -(NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G));

ERRORMODEL binom ~ General(neg_ll);

run;

Now, how you can integrate estimation of an autocorrelation in the above code, I don't know and don't have time to investigate. However, I can suggest use of the GLIMMIX procedure as an alternative approach which can estimate both likelihood and quasi-likelihood models. For the quasi-likelihood model, you could incorporate some autocorrelation structure in the estimation model.

GLIMMIX code to estimate the likelihood model specified above would be

proc glimmix data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

model NEQ_UNBES_ABGEZ = EAD_gesamt / dist=binary link=cloglog;

run;

We can modify the above GLIMMIX code to estimate a model with an AR(1) covariance structure for the residuals by adding a random statement as shown below:

proc glimmix data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

model NEQ_UNBES_ABGEZ = EAD_gesamt / dist=binary link=cloglog;

random _residual_ / subject=intercept type=ar(1);

run;

Note that with the addition of the "random _residual_" statement, a quasi-likelihood estimation procedure is invoked.

Rather than an AR(1) covariance structure, you might want to estimate a model with a first-order autoregressive moving-average covariance structure. That can be done by replacing the TYPE=AR(1) specification with TYPE=ARMA(1,1). There are many other covariance structures which can be estimated with the GLIMMIX procedure. Most of them will probably not be of interest to you.

I would note that the above code assumes that all observations belong to a single time series. If you have multiple time series (for instance, you have collected time series values for multiple companies or multiple countries), then you must use a different SUBJECT= specification. Again, a better description of the statistical problem would enable a more authoritative response.

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS b0 b1;

y = b0 + b1*EAD_gesamt;

G = exp(-exp(-y));

neg_ll = -(NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G));

ERRORMODEL binom ~ General(neg_ll);

run;

Now, how you can integrate estimation of an autocorrelation in the above code, I don't know and don't have time to investigate. However, I can suggest use of the GLIMMIX procedure as an alternative approach which can estimate both likelihood and quasi-likelihood models. For the quasi-likelihood model, you could incorporate some autocorrelation structure in the estimation model.

GLIMMIX code to estimate the likelihood model specified above would be

proc glimmix data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

model NEQ_UNBES_ABGEZ = EAD_gesamt / dist=binary link=cloglog;

run;

We can modify the above GLIMMIX code to estimate a model with an AR(1) covariance structure for the residuals by adding a random statement as shown below:

proc glimmix data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

model NEQ_UNBES_ABGEZ = EAD_gesamt / dist=binary link=cloglog;

random _residual_ / subject=intercept type=ar(1);

run;

Note that with the addition of the "random _residual_" statement, a quasi-likelihood estimation procedure is invoked.

Rather than an AR(1) covariance structure, you might want to estimate a model with a first-order autoregressive moving-average covariance structure. That can be done by replacing the TYPE=AR(1) specification with TYPE=ARMA(1,1). There are many other covariance structures which can be estimated with the GLIMMIX procedure. Most of them will probably not be of interest to you.

I would note that the above code assumes that all observations belong to a single time series. If you have multiple time series (for instance, you have collected time series values for multiple companies or multiple countries), then you must use a different SUBJECT= specification. Again, a better description of the statistical problem would enable a more authoritative response.