BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hello guys,

I have a problem integrating a regression function inside SAS.

Essentially, I want to run a regression with the recovery rate of loans as dependant and the size of the loan as explanatory variable. However, -> non-normal distribution, heteroscedasticity ETC. huge problems. I tried log forms of explanatory etc. does not help.

Academia suggests a solution to this problem based on a model developed by Papke & Wooldrige (1996). I have to integrate their regression model into SAS or Enterprise Guide.

They define the log-log function as E(x|y)=G(xß) where G(xß) = e^-e^-xß which is pretty standard.

Following up on Papke their non-linear estimation procedure maximizes a Bernoulli log-likelihood function:

l (b) = y[logG*(xb)] + (1-y)*log[1-G*(xb)]

They further postulate that their quasi maximum likelihood estimators are consistent and asymptotically normal.

Does anybody have an idea, what code i need to create or if Enterprise Guide can do this function with a little extra programming?
9 REPLIES 9
Dale
Pyrite | Level 9
What is G*(xb) in the log-likelihood function? Assuming that G*(xb) should really be G(xb) and that y[logG*(xb)] should be y*log[G(xb)], the problem can be handled quite easily using the NLIN or NLMIXED procedures in SAS/STAT and probably using the MODEL procedure in SAS/ETS. Since I am most familiar with the syntax for the NLMIXED procedure, I will provide code for that.

proc nlmixed data=mydata;
  eta = b0 + b1*x1 + b2*x2 + ... ;
  mu = exp(-exp(eta));
  ll = y*log(mu) + (1-y)*log(1-mu);
model y ~ general(ll);
run;

If my assumptions about G*(xb) and y[logG*(xb)] are wrong, then please clarify the problem.
deleted_user
Not applicable
Hello Dale,

First of all, thank you for your answer!

You are right about your assumptions, I meant G(xb) and y*logG(xb).

However, can you generate the code for SAS/ETS? I am not the world's best SAS-programmer admittedly, and I do not know how to get the program to utilize quasi maximum likelihood estimators, etc...

You would save my life 😉

Kind regards,

Lasse
Dale
Pyrite | Level 9
Lasse,

I should note that in order to compute quasi likelihood estimates, a function that specifies the mean and variance of the response is employed. To this point in time, you have provided a mean function and a likelihood function. That allows obtaining a maximum likelihood estimate of the parameters.

If you want to employ a quasi likelihood estimator (to handle overdispersion?), then you need to also specify the variance function. With a variance function, you could probably obtain estimates via quasi likelihood using the GENMOD procedure.

I am not really experienced with the ETS module in SAS. My experience is almost entirely with SAS/STAT. The NLMIXED code which I previously provided will produce ML estimates according to the log-likelihood function you provided. (Given that the NLMIXED code is really quite simple, I don't think you should have any trouble implementing that code if you are willing to employ ML instead of quasi-likelihood.) I would note that you can add overdispersion into the model estimated via the NLMIXED procedure by adding a random intercept term to the linear predictor. So, there are options for dealing with overdispersion even if you employ ML estimation.

In order to provide any more assistance, I would need to know the variance function - in which case I could probably help construct PROC GENMOD code which would return quasi-likelihood estimates OR I can help you construct code which incorporates a random effect to account for overdispersion employing some variation of the NLMIXED code previously specified. But I can't really help you with code employing the ETS module.
deleted_user
Not applicable
Just as an explanation, I want to measure the effect of the loan size (EAD_gesamt) on the recovery rate (NEQ_UNBES_ABGEZ).

Assuming we neglect the Quasi maximum likelihood issue and utilize the FIML (Even though this one assumes normality?!), I have attempted to reproduce your code utilizing my dependant variable, NEQ_UNBES_ABGEZ and explanatory variable, EAD_gesamt:

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;

PARMS G b0 b1;

y = b0 + b1*EAD_gesamt;

G = exp(-exp(-y));

NEQ_UNBES_ABGEZ = y*log(G)+(1-y)*log(1-G);

FIT NEQ_UNBES_ABGEZ / FIML;

run;

Does that code seem about right? I noticed that a "-" was missing in your definition of log-log function, is that correct?

Furthermore, I get the following warning:

"WARNING: The 'parameter' G is assigned a value by the model program. This parameter will not be estimated."

Also, the results seem a bit funky, I probably made several mistakes.... Help is needed!

Thanks for all help!

Kind regards,

Lasse
Dale
Pyrite | Level 9
Lasse,

I did miss the minus sign in front of my eta term and should have specified

  mu = exp(-exp(-eta));

I don't think that the PROC MODEL code should specify G as a parameter of the model. G is a function of the parameters b0 and b1. The WARNING message returned by the PROC MODEL procedure is indicating this same thing - that G is not a parameter.

I see, too, that you are assigning your response variable NEQ_UNBES_ABGEZ the value y*log(G)+(1-y)*log(1-G). This is undoubtedly wrong. The left hand side of

  NEQ_UNBES_ABGEZ = y*log(G)+(1-y)*log(1-G);

should not be your response variable. Rather, the left hand side should be an equation name and the response variable should enter on the right hand side in the place of y. Without knowing knowing much about PROC MODEL syntax, I would guess that your code should be something like:

  MY_EQN = NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G);

followed by a fit statement which names MY_EQN for optimization via FIML. So, I would suggest the following code:

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;
  PARMS b0 b1;
  y = b0 + b1*EAD_gesamt;
  G = exp(-exp(-y));
  MYFUNC = NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G);
  FIT MY_EQN / FIML;
run;


As I indicated previously, I am much more familiar with the NLMIXED procedure than I am with the MODEL procedure. I would suggest that you compare results from the above PROC MODEL code with the following NLMIXED code:

proc nlmixed data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;
  PARMS b0 b1 0;
  eta = b0 + b1*EAD_gesamt;
  mu = exp(-exp(-eta));
  ll = NEQ_UNBES_ABGEZ*log(mu)+(1-NEQ_UNBES_ABGEZ)*log(1-mu);
  model NEQ_UNBES_ABGEZ ~ general(ll);
run;


If the PROC MODEL code is constructed properly, results should be quite similar for the two procedures.
deleted_user
Not applicable
Hello Dale,

thanks for your great help! You are the best!

While your equation works fine, I just can not get the statement to work under SAS / ETS.

I attempted to get it to work with the "errormodel" statement, but I failed miserably for the last 2 hours... (Hardcore headache)

Is it possible that you program a normality test (Jarque-berra?), white's heteroscedasticity, and autocorrelation test, if possible with figures of residuals into your model? I could do it under SAS/ETS myself, however i can not solve the problem related to the errormodel statement :(...

Kind regards and thanks for your great assistance,

Lasse
lvm
Rhodochrosite | Level 12 lvm
Rhodochrosite | Level 12
You may be trying to get MODEL to do something that it is not designed for. To start with, the code you are trying is incorrectly telling the procedure that the log-likelihood is the response variable. MODEL is a wonderful procedure, but it primarily deals with normal error distributions. The FIML method is explicitly for multivariate normal distributions, and I don't think you can change that. With other estimation methods, one can use the ERRORMODEL statement to choose other distributions that are continuous (Cauchy, etc.), although I have not done this myself. You are wanting to use a binomial or Bernoulli distribution, which is discrete (of course). There are options for defining your own error distribution with the ERRORMODEL statement, and you will have to read some of the documentation to learn about this (there are examples in the User's Guide). Caution: given that MODEL is primarily for data with continuous distributions, I don't know what happens even if you get the program to run with your own objective function for a discrete variable.

The NLMIXED procedure would definitely be my first choice to deal with your problem (with the code given by another poster).
deleted_user
Not applicable
Thank you for the response lvm! You are definitely right on all accounts.

I attempted to specify a different distribution with the errormodel statement. I actually succeeded in writing the code for the errormodel attempt, however, the answers I got were somewhat spurious.

Consequently, I am sticking to the NLMIXED method proposed by Dale. I just want to do a rudimentary analysis of the results. For that I need the distribution of the residuals, hetero and autocorr tests. I will attempt to integrate those into the NLMIXED statement myself, yet help would as always, be highly appreciated :)...
Dale
Pyrite | Level 9
Following the comments by lvm, I looked at the ERRORMODEL statement. It appears that PROC MODEL code that is consistent with PROC NLMIXED code would be something like:

proc model data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;
  PARMS b0 b1;
  y = b0 + b1*EAD_gesamt;
  G = exp(-exp(-y));
  neg_ll = -(NEQ_UNBES_ABGEZ*log(G)+(1-NEQ_UNBES_ABGEZ)*log(1-G));
  ERRORMODEL binom ~ General(neg_ll);
run;


Now, how you can integrate estimation of an autocorrelation in the above code, I don't know and don't have time to investigate. However, I can suggest use of the GLIMMIX procedure as an alternative approach which can estimate both likelihood and quasi-likelihood models. For the quasi-likelihood model, you could incorporate some autocorrelation structure in the estimation model.

GLIMMIX code to estimate the likelihood model specified above would be

proc glimmix data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;
  model NEQ_UNBES_ABGEZ = EAD_gesamt / dist=binary link=cloglog;
run;

We can modify the above GLIMMIX code to estimate a model with an AR(1) covariance structure for the residuals by adding a random statement as shown below:

proc glimmix data=SASUSER.PREDLINREGPREDICTIONSFILTER_FOR_;
  model NEQ_UNBES_ABGEZ = EAD_gesamt / dist=binary link=cloglog;
  random _residual_ / subject=intercept type=ar(1);
run;

Note that with the addition of the "random _residual_" statement, a quasi-likelihood estimation procedure is invoked.

Rather than an AR(1) covariance structure, you might want to estimate a model with a first-order autoregressive moving-average covariance structure. That can be done by replacing the TYPE=AR(1) specification with TYPE=ARMA(1,1). There are many other covariance structures which can be estimated with the GLIMMIX procedure. Most of them will probably not be of interest to you.

I would note that the above code assumes that all observations belong to a single time series. If you have multiple time series (for instance, you have collected time series values for multiple companies or multiple countries), then you must use a different SUBJECT= specification. Again, a better description of the statistical problem would enable a more authoritative response.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1806 views
  • 0 likes
  • 3 in conversation