Re: PROC GENMOD Custom Link Functions

Xeonzinc · Posted 08-11-2021 02:19 AM

Hi,

I'm trying to use the custom link functions within the PROC GENMOD procedure (using FWDLINK & INVLINK statements). I can do simple functions, like emulate a logistic link function as shown below which works fines and give effectively the same result as the built in function:

Custom Function:

proc genmod data = TMP00001.upl_dat_dedup_rd (obs=1000000) ;
class m e adj_v (ref = FIRST)/param = glm;
fwdlink link = log(_MEAN_/(1-_MEAN_));
invlink ilink = 1/(1+exp(-_XBETA_));
model def (descending) = m e adj_v/ dist = binomial type3;
run;

Built-in Function:

proc genmod data = TMP00001.dat_dedup_rd (obs=1000000) ;
class m e adj_v (ref = FIRST)/param = glm;
model def (descending) = m e adj_v/ dist = binomial link=logit type3;
run;

However, when I try to do much else I keep getting various errors , mainly the one shown below, with results that are clearly wrong:

WARNING: The specified model did not converge.

ERROR: Error in computing deviance function.

When running:

proc genmod data = TMP00001.dat_dedup_rd (obs=1000000) ;
mn = _MEAN_;
xb = _XBETA_;
ilnk = exp(xb);
if ilnk > 1 then ilnk = 1;
if mn > 1 then mn = 1;
lnk = log(mn);

class m e adj_v (ref = FIRST)/param = glm;
fwdlink link = lnk;
invlink ilink = ilnk;
model def (descending) = m e adj_v/ dist = binomial type3;
run;

I suspect there may be some statistical/technological restrictions which cause this, but without understanding them it's proving tricky to work within those rules and get things running. I have tried setting custom variance and deviance functions (variance / deviance statements) which should match the custom link function, but haven't had any success. Can anyone offer any support with defining custom link functions for GENMOD and the rules they must follow?

pmbrown · Posted 08-11-2021 03:13 AM

first thing id try (given the failed to converge warning) is increase iterations, on the model statement: maxiter=100. Then id try another proc eg nlmixed (without using random statement) and specify the likelihood. Clutching at straws though

StatDave · Posted 08-11-2021 11:22 AM

It looks like you are trying to create a log-binomial model which can be done with the DIST=BIN and LINK=LOG options in PROC GENMOD, probably for the purpose of estimating relative risks. If you tried those options but ran into fitting errors, then see this note which discusses this model in detail along with dealing with fitting errors and estimating relative risks from the model.

Rick_SAS · Posted 08-11-2021 01:50 PM

Can you explain why you are using these statements?

if ilnk > 1 then ilnk = 1;
if mn > 1 then mn = 1;

The assumption for a link function is that it is monotonic and differentiable (so that it has a differentiable inverse as well). This definition looks neither monotonic (very important) nor differentiable (less important).

Xeonzinc · Posted 08-12-2021 04:00 AM

Thanks, I think those were the rules I was searching for (and half suspected along with some others).

My aim is to change the shape of the logistic s-curve substantially. I am in a situation where most modelling volume occurs in the 'low probability' region of ~<=10%, where the relative impact of predictors is a constant ~2.7x for every '1' movement in logistic space (illustrated below).

My concerns are twofold:

1) That that constant 2.7x relative relationship may not be appropriate for my data, I can't see any reason this should be constant across my data range and would like to vary it.

2) Using models built on the <10% probability space to predict higher probabilities seems inappropriate (ignoring the extra issues of using models outside of observed data), as the modelled relationships entirely change how they are applied above this point (hence my attempt to use the exp function capped at 1, which would at least have a constant 2.7x relationship across the full range of probabilities.)

Given the rules you have suggested I think I can iteratively modify the logit curve to determine if shape adjustments lead to improved fitting.

Rick_SAS · Posted 08-12-2021 05:44 AM

I have never tried to do what you are attempting, but perhaps an alternative is to enhance the model rather than change the link function. If you know that the response is different when X is small as compared to when X is large, then either

Construct a discrete indicator variable C=('Small', 'Med', 'Large') and use a model such as
MODEL Y = C C*X X
Use a spline effect to capture the nonlinear relationship between X and Y.
Use a segmented model, where the model is different for different portions of the domain

Examples of these models at
Piecewise regression models and spline effects

and

Segmented regression models in SAS

Xeonzinc · Posted 08-12-2021 07:17 AM

Thanks, On 1) this is an idea that's been explored, but X in my case is driven by a variety of class and continuous variables, with the weights of them being derived during the regression. It is not possible to know the specific X for an observation prior to the regression as it is binary data, therefore bands for C cannot be predetermined.

One idea being considered is to perform a 2 step regression, deriving X without C, then using that output to allocate C-bands for each observation, before re-running the regression with X (components) & C. However this is not ideal as every time C changes, so will X. Potentially repeating the process should converge on a stable solution for X and C eventually, but the hope with modifying the link function was that this could all be done in one step.

For 2) as X is not a single variable, but a combination of variables derived in the regression I don't believe is is possible to ask it to derive an overall spline applied on top of individual variables with their weights also being derived at the same time? (similar issue to 1))

3) Is very interesting, I've not come across the NLIN procedure, the example of probit binomial regression in the documentation suggests it might do the job. I notice the overview recommends other procedures for maximum likelihood estimations, but I assume that's just because they are easier to use for the majority of applications, whereas PROC NLIN is more complex but offers much more flexibility? But I think this holds the most potential for achieving everything I am trying to do in a single step procedure. I will see where I can get with it thank you.

PROC GENMOD Custom Link Functions