BookmarkSubscribeRSS Feed
Xeonzinc
Fluorite | Level 6

Hi,

 

I'm trying to use the custom link functions within the PROC GENMOD procedure (using FWDLINK & INVLINK statements). I can do simple functions, like emulate a logistic link function as shown below which works fines and give effectively the same result as the built in function:

 

Custom Function:

proc genmod data = TMP00001.upl_dat_dedup_rd (obs=1000000) ;
class m e adj_v (ref = FIRST)/param = glm;
fwdlink link = log(_MEAN_/(1-_MEAN_));
invlink ilink = 1/(1+exp(-_XBETA_));
model def (descending) = m e adj_v/ dist = binomial type3;
run;

Built-in Function:

proc genmod data = TMP00001.dat_dedup_rd (obs=1000000) ;
class m e adj_v (ref = FIRST)/param = glm;
model def (descending) = m e adj_v/ dist = binomial link=logit type3;
run;

However, when I try to do much else I keep getting various errors , mainly the one shown below, with results that are clearly wrong:

    WARNING: The specified model did not converge.

    ERROR:  Error in computing deviance function.

 

When running:

 

proc genmod data = TMP00001.dat_dedup_rd (obs=1000000) ;
mn = _MEAN_;
xb = _XBETA_;
ilnk = exp(xb);
if ilnk > 1 then ilnk = 1;
if mn > 1 then mn = 1;
lnk = log(mn);

class m e adj_v (ref = FIRST)/param = glm;
fwdlink link = lnk;
invlink ilink = ilnk;
model def (descending) = m e adj_v/ dist = binomial type3;
run;

I suspect there may be some statistical/technological restrictions which cause this, but without understanding them it's proving tricky to work within those rules and get things running. I have tried setting custom variance and deviance functions (variance / deviance​ statements) which should match the custom link function, but haven't had any success. Can anyone offer any support with defining custom link functions for GENMOD and the rules they must follow?

6 REPLIES 6
pmbrown
Quartz | Level 8

first thing id try (given the failed to converge warning) is increase iterations, on the model statement: maxiter=100. Then id try another proc eg nlmixed (without using random statement) and specify the likelihood. Clutching at straws though

StatDave
SAS Super FREQ

It looks like you are trying to create a log-binomial model which can be done with the DIST=BIN and LINK=LOG options in PROC GENMOD, probably for the purpose of estimating relative risks. If you tried those options but ran into fitting errors, then see this note which discusses this model in detail along with dealing with fitting errors and estimating relative risks from the model.

Rick_SAS
SAS Super FREQ

Can you explain why you are using these statements?

if ilnk > 1 then ilnk = 1;
if mn > 1 then mn = 1;

The assumption for a link function is that it is monotonic and differentiable (so that it has a differentiable inverse as well). This definition looks neither monotonic (very important) nor differentiable (less important).

Xeonzinc
Fluorite | Level 6

Thanks, I think those were the rules I was searching for (and half suspected along with some others).

 

My aim is to change the shape of the logistic s-curve substantially. I am in a situation where most modelling volume occurs in the 'low probability' region of ~<=10%, where the relative impact of predictors is a constant ~2.7x for every '1' movement in logistic space (illustrated below). 

 

logit_exp.png

My concerns are twofold:

1) That that constant 2.7x relative relationship may not be appropriate for my data, I can't see any reason this should be constant across my data range and would like to vary it.

2) Using models built on the <10% probability space to predict higher probabilities seems inappropriate (ignoring the extra issues of using models outside of observed data), as the modelled relationships entirely change how they are applied above this point (hence my attempt to use the exp function capped at 1, which would at least have a constant 2.7x relationship across the full range of probabilities.)

 

Given the rules you have suggested I think I can iteratively modify the logit curve to determine if shape adjustments lead to improved fitting.

Rick_SAS
SAS Super FREQ

I have never tried to do what you are attempting, but perhaps an alternative is to enhance the model rather than change the link function. If you know that the response is different when X is small as compared to when X is large, then either

  1. Construct a discrete indicator variable C=('Small', 'Med', 'Large') and use a model such as
    MODEL Y = C C*X X
  2. Use a spline effect to capture the nonlinear relationship between X and Y.
  3. Use a segmented model, where the model is different for different portions of the domain

Examples of these models at
Piecewise regression models and spline effects

and

Segmented regression models in SAS

 

 

Xeonzinc
Fluorite | Level 6

Thanks, On 1) this is an idea that's been explored, but X in my case is driven by a variety of class and continuous variables, with the weights of them being derived during the regression. It is not possible to know the specific X for an observation prior to the regression as it is binary data, therefore bands for C cannot be predetermined.

 

One idea being considered is to perform a 2 step regression, deriving X without C, then using that output to allocate C-bands for each observation, before re-running the regression with X (components) & C. However this is not ideal as every time C changes, so will X. Potentially repeating the process should converge on a stable solution for X and C eventually, but the hope with modifying the link function was that this could all be done in one step.

 

For 2) as X is not a single variable, but a combination of variables derived in the regression I don't believe is is possible to ask it to derive an overall spline applied on top of individual variables with their weights also being derived at the same time? (similar issue to 1))

 

3) Is very interesting, I've not come across the NLIN procedure, the example of probit binomial regression in the documentation suggests it might do the job. I notice the overview recommends other procedures for maximum likelihood estimations, but I assume that's just because they are easier to use for the majority of applications, whereas PROC NLIN is more complex but offers much more flexibility? But I think this holds the most potential for achieving everything I am trying to do in a single step procedure. I will see where I can get with it thank you.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1126 views
  • 3 likes
  • 4 in conversation