All,
I'm trying to extend the following macro (from Peter Song, Univ Michigan; http://www-personal.umich.edu/~pxsong/qif_package/QIFv02.sas), which fits a quadratic inference function (QIF; a type of marginal generalized linear model akin to generalized estimating equations), to accommodate a negative binomial distribution. The macro is not too long, but I won't copy it here since I've provided a link. Fair warning: I'm brand spanking new the world of IML, so (1) please pardon my ignorance, and (2) it's okay to let me know I'm in over my head...
Problem 1: Implementing a negative binomial distribution QIF
Proposed solution: The negative binomial application should be similar to the Poisson except for the specification of the variance function. Thus, in every place where a Poisson distribution is referenced in the macro, I've added a new section corresponding to the NEGBIN. These sections are identical in all cases (e.g., the calculation of pearson and deviance residuals, the calculation of ui) except in how the variance function is defined. For the Poisson, this particular section of the macro looks like this:
%else %if &dist = POISSON %then %do;
ui = exp( (xi*beta) );
fui = log(ui);
fui_dev = diag(ui);
vui = diag(sqrt(1/ui));
%end;
I think it should look like this for NB:
%else %if &dist = NEGBIN %then %do;
ui = exp( (xi*beta) );
fui = log(ui);
fui_dev = diag(ui)*diag(1+ui);
                vui = diag(sqrt(1/ui))*diag(sqrt(1/(1+ui)));
            %end;
Problem 2: Rounding errors preclude calculation of SVD and GINV
Proposed solution: ???
Some (hopefully) relevant log output...
ERROR: No convergence of singular value decomposition due to rounding errors.
ERROR: Execution error as noted previously. (rc=100)
operation : GINV at line 5471 column 1
operands : arsumc
arsumc 30 rows 30 cols (numeric)
statement : ASSIGN at line 5471 column 1
I've attached a *.SAS file containing the modified macro, my data, and the particular QIF call that produced the above error. I used the Poisson in this call to avoid any potential errors in my adaptation to the negative binomial...
Thanks very much for any help,
Adam Smith
Department of Natural Resources Science
University of Rhode Island
In the example that you provide, the arsumc matrix is a 30x30 matrix with most elements about 1E130.
That's the source of the error reported by GINV.
if iteration=1 then do;
_min = min(arsumc);
_max = max(arsumc);
_mean = arsumc[:];
print _min _max _mean;
end;
_min=6113822.2
_max=1.589E133
_mean=2.959E131
The neqative binomial (NB) model should have a parameter, k, so I don't think your formula is correct.
Your first question is statistical, rather than having to do with matrices and linear algebra. I'm not an expert in GEEs or using generalized linear models, but I discussed this with a colleague. As best we can tell, the macro writer is using the following definitions for a model with link function g():
Ui: mean
Fui: ginv(mean), i.e. linear predictor
Fui_dev: diagonal matrix of weights for Fisher scoring = 1/(v(mu)*dg(mu)**2) (not sure about this one: gamma has negative sign?)
Vui: diagonal matrix of inverse of square root of variance.
If these are right, for a log-linked NB with dispersion parameter k, you might try
Ui = xi*beta
Fui = log(ui)
Fui_dev = diag(ui/(1+k*ui))
Vui = diag( 1/sqrt(ui+k*ui##2))
As I've said, this is a guess. You might get better answers from the SAS Discussion Forum on SAS/STAT and Statistical Procedures.
For your second question, I'm away from my office, so can't reproduce the error. Perhaps when you correct specify the NB model, this second error will go away. Or someone else might be able to help.
Hi Rick,
Thanks very much for your reply. I'm hoping I can feel my way through this...
The neqative binomial (NB) model should have a parameter, k, so I don't think your formula is correct.
Typically, yes, I agree. However, when the REPEATED statement is used in GENMOD (invoking the GEE), ML estimates of the scale (or dispersion for NegBin) disappear, perhaps into the "nuisance" variation associated with the clustering? If k is required, however, I don't have any idea how to get ML estimates of it in IML?
As best we can tell, the macro writer is using the following definitions for a model with link function g():
Ui: mean
Fui: ginv(mean), i.e. linear predictor
Fui_dev: diagonal matrix of weights for Fisher scoring = 1/(v(mu)*dg(mu)**2) (not sure about this one: gamma has negative sign?)
Vui: diagonal matrix of inverse of square root of variance.
Agree in Ui and Fui. I'll have to defer to you on the others, although the negative sign in the gamma confused me as well.
If these are right, for a log-linked NB with dispersion parameter k, you might tryUi = xi*beta
Fui = log(ui)
Fui_dev = diag(ui/(1+k*ui))
Vui = diag( 1/sqrt(ui+k*ui##2))
Tried it, and understandably it's looking for a matrix "k". I'll run it by the other forum as well.
For your second question, I'm away from my office, so can't reproduce the error. Perhaps when you correct specify the NB model, this second error will go away. Or someone else might be able to help.Alas, no. It's not specific to the NegBin, but occurs with other distributions (e.g., Poisson) as well.
Thanks again,
Adam
In the example that you provide, the arsumc matrix is a 30x30 matrix with most elements about 1E130.
That's the source of the error reported by GINV.
if iteration=1 then do;
_min = min(arsumc);
_max = max(arsumc);
_mean = arsumc[:];
print _min _max _mean;
end;
_min=6113822.2
_max=1.589E133
_mean=2.959E131
Thanks Rick... arsumc gets out of hand quickly...on iteration 1 in fact. I've been running through the meat of the macro variable by variable, and looked into the math behind QIF as best I can (link here, if anyone is interested), and I don't think (1) that the macro is up for my specific GEE model (i.e., NegBin with an offset term) and (2) I've got a handle on the linear algebra or IML coding to tackle it. Guess I'll look into alternatives.
Thanks so much for the help.
One general bit of advice: it is usually a poor idea to interlace macro code and IML. It is almost always unnecessary, and it makes debugging a real pain. By restructuring the logic, you can usually avoid %IF/%THEN and other macro logic and use IML statements instead.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.