BookmarkSubscribeRSS Feed
njgrubic
Fluorite | Level 6

I am attempting to build a modified Poisson regression model with cluster-level groups, which has been verified by Zou & Donner (2011) for correlated binary data. I attempted to run the following code below and have been unable to produce an output after >24 hours of run time (sample size = >300,000). 

proc genmod data=caresmodifiedpoisson;
class EMSid ruca(ref="1_urbancore") gender(ref="Female") race1(ref="White") rhythmtype(ref="Shockable") ses(ref="Q5") etiology(ref="Presumed Cardiac Etiology") location1(ref="Non-Pu") witness(ref="Unwitnessed Arrest") /;
model outcome = year ruca age gender rhythmtype race1 witness location1 etiology ses / dist=poisson link=log;
repeated subject = EMSid / type = unstr;
run;

The following error appears after the algorithm converged note is made: "WARNING: The number of response pairs for estimating correlation is less than or equal to the number of regression parameters. A simpler correlation model might be more appropriate". Is there any way I can improve the statistical efficiency of this model? Or is there a simpler modelling approach I should consider?

4 REPLIES 4
ballardw
Super User

Basically too many parameters.

I have no clue what Witness might be. I might consider examining that for possible elimination as it sounds possible to have lots of values meaning that groups including it may be a bit sparse for other parameters, same with Location.

 

How many levels do you have for each of those parameters?

 

SES, if "socio-economic status", may be another contender if you a large number of levels especially paired with "race". You might for example in Race=A (generic) only have one or two values of SES and not the whole spectrum represented for the other parameters. Low SES are unlikely to appear in geographic locations of high living expenses.

 

Please note: SAS warnings are not errors. They are there to let you know there might be a problem and to reconsider your approach if needed. The results are not "wrong" but may well be questionable.

 

I'm not sure how your comment of "verified by Zou & Donner (2011) for correlated binary data" applies. Was that verification done with as many parameters as you use?As many levels?

StatDave
SAS Super FREQ

This message is the result of having very few measurements at some time points. With few measurements at some time point(s), there is very little data available to estimate correlations involving those time points. In the case of TYPE=UN, this message occurs any time the number of data values across the clusters in a particular j,k combination (call it n*) is less than the number of model parameters. For example, if there is only one cluster of the largest size then n*=1 which will always be <= p, even in an intercept-only model where p=1.

 

Try removing the TYPE=UN option to use a simpler correlation structure. 

njgrubic
Fluorite | Level 6

@StatDave Thanks for this explanation. My clustering variable has 1444 levels so it makes sense that some of the j,k combinations may have less observations than the total number of model parameters (11 total model parameters). 

 

I tried your suggestion of removing the TYPE=UN statement. When I do this I get the following statements in my log - "WARNING: The specified model did not converge" and "ERROR: The mean parameter is either invalid or at a limit of its range for some observations". Is my only option now either to find a clustering variable with fewer levels, or completely remove it from the model?

 

Thanks for your guidance and support.

StatDave
SAS Super FREQ

It is not clear if you are saying that the number of clusters (that is, the number of levels of EMSID) is 1444, or that the number of measurements in each cluster (and therefore the size of the working correlation matrix) is 1444. How many levels does EMSID have and what is the largest number of observations within any cluster? 

 

If your response is binary, then use DIST=BINOMIAL LINK=LOGIT instead. 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 846 views
  • 5 likes
  • 3 in conversation