topic Re: What is the OBSMARGINS dataset format for PROC GENMOD? in Statistical Procedures

What is the OBSMARGINS dataset format for PROC GENMOD?

pblls — Tue, 18 Jan 2022 10:01:22 GMT

Hi all,

I'm trying to specify coefficients for LSMEANS class levels in PROC GENMOD, but am struggling to get this working. Consider the following example:

data cohorts;
   fixed = 1;
   cohort = 1; numer = 10; ldenom = log(100); output;
   cohort = 2; numer =  9; ldenom = log(100); output;
   cohort = 3; numer =  1; ldenom =   log(2); output;
run;

proc genmod data=cohorts;
   class cohort fixed;
   model numer = fixed cohort / dist=poisson offset=ldenom;
   lsmeans fixed / e diff cl;
run;

I would like the LSMEANS coefficients (e option) for cohort to be 0.495, 0.495 and 0.01 respectively based on the denom values, but they default to 0.333 each. I see that OBSMARGINS=<OM-data-set> should let me specify this, and I also see that this dataset should contain 'all model variables except the dependent one' (fixed and cohort in my case), but I don't see how I specify the value of the coefficient itself. Not specifying a dataset doesn't change anything if the input is in this 'one record with all observations per cohort' format, which is unfortunately all I have.

This feels like it should be easy but I'm completely failing to find any example or specification for this OM dataset format, any suggestions?

Re: What is the OBSMARGINS dataset format for PROC GENMOD?

pblls — Tue, 18 Jan 2022 10:32:39 GMT

After some further testing it seems like it's just the number of observations in the OM dataset that matters, i.e. the following addition should be correct:

data obsm;
   fixed = 1;
   cohort = 1; do i = 1 to 100; output; end;
   cohort = 2; do i = 1 to 100; output; end;
   cohort = 3; do i = 1 to   2; output; end;
   drop i;
run;

proc genmod data=cohorts;
   ...
   lsmeans fixed / ... om=obsm;
run;

The problem is now that this gives a segfault in PROC GENMOD on our central SAS installation, so I guess we'll be reaching out to Tech Support.

Re: What is the OBSMARGINS dataset format for PROC GENMOD?

Rick_SAS — Tue, 18 Jan 2022 11:19:41 GMT

What version of SAS are you running? Submit

%put &=SYSVLONG4;

and send back the line that appears in the log that looks something like this:

121 %put &=SYSVLONG4;
SYSVLONG4=9.04.01M6P11072018

When I run your program on SAS 9.4m7, it does not crash and it gives an answer, so perhaps this issue has been resolved in a more recent version of SAS.

In your program, the FIXED variable has only one level. That is a degenerate situation. Please try your program when the CLASS variables have more than one level. For example, you might try running the following:

data cohorts;
   do fixed = 1 to 2;
      cohort = 1; numer = 10; ldenom = log(100); output;
      cohort = 2; numer =  9; ldenom = log(100); output;
      cohort = 3; numer =  1; ldenom =   log(2); output;
   end;
run;

proc genmod data=cohorts;
   class cohort fixed;
   model numer = fixed cohort / dist=poisson offset=ldenom;
   lsmeans fixed / e diff cl;
run;

data obsm;
   do fixed = 1 to 2;
      cohort = 1; do i = 1 to 100; output; end;
      cohort = 2; do i = 1 to 100; output; end;
      cohort = 3; do i = 1 to   2; output; end;
   end;
   drop i;
run;

proc genmod data=cohorts;
   class cohort fixed;
   model numer = fixed cohort / dist=poisson offset=ldenom;
   lsmeans fixed /  e diff cl om=obsm;
run;

Re: What is the OBSMARGINS dataset format for PROC GENMOD?

pblls — Tue, 18 Jan 2022 14:13:58 GMT

We're also on 9.4M7:

SYSVLONG4=9.04.01M7P08052020

My example and your extension both work on a second installation of SAS (and not on the server), but if I try to run it with some actual data (a few thousand observations in the obsm dataset) I also get a segfault on that local environment... both are running the same SAS version.

This will probably not be fixed with just some additional SAS code, so I'll try to get some technical support.

To continue in the spirit of this thread, is there a way to specify the margins without creating that number of observations? Ideally I would like to do something like this and not need a huge number of records:

data obsm;
   do fixed = 1, 2;
      cohort = 1; _MARGIN_ = 100; output;
      cohort = 2; _MARGIN_ = 100; output;
      cohort = 3; _MARGIN_ =   2; output;
   end;
run;

Re: What is the OBSMARGINS dataset format for PROC GENMOD?

StatDave — Tue, 18 Jan 2022 17:06:48 GMT

It's not at all clear what your ultimate goal is with this, but it appears that you simply have an observed proportion for each of three groups. If the goal is to compare those group proportions, this can be done by fitting an appropriate model and using the NLMeans macro to make pairwise comparisons. Since the data are just counts from an aggregated binary response, the appropriate model is a logistic model. The following code fits the model and then does the comparisons with the NLMeans macro following the discussion in this note:

data cohorts;
input cohort num den;
datalines;
1 10 100
2 9 100
3 1 2
;
proc logistic data=cohorts;
class cohort/param=glm;
model num/den = cohort;
lsmeans cohort / ilink e diff cl;
ods output coef=c;
store log;
run;
%nlmeans(instore=log, coef=c, link=logit)

Re: What is the OBSMARGINS dataset format for PROC GENMOD?

pblls — Wed, 19 Jan 2022 11:41:18 GMT

Ah, yes, I didn't really go into the 'why' because unfortunately we're mostly stuck with this method for historical reasons. The goal is to provide confidence intervals for an event rate, and without the covariate the model matches very closely with the method of Ulm (10.1093/oxfordjournals.aje.a115507) which was used on the pool of cohorts.

The data are event counts with follow-up time in the denominator, and while I only have aggregates across cohorts individuals may have counts >1, so I'm not sure if logistic regression is appropriate here?

Re: What is the OBSMARGINS dataset format for PROC GENMOD?

StatDave — Wed, 19 Jan 2022 14:37:47 GMT

If you counted the number of events that occur in each cohort (the numerator) out of a total number of observed individuals in each cohort (the denominator), then at the individual level the response is binomial and the logistic model is appropriate. If, for some reason, you want to assume that the cohorts have the same event probability, then you could simply remove COHORT from the model and estimate the common event probability:

proc logistic data=cohorts;
model num/den = ;
estimate 'pr' intercept 1 / ilink;
run;