## How do you perform Ad Hoc Correction test statistics

Solved
Occasional Contributor
Posts: 6

# How do you perform Ad Hoc Correction test statistics

Simple Question:

Is there a way to alter the value of "n" used in the calculation of Standard Errors for the Logistic Procedure?

Details:

In order to produce an unbiased sample that represents the proper sampling rates of events and parameter values, sometimes the independent observations assumption must be violated.  Additionally, certain hazard models allow indidual-time period observations across time, but the sample is only independent to the extent of individuals (meaning the set of each individual's time processes are independent, but each observation is not).  Without correction of standard errors for this downward bias on the test statistics, the risk of Type 1 error is greatly increased.  Is there a way to alter how this is calculated directly, in order to avoid having to make manual calculations for the Wald Chi-Square testing and p-values?

Example, consider the following data.  We wish to estimate the hazard rate on this data (ignoring many forms of potential bias for simplicity of the example).  The raw test-statistics will use 13 for "n" in the test statistic calculation, but I want it to use 3 instead, since there are only 3 individual and independent process:

 Individual Time Period Y X1 X2 1 1 0 4 5 1 2 0 4 5 1 3 0 2 5 1 4 0 2 4 1 5 0 2 4 1 6 1 1 4 2 3 0 4 3 2 4 0 5 4 2 5 0 6 4 3 2 0 3 2 3 3 0 2 2 3 4 0 2 1 3 5 1 2 1

SAS 9.4 EG 6.1 32-bit  SAS 9.4 EG 64-bit

Accepted Solutions
Solution
‎04-05-2016 10:53 AM
Occasional Contributor
Posts: 6

## Re: How do you perform Ad Hoc Correction test statistics

The sample data was cobbled together purely to describe the problem.  Specific information would be a violation of my company's intellectual property.  However, I found a solution to my problem and will share it.  It was far simpler than I imagined.

If the inflation factor is known, like in this above example:

13 observations/3 subpopulations=4.33 independent groups

However, the heterogeneity constant correction used by Scale=<constant> is squared, so we must take the square root:

sqrt(4.33)=2.08

From this, we can rescale the confidence intervals and p-value estimates directly using the SCALE option:

PROC LOGISTIC DATA=indata; MODEL Y=X1 X2/SCALE=2.08;RUN;

Despite not sharing the real data/approach etc. Steve is correct in identifying the missing RE.  This simple scaling imbeds assumptions about the relevance of start-points for each individual's process. Depending on how start-points and end-points are captured in the DGP and the underlying reasons, this can create substantial bias in estimating the model.  For a more clear understanding of this example and the applicability to various event-modeling...more specifically, why this was not a substantial concern for my real data and problem, consider (Paul Allison, 1982), (Tyler Shumway, 2001), (Tyler Shumway, 2004).  Duration capturing elements such as age or event-life can be used in the model to verify that the start-point concerns are not driving the model.

All Replies
Super User
Posts: 10,219

## Re: How do you perform Ad Hoc Correction test statistics

```I am not sure if you are talking about Condition Logistic ?
If it were , check STRATA statement.

```
Posts: 2,655

## Re: How do you perform Ad Hoc Correction test statistics

[ Edited ]

My suggestion (as it almost always is, it seems) is to look at doing the logistic regression in PROC GLIMMIX, with subject as a RANDOM effect.  This should correctly cluster the data, and result in the correct degrees of freedom for tests and confidence intervals.  For a worked example, see Example 45.18 Weighted Multilevel Model for Survey Data in the PROC GLIMMIX documentation (SAS/STAT 14.1).

Steve Denham

SAS Super FREQ
Posts: 3,841

## Re: How do you perform Ad Hoc Correction test statistics

@SteveDenham I thought that you might suggest this. For those of us who are not experts in this area, could you briefly explain why you did not recommend GENMOD and the REPEATED statement (GEE approach)?

Posts: 2,655

## Re: How do you perform Ad Hoc Correction test statistics

@Rick_SAS, my concern here was not with the repeated nature of the data, but of the clustered nature by subject, which would constiute a random effect that isn't modeled in GENMOD or GEE.  The sample data really looks like that in Ex. 45.18, without the sampling weights, which could be added to a subsequent analysis.

Steve Denham

Solution
‎04-05-2016 10:53 AM
Occasional Contributor
Posts: 6

## Re: How do you perform Ad Hoc Correction test statistics

The sample data was cobbled together purely to describe the problem.  Specific information would be a violation of my company's intellectual property.  However, I found a solution to my problem and will share it.  It was far simpler than I imagined.

If the inflation factor is known, like in this above example:

13 observations/3 subpopulations=4.33 independent groups

However, the heterogeneity constant correction used by Scale=<constant> is squared, so we must take the square root:

sqrt(4.33)=2.08

From this, we can rescale the confidence intervals and p-value estimates directly using the SCALE option:

PROC LOGISTIC DATA=indata; MODEL Y=X1 X2/SCALE=2.08;RUN;

Despite not sharing the real data/approach etc. Steve is correct in identifying the missing RE.  This simple scaling imbeds assumptions about the relevance of start-points for each individual's process. Depending on how start-points and end-points are captured in the DGP and the underlying reasons, this can create substantial bias in estimating the model.  For a more clear understanding of this example and the applicability to various event-modeling...more specifically, why this was not a substantial concern for my real data and problem, consider (Paul Allison, 1982), (Tyler Shumway, 2001), (Tyler Shumway, 2004).  Duration capturing elements such as age or event-life can be used in the model to verify that the start-point concerns are not driving the model.

SAS Super FREQ
Posts: 3,841

## Re: How do you perform Ad Hoc Correction test statistics

Okay. Interesting. Do you also want to AGGREGATE over the individuals?

Occasional Contributor
Posts: 6

## Re: How do you perform Ad Hoc Correction test statistics

[ Edited ]

I was trying to aggregate over individuals, as this is the effect I am trying to proxy via the SCALE. But the real data can include anywhere between 150 and 10,000 individuals. Additionally, I was having trouble figuring out how to use it to globally assign these groups since they are not a predictor in the model. I reviewed everything I could find on the AGGREGATE and SCALE functions trying to figure this out, and gave up when I found that I could simplify by just imbedding assumptions via a direct scaling function for the standard errors. It remains unclear to me how substantial the risk is given other assumptions in the approach, but so far, testing seems to be in line with expectations.

I should also note that moderate Type 1 error is not generally the end of the world for what is being done here...but it should be tested somewhat accurately to prevent potentially serious and dangerous misspecification.  I find that it is very easy to overstate the importance of p-values in this field, especially when there are micronumerosity concerns.  Nonetheless, the CI's and p-values should be close to accurate or harmful decisions can be made.

Posts: 2,655

## Re: How do you perform Ad Hoc Correction test statistics

[ Edited ]

@DLBarker, I love the term micronumerosity.  In my field, it is generally referred to as pseudo-replication, and is associated with experimental unit confusion.  Here though, it appears you have both random and repeated effects, and one of the built in benefits of the mixed model approach is the correct assignment of degrees of freedom (provided that a correctly specified model exists and is applied).  The field that seems most concerned about using a mixed model approach, in my experience, is econometrics, and the reason often given is the "bias" associated with minimized variance estimators.  Laplace or adaptive quadrature methods go a long way toward alleviating this, but that is only my opinion.  I guess I would cite Stroup (2012), Bolker (2009) and Bates (2014) for approaches on how to minimize bias in a generalized linear mixed modeling schema.

Steve Denham

Occasional Contributor
Posts: 6

## Re: How do you perform Ad Hoc Correction test statistics

[ Edited ]

Thank you @SteveDenham

I am going to be doing further research on this and the applications of the GLIMMIX procedure to these problems. I just ordered a copy of Stroup's book, which seems to deal exclusively with GLMM approaches in SAS. I look forward to the insight it may provide for future approaches...however, I am unsure how well I will be able to create a scorecard specification from it. It will be another year before I have to do a methodological review though, so I have time to research, play around and test.

Occasional Contributor
Posts: 6

## Re: How do you perform Ad Hoc Correction test statistics

@SteveDenham  I do still have the fear that GLMM may overcorrect for when the information quantities of observations for each individual are too few.  Again, I will just have to test this concern in the future.  Testing and sim is generally easier than simply pondering the impact.

Posts: 2,655

## Re: How do you perform Ad Hoc Correction test statistics

@DLBarker, I agree on the problem of insufficient info--it makes the solutions unstable, if they converge at all.  Simpler models are then often used, and the true research question is not addressed.  The simulation approach is where I would go--and simulating correlated/clustered data that does not fit a multivariate normal is a daunting task in itself.  Check out Bolker's text on ecological models.  You'll have to translate from R to SAS in a lot of places, but the theoretical approach should help.

Steve Denham

Occasional Contributor
Posts: 6