I work with logistic regression quite a bit, but I haven't come across anything in the SAS documentation that outlines how to do this. Here's a quick example data set. Let's say I'm measuring the failure rate of two pieces of equipment at different temperatures to see if an older model (type) has a significantly higher failure rate:
DATA equipmentfail;
INPUT type temp n failures ;
DATALINES;
1 90 20 2
1 100 20 7
1 110 20 12
1 120 20 17
2 90 20 5
2 100 20 10
2 110 20 15
2 120 20 20;
proc logistic data=equipmentfail plots=effect;
class type;
model failures/n = temp type ;
run;
In this case, there is a significant difference in failure rates between equipment types. However, what I'm trying to figure out is what to do if the way you measure failure is biased. Let's say it is known that the method for measuring failure overpredicts failure by an average of 10% plus or minus 5% (95% confidence interval). At this point, you can reduce the number of failure events by the known average bias of 10%, but that doesn't account the variation around that average. Is there some way in SAS procs like PROC LOGISTIC to account for this variation, or is it something that likely has to done by hand? Thanks.
The error process that you describe would result in a bias in the number of failures AND in extra variation in the observed number of failures. You can correct for the bias using your best estimate (10%) and check for extra variation with goodness of fit statistics. If the Deviance/DF ratio is greater than 1, you can account for overdispersion by adding the SCALE=Deviance option to your model statement (definitively not the case in your example data) :
DATA equipmentfail;
INPUT type temp n failures;
correctedFailures = round(0.9*failures);
DATALINES;
1 90 20 2
1 100 20 7
1 110 20 12
1 120 20 17
2 90 20 5
2 100 20 10
2 110 20 15
2 120 20 20
;
proc logistic data=equipmentfail plots=effect;
class type;
model correctedFailures/n = temp type / lackfit /* scale=Deviance */ ;
run;
PG
Read the following references: Magder LS, Hughes JP. Logistic regression when the outcome is measured with uncertainty. American Journal of Epidemiology 1997;146(2):195-203. Neuhaus JM. Bias and efficiency loss due to misclassified responses in binary regression. Biometrika 1999;86(4):843-855.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.