McNemar Test with a Zero Cell

mrprerost · Posted 01-04-2019 09:58 AM

A previous user asked a question about how to run a McNemar's test with cells that have 0 frequencies, and a solution was proposed:

data have;
input view success count ;
datalines;1 1 30
1 0 0
2 1 26
2 0 4;
run;
proc freq data=have; 
	weight count;
	tables view*success / agree;
	exact mcnem;
run;

This works when you have the summary data, but can the same result be achieved when you are using an actual dataset (not inputting the 2x2 cell counts) and not using a 'weight' statement? I'm using SAS 9.4.

lopezr · Posted 01-04-2019 11:39 AM

You can achieve this with the actual data set by assigning weight = 1 to observations in your data and adding a single observation representing the 0 frequency cell with a weight of 0.

data have (drop = i);
    /* generate actual data set */
    do i = 1 to 30;
        view = 1;
        success = 1;
        weight = 1;
    output;
    end;
    do i = 1 to 26;
        view = 2;
        success = 1;
        weight = 1;
    output;
    end;
    do i = 1 to 4;
        view = 2;
        success = 0;
        weight = 1;
    output;
    end;
    /* 0 freq cell with weight 0 */
    do i = 1 to 1;
        view = 1;
        success = 0;
        weight = 0;
    output;
    end;
run;

proc freq data=have; 
    weight weight;
    tables view*success / agree;
    exact mcnem;
run;

Ahmed_Hegazy · Posted 01-04-2019 01:28 PM

Hello @mrprerost and welcome to the SAS Support Communities!

For a single zero cell in the 2x2 table it is not necessary to create an extra observation with weight 0 because its weight would be 0 anyway. So, in @lopezr's code you can omit the last DO loop without changing the result.

Please note, however, that the correct solution in the other thread involved two zero cells in the same row. (Unfortunately, the accepted solution there did not produce the correct result, but the original poster, Ahmed_Hegazy, then provided a correct solution.) In this case you do need a zero-weight observation in the dataset. As above, it is sufficient to create a single observation (the second zero-weight cell then arises automatically). So, an equivalent dataset for the correct solution in the other thread could be created as:

data viewcomp1;
do _n_=1 to 26;
  view1='Success'; view2='Success';
  count=1;
  output;
end;
do _n_=1 to 4;
  view1='Success'; view2='Failure';
  count=1;
  output;
end;
view1='Failure';
count=0;
output;
run;

(Note that the value of VIEW2 in the last observation is 'Failure' from the preceding DO loop, but 'Success' would be fine as well. Even a missing value would work.)

Now the correct result can be replicated with:

proc freq data = viewcomp1;
weight count / zeros;
tables view1*view2 / agree;
exact mcnem;
run;

Edit: Obviously, the DATA step above could be simplified to:

data viewcomp1;
view1='Success';
count=1;
do _n_=1 to 26;
  view2=view1;
  output;
end;
do _n_=1 to 4;
  view2='Failure';
  output;
end;
view1=view2;
count=0;
output;
run;

mrprerost · Posted 01-08-2019 08:20 AM

Thank you both @lopezr and @FreelanceReinh. Both of your solutions presuppose that the cell counts are known, and I didn't specify that I was looking for a solution where these counts are not known. A colleague of mine suggested the following solution and it worked, so I am including it here. Var1 and var2 are grouping variables and var3 and var4 are the variables in the 2x2 table.

proc means data=xxxxxx completetypes noprint nway;
class var1 var2 var3 var4;
output out=freqnew(rename=(_freq_=count) drop=_type_);
run;

proc sort data=freqnew;
by var1 var2 var3 var4 _stat_ count;
run;

data analysis (keep=var1 var2 var3 var4 count);
set freqnew; by var1 var2 var3 var4 _stat_ count;
if first.var4;
run;

proc print data=analysis noobs;
title1 "Zero Table Cell Added to Raw Data";
run;

Proc Freq data=analysis;
by var1 var2;
Tables var3*var4 / missing agree;
exact mcnem;
weight count / zeros;
run;

FreelanceReinh · Posted 01-08-2019 12:46 PM

I see what you mean. The datasets @lopezr and I produced (using hardcoded numbers of observations like 26 and 4) were just to demonstrate that the McNemar test involving zero cells can be performed without having summary data. Of course, I would not recommend creating detail data from existing summary data for this purpose, let alone using hardcoded numbers. (The required zero-weight observation could have been appended without knowing these numbers.)

Please note that your PROC MEANS approach (using the COMPLETETYPES option) would not have worked in the situation of the other thread you referred to in your original post: There the problem was that the value VIEW1='Failure' did not occur in the initial dataset. COMPLETETYPES would not add it. Instead, the CLASSDATA= option could have been used (or a preloaded format), as shown below.

/* Create test data for demonstration */

data have;
view1='Success';
do _n_=1 to 26;
  view2=view1;
  output;
end;
do _n_=1 to 4;
  view2='Failure';
  output;
end;
run;

/* Create a class dataset in order to introduce the non-existing level of VIEW1 */

data cldat;
view1='Failure';
view2=view1;
run; /* A more general version of CLDAT could contain all four
        combinations of 'Success' and 'Failure'. */

/* Produce input dataset for PROC FREQ */

proc summary data=have classdata=cldat nway;
class view1 view2;
output out=want;
run;

/* Perform the desired McNemar test */

proc freq data = want;
weight _freq_ / zeros;
tables view1*view2 / agree;
exact mcnem;
run;

McNemar Test with a Zero Cell

Re: McNemar Test with a Zero Cell

Re: McNemar Test with a Zero Cell

Re: McNemar Test with a Zero Cell

Re: McNemar Test with a Zero Cell

McNemar Test with a Zero Cell

Re: McNemar Test with a Zero Cell

Re: McNemar Test with a Zero Cell

Re: McNemar Test with a Zero Cell

Re: McNemar Test with a Zero Cell

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!