BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
docfak
Fluorite | Level 6

Hello,

 

I am trying to compute a Cochran's Q using proc freq with 24 indexed variables, but SAS freezes as I think the line of code is too long. 

Is there a trick to allow SAS to compute the line of code?

 

My code is:

 

proc freq data=mydata;
tables var1*var2*var3*var4*var5*var6*var7*var8*var9*var10*var11*var12*var13*var14*var15*var16*var17*var18*var19*var20*var21*var22*var23*var24 /agree;
run;

Many thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hello @docfak,

 

This is interesting! Even with

ods select CochransQ;

which is more restrictive than 

ods exclude crosstabfreqs;

PROC FREQ would probably take a while to run your code with this 1000-obs. sample dataset HAVE (in place of your MYDATA):

%let m=24;

data have;
call streaminit(27182818);
array var[&m];
do i=1 to 1000;
  do _n_=1 to dim(var);
    var[_n_]=rand('bern',0.5);
  end;
  output;
end;
run;

I haven't actually run your PROC FREQ step with m=24, but observed that for m=17, 18, 19, 20 the run times approximately doubled with each additional variable (like 5, 9, 19, 39 seconds).

 

Then I looked up the formula for Cochran's Q (and the associated p-value) in the most recent SAS 9.4 documentation (https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_freq_details78.htm). After some more research and manual calculations I realized that the formula for Cochran's Q on that page (and in my local SAS 9.4M5 help files) is wrong! I will report this to SAS. Comparing older versions of the documentation, starting at SAS version 8 (http://v8doc.sas.com/sashtml/proc/zeq-comp.htm), it appears that the error was introduced in the documentation of SAS 9.2 (https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0...) and then copied to versions 9.3 and 9.4. In version 8 and also as "recently" as v9.1.3 (https://support.sas.com/documentation/onlinedoc/91pdf/index_913.html, see Procedures Guide base_proc_8977.pdf linked there) the formula was correct:

Correct formula in SAS 9.1.3 Procedures Guide, p. 1621Correct formula in SAS 9.1.3 Procedures Guide, p. 1621

Incorrect formula in SAS 9.4/Viya 3.5 documentationIncorrect formula in SAS 9.4/Viya 3.5 documentation

Using the correct formula (more precisely: a formula equivalent to the correct one) I was able to reproduce the results from PROC FREQ for m=17, 18, 19, 20, but with run times <1 second, and the case m=24 went about as fast as the others. Here's the code I used:

ods select none;
ods output onewayfreqs=freqs;
proc freq data=have;
tables var:;
run;
ods select all;

data freqs1(keep=n);
set freqs(rename=(frequency=n));
if sum(of var:);
run;

proc summary data=freqs1;
var n;
output out=stats(drop=_:) css=c sum=t;
run;

data want(keep=s c t Q DF p);
do until(last);
  set have end=last;
  s+(sum(of var:)**2);
end;
set stats;
Q=(&m-1)*c/(t-s/&m);
DF=&m-1;
p=1-cdf('chisq',Q,DF);
format p pvalue6.4;
label Q="Cochran's Q"
      p="p-value of Cochran's Q test";
run;

Log (incl. creation of sample data):

Spoiler
1    %let m=24;
2
3    data have;
4    call streaminit(27182818);
5    array var[&m];
6    do i=1 to 1000;
7      do _n_=1 to dim(var);
8        var[_n_]=rand('bern',0.5);
9      end;
10     output;
11   end;
12   run;

NOTE: The data set WORK.HAVE has 1000 observations and 25 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


13
14   ods select none;
15   ods output onewayfreqs=freqs;
16   proc freq data=have;
17   tables var:;
18   run;

NOTE: The data set WORK.FREQS has 48 observations and 53 variables.
NOTE: There were 1000 observations read from the data set WORK.HAVE.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           0.09 seconds
      cpu time            0.09 seconds


19   ods select all;
20
21   data freqs1(keep=n);
22   set freqs(rename=(frequency=n));
23   if sum(of var:);
24   run;

NOTE: There were 48 observations read from the data set WORK.FREQS.
NOTE: The data set WORK.FREQS1 has 24 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


25
26   proc summary data=freqs1;
27   var n;
28   output out=stats(drop=_:) css=c sum=t;
29   run;

NOTE: Multiple concurrent threads will be used to summarize data.
NOTE: There were 24 observations read from the data set WORK.FREQS1.
NOTE: The data set WORK.STATS has 1 observations and 2 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.03 seconds
      cpu time            0.07 seconds


30
31   data want(keep=s c t Q DF p);
32   do until(last);
33     set have end=last;
34     s+(sum(of var:)**2);
35   end;
36   set stats;
37   Q=(&m-1)*c/(t-s/&m);
38   DF=&m-1;
39   p=1-cdf('chisq',Q,DF);
40   format p pvalue6.4;
41   label Q="Cochran's Q"
42         p="p-value of Cochran's Q test";
43   run;

NOTE: There were 1000 observations read from the data set WORK.HAVE.
NOTE: There were 1 observations read from the data set WORK.STATS.
NOTE: The data set WORK.WANT has 1 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Result (dataset WANT):

   s         c              t       Q       DF         p

152459    4314.96       12097    17.2762    23    0.7955

 

So, if you only need Cochran's Q and the associated p-value, it seems that the manual calculation beats PROC FREQ.

View solution in original post

10 REPLIES 10
ballardw
Super User

How many levels are involved with each of those variables? IF you have 2 levels per variable you are looking at 2**22 (4,194,304) 2-by-2 tables.

 

"Freeze" in this case, I would guess means you don't see anything because SAS is making all those html tables to display all of the frequencies involved.

 

I might suggest placing this immediately before your Proc Freq code to suppress the humoungous

ods exclude crosstabfreqs;

 

Are you sure that your data meets the requirements for Agreement calculations as in " AGREE statistics are computed only for tables where the number of rows equals the number of columns." ?

 

 

docfak
Fluorite | Level 6

I have two levels per variable.

When I use 12 variables (var1-var12), I can compute Cochran's Q.

Using your line of code, I get this error

 

ERROR:  An exception has been encountered.
Veuillez contacter le support technique et transmettez-lui les informations suivantes :

Le nom de la tâche SAS est [FREQ]
ERROR:  Violation de l'accès en lecture FREQ
Exception en (0753146E)
Task Traceback
Address   Frame     (DBGHELP API Version 4.0 rev 5)
000000000753146E  00000000007F9F70  sasods:tkvercn1+0x10042E
000000000753CD31  00000000007FA2E0  sasods:tkvercn1+0x10BCF1
000000000753C30A  00000000007FA370  sasods:tkvercn1+0x10B2CA
000000000F604DB5  00000000007FA378  sasfreq:tkvercn1+0x73D75
000000000F5B177D  00000000007FCC60  sasfreq:tkvercn1+0x2073D
000000000F59450B  00000000007FFB20  sasfreq:tkvercn1+0x34CB
00000000035BA366  00000000007FFB28  sashost:Main+0x11EA6
00000000035C0574  00000000007FFF20  sashost:Main+0x180B4
00007FFCCAD07034  00000000007FFF28  KERNEL32:BaseThreadInitThunk+0x14
00007FFCCB422651  00000000007FFF58  ntdll:RtlUserThreadStart+0x21

 

FreelanceReinh
Jade | Level 19

Hello @docfak,

 

This is interesting! Even with

ods select CochransQ;

which is more restrictive than 

ods exclude crosstabfreqs;

PROC FREQ would probably take a while to run your code with this 1000-obs. sample dataset HAVE (in place of your MYDATA):

%let m=24;

data have;
call streaminit(27182818);
array var[&m];
do i=1 to 1000;
  do _n_=1 to dim(var);
    var[_n_]=rand('bern',0.5);
  end;
  output;
end;
run;

I haven't actually run your PROC FREQ step with m=24, but observed that for m=17, 18, 19, 20 the run times approximately doubled with each additional variable (like 5, 9, 19, 39 seconds).

 

Then I looked up the formula for Cochran's Q (and the associated p-value) in the most recent SAS 9.4 documentation (https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/statug/statug_freq_details78.htm). After some more research and manual calculations I realized that the formula for Cochran's Q on that page (and in my local SAS 9.4M5 help files) is wrong! I will report this to SAS. Comparing older versions of the documentation, starting at SAS version 8 (http://v8doc.sas.com/sashtml/proc/zeq-comp.htm), it appears that the error was introduced in the documentation of SAS 9.2 (https://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#procstat_freq_a0...) and then copied to versions 9.3 and 9.4. In version 8 and also as "recently" as v9.1.3 (https://support.sas.com/documentation/onlinedoc/91pdf/index_913.html, see Procedures Guide base_proc_8977.pdf linked there) the formula was correct:

Correct formula in SAS 9.1.3 Procedures Guide, p. 1621Correct formula in SAS 9.1.3 Procedures Guide, p. 1621

Incorrect formula in SAS 9.4/Viya 3.5 documentationIncorrect formula in SAS 9.4/Viya 3.5 documentation

Using the correct formula (more precisely: a formula equivalent to the correct one) I was able to reproduce the results from PROC FREQ for m=17, 18, 19, 20, but with run times <1 second, and the case m=24 went about as fast as the others. Here's the code I used:

ods select none;
ods output onewayfreqs=freqs;
proc freq data=have;
tables var:;
run;
ods select all;

data freqs1(keep=n);
set freqs(rename=(frequency=n));
if sum(of var:);
run;

proc summary data=freqs1;
var n;
output out=stats(drop=_:) css=c sum=t;
run;

data want(keep=s c t Q DF p);
do until(last);
  set have end=last;
  s+(sum(of var:)**2);
end;
set stats;
Q=(&m-1)*c/(t-s/&m);
DF=&m-1;
p=1-cdf('chisq',Q,DF);
format p pvalue6.4;
label Q="Cochran's Q"
      p="p-value of Cochran's Q test";
run;

Log (incl. creation of sample data):

Spoiler
1    %let m=24;
2
3    data have;
4    call streaminit(27182818);
5    array var[&m];
6    do i=1 to 1000;
7      do _n_=1 to dim(var);
8        var[_n_]=rand('bern',0.5);
9      end;
10     output;
11   end;
12   run;

NOTE: The data set WORK.HAVE has 1000 observations and 25 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


13
14   ods select none;
15   ods output onewayfreqs=freqs;
16   proc freq data=have;
17   tables var:;
18   run;

NOTE: The data set WORK.FREQS has 48 observations and 53 variables.
NOTE: There were 1000 observations read from the data set WORK.HAVE.
NOTE: PROCEDURE FREQ used (Total process time):
      real time           0.09 seconds
      cpu time            0.09 seconds


19   ods select all;
20
21   data freqs1(keep=n);
22   set freqs(rename=(frequency=n));
23   if sum(of var:);
24   run;

NOTE: There were 48 observations read from the data set WORK.FREQS.
NOTE: The data set WORK.FREQS1 has 24 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


25
26   proc summary data=freqs1;
27   var n;
28   output out=stats(drop=_:) css=c sum=t;
29   run;

NOTE: Multiple concurrent threads will be used to summarize data.
NOTE: There were 24 observations read from the data set WORK.FREQS1.
NOTE: The data set WORK.STATS has 1 observations and 2 variables.
NOTE: PROCEDURE SUMMARY used (Total process time):
      real time           0.03 seconds
      cpu time            0.07 seconds


30
31   data want(keep=s c t Q DF p);
32   do until(last);
33     set have end=last;
34     s+(sum(of var:)**2);
35   end;
36   set stats;
37   Q=(&m-1)*c/(t-s/&m);
38   DF=&m-1;
39   p=1-cdf('chisq',Q,DF);
40   format p pvalue6.4;
41   label Q="Cochran's Q"
42         p="p-value of Cochran's Q test";
43   run;

NOTE: There were 1000 observations read from the data set WORK.HAVE.
NOTE: There were 1 observations read from the data set WORK.STATS.
NOTE: The data set WORK.WANT has 1 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

Result (dataset WANT):

   s         c              t       Q       DF         p

152459    4314.96       12097    17.2762    23    0.7955

 

So, if you only need Cochran's Q and the associated p-value, it seems that the manual calculation beats PROC FREQ.

docfak
Fluorite | Level 6

This is very interesting, thank you very much!

Tom
Super User Tom
Super User

Why do you think those two different ways of writing the formula are different?

image.png

Is of the form A*((B*C)/D)

image.png

Has the same terms re-arranged into: B*A*C/D

FreelanceReinh
Jade | Level 19

Thanks, @Tom, for chiming in. I see what you mean, but my understanding is that the term is not part of the sum (j=1, ..., m) because it doesn't contain j and it's not enclosed in parentheses either. (Also, I think that my "manual" calculation is consistent with this interpretation and its results matched those from PROC FREQ in the few cases I checked.) So, multiplying m (from the second form of the equation) into the numerator of the fraction would result in mSTj² − m, but the correct term is mSTj² − T² (as in the first equation).

 

I have sent a notification to SAS via the "Feedback" link on the documentation page and I will report their response (if any) in this thread.

 

Addendum:

Another strong argument supporting my point is:

We have T=STj and, by Cauchy-Schwarz inequality, T² ≤ mSTj²  (with equality holding in the case T1=...=Tm), so the numerator of the fraction in the old (correct) version of the formula (as I read it) is non-negative. The denominator is non-negative as well because T=SSk and 0 ≤ Sk ≤ m for all k=1, ..., N, which implies mT ≥SSk² (with equality holding in the case S1, ..., SN Î {0, m}). Hence, QC cannot be negative, which is plausible as it is a chi-square statistic. However, the numerator of the new (incorrect) formula is non-positive (since STj² ≤ (STj)² = T²), even regardless of being part of the sum or not. Together with the non-negative denominator QC could never be positive, which would be absurd for a chi-square statistic.

 

 

FreelanceReinh
Jade | Level 19

Update 2022-10-13

 

I have not received any response from SAS to my notifications regarding the Cochran's Q formula in question (emails to yourturn@sas.com sent on October 10, 2021 and June 11, 2022), and the documentation has not been revised either. This must be a rare exception because in Re: SAS Certifiication & Documentation Questions (March 2022) it says: "Our team responds to every request within 2 business days."

MarkBright
SAS Moderator

Hi @free. You're right...unanswered feedback from our team is a rare exception. 🙂  I'm on the team that monitors the yourturn@sas.com inbox and we're committed to addressing each response we receive. I don't see your feedback emails on this topic from the dates you mention (10/10/21 and 6/11/22). I wonder if there was an email delivery snafu.

Regardless, I'd like to be sure we address your feedback. I will inform our team about this Communities thread to investigate the doc. In the meantime, please feel free to send another email to yourturn@sas.com about this same issue so we can be sure your emails are delivering accurately.      

FreelanceReinh
Jade | Level 19

Many thanks, @MarkBright, for your prompt response. Indeed, lost emails were my only explanation for this. I'm going to re-send the email from last June.

FreelanceReinh
Jade | Level 19

Update 2022-10-17

 

The typo in Cochran's Q formula in the current PROC FREQ documentation was confirmed by SAS Technical Support earlier today (and indeed within two business days after I had re-sent the lost email). The correction (replacement of by T²/m) will be made "in the next release of the documentation."

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1189 views
  • 2 likes
  • 5 in conversation