BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
jbegovac
Fluorite | Level 6

I have noticed that the Cochran-Armitage Test for Trend gives different results when an interval variable is grouped by proc format compared to recoding. For example:

proc format;

value EngineSizeFmt 

      low -<  3  = "0-3"

        3 -<  5 = "3-4"

       5 -<  9= "5-8"

       ;

run;

proc freq data=sashelp.cars;

where origin in("Europe" "USA");

tables EngineSize*origin /trend chisq nopercent norow;

   format EngineSize EngineSizeFmt.;

run;

data work.cars;

set sashelp.cars;

if EngineSize lt 3 then esize=1;

if EngineSize ge 3 and EngineSize lt 5 then esize=2;

if EngineSize ge 5 then esize=3;

run;

proc freq data=work.cars;

where origin in("Europe" "USA");

tables ESize*origin /trend chisq nopercent norow;

 run;

 

The frequencies and chi-square test are the same, the trend test is different. Any thoughts? Which should be used?

1 ACCEPTED SOLUTION

Accepted Solutions
JackieJ_SAS
SAS Employee

Building off the comments of @ballardw 

Here is the output after the table that I get from your two PROC FREQ calls:

JackieJ_SAS_2-1740849403752.png

You'll notice that the output for the Mantel-Haenszel Chi-Square is different too. Both the MH and CA test for trend use scores in their calculation. 

According to the documentation for the SCORES= option on the TABLE statement, you can specify the SCOROUT option on the TABLES statement to display the scores used:
SAS Help Center: TABLES Statement

I used the SCOROUT option and get these scores for your two PROC FREQ calls. That the scores are different is why the MH and CA tests are providing different output across the two calls:

JackieJ_SAS_3-1740850504025.png

 

View solution in original post

7 REPLIES 7
Ksharp
Super User

That is because the order of "Europe" "USA" is reverser.

For the frist PROC FREQ:

Ksharp_0-1740813556234.png

For the second PROC FREQ:

Ksharp_1-1740813593207.png

Therefore , you could get reverse result!!!!

jbegovac
Fluorite | Level 6

Thanx Ksharp  for answering. But, both frequency tables look the same the first column is Europe and the second USA. I am probably missing something.

ballardw
Super User

I suspect it may have something to do with this detail from the Cochran-Armitage details

 For character variables, the table scores for the row variable are the row numbers 
(for example, 1 for the first row, 2for the second row, and so on). For numeric variables,
the table score for each row is the numeric value of the row level.

The "formatted" value is character.

 

Since some of the cells are a bit small, the largest size has 29 of the 270 observations, perhaps using the RANKS instead of the default table Score is appropriate and does yield the same score for both. Which is why I think there may be some oddity in the score calculation for the formatted values.

 

proc freq data=work.cars;
where origin in("Europe" "USA");
tables ESize*origin  enginesize*origin/  trend score=rank ;
   format EngineSize EngineSizeFmt.;

 run;

which without the frequency tables generates:

Statistics for Table of esize by Origin

Cochran-Armitage Trend Test
(Rank Scores)
Statistic (Z) -2.0200
One-sided Pr < Z 0.0217
Two-sided Pr > |Z| 0.0434

and

Statistics for Table of EngineSize by Origin

Cochran-Armitage Trend Test
(Rank Scores)
Statistic (Z) -2.0200
One-sided Pr < Z 0.0217
Two-sided Pr > |Z| 0.0434

 

PaigeMiller
Diamond | Level 26

Why group a continuous variable at all? Especially if you are concerned about trend ...

--
Paige Miller
JackieJ_SAS
SAS Employee

Building off the comments of @ballardw 

Here is the output after the table that I get from your two PROC FREQ calls:

JackieJ_SAS_2-1740849403752.png

You'll notice that the output for the Mantel-Haenszel Chi-Square is different too. Both the MH and CA test for trend use scores in their calculation. 

According to the documentation for the SCORES= option on the TABLE statement, you can specify the SCOROUT option on the TABLES statement to display the scores used:
SAS Help Center: TABLES Statement

I used the SCOROUT option and get these scores for your two PROC FREQ calls. That the scores are different is why the MH and CA tests are providing different output across the two calls:

JackieJ_SAS_3-1740850504025.png

 

jbegovac
Fluorite | Level 6

Thanx to @JackieJ_SAS and @ballardw  explaining the difference. Good to know that there might be some differences in the CA and MH tests depending on how the grouping of a continuous variable is done.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1657 views
  • 8 likes
  • 5 in conversation