Solved: Re: Proc format for grouping a continuous variable and the trend test.

jbegovac · Posted 03-01-2025 01:57 AM

I have noticed that the Cochran-Armitage Test for Trend gives different results when an interval variable is grouped by proc format compared to recoding. For example:

proc format;

value EngineSizeFmt

low -< 3 = "0-3"

3 -< 5 = "3-4"

5 -< 9= "5-8"

;

run;

proc freq data=sashelp.cars;

where origin in("Europe" "USA");

tables EngineSize*origin /trend chisq nopercent norow;

format EngineSize EngineSizeFmt.;

run;

data work.cars;

set sashelp.cars;

if EngineSize lt 3 then esize=1;

if EngineSize ge 3 and EngineSize lt 5 then esize=2;

if EngineSize ge 5 then esize=3;

run;

proc freq data=work.cars;

where origin in("Europe" "USA");

tables ESize*origin /trend chisq nopercent norow;

run;

The frequencies and chi-square test are the same, the trend test is different. Any thoughts? Which should be used?

JackieJ_SAS · Posted 03-01-2025 12:37 PM

Building off the comments of @ballardw

Here is the output after the table that I get from your two PROC FREQ calls:

You'll notice that the output for the Mantel-Haenszel Chi-Square is different too. Both the MH and CA test for trend use scores in their calculation.

According to the documentation for the SCORES= option on the TABLE statement, you can specify the SCOROUT option on the TABLES statement to display the scores used:
SAS Help Center: TABLES Statement

I used the SCOROUT option and get these scores for your two PROC FREQ calls. That the scores are different is why the MH and CA tests are providing different output across the two calls:

View solution in original post

Ksharp · Posted 03-01-2025 02:20 AM

That is because the order of "Europe" "USA" is reverser.

For the frist PROC FREQ:

For the second PROC FREQ:

Therefore , you could get reverse result!!!!

jbegovac · Posted 03-01-2025 03:00 AM

Thanx Ksharp for answering. But, both frequency tables look the same the first column is Europe and the second USA. I am probably missing something.

Ksharp · Posted 03-01-2025 08:08 PM

Sorry. My mistake.

ballardw · Posted 03-01-2025 04:17 AM

I suspect it may have something to do with this detail from the Cochran-Armitage details

 For character variables, the table scores for the row variable are the row numbers 
(for example, 1 for the first row, 2for the second row, and so on). For numeric variables, 
the table score for each row is the numeric value of the row level.

The "formatted" value is character.

Since some of the cells are a bit small, the largest size has 29 of the 270 observations, perhaps using the RANKS instead of the default table Score is appropriate and does yield the same score for both. Which is why I think there may be some oddity in the score calculation for the formatted values.

proc freq data=work.cars;
where origin in("Europe" "USA");
tables ESize*origin  enginesize*origin/  trend score=rank ;
   format EngineSize EngineSizeFmt.;

 run;

which without the frequency tables generates:

Statistics for Table of esize by Origin

Cochran-Armitage Trend Test (Rank Scores)
Statistic (Z)	-2.0200
One-sided Pr < Z	0.0217
Two-sided Pr > \|Z\|	0.0434

and

Statistics for Table of EngineSize by Origin

Cochran-Armitage Trend Test (Rank Scores)
Statistic (Z)	-2.0200
One-sided Pr < Z	0.0217
Two-sided Pr > \|Z\|	0.0434

PaigeMiller · Posted 03-01-2025 06:05 AM

Why group a continuous variable at all? Especially if you are concerned about trend ...

--
Paige Miller

JackieJ_SAS · Posted 03-01-2025 12:37 PM

Building off the comments of @ballardw

Here is the output after the table that I get from your two PROC FREQ calls:

You'll notice that the output for the Mantel-Haenszel Chi-Square is different too. Both the MH and CA test for trend use scores in their calculation.

According to the documentation for the SCORES= option on the TABLE statement, you can specify the SCOROUT option on the TABLES statement to display the scores used:
SAS Help Center: TABLES Statement

I used the SCOROUT option and get these scores for your two PROC FREQ calls. That the scores are different is why the MH and CA tests are providing different output across the two calls:

jbegovac · Posted 03-01-2025 02:48 PM

Thanx to @JackieJ_SAS and @ballardw explaining the difference. Good to know that there might be some differences in the CA and MH tests depending on how the grouping of a continuous variable is done.

Registration is open