Fluorite | Level 6

## paired categorical data

Hi all,
I am working on a dataset, where I have to test if there is a significant difference in categorical variables from one timepoint to another in the same person.

My variables contains 2 or 3 categories (0,1,2).

Which test will be suitable to use - especially for the 3 category variable?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: paired categorical data

@lone0708 wrote:

I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:

Patient Time1 Time2.    Postion_time1.  position_time2  Light_time1  light_time2

A.          13:14.  14:00.         0                         1                     1                    0

B.          12:00.  12:15          2                         2                      1                    1

C.          12:13.  14:45          3                         1                     0                     1

I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense

Proc freq with the EXPECTED option on the tables statement sounds like what you might be looking for.

Here's a brief example creating a data set with two variables to "compare". The Rand ('integer', n) function creates random integers in the 1 to n interval.

You can se the counts of the intersections of the values and compare with an "expected" value based on the distribution.

```data example;
/*should produce relatively similar distributions*/
do i=1 to 50;
x= rand('integer',3);
y= rand('integer',3);
output;
end;
/* now add some to bias a variable, y won't have any 3*/
do i=51 to 100;
x= rand('integer',3);
y= rand('integer',2);
output;
end;
run;

proc freq data=example;
tables x*y /expected chisq;
run;
```

Throw in a Chi-square test and you have a statistic that tests similarity of distribution.

7 REPLIES 7
Super User

## Re: paired categorical data

With categorical values, any difference is significant. So I would simply count the distinct values:

``````data have;
input person \$ cat_var;
datalines;
A 0
A 1
B 0
B 0
;

proc sql;
create table want as
select
person
from have
group by person
having count(distinct cat_var) > 1
;
quit;``````
Diamond | Level 26

## Re: paired categorical data

@lone0708 wrote:

Hi all,
I am working on a dataset, where I have to test if there is a significant difference in categorical variables ...

Significant difference between some statistic for the categorical variables (if so, what statistic?) or significant difference between the categorical variables themselves (if so, please explain in a lot more detail)

--
Paige Miller
Fluorite | Level 6

## Re: paired categorical data

I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:

Patient Time1 Time2.    Postion_time1.  position_time2  Light_time1  light_time2

A.          13:14.  14:00.         0                         1                     1                    0

B.          12:00.  12:15          2                         2                      1                    1

C.          12:13.  14:45          3                         1                     0                     1

I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense

Diamond | Level 26

## Re: paired categorical data

I am searching to test significant difference between the categorical variables themselves.

I am very confused. As far as I know, this can't be done. It is not a statistical concept to test categorical variables themselves. (Or in the trivial sense, they are always different). The only statistical concept is to test statistics for each categorical variable to see if the statistics are different in the different categories, and you seem to be saying that's not what you want.

In the data set you show, describe the steps (in words) to show how you would answer the question.

--
Paige Miller
Super User

## Re: paired categorical data

Wouldn't a simple PROC FREQ show you an imbalance in these values?

Super User

## Re: paired categorical data

@lone0708 wrote:

I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:

Patient Time1 Time2.    Postion_time1.  position_time2  Light_time1  light_time2

A.          13:14.  14:00.         0                         1                     1                    0

B.          12:00.  12:15          2                         2                      1                    1

C.          12:13.  14:45          3                         1                     0                     1

I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense

Proc freq with the EXPECTED option on the tables statement sounds like what you might be looking for.

Here's a brief example creating a data set with two variables to "compare". The Rand ('integer', n) function creates random integers in the 1 to n interval.

You can se the counts of the intersections of the values and compare with an "expected" value based on the distribution.

```data example;
/*should produce relatively similar distributions*/
do i=1 to 50;
x= rand('integer',3);
y= rand('integer',3);
output;
end;
/* now add some to bias a variable, y won't have any 3*/
do i=51 to 100;
x= rand('integer',3);
y= rand('integer',2);
output;
end;
run;

proc freq data=example;
tables x*y /expected chisq;
run;
```

Throw in a Chi-square test and you have a statistic that tests similarity of distribution.

## Re: paired categorical data

Hi @lone0708,

Do you mean a test for marginal homogeneity (i.e., whether the distribution of "position" has changed from time 1 to time 2, and similar for "light")?

If so, the test "equivalent to Bhapkar’s test" presented in Example 35.7 Repeated Measures, 4 Response Levels, 1 Population of the PROC CATMOD documentation might be appropriate, especially in the case of more than two categories. (See also https://support.sas.com/kb/39/243.html.) For dichotomous variables (e.g., if "light" is either 0 or 1) McNemar's test should be applicable, see the Tests and Measures of Agreement available in PROC FREQ.

Example:

``````/* Create sample data for demonstration */

data have;
call streaminit(27182818);
do patient=1 to 250;
time1=round(rand('integer','8:00't,'14:00't),60);
time2=time1+round(rand('integer','0:15't,'6:00't),60);
position_time1=rand('table',0.2, 0.3, 0.4)-1;
position_time2=rand('table',0.25,0.35,0.25)-1;
light_time1=rand('bern',0.6);
light_time2=rand('bern',0.5);
output;
end;
format time: time5.;
run;

/* Perform tests for marginal homogeneity */

proc catmod data=have namelen=29;
response marginals;
model position_time1*position_time2=_response_ / freq design;
repeated time 2;
quit;

proc freq data=have;
tables light_time1*light_time2 / agree;
run;``````

Edit: Note that the difference between time 1 and time 2, be it 15 minutes or 6 hours, is disregarded in these tests.

Discussion stats
• 7 replies
• 1510 views
• 0 likes
• 5 in conversation