Hi all,
I am working on a dataset, where I have to test if there is a significant difference in categorical variables from one timepoint to another in the same person.
My variables contains 2 or 3 categories (0,1,2).
Which test will be suitable to use - especially for the 3 category variable?
Thanks
@lone0708 wrote:
I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:
Patient Time1 Time2. Postion_time1. position_time2 Light_time1 light_time2
A. 13:14. 14:00. 0 1 1 0
B. 12:00. 12:15 2 2 1 1
C. 12:13. 14:45 3 1 0 1
I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense
Proc freq with the EXPECTED option on the tables statement sounds like what you might be looking for.
Here's a brief example creating a data set with two variables to "compare". The Rand ('integer', n) function creates random integers in the 1 to n interval.
You can se the counts of the intersections of the values and compare with an "expected" value based on the distribution.
data example; /*should produce relatively similar distributions*/ do i=1 to 50; x= rand('integer',3); y= rand('integer',3); output; end; /* now add some to bias a variable, y won't have any 3*/ do i=51 to 100; x= rand('integer',3); y= rand('integer',2); output; end; run; proc freq data=example; tables x*y /expected chisq; run;
Throw in a Chi-square test and you have a statistic that tests similarity of distribution.
With categorical values, any difference is significant. So I would simply count the distinct values:
data have;
input person $ cat_var;
datalines;
A 0
A 1
B 0
B 0
;
proc sql;
create table want as
select
person
from have
group by person
having count(distinct cat_var) > 1
;
quit;
@lone0708 wrote:
Hi all,
I am working on a dataset, where I have to test if there is a significant difference in categorical variables ...
Significant difference between some statistic for the categorical variables (if so, what statistic?) or significant difference between the categorical variables themselves (if so, please explain in a lot more detail)
I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:
Patient Time1 Time2. Postion_time1. position_time2 Light_time1 light_time2
A. 13:14. 14:00. 0 1 1 0
B. 12:00. 12:15 2 2 1 1
C. 12:13. 14:45 3 1 0 1
I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense
I am searching to test significant difference between the categorical variables themselves.
I am very confused. As far as I know, this can't be done. It is not a statistical concept to test categorical variables themselves. (Or in the trivial sense, they are always different). The only statistical concept is to test statistics for each categorical variable to see if the statistics are different in the different categories, and you seem to be saying that's not what you want.
In the data set you show, describe the steps (in words) to show how you would answer the question.
Wouldn't a simple PROC FREQ show you an imbalance in these values?
@lone0708 wrote:
I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:
Patient Time1 Time2. Postion_time1. position_time2 Light_time1 light_time2
A. 13:14. 14:00. 0 1 1 0
B. 12:00. 12:15 2 2 1 1
C. 12:13. 14:45 3 1 0 1
I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense
Proc freq with the EXPECTED option on the tables statement sounds like what you might be looking for.
Here's a brief example creating a data set with two variables to "compare". The Rand ('integer', n) function creates random integers in the 1 to n interval.
You can se the counts of the intersections of the values and compare with an "expected" value based on the distribution.
data example; /*should produce relatively similar distributions*/ do i=1 to 50; x= rand('integer',3); y= rand('integer',3); output; end; /* now add some to bias a variable, y won't have any 3*/ do i=51 to 100; x= rand('integer',3); y= rand('integer',2); output; end; run; proc freq data=example; tables x*y /expected chisq; run;
Throw in a Chi-square test and you have a statistic that tests similarity of distribution.
Hi @lone0708,
Do you mean a test for marginal homogeneity (i.e., whether the distribution of "position" has changed from time 1 to time 2, and similar for "light")?
If so, the test "equivalent to Bhapkar’s test" presented in Example 35.7 Repeated Measures, 4 Response Levels, 1 Population of the PROC CATMOD documentation might be appropriate, especially in the case of more than two categories. (See also https://support.sas.com/kb/39/243.html.) For dichotomous variables (e.g., if "light" is either 0 or 1) McNemar's test should be applicable, see the Tests and Measures of Agreement available in PROC FREQ.
Example:
/* Create sample data for demonstration */
data have;
call streaminit(27182818);
do patient=1 to 250;
time1=round(rand('integer','8:00't,'14:00't),60);
time2=time1+round(rand('integer','0:15't,'6:00't),60);
position_time1=rand('table',0.2, 0.3, 0.4)-1;
position_time2=rand('table',0.25,0.35,0.25)-1;
light_time1=rand('bern',0.6);
light_time2=rand('bern',0.5);
output;
end;
format time: time5.;
run;
/* Perform tests for marginal homogeneity */
proc catmod data=have namelen=29;
response marginals;
model position_time1*position_time2=_response_ / freq design;
repeated time 2;
quit;
proc freq data=have;
tables light_time1*light_time2 / agree;
run;
Edit: Note that the difference between time 1 and time 2, be it 15 minutes or 6 hours, is disregarded in these tests.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.