BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lone0708
Fluorite | Level 6

Hi all, 
I am working on a dataset, where I have to test if there is a significant difference in categorical variables from one timepoint to another in the same person. 

My variables contains 2 or 3 categories (0,1,2). 

Which test will be suitable to use - especially for the 3 category variable?

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@lone0708 wrote:

I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:

 

Patient Time1 Time2.    Postion_time1.  position_time2  Light_time1  light_time2

A.          13:14.  14:00.         0                         1                     1                    0

B.          12:00.  12:15          2                         2                      1                    1

C.          12:13.  14:45          3                         1                     0                     1

 

I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense


Proc freq with the EXPECTED option on the tables statement sounds like what you might be looking for.

Here's a brief example creating a data set with two variables to "compare". The Rand ('integer', n) function creates random integers in the 1 to n interval.

You can se the counts of the intersections of the values and compare with an "expected" value based on the distribution.

data example;
   /*should produce relatively similar distributions*/
   do i=1 to 50;
      x= rand('integer',3);
      y= rand('integer',3);
      output;
   end;
   /* now add some to bias a variable, y won't have any 3*/
   do i=51 to 100;
      x= rand('integer',3);
      y= rand('integer',2);
      output;
   end;
run;

proc freq data=example;
   tables x*y /expected chisq;
run;

Throw in a Chi-square test and you have a statistic that tests similarity of distribution.

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

With categorical values, any difference is significant. So I would simply count the distinct values:

data have;
input person $ cat_var;
datalines;
A 0
A 1
B 0
B 0
;

proc sql;
create table want as
  select
    person
  from have
  group by person
  having count(distinct cat_var) > 1
;
quit;
PaigeMiller
Diamond | Level 26

@lone0708 wrote:

Hi all, 
I am working on a dataset, where I have to test if there is a significant difference in categorical variables ...


Significant difference between some statistic for the categorical variables (if so, what statistic?) or significant difference between the categorical variables themselves (if so, please explain in a lot more detail)

--
Paige Miller
lone0708
Fluorite | Level 6

I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:

 

Patient Time1 Time2.    Postion_time1.  position_time2  Light_time1  light_time2

A.          13:14.  14:00.         0                         1                     1                    0

B.          12:00.  12:15          2                         2                      1                    1

C.          12:13.  14:45          3                         1                     0                     1

 

I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense

PaigeMiller
Diamond | Level 26

I am searching to test significant difference between the categorical variables themselves.


I am very confused. As far as I know, this can't be done. It is not a statistical concept to test categorical variables themselves. (Or in the trivial sense, they are always different). The only statistical concept is to test statistics for each categorical variable to see if the statistics are different in the different categories, and you seem to be saying that's not what you want.

 

In the data set you show, describe the steps (in words) to show how you would answer the question.

--
Paige Miller
ballardw
Super User

@lone0708 wrote:

I am searching to test significant difference between the categorical variables themselves. My dataset looks like this:

 

Patient Time1 Time2.    Postion_time1.  position_time2  Light_time1  light_time2

A.          13:14.  14:00.         0                         1                     1                    0

B.          12:00.  12:15          2                         2                      1                    1

C.          12:13.  14:45          3                         1                     0                     1

 

I want to test, if position and light are generally the same at the two timepoints or is for example position 3 overrepresented at time 1. I hope it makes sense


Proc freq with the EXPECTED option on the tables statement sounds like what you might be looking for.

Here's a brief example creating a data set with two variables to "compare". The Rand ('integer', n) function creates random integers in the 1 to n interval.

You can se the counts of the intersections of the values and compare with an "expected" value based on the distribution.

data example;
   /*should produce relatively similar distributions*/
   do i=1 to 50;
      x= rand('integer',3);
      y= rand('integer',3);
      output;
   end;
   /* now add some to bias a variable, y won't have any 3*/
   do i=51 to 100;
      x= rand('integer',3);
      y= rand('integer',2);
      output;
   end;
run;

proc freq data=example;
   tables x*y /expected chisq;
run;

Throw in a Chi-square test and you have a statistic that tests similarity of distribution.

FreelanceReinh
Jade | Level 19

Hi @lone0708,

 

Do you mean a test for marginal homogeneity (i.e., whether the distribution of "position" has changed from time 1 to time 2, and similar for "light")?

 

If so, the test "equivalent to Bhapkar’s test" presented in Example 35.7 Repeated Measures, 4 Response Levels, 1 Population of the PROC CATMOD documentation might be appropriate, especially in the case of more than two categories. (See also https://support.sas.com/kb/39/243.html.) For dichotomous variables (e.g., if "light" is either 0 or 1) McNemar's test should be applicable, see the Tests and Measures of Agreement available in PROC FREQ.

 

Example:

/* Create sample data for demonstration */

data have;
call streaminit(27182818);
do patient=1 to 250;
  time1=round(rand('integer','8:00't,'14:00't),60);
  time2=time1+round(rand('integer','0:15't,'6:00't),60);
  position_time1=rand('table',0.2, 0.3, 0.4)-1;
  position_time2=rand('table',0.25,0.35,0.25)-1;
  light_time1=rand('bern',0.6);
  light_time2=rand('bern',0.5);
  output;
end;
format time: time5.;
run;

/* Perform tests for marginal homogeneity */

proc catmod data=have namelen=29;
response marginals;
model position_time1*position_time2=_response_ / freq design;
repeated time 2;
quit;

proc freq data=have;
tables light_time1*light_time2 / agree;
run;

Edit: Note that the difference between time 1 and time 2, be it 15 minutes or 6 hours, is disregarded in these tests.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1487 views
  • 0 likes
  • 5 in conversation