Solved
Contributor
Posts: 22

# data analysis with categorical response

Hi All,

I just got some data from my experiment. The depedent variable values in the data set were categorically assigned as 0, 2, 4, 6, 8, 10. Other than the categorical response, the data set is quite simple with only one treatment and 10 replicates. I'm just wondering if it's necesary to do some data transformation for the categorical response? And what will be a good program to do the analysis? Proc mixed? Proc glimmix?

Hope you could give me some advice on analyzing data with categorical response. I'll really appreciate it.

Thanks!

Accepted Solutions
Solution
‎06-20-2017 12:56 PM
Posts: 2,065

## Re: data analysis with categorical response

[ Edited ]
disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0. disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0

The onset of symptoms are on a continuous scale, although the recording of the levels breaks the values into discrete numbers. Whether you analyze this as continuous or discrete, they are both approximations to the actual value of number of days to onset of symptoms. Which is the better approximation? No way of knowing, but I lean towards continuous. In fact, you could select the midpoint of the range as your continuous level, which would be an even better approximation: 3.5 = 1-6 days, 8.5  = 7-10 days, 13.5 = 11-16 days, etc. (And if you're going to a similar study in the future, don't group the results into discrete categories, record the actual number of days!)

proc glimmix data=severity;
class isolate rep;
model disease=isolate;
random rep;
lsmeans isolate/ lines;
run;

I don't see a need for rep in the model, if you leave it out, then the replicates are lumped into the random error, which is where they should be. If you are going to explicitly include rep in the model, it must be nested within isolate, as in

```model disease = isolate rep(isolate);
random rep(isolate);```

otherwise you will not get the right result.

--
Paige Miller

All Replies
Super User
Posts: 10,214

## Re: data analysis with categorical response

```If your all X Y are category variable. Try
Proc catmod   ```
Contributor
Posts: 22

## Re: data analysis with categorical response

Yes, in my case, Y which has values of 10,8,6,4,2,0, repsents the disease severity caused by a pathogen. X is the different pathogenic isolates, which is also categorical. What I am trying to do is to compare the disease severity caused by different pathogenic isolates on the same host.

I'll take a look at Proc catmod and try it out, thanks again!

Posts: 2,065

## Re: data analysis with categorical response

No need for transformation.

Can these categorical data levels be assumed to actually be on a continuous scale or ordinal scale?

There are many choices for analysis

PROC GLIMMIX

PROC LOGISTIC

PROC CATMOD

PROC GENMOD

With the limited information you have provided, I don't think we can advise further

--
Paige Miller
Contributor
Posts: 22

## Re: data analysis with categorical response

Hi PaigeMiller,

My experiment was to compare the disease severity caused by 21 pathogenic isolates on the same host plant. 10 host plants were used as replicates for each single isolate inoculation. So there were 21*10=210 plants used in total. And disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0. I've attached the data set here, if you would like to take a look at it. I'd like to use PROC GLIMIXX, but I am not sure how to specify that the response is categorical in the SAS code. Hope you could give me more suggestions. Thanks in advance!

Posts: 2,065

## Re: data analysis with categorical response

People here don't usually open Excel files for fear of viruses or other executables included.

I'm not sure why you need to consider these as categories, it seems that treating the results as numeric ought to work better than categories.

In that case, the code should be relatively simple:

UNTESTED CODE

```proc glimmix;    class pathogen;
model y = pathogen;
run;```
--
Paige Miller
Contributor
Posts: 22

## Re: data analysis with categorical response

Oh.. sorry about the Excel files.

Since the dependent variable only takes 10, 8, 6, 4, 2, 0, I considered it as categriocal.

I guess this is a really naive question, can I take the dependent variable as numeric although the value is not continuously assigned?

The following the data set and the sas code I tried, just added rep as a random effect.

``````data severity;
input isolate \$ rep disease;
datalines;

R0-G5-6	1	8
R0-G5-6	2	10
R0-G5-6	3	6
R0-G5-6	4	8
R0-G5-6	5	8
R0-G5-6	6	6
R0-G5-6	7	6
R0-G5-6	8	6
R0-G5-6	9	6
R0-G5-6	10	8
R0-G5-6A	1	0
R0-G5-6A	2	0
R0-G5-6A	3	0
R0-G5-6A	4	0
R0-G5-6A	5	0
R0-G5-6A	6	10
R0-G5-6A	7	0
R0-G5-6A	8	0
R0-G5-6A	9	0
R0-G5-6A	10	0
R0-G5-6B	1	8
R0-G5-6B	2	8
R0-G5-6B	3	8
R0-G5-6B	4	8
R0-G5-6B	5	8
R0-G5-6B	6	10
R0-G5-6B	7	6
R0-G5-6B	8	8
R0-G5-6B	9	8
R0-G5-6B	10	8
R0-G5-6C	1	10
R0-G5-6C	2	10
R0-G5-6C	3	8
R0-G5-6C	4	8
R0-G5-6C	5	8
R0-G5-6C	6	10
R0-G5-6C	7	6
R0-G5-6C	8	8
R0-G5-6C	9	8
R0-G5-6C	10	6
R0-G5-6E	1	6
R0-G5-6E	2	6
R0-G5-6E	3	4
R0-G5-6E	4	6
R0-G5-6E	5	6
R0-G5-6E	6	8
R0-G5-6E	7	6
R0-G5-6E	8	6
R0-G5-6E	9	8
R0-G5-6E	10	8
R0-G5-6F	1	4
R0-G5-6F	2	6
R0-G5-6F	3	6
R0-G5-6F	4	6
R0-G5-6F	5	8
R0-G5-6F	6	8
R0-G5-6F	7	8
R0-G5-6F	8	6
R0-G5-6F	9	8
R0-G5-6F	10	6
R0-G5-6G	1	6
R0-G5-6G	2	4
R0-G5-6G	3	6
R0-G5-6G	4	8
R0-G5-6G	5	6
R0-G5-6G	6	8
R0-G5-6G	7	8
R0-G5-6G	8	10
R0-G5-6G	9	2
R0-G5-6G	10	0
R0-G5-6H	1	10
R0-G5-6H	2	10
R0-G5-6H	3	4
R0-G5-6H	4	10
R0-G5-6H	5	6
R0-G5-6H	6	6
R0-G5-6H	7	8
R0-G5-6H	8	6
R0-G5-6H	9	10
R0-G5-6H	10	8
R0-G5-6I	1	6
R0-G5-6I	2	6
R0-G5-6I	3	8
R0-G5-6I	4	6
R0-G5-6I	5	6
R0-G5-6I	6	8
R0-G5-6I	7	8
R0-G5-6I	8	6
R0-G5-6I	9	8
R0-G5-6I	10	6
R0-G5-6J	1	8
R0-G5-6J	2	8
R0-G5-6J	3	8
R0-G5-6J	4	8
R0-G5-6J	5	8
R0-G5-6J	6	6
R0-G5-6J	7	6
R0-G5-6J	8	6
R0-G5-6J	9	8
R0-G5-6J	10	6
R0-G2-6	1	8
R0-G2-6	2	8
R0-G2-6	3	6
R0-G2-6	4	8
R0-G2-6	5	6
R0-G2-6	6	4
R0-G2-6	7	8
R0-G2-6	8	8
R0-G2-6	9	2
R0-G2-6	10	6
R0-G2-6A	1	6
R0-G2-6A	2	8
R0-G2-6A	3	8
R0-G2-6A	4	6
R0-G2-6A	5	8
R0-G2-6A	6	0
R0-G2-6A	7	8
R0-G2-6A	8	0
R0-G2-6A	9	0
R0-G2-6A	10	0
R0-G2-6B	1	6
R0-G2-6B	2	0
R0-G2-6B	3	0
R0-G2-6B	4	8
R0-G2-6B	5	6
R0-G2-6B	6	4
R0-G2-6B	7	4
R0-G2-6B	8	2
R0-G2-6B	9	6
R0-G2-6B	10	8
R0-G2-6C	1	8
R0-G2-6C	2	8
R0-G2-6C	3	8
R0-G2-6C	4	0
R0-G2-6C	5	0
R0-G2-6C	6	8
R0-G2-6C	7	0
R0-G2-6C	8	6
R0-G2-6C	9	6
R0-G2-6C	10	4
R0-G2-6D	1	2
R0-G2-6D	2	6
R0-G2-6D	3	4
R0-G2-6D	4	0
R0-G2-6D	5	2
R0-G2-6D	6	8
R0-G2-6D	7	0
R0-G2-6D	8	6
R0-G2-6D	9	6
R0-G2-6D	10	0
R0-G2-6E	1	0
R0-G2-6E	2	0
R0-G2-6E	3	8
R0-G2-6E	4	6
R0-G2-6E	5	6
R0-G2-6E	6	2
R0-G2-6E	7	8
R0-G2-6E	8	8
R0-G2-6E	9	4
R0-G2-6E	10	6
R0-G2-6F	1	2
R0-G2-6F	2	0
R0-G2-6F	3	8
R0-G2-6F	4	6
R0-G2-6F	5	6
R0-G2-6F	6	8
R0-G2-6F	7	6
R0-G2-6F	8	6
R0-G2-6F	9	8
R0-G2-6F	10	8
R0-G2-6G	1	0
R0-G2-6G	2	6
R0-G2-6G	3	0
R0-G2-6G	4	6
R0-G2-6G	5	2
R0-G2-6G	6	8
R0-G2-6G	7	6
R0-G2-6G	8	6
R0-G2-6G	9	6
R0-G2-6G	10	8
R0-G2-6H	1	6
R0-G2-6H	2	4
R0-G2-6H	3	8
R0-G2-6H	4	6
R0-G2-6H	5	2
R0-G2-6H	6	0
R0-G2-6H	7	8
R0-G2-6H	8	8
R0-G2-6H	9	8
R0-G2-6H	10	6
R0-G2-6I	1	4
R0-G2-6I	2	0
R0-G2-6I	3	0
R0-G2-6I	4	6
R0-G2-6I	5	6
R0-G2-6I	6	2
R0-G2-6I	7	6
R0-G2-6I	8	4
R0-G2-6I	9	8
R0-G2-6I	10	0
R0-G2-6J	1	6
R0-G2-6J	2	4
R0-G2-6J	3	6
R0-G2-6J	4	6
R0-G2-6J	5	6
R0-G2-6J	6	2
R0-G2-6J	7	6
R0-G2-6J	8	0
R0-G2-6J	9	2
R0-G2-6J	10	8
;

proc print data=severity;

proc glimmix data=severity;
class isolate rep;
model disease=isolate;
random rep;
lsmeans isolate/ lines;
run;``````

Solution
‎06-20-2017 12:56 PM
Posts: 2,065

## Re: data analysis with categorical response

[ Edited ]
disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0. disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0

The onset of symptoms are on a continuous scale, although the recording of the levels breaks the values into discrete numbers. Whether you analyze this as continuous or discrete, they are both approximations to the actual value of number of days to onset of symptoms. Which is the better approximation? No way of knowing, but I lean towards continuous. In fact, you could select the midpoint of the range as your continuous level, which would be an even better approximation: 3.5 = 1-6 days, 8.5  = 7-10 days, 13.5 = 11-16 days, etc. (And if you're going to a similar study in the future, don't group the results into discrete categories, record the actual number of days!)

proc glimmix data=severity;
class isolate rep;
model disease=isolate;
random rep;
lsmeans isolate/ lines;
run;

I don't see a need for rep in the model, if you leave it out, then the replicates are lumped into the random error, which is where they should be. If you are going to explicitly include rep in the model, it must be nested within isolate, as in

```model disease = isolate rep(isolate);
random rep(isolate);```

otherwise you will not get the right result.

--
Paige Miller
Contributor
Posts: 22

## Re: data analysis with categorical response

Thank you so much for the suggestions and the correction of the SAS code!

I actually have the actual number of days recorded, I can definitely try that for the analysis.

Thanks again! I really appreciated your great help!

Contributor
Posts: 22

## Re: data analysis with categorical response

Hi PaigeMiller,

Sorry to keep bugging you. I just came across this question when I was trying to do the analysis using the actual number of days instead of the disease severity values assigned.

The experiment was done in a 28-day period, some of the plants did not show any symptoms at day 28. Theoretically, the number of days for those plants did not show any symptoms (healthy plants) will be infinite. I was wondering how should I deal with this kind of situation? Hope you could give me some tips.

Thank you!

Posts: 2,065

## Re: data analysis with categorical response

This is called right-censored data, when the measurement stops at some time, but the true value you'd like to observe hasn't occured yet. Here is an example where right-censored data is analyzed

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifereg_sec...

--
Paige Miller
Contributor
Posts: 22