BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jjin0322
Calcite | Level 5

Hi All,

 

I just got some data from my experiment. The depedent variable values in the data set were categorically assigned as 0, 2, 4, 6, 8, 10. Other than the categorical response, the data set is quite simple with only one treatment and 10 replicates. I'm just wondering if it's necesary to do some data transformation for the categorical response? And what will be a good program to do the analysis? Proc mixed? Proc glimmix?

 

Hope you could give me some advice on analyzing data with categorical response. I'll really appreciate it.

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26
disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0. disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0

 

 

The onset of symptoms are on a continuous scale, although the recording of the levels breaks the values into discrete numbers. Whether you analyze this as continuous or discrete, they are both approximations to the actual value of number of days to onset of symptoms. Which is the better approximation? No way of knowing, but I lean towards continuous. In fact, you could select the midpoint of the range as your continuous level, which would be an even better approximation: 3.5 = 1-6 days, 8.5  = 7-10 days, 13.5 = 11-16 days, etc. (And if you're going to a similar study in the future, don't group the results into discrete categories, record the actual number of days!)

 

proc glimmix data=severity;
class isolate rep;
model disease=isolate;
random rep;
lsmeans isolate/ lines;
run;

 

I don't see a need for rep in the model, if you leave it out, then the replicates are lumped into the random error, which is where they should be. If you are going to explicitly include rep in the model, it must be nested within isolate, as in

 

model disease = isolate rep(isolate);
random rep(isolate);

otherwise you will not get the right result.

 

--
Paige Miller

View solution in original post

11 REPLIES 11
Ksharp
Super User
If your all X Y are category variable. Try 
 Proc catmod   
jjin0322
Calcite | Level 5

Thanks for your suggestion!

 

Yes, in my case, Y which has values of 10,8,6,4,2,0, repsents the disease severity caused by a pathogen. X is the different pathogenic isolates, which is also categorical. What I am trying to do is to compare the disease severity caused by different pathogenic isolates on the same host. 

 

I'll take a look at Proc catmod and try it out, thanks again!

PaigeMiller
Diamond | Level 26

No need for transformation.

 

Can these categorical data levels be assumed to actually be on a continuous scale or ordinal scale?

 

There are many choices for analysis

PROC GLIMMIX

PROC LOGISTIC

PROC CATMOD

PROC GENMOD

 

With the limited information you have provided, I don't think we can advise further

--
Paige Miller
jjin0322
Calcite | Level 5

Hi PaigeMiller,

 

Thanks for your kind reply!

 

My experiment was to compare the disease severity caused by 21 pathogenic isolates on the same host plant. 10 host plants were used as replicates for each single isolate inoculation. So there were 21*10=210 plants used in total. And disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0. I've attached the data set here, if you would like to take a look at it. I'd like to use PROC GLIMIXX, but I am not sure how to specify that the response is categorical in the SAS code. Hope you could give me more suggestions. Thanks in advance! 

PaigeMiller
Diamond | Level 26

People here don't usually open Excel files for fear of viruses or other executables included.

 

I'm not sure why you need to consider these as categories, it seems that treating the results as numeric ought to work better than categories.

 

In that case, the code should be relatively simple:

 

UNTESTED CODE

proc glimmix;
class pathogen; model y = pathogen; run;
--
Paige Miller
jjin0322
Calcite | Level 5

Oh.. sorry about the Excel files.

Since the dependent variable only takes 10, 8, 6, 4, 2, 0, I considered it as categriocal.

I guess this is a really naive question, can I take the dependent variable as numeric although the value is not continuously assigned?

The following the data set and the sas code I tried, just added rep as a random effect.

Really appreciated your help!

   

data severity;
input isolate $ rep disease; 
datalines;

R0-G5-6	1	8
R0-G5-6	2	10
R0-G5-6	3	6
R0-G5-6	4	8
R0-G5-6	5	8
R0-G5-6	6	6
R0-G5-6	7	6
R0-G5-6	8	6
R0-G5-6	9	6
R0-G5-6	10	8
R0-G5-6A	1	0
R0-G5-6A	2	0
R0-G5-6A	3	0
R0-G5-6A	4	0
R0-G5-6A	5	0
R0-G5-6A	6	10
R0-G5-6A	7	0
R0-G5-6A	8	0
R0-G5-6A	9	0
R0-G5-6A	10	0
R0-G5-6B	1	8
R0-G5-6B	2	8
R0-G5-6B	3	8
R0-G5-6B	4	8
R0-G5-6B	5	8
R0-G5-6B	6	10
R0-G5-6B	7	6
R0-G5-6B	8	8
R0-G5-6B	9	8
R0-G5-6B	10	8
R0-G5-6C	1	10
R0-G5-6C	2	10
R0-G5-6C	3	8
R0-G5-6C	4	8
R0-G5-6C	5	8
R0-G5-6C	6	10
R0-G5-6C	7	6
R0-G5-6C	8	8
R0-G5-6C	9	8
R0-G5-6C	10	6
R0-G5-6E	1	6
R0-G5-6E	2	6
R0-G5-6E	3	4
R0-G5-6E	4	6
R0-G5-6E	5	6
R0-G5-6E	6	8
R0-G5-6E	7	6
R0-G5-6E	8	6
R0-G5-6E	9	8
R0-G5-6E	10	8
R0-G5-6F	1	4
R0-G5-6F	2	6
R0-G5-6F	3	6
R0-G5-6F	4	6
R0-G5-6F	5	8
R0-G5-6F	6	8
R0-G5-6F	7	8
R0-G5-6F	8	6
R0-G5-6F	9	8
R0-G5-6F	10	6
R0-G5-6G	1	6
R0-G5-6G	2	4
R0-G5-6G	3	6
R0-G5-6G	4	8
R0-G5-6G	5	6
R0-G5-6G	6	8
R0-G5-6G	7	8
R0-G5-6G	8	10
R0-G5-6G	9	2
R0-G5-6G	10	0
R0-G5-6H	1	10
R0-G5-6H	2	10
R0-G5-6H	3	4
R0-G5-6H	4	10
R0-G5-6H	5	6
R0-G5-6H	6	6
R0-G5-6H	7	8
R0-G5-6H	8	6
R0-G5-6H	9	10
R0-G5-6H	10	8
R0-G5-6I	1	6
R0-G5-6I	2	6
R0-G5-6I	3	8
R0-G5-6I	4	6
R0-G5-6I	5	6
R0-G5-6I	6	8
R0-G5-6I	7	8
R0-G5-6I	8	6
R0-G5-6I	9	8
R0-G5-6I	10	6
R0-G5-6J	1	8
R0-G5-6J	2	8
R0-G5-6J	3	8
R0-G5-6J	4	8
R0-G5-6J	5	8
R0-G5-6J	6	6
R0-G5-6J	7	6
R0-G5-6J	8	6
R0-G5-6J	9	8
R0-G5-6J	10	6
R0-G2-6	1	8
R0-G2-6	2	8
R0-G2-6	3	6
R0-G2-6	4	8
R0-G2-6	5	6
R0-G2-6	6	4
R0-G2-6	7	8
R0-G2-6	8	8
R0-G2-6	9	2
R0-G2-6	10	6
R0-G2-6A	1	6
R0-G2-6A	2	8
R0-G2-6A	3	8
R0-G2-6A	4	6
R0-G2-6A	5	8
R0-G2-6A	6	0
R0-G2-6A	7	8
R0-G2-6A	8	0
R0-G2-6A	9	0
R0-G2-6A	10	0
R0-G2-6B	1	6
R0-G2-6B	2	0
R0-G2-6B	3	0
R0-G2-6B	4	8
R0-G2-6B	5	6
R0-G2-6B	6	4
R0-G2-6B	7	4
R0-G2-6B	8	2
R0-G2-6B	9	6
R0-G2-6B	10	8
R0-G2-6C	1	8
R0-G2-6C	2	8
R0-G2-6C	3	8
R0-G2-6C	4	0
R0-G2-6C	5	0
R0-G2-6C	6	8
R0-G2-6C	7	0
R0-G2-6C	8	6
R0-G2-6C	9	6
R0-G2-6C	10	4
R0-G2-6D	1	2
R0-G2-6D	2	6
R0-G2-6D	3	4
R0-G2-6D	4	0
R0-G2-6D	5	2
R0-G2-6D	6	8
R0-G2-6D	7	0
R0-G2-6D	8	6
R0-G2-6D	9	6
R0-G2-6D	10	0
R0-G2-6E	1	0
R0-G2-6E	2	0
R0-G2-6E	3	8
R0-G2-6E	4	6
R0-G2-6E	5	6
R0-G2-6E	6	2
R0-G2-6E	7	8
R0-G2-6E	8	8
R0-G2-6E	9	4
R0-G2-6E	10	6
R0-G2-6F	1	2
R0-G2-6F	2	0
R0-G2-6F	3	8
R0-G2-6F	4	6
R0-G2-6F	5	6
R0-G2-6F	6	8
R0-G2-6F	7	6
R0-G2-6F	8	6
R0-G2-6F	9	8
R0-G2-6F	10	8
R0-G2-6G	1	0
R0-G2-6G	2	6
R0-G2-6G	3	0
R0-G2-6G	4	6
R0-G2-6G	5	2
R0-G2-6G	6	8
R0-G2-6G	7	6
R0-G2-6G	8	6
R0-G2-6G	9	6
R0-G2-6G	10	8
R0-G2-6H	1	6
R0-G2-6H	2	4
R0-G2-6H	3	8
R0-G2-6H	4	6
R0-G2-6H	5	2
R0-G2-6H	6	0
R0-G2-6H	7	8
R0-G2-6H	8	8
R0-G2-6H	9	8
R0-G2-6H	10	6
R0-G2-6I	1	4
R0-G2-6I	2	0
R0-G2-6I	3	0
R0-G2-6I	4	6
R0-G2-6I	5	6
R0-G2-6I	6	2
R0-G2-6I	7	6
R0-G2-6I	8	4
R0-G2-6I	9	8
R0-G2-6I	10	0
R0-G2-6J	1	6
R0-G2-6J	2	4
R0-G2-6J	3	6
R0-G2-6J	4	6
R0-G2-6J	5	6
R0-G2-6J	6	2
R0-G2-6J	7	6
R0-G2-6J	8	0
R0-G2-6J	9	2
R0-G2-6J	10	8
;

proc print data=severity;

proc glimmix data=severity;
class isolate rep;
model disease=isolate;
random rep;
lsmeans isolate/ lines;
run;

 

PaigeMiller
Diamond | Level 26
disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0. disease severity was assigned accroding to when the plants showed disease symptoms: 1-6 days=10, 7-10 days=8, 11-16 days=6, 17-22 days=4, 23-28 days=2, and no symptoms at day 28=0

 

 

The onset of symptoms are on a continuous scale, although the recording of the levels breaks the values into discrete numbers. Whether you analyze this as continuous or discrete, they are both approximations to the actual value of number of days to onset of symptoms. Which is the better approximation? No way of knowing, but I lean towards continuous. In fact, you could select the midpoint of the range as your continuous level, which would be an even better approximation: 3.5 = 1-6 days, 8.5  = 7-10 days, 13.5 = 11-16 days, etc. (And if you're going to a similar study in the future, don't group the results into discrete categories, record the actual number of days!)

 

proc glimmix data=severity;
class isolate rep;
model disease=isolate;
random rep;
lsmeans isolate/ lines;
run;

 

I don't see a need for rep in the model, if you leave it out, then the replicates are lumped into the random error, which is where they should be. If you are going to explicitly include rep in the model, it must be nested within isolate, as in

 

model disease = isolate rep(isolate);
random rep(isolate);

otherwise you will not get the right result.

 

--
Paige Miller
jjin0322
Calcite | Level 5

Thank you so much for the suggestions and the correction of the SAS code!

 

I actually have the actual number of days recorded, I can definitely try that for the analysis. 

 

Thanks again! I really appreciated your great help! 

jjin0322
Calcite | Level 5

Hi PaigeMiller,

 

Sorry to keep bugging you. I just came across this question when I was trying to do the analysis using the actual number of days instead of the disease severity values assigned. 

 

The experiment was done in a 28-day period, some of the plants did not show any symptoms at day 28. Theoretically, the number of days for those plants did not show any symptoms (healthy plants) will be infinite. I was wondering how should I deal with this kind of situation? Hope you could give me some tips. 

 

Thank you!

PaigeMiller
Diamond | Level 26

This is called right-censored data, when the measurement stops at some time, but the true value you'd like to observe hasn't occured yet. Here is an example where right-censored data is analyzed

 

https://support.sas.com/documentation/cdl/en/statug/63033/HTML/default/viewer.htm#statug_lifereg_sec...

--
Paige Miller
jjin0322
Calcite | Level 5

Thank you so much! 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 1904 views
  • 1 like
  • 3 in conversation