07-29-2014 09:40 AM
Could you help me in this analysis???
I have around 70 ID's and each of them belong to either of those 3 types.(type1 , type2, or type3)
and we have several variables like shown.(both character and numeric)
I want to statistically see if any of those variables are predisposing a person over a particular TYPE(of those 3 types) than the other!!!!!
I think we need to go with regression analysis. But I am not sure about the process because there are both numeric and character variables.
also i am not sure which of the values under those char variables should be given a 1 and which ones a zero
Please help me in this analysis
ID Type var1(char) var2(char) var3(num) var4(num) var5(num)
101 type1 M bacteria 1 . 1
102 type2 F no bacteria . 1 1
103 type3 M no bacteria 1 1 1
104 type2 M bacteria 1 . 1
105 type1 F bacteria . . 1
07-29-2014 03:44 PM
Looks to me like a GENMOD/GLIMMIX problem, with a multinomial nominal dependent variable (Type), and then two class variables and two numeric variables. The most concerning thing is the number of missing values for the numeric predictors. Does the 1 indicate the presence of something and missing non-presence?
As far as the character variables translating to numeric--it makes no difference which gets a 0 and which gets a 1 so long as they are dichotomous. You can deal with that in several ways: A CLASS statement with the REF= option; translation in a data step; coding statements in GENMOD/GLI(MMIX are 3 examples.
Another approach would be clustering into 3 clusters and comparing the results to Type, via PROC FREQ.
07-29-2014 04:16 PM
Thanks for the reply.
yes 1 indicates presence and missing indicates non presence...
do I need to convert the missing to zero's?????????
FOR SOME VARIABLES MISSING COULD MEAN THERE IS NO VALUE RECORDED.HOW DO I DEAL WITH THESE?????like in the gender some could have a missing /
bacteria and no bacteria are the only two possible values under variable2 above. But some could have missing since they dint take the test or something like that
I was reading this paper early today which uses logistic...can I go with this approach????
page 3 bottom part!!!!!
also has the ref options explained????
07-30-2014 08:15 AM
What it now looks to me like is that all of the independent variables can be considered as CLASS variables. I don't know quite yet how you are going to deal with the missings for var3, var4 and var5, if some are zeroes and some are not observed. That part of the data will have to be cleaned up.
For now a simple approach would be PROC GLIMMIX to handle the nominal nature of type. Here is some code to get started :
proc glimmix data=yourdata;
class var1 var2 var3 var4 var5;
model type = var1 var2 var3 var4 var5/dist=multinomial
07-29-2014 10:24 PM
Looks to me like the ideal candidate for PROC ADAPTIVEREG, which I haven't tried yet :smileyplain:. It handles nominal dependent variables, missing predictors, variable selection, nonlinear effects. Almost too good to be true.
Anybody tried it?
07-30-2014 08:18 AM
Haven't tried ADAPTIVEREG, but adaptive splines look pretty pwerful for noisy data, data with strange missing patterns and what I would call "clumpy" data.
08-03-2014 03:30 AM
Hi. Maybe you should use Generalized Logits Model for Multinomial Logistic Models .
proc logistic executes nominal response variables analysis ,since the response variable no longer has the ordering ,we can no longer fit a proportional odds model to the data.But we can fit a generalized logits model . ( also you can try proc catmod ) .
Considering all of your variables are category variables . Get the count number before using Generalized Logits Model.
You'd better fit Saturated Model before submitting the following code - only fit the main effect model .
proc sql; create table temp as select *,count(*) as count from have group by type, var1,var2, var3 ,var4 ,var5; quit; proc logistic data=temp order=internal; freq count; class var1 - var5 /order=data; model type=var1 - var5 /link=glogit scale=none aggregate; run;