BookmarkSubscribeRSS Feed
robertrao
Quartz | Level 8


Hi ,

Could you help me in this analysis???

I have around 70 ID's and each of them belong to either of those 3 types.(type1 , type2, or type3)

and we have several variables like shown.(both character and numeric)

I want to statistically see if any of those variables are predisposing a person over a particular TYPE(of those 3 types)  than the other!!!!!

I think we need to go with regression analysis. But I am not sure about the process because there are both numeric and character variables.

also i am not sure which of the values under those char variables  should be given a 1 and which ones a zero

Please help me in this analysis

ID     Type        var1(char)      var2(char)        var3(num)         var4(num)        var5(num)

101   type1      M                    bacteria                1                        .                        1

102   type2      F                    no bacteria             .                        1                        1

103   type3      M                     no bacteria           1                        1                        1

104  type2       M                     bacteria                1                         .                         1

105  type1       F                      bacteria                 .                         .                         1

Thanks

6 REPLIES 6
SteveDenham
Jade | Level 19

Looks to me like a GENMOD/GLIMMIX problem, with a multinomial nominal dependent variable (Type), and then two class variables and two numeric variables.  The most concerning thing is the number of missing values for the numeric predictors.  Does the 1 indicate the presence of something and missing non-presence?

As far as the character variables translating to numeric--it makes no difference which gets a 0 and which gets a 1 so long as they are dichotomous.  You can deal with that in several ways: A CLASS statement with the REF= option; translation in a data step; coding statements in GENMOD/GLI(MMIX are 3 examples.

Another approach would be clustering into 3 clusters and comparing the results to Type, via PROC FREQ.

Steve Denham

robertrao
Quartz | Level 8

Hi Steve,

Thanks for the reply.

yes 1 indicates presence and missing indicates non presence...

do I need to convert the missing to zero's?????????

FOR SOME VARIABLES MISSING COULD MEAN THERE IS NO VALUE RECORDED.HOW DO I DEAL WITH THESE?????like in the gender some could have a missing /

bacteria and no bacteria are the only two possible values under variable2 above. But some could have missing  since they dint take the test or something like that

I was reading this paper early today which uses logistic...can I go with this approach????

page 3 bottom part!!!!!

also has the ref options explained????

http://support.sas.com/resources/papers/proceedings12/317-2012.pdf

SteveDenham
Jade | Level 19

Hi Robert,

What it now looks to me like is that all of the independent variables can be considered as CLASS variables.  I don't know quite yet how you are going to deal with the missings for var3, var4 and var5, if some are zeroes and some are not observed.  That part of the data will have to be cleaned up.

For now a simple approach would be PROC GLIMMIX to handle the nominal nature of type.  Here is some code to get started :

proc glimmix data=yourdata;

class var1 var2 var3 var4 var5;

model type = var1 var2 var3 var4 var5/dist=multinomial

                                                       link=glogit

                                                       solution

                                                       oddsratio;

run;

Steve Denham


PGStats
Opal | Level 21

Looks to me like the ideal candidate for PROC ADAPTIVEREG, which I haven't tried yet :smileyplain:. It handles nominal dependent variables, missing predictors, variable selection, nonlinear effects. Almost too good to be true.

Anybody tried it?

PG

PG
SteveDenham
Jade | Level 19

Haven't tried ADAPTIVEREG, but adaptive splines look pretty pwerful for noisy data, data with strange missing patterns and what I would call "clumpy" data.

Steve Denham

Ksharp
Super User

Hi. Maybe you should use Generalized Logits Model for Multinomial Logistic Models .

proc logistic executes nominal response variables analysis ,since the response variable no longer has the ordering ,we can no longer fit a proportional odds model to the data.But we can fit a generalized logits model . ( also you can try proc catmod ) .

Considering all of your variables are category  variables . Get the count number before using  Generalized Logits Model.

You'd better fit Saturated Model before submitting the following code - only fit the main effect model .

proc sql;
create table temp as
 select *,count(*) as count
  from have
   group by type, var1,var2, var3 ,var4 ,var5;
quit;
proc logistic data=temp order=internal;
 freq count;
 class var1 - var5 /order=data;
 model type=var1 - var5 /link=glogit scale=none aggregate;
run;

Xia Keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 1633 views
  • 0 likes
  • 4 in conversation