## Regression Analysis???

Super Contributor
Posts: 1,041

# Regression Analysis???

Hi ,

Could you help me in this analysis???

I have around 70 ID's and each of them belong to either of those 3 types.(type1 , type2, or type3)

and we have several variables like shown.(both character and numeric)

I want to statistically see if any of those variables are predisposing a person over a particular TYPE(of those 3 types)  than the other!!!!!

I think we need to go with regression analysis. But I am not sure about the process because there are both numeric and character variables.

also i am not sure which of the values under those char variables  should be given a 1 and which ones a zero

ID     Type        var1(char)      var2(char)        var3(num)         var4(num)        var5(num)

101   type1      M                    bacteria                1                        .                        1

102   type2      F                    no bacteria             .                        1                        1

103   type3      M                     no bacteria           1                        1                        1

104  type2       M                     bacteria                1                         .                         1

105  type1       F                      bacteria                 .                         .                         1

Thanks

Posts: 2,655

## Re: Regression Analysis???

Looks to me like a GENMOD/GLIMMIX problem, with a multinomial nominal dependent variable (Type), and then two class variables and two numeric variables.  The most concerning thing is the number of missing values for the numeric predictors.  Does the 1 indicate the presence of something and missing non-presence?

As far as the character variables translating to numeric--it makes no difference which gets a 0 and which gets a 1 so long as they are dichotomous.  You can deal with that in several ways: A CLASS statement with the REF= option; translation in a data step; coding statements in GENMOD/GLI(MMIX are 3 examples.

Another approach would be clustering into 3 clusters and comparing the results to Type, via PROC FREQ.

Steve Denham

Super Contributor
Posts: 1,041

## Re: Regression Analysis???

Hi Steve,

yes 1 indicates presence and missing indicates non presence...

do I need to convert the missing to zero's?????????

FOR SOME VARIABLES MISSING COULD MEAN THERE IS NO VALUE RECORDED.HOW DO I DEAL WITH THESE?????like in the gender some could have a missing /

bacteria and no bacteria are the only two possible values under variable2 above. But some could have missing  since they dint take the test or something like that

I was reading this paper early today which uses logistic...can I go with this approach????

page 3 bottom part!!!!!

also has the ref options explained????

http://support.sas.com/resources/papers/proceedings12/317-2012.pdf

Posts: 2,655

## Re: Regression Analysis???

Hi Robert,

What it now looks to me like is that all of the independent variables can be considered as CLASS variables.  I don't know quite yet how you are going to deal with the missings for var3, var4 and var5, if some are zeroes and some are not observed.  That part of the data will have to be cleaned up.

For now a simple approach would be PROC GLIMMIX to handle the nominal nature of type.  Here is some code to get started :

proc glimmix data=yourdata;

class var1 var2 var3 var4 var5;

model type = var1 var2 var3 var4 var5/dist=multinomial

solution

oddsratio;

run;

Steve Denham

Posts: 5,045

## Re: Regression Analysis???

Looks to me like the ideal candidate for PROC ADAPTIVEREG, which I haven't tried yet :smileyplain:. It handles nominal dependent variables, missing predictors, variable selection, nonlinear effects. Almost too good to be true.

Anybody tried it?

PG

PG
Posts: 2,655

## Re: Regression Analysis???

Haven't tried ADAPTIVEREG, but adaptive splines look pretty pwerful for noisy data, data with strange missing patterns and what I would call "clumpy" data.

Steve Denham

Super User
Posts: 10,205

## Re: Regression Analysis???

Hi. Maybe you should use Generalized Logits Model for Multinomial Logistic Models .

proc logistic executes nominal response variables analysis ,since the response variable no longer has the ordering ,we can no longer fit a proportional odds model to the data.But we can fit a generalized logits model . ( also you can try proc catmod ) .

Considering all of your variables are category  variables . Get the count number before using  Generalized Logits Model.

You'd better fit Saturated Model before submitting the following code - only fit the main effect model .

```proc sql;
create table temp as
select *,count(*) as count
from have
group by type, var1,var2, var3 ,var4 ,var5;
quit;
proc logistic data=temp order=internal;
freq count;
class var1 - var5 /order=data;
model type=var1 - var5 /link=glogit scale=none aggregate;
run;

```

Xia Keshan

Discussion stats
• 6 replies
• 402 views
• 0 likes
• 4 in conversation