11-27-2012 03:07 AM
I need to do a logisc regression. My outcome is dichotom. The independent varialbels have different characters, as shown in the spreadsheet. I am in doubt - which SAS code(s) should I use.
I hope you can help
|ID||outcome||Var1 (categorial)||Var 2 (dichotom)||Var 3 (continuing)|
11-27-2012 07:35 AM
This should get you started, but beware, there are a lot of pitfalls in this area, the primary being enough outcomes of interest for the number of variables included in the model. I am choosing PROC GENMOD because of the presence of the categorical variables Var1 and Var2, and the ease of specifying their effects in PROC GENMOD as opposed to PROC LOGISTIC.
proc genmod data=yourdata;
class var1 var2;
model outcome=var1 var2 var3/dist=binary solution;
lsmeans var1 var2/ilink;
Other things to consider--you have a continuous variable. The analysis I have given here is referred to in the literature as analysis of covariance, and this particular model assumes that the "slope" due to var3 is constant across all levels of var1 and var2. Without knowing how much data is available, I don't know whether you can efficiently investigate whether or not there is evidence for this assumption. Anyway, this should get you started. Stop back in when you have tried it, and see if it is giving you answers that are interpretable. Be sure to read the documentation, not only for PROC GENMOD, but for PROC GLIMMIX and PROC LOGISTIC to get an understanding of exactly what SAS is doing in each of these.
12-19-2012 03:39 AM
Dear Steve Denham
Tahnk you for your helpful reply. I am sorry for getting back to you this late. I ended up doing the 'proc logistic' and worked it over over with a member of the statistical department. But thank you anyway.
12-31-2012 02:23 AM
Dear Steve Denham,
I came to the same problem,after I posted
(Proc Logisitic result not include ordinal variables)I found your reply ,very useful.
Thus ,I treat those class variables as continuous,code as below
proc logistic data=slide.sb_vm_training outmodel=slide.model;
model dv = N2 N3 N4 N5 N6 N7 N10 N11 N12 N13 Prin1 Prin2 Prin3 factor1 factor2 factor3 factor4 factor5 factor6 factor7 factor8 /selection=stepwise ;
then the N variable did into the model.
Can you give a brief explaination why proc logistic has shortcoming in nominal and continuous variables combined?
or can you give me some papers to read?
12-31-2012 07:59 AM
Point one: Nominal and continuous combined often leads to quasi-separation. For papers on this problem, read the documentation for PROC LOGISTIC, and follow the references given there.
Point two: Search this site and the SAS-L listserv for comments regarding stepwise selection of variables. In particular, find the paper by Flom and Cassell at http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf. Stepwise has a variety of problems, not the least of which is that any of the p values associated with the parameters are wrong, as the distributional assumptions are not met. They are also biased towards zero.
01-04-2013 03:02 AM
Dear Steve Denham,
Thank u for your wonderful explanation.
Can I ask one more question?
proc princomp data=slide.sb_vm10 cov outstat=temp_prin1;
for eg variables group A with large scope is within (-1M,1M),variables group B with small scope is within (-1,1),
it seems that the coefficient for Eigenvectors like prin1 will be Zero for those variables group B.
Do u know in mind how to deal with such things?
Thx in advance.
01-04-2013 07:12 AM
Rescale! Remember that prinicipal components and the resulting eigenvalues are based on the amount of variability explained. If all of the variability is in group A, then the component will only have a loading on A, as B contributes almost nothing to the total variability.