12-09-2015 10:45 AM
I use PROC LOGISTIC to to find the most predictive combination of parameters(model) for diagnosis. The code is below:
%let list_param = AP01 AP02 AP03 AP04 AP05 AP06 AP07 AP08 AP09 AP10 AP11 AP12 AP13 AP14 AP15 AP16 AP17 AP18 AP19 AP20
AP21 AP22 AP23 AP24 AP25 AP26 AP27 AP28 AI01 AI02 AI03 AI04 AI05 AI06 AI07 AI08 AI09 AI10 AI11 AI12
AI13 AI14 AI15 AI16 AI17 AI18 AI19 AI20 AI21;
proc logistic data= Source plots(only)= roc;
model acrstat = &list_param. age sexn reasvis/ CTABLE pprob=0.5 SELECTION=stepwise;
output out=probs6 PREDPROBS= i;
acrstat - is dependent variable with values if the diagnosis is or no.
Predictors - list of parameters - &list_param, age, sex - all n umeric variables, reasvis - character variable.
And there are only 9 patients with diagnosis and 320 without it (Total number of subjects - 329) in Source dataset.
And I have some problems with it - actually procedure predicts that no diagnosis will occur. I think it is becouse of small amount of subjects with diagnosis.
But is it possible to find the model which will predict the occurance of diagnosis?
12-09-2015 11:17 AM
The large number of variables is perhaps a factor as well. When you look at the 9 diagnosis and all of the variables in the model to any 2 have the same factors? You don't indicate what your variables represent but if they are all of a "yes/no" or "Present/not present" it should be easy to check. If none of them have any in common then I wouldn't be surprised.
You could examine using Selection=Score see if any of the subsets do very well.
12-10-2015 02:52 AM
You don't indicate what your variables represent
The variables are numeric with 1 for 'Yes' and 0 for 'No'.
Also the removing of parameters that are not statistically significant is performed by SELECTION=stepwise.
12-09-2015 01:03 PM
This is a logistic regression model with a small event rate. There are some known ways to deal with this, including assiging prior probabiliites via a Bayesian methodology.
You can find a better model by removing some variables that don't add value to the regression.
12-10-2015 02:41 AM
The removing of parameters that are not statistically significant is performed by SELECTION=stepwise option. But despite on this I get the classification table without predicted diagnosis.