I have categorical variables (Education and Gender) and numerical variables (Age , Income and No.of policies). Age and Income should be grouped as below. I want to run a logistic model on the below data. Which of the following codes is correct? So basically my question is ,should the categorical variables education and gender , binned numeric variables (age and income) be included as class variables explicitly (code 2) or can all of these variables be put together as done in code 1?
Target | Age | Income | Education | Gender | Policy_count |
1 | 26 | 500000 | Graduate | F | 2 |
0 | 38 | 300000 | Graduate | M | 4 |
0 | 42 | 1000000 | Post Graduate | F | 3 |
0 | 68 | 2200000 | Post Graduate | M | 5 |
0 | 18 | 65000 | 12th | M | 1 |
0 | 71 | 3500000 | Post Graduate | M | 2 |
1 | 40 | 2400000 | Post Graduate | M | 2 |
1 | 43 | 5000000 | Post Graduate | M | 1 |
1 | 52 | 7000000 | Post Graduate | M | 5 |
0 | 61 | 10000000 | PHD | F | 7 |
0 | 33 | 650000 | Graduate | M | 3 |
0 | 14 | 80000 | 10th | M | 1 |
0 | 58 | 200000 | Graduate | M | 4 |
Age_Group | Income_group |
<20 | <100000 |
20-30 | 100000-500000 |
30-40 | 500000-1000000 |
40-50 | 1000000-2000000 |
50-60 | >2000000 |
>60 |
/*Code 1:*/
proc logistic data=test descending
plots(only)=(roc(id=obs) effect) PLOTS(MAXPOINTS=NONE)
namelen=34 outmodel=Logistic_Result;
model target=
Age_Group
Income_group
Policy_count
Gender
Education
/ selection=stepwise
slentry=0.05
slstay=0.05
outroc = ROC_Stats
lackfit rsq stb;
output out =pred p=phat ;
run;
/*Code 2:*/
PROC LOGISTIC DATA=test
Namelen=34 PLOTS(ONLY)=ALL;
CLASS age_group (PARAM=EFFECT) income_group (PARAM=EFFECT) Gender (PARAM=EFFECT) Education (PARAM=EFFECT)
Model target (event=’1’)= Policy_count
/ SELECTION=STEPWISE
SLE=0.05
SLS=0.05
LACKFIT
LINK=LOGIT
CLPARM=WALD
CLODDS=WALD
ALPHA=0.05;
RUN;
Neither Code 1 or Code 2 is correct. You need to include Age_group, Income_group, Education, and Gender in the CLASS statement (as in Code 2) as well as in the MODEL statement (Code 1).
The default is PARAM=EFFECT, so you do not need to include that specification unless you just want to.
I hope this helps.
Only Education and Gender need to be included in CLASS statement.
proc logistic data=test descending
plots(only)=(roc(id=obs) effect) PLOTS(MAXPOINTS=NONE)
namelen=34 outmodel=Logistic_Result;
class age education;
model target=
Age_Group
Income_group
Policy_count
Gender
Education
/ selection=stepwise
slentry=0.05
slstay=0.05
outroc = ROC_Stats
lackfit rsq stb;
output out =pred p=phat ;
run;
Neither Code 1 or Code 2 is correct. You need to include Age_group, Income_group, Education, and Gender in the CLASS statement (as in Code 2) as well as in the MODEL statement (Code 1).
The default is PARAM=EFFECT, so you do not need to include that specification unless you just want to.
I hope this helps.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.