BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
pdhara
Calcite | Level 5

I have categorical variables (Education  and Gender) and numerical variables (Age , Income and No.of policies). Age and Income should be grouped as below. I want to run a logistic model on the below data. Which of the following codes is correct? So basically my question is ,should the categorical variables education and gender , binned numeric variables (age and income) be included as class variables explicitly (code 2) or can all of these variables be put together  as done in code 1?

 

TargetAgeIncomeEducationGenderPolicy_count
126500000GraduateF2
038300000GraduateM4
0421000000Post GraduateF3
0682200000Post GraduateM5
0186500012thM1
0713500000Post GraduateM2
1402400000Post GraduateM2
1435000000Post GraduateM1
1527000000Post GraduateM5
06110000000PHDF7
033650000GraduateM3
0148000010thM1
058200000GraduateM4

 

Age_GroupIncome_group
<20<100000
20-30100000-500000
30-40500000-1000000
40-501000000-2000000
50-60>2000000
>60 

 

 

/*Code 1:*/

 

proc logistic data=test descending

plots(only)=(roc(id=obs) effect)  PLOTS(MAXPOINTS=NONE)

namelen=34 outmodel=Logistic_Result;

model target=

Age_Group

Income_group

Policy_count

Gender

Education

/ selection=stepwise

  slentry=0.05

  slstay=0.05

 

outroc = ROC_Stats

lackfit rsq stb;

output out =pred p=phat ;

run;

 

 

/*Code 2:*/

 

PROC LOGISTIC DATA=test

Namelen=34  PLOTS(ONLY)=ALL;

CLASS  age_group (PARAM=EFFECT)  income_group (PARAM=EFFECT)  Gender (PARAM=EFFECT)    Education (PARAM=EFFECT)

Model target (event=’1’)=  Policy_count

/          SELECTION=STEPWISE

                                SLE=0.05

                                SLS=0.05

                                LACKFIT

                                LINK=LOGIT

                                CLPARM=WALD

                                CLODDS=WALD

                                ALPHA=0.05;

RUN;

 

1 ACCEPTED SOLUTION

Accepted Solutions
sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Neither Code 1 or Code 2 is correct. You need to include Age_group, Income_group, Education, and Gender in the CLASS statement (as in Code 2) as well as in the MODEL statement (Code 1).

 

The default is PARAM=EFFECT, so you do not need to include that specification unless you just want to.

 

I hope this helps.

 

View solution in original post

2 REPLIES 2
Ksharp
Super User

Only Education and Gender need to be included in CLASS statement.

 

 

proc logistic data=test descending

plots(only)=(roc(id=obs) effect)  PLOTS(MAXPOINTS=NONE)

namelen=34 outmodel=Logistic_Result;


class age education; model target= Age_Group Income_group Policy_count Gender Education / selection=stepwise slentry=0.05 slstay=0.05 outroc = ROC_Stats lackfit rsq stb; output out =pred p=phat ; run;
 

 

sld
Rhodochrosite | Level 12 sld
Rhodochrosite | Level 12

Neither Code 1 or Code 2 is correct. You need to include Age_group, Income_group, Education, and Gender in the CLASS statement (as in Code 2) as well as in the MODEL statement (Code 1).

 

The default is PARAM=EFFECT, so you do not need to include that specification unless you just want to.

 

I hope this helps.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 753 views
  • 1 like
  • 3 in conversation