BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
juanvg1972
Pyrite | Level 9

Hi,

I am using logistic regresion to predict a target var type (1,0).
One of the vars of my model is a classificarion var.

a_type = ("high", "medium" , "low"), is a prediction var, not the target

I use proc logistics.

I don't know if it is recommended to transform this var in dummy vars like that:

a_type_high = (1,0)
a_type_medium = (1,0)
a_type_low = (1,0)

I suppose that kind of vars are better for logistic regression, isn't it?
If I don't transform the vars, does the proc do the transformation automatically?

 

Another question I also have several continuos/quantitative vars like sales (0-50), mkt_exp (0-1000)
do I have to no a normalization to transform in a var with avg=0 and std = 1?, is that needed?

 

Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
JasonXin
SAS Employee
Hi, When you list the variable at Class statement under proc logistic, the Class statement option Param=EFFECT should do the dummy variable for you. The other commonly used option is Param=GLM. There is no fast rule as to which one is better. In using Proc logistic for predictive modeling, these two options are most popular. There are another ~8 options you may explore. That is if your work is more design matrix sensitive. One issue more important for you actually is missing value status on the categorical variable. There is a Missing option at the Class statement you can read. Generally Proc logistic has been optimized continuously so the user does not have to spend time coding stuff manually. As for normalization, the question really relates to if the model is sensitive to distribution of input variables, the interval variables. If the variable is too NOT normal, you should not normalize. Other factors include 1. your link function. Many link functions are distribution tolerant, but not all. 2. sample size. Many modelers tend to ignore normality of input variables when the model universe is big. 3. Really normality matters if univariate study of the input variable is critical for fitting the model: in fitting models like logistic regression, interactions among inputs are often more influential. 4. If one should normalize an interval input, the marginal improvement on its contribution towards the model's overall predictive accuracy tends to be: first, hard to measure. second, if measurable, tends to be insignificant. Hope this helps? Thanks for using SAS. Jason Xin

View solution in original post

4 REPLIES 4
Reeza
Super User

Categorical variables should be placed in the CLASS statement. 

 

If it's your first time doing an analysis I like to find a worked example, work through that, then proceed to my data. 

The documentation has a good example of analysis with categorical predictors. 

 

Another resource:

http://www.ats.ucla.edu/stat/sas/dae/logit.htm

 

Normalization is up to you. If you choose to do so, look at proc stdize. 

 

Are res you using SAS Enterprise Miner?

juanvg1972
Pyrite | Level 9

Thanks, I am using Enterprise Guide, no Miner.

I am not sure, when to standarize or not.

 

Thanks for your help

JasonXin
SAS Employee
Hi, When you list the variable at Class statement under proc logistic, the Class statement option Param=EFFECT should do the dummy variable for you. The other commonly used option is Param=GLM. There is no fast rule as to which one is better. In using Proc logistic for predictive modeling, these two options are most popular. There are another ~8 options you may explore. That is if your work is more design matrix sensitive. One issue more important for you actually is missing value status on the categorical variable. There is a Missing option at the Class statement you can read. Generally Proc logistic has been optimized continuously so the user does not have to spend time coding stuff manually. As for normalization, the question really relates to if the model is sensitive to distribution of input variables, the interval variables. If the variable is too NOT normal, you should not normalize. Other factors include 1. your link function. Many link functions are distribution tolerant, but not all. 2. sample size. Many modelers tend to ignore normality of input variables when the model universe is big. 3. Really normality matters if univariate study of the input variable is critical for fitting the model: in fitting models like logistic regression, interactions among inputs are often more influential. 4. If one should normalize an interval input, the marginal improvement on its contribution towards the model's overall predictive accuracy tends to be: first, hard to measure. second, if measurable, tends to be insignificant. Hope this helps? Thanks for using SAS. Jason Xin
Reeza
Super User

In the CLASS statement, look at the parameterization options. AFAIK param = Ref is the most common, and most easily interpretable way of specifying your variables. Make sure you review the design matrix and understand your output. 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 1511 views
  • 3 likes
  • 3 in conversation