Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Logistic regresion (proc logistics) vars preparation questions

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 93
Accepted Solution

Logistic regresion (proc logistics) vars preparation questions

Hi,

I am using logistic regresion to predict a target var type (1,0).
One of the vars of my model is a classificarion var.

a_type = ("high", "medium" , "low"), is a prediction var, not the target

I use proc logistics.

I don't know if it is recommended to transform this var in dummy vars like that:

a_type_high = (1,0)
a_type_medium = (1,0)
a_type_low = (1,0)

I suppose that kind of vars are better for logistic regression, isn't it?
If I don't transform the vars, does the proc do the transformation automatically?

 

Another question I also have several continuos/quantitative vars like sales (0-50), mkt_exp (0-1000)
do I have to no a normalization to transform in a var with avg=0 and std = 1?, is that needed?

 

Thanks

 


Accepted Solutions
Solution
‎08-05-2016 10:52 AM
SAS Employee
Posts: 122

Re: Logistic regresion (proc logistics) vars preparation questions

Hi, When you list the variable at Class statement under proc logistic, the Class statement option Param=EFFECT should do the dummy variable for you. The other commonly used option is Param=GLM. There is no fast rule as to which one is better. In using Proc logistic for predictive modeling, these two options are most popular. There are another ~8 options you may explore. That is if your work is more design matrix sensitive. One issue more important for you actually is missing value status on the categorical variable. There is a Missing option at the Class statement you can read. Generally Proc logistic has been optimized continuously so the user does not have to spend time coding stuff manually. As for normalization, the question really relates to if the model is sensitive to distribution of input variables, the interval variables. If the variable is too NOT normal, you should not normalize. Other factors include 1. your link function. Many link functions are distribution tolerant, but not all. 2. sample size. Many modelers tend to ignore normality of input variables when the model universe is big. 3. Really normality matters if univariate study of the input variable is critical for fitting the model: in fitting models like logistic regression, interactions among inputs are often more influential. 4. If one should normalize an interval input, the marginal improvement on its contribution towards the model's overall predictive accuracy tends to be: first, hard to measure. second, if measurable, tends to be insignificant. Hope this helps? Thanks for using SAS. Jason Xin

View solution in original post


All Replies
Super User
Posts: 17,865

Re: Logistic regresion (proc logistics) vars preparation questions

Categorical variables should be placed in the CLASS statement. 

 

If it's your first time doing an analysis I like to find a worked example, work through that, then proceed to my data. 

The documentation has a good example of analysis with categorical predictors. 

 

Another resource:

http://www.ats.ucla.edu/stat/sas/dae/logit.htm

 

Normalization is up to you. If you choose to do so, look at proc stdize. 

 

Are res you using SAS Enterprise Miner?

Frequent Contributor
Posts: 93

Re: Logistic regresion (proc logistics) vars preparation questions

Thanks, I am using Enterprise Guide, no Miner.

I am not sure, when to standarize or not.

 

Thanks for your help

Solution
‎08-05-2016 10:52 AM
SAS Employee
Posts: 122

Re: Logistic regresion (proc logistics) vars preparation questions

Hi, When you list the variable at Class statement under proc logistic, the Class statement option Param=EFFECT should do the dummy variable for you. The other commonly used option is Param=GLM. There is no fast rule as to which one is better. In using Proc logistic for predictive modeling, these two options are most popular. There are another ~8 options you may explore. That is if your work is more design matrix sensitive. One issue more important for you actually is missing value status on the categorical variable. There is a Missing option at the Class statement you can read. Generally Proc logistic has been optimized continuously so the user does not have to spend time coding stuff manually. As for normalization, the question really relates to if the model is sensitive to distribution of input variables, the interval variables. If the variable is too NOT normal, you should not normalize. Other factors include 1. your link function. Many link functions are distribution tolerant, but not all. 2. sample size. Many modelers tend to ignore normality of input variables when the model universe is big. 3. Really normality matters if univariate study of the input variable is critical for fitting the model: in fitting models like logistic regression, interactions among inputs are often more influential. 4. If one should normalize an interval input, the marginal improvement on its contribution towards the model's overall predictive accuracy tends to be: first, hard to measure. second, if measurable, tends to be insignificant. Hope this helps? Thanks for using SAS. Jason Xin
Super User
Posts: 17,865

Re: Logistic regresion (proc logistics) vars preparation questions

[ Edited ]

In the CLASS statement, look at the parameterization options. AFAIK param = Ref is the most common, and most easily interpretable way of specifying your variables. Make sure you review the design matrix and understand your output. 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 658 views
  • 3 likes
  • 3 in conversation