03-18-2021
Tenno
Calcite | Level 5
Member since
07-27-2011
- 5 Posts
- 0 Likes Given
- 0 Solutions
- 0 Likes Received
-
Latest posts by Tenno
Subject Views Posted 2746 09-07-2014 09:57 PM 5853 12-03-2012 11:58 AM 5853 12-02-2012 03:32 PM 6178 11-21-2012 11:25 AM 1555 12-13-2011 09:45 AM -
Activity Feed for Tenno
- Posted chi2_discretize.sas on SAS Communities Library. 09-07-2014 09:57 PM
- Posted Re: How can I encode the class values of a categorical variable into a continuous variable? on SAS Data Science. 12-03-2012 11:58 AM
- Posted Re: How can I encode the class values of a categorical variable into a continuous variable? on SAS Data Science. 12-02-2012 03:32 PM
- Posted How can I encode the class values of a categorical variable into a continuous variable? on SAS Data Science. 11-21-2012 11:25 AM
- Posted Math column vector into IML column vector upon creation on SAS/IML Software and Matrix Computations. 12-13-2011 09:45 AM
-
My Library Contributions
Subject Likes Author Latest Post 0
09-07-2014
09:57 PM
DESCRIPTION: %CHI2_DISCRETIZE performs supervised discretization of a continuous variable based on the values of a categorical variable.
The X^2 algorithm is used to recursively subdivide the values of the continuous variable until there is no improvement in the value of the criterion function (logworth).
SYNTAX: See macro header for definition of variables
EXAMPLE: See initial comments in macro for example of use.
See %REPLICATE_RESULTS macro at end of file for additional examples of use.
NOTES: The data may not contain missing values.
AUTHOR: Ross Bettinger
DATE: 07Sep2014
REFERENCE: www.wuss.org/proceedings11/Papers_Bettinger_R_74935.pdf
... View more
- Find more articles tagged with:
- chi-squared
- cutpoint
- cutset
- decision_tree
- discretization
- enterprise_miner
Labels:
12-03-2012
11:58 AM
Thanks, Damien. It is very important not to introduce new errors that may confound the results into a problem which one is trying to solve.
... View more
12-02-2012
03:32 PM
This is an informed answer. Thank you, Mr. Levine. Upon reflection, I could also expand the categorical variable into each of its levels using GLM encoding and create a binary indicator vector for each observation where the class level indicator would be set to 1 and all other indicator values would be set to 0. Then, I could run a principal components analysis on the variable and take the first principal component value, which would represent the projection of the variable along the axis of maximum variance and hence explanatory power. Regardless of technique, however, I would have to create a framework (did someone say "Write a SAS macro"?) to apply this technique to every categorical variable to be encoded. But this would be not a significant task to perform. A related question is: If I use the target (dependent) variable information in constructing the encoded representation of the categorical variable, am I introducing bias into the solution? Bias would distort the modeling results, and could come from dependencies in the data introduced by sampling, for example. Perhaps using target information is not a recommended practice. What do we think about this in general?
... View more
11-21-2012
11:25 AM
I would like to transform a categorically-valued predictor variable into a continuously-valued predictor variable. From, say, character class values into real-valued representations of those values. I know that I can do this in several ways: simply by substituting the frequency of a level for the level value itself or by computing the entropy of a level. I want to generalize the interpretation of the Information Value of a variable from the binary classification "good/bad" application frequently used in credit scoring to a multiclass 1-versus-many representation of the 1-of-N GLM encoding. For example, if there are 3 class values, I would compute the information value of each in turn versus the other two so that, for class labels 'A', 'B', 'C', the three information values would be 'A' vs ('B', 'C'), 'B' vs ('A', 'C') and 'C' vs ('A', 'B') so that I can numerically represent a multiclass categorical variable as a single real-valued variable. I know that there will be only N distinct values produced by this technique, but I will be able to use existing code that works well on continuous-valued variables, and I do not know how to incorporate a GLM-encoded categorical variable into my work. Is there a better way than Information Value to transform a categorical variable into a continuous variable? How does Enterprise Miner process categorical variables? Does EM convert a categorical variable into a real-valued variable and then use the real values in splitting a target variable?
... View more
12-13-2011
09:45 AM
Since many vectors are n x 1 upon definition in mathematical notation, while IML vectors are 1 x n upon creation, if I want to use a vector as n x 1, I have to transpose it or “j” it into a vertical rather than horizontal orientation. Is there an IML option that I do not know about that would automatically create vectors as n x 1?
... View more