I am curious about how the logp values of different variables are calculated when building a Loss Given Default (LGD) model using a decision tree. In other examples I found online, the logp value calculation is always explained with a binary target variable. However, in an LGD model, the target variable is not binary but continuous. How does the logp calculation work when assigning "values" to variables for prediction in this case? Since there is no "good" and "bad" values for the target variable, the classification method and the logp value calculation must be different.
I do not have an answer to your question at hand but can tell you what I know about your question:
(1) Trees are naturally categorical. When encountering continuous variables, trees categorize them in the process of its growth (i.e., in the model building process). After all, trees are chiefly tools of classification. Its is less powerful in predicting a continuous outcome. For instance, in "Example 15.3 Creating a Regression Tree" of the documentation of the HPSPLIT procedure, which I will mention in more detail later, the predicted outcomes are simply the arithmetic means of each end node. Please note that while this resembles the rationale of linear regression, the continuous independent variables are subject to categorization in tree-based models. That leads to loss of information.
(2) I happen to be reading a book on credit risk analysis named Amazon.com: Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS (Wiley .... This book includes the discussion of modeling LGD in SAS as well as building tree-based models in SAS Enterprise Miner. You may take a look at the book and see if there is anything useful. By the way, the book also contains contents regarding more sophisticated aspects of credit risk modeling and the usage of more advanced SAS modules like the QLIM procedure, which other books do not usually discuss.
(3) If SAS instead of SAS Enterprise Miner is also available, you can also take a look at the HPSPLIT procedure. It is a statistical procedure capable of building tree-based models. More information on PROC HPSPILT can be found in SAS documentation, namely SAS Help.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.