About jlh368

Tom · ‎06-02-2021

@jlh368 wrote: Thanks that is a good direction to question. The SQL column is varchar(max). It is used to store multiple types of data with an associated key field. It looks like the field that is being returned was created as an integer when it should have been a character field of length 10. Instead it was stored as an integer hence the max length standard integer being returned. Thanks! So the cause seems to be the way the variable is defined in the remote database. The variable is begin defined as length $32767 in SAS because that is the maximum length that SAS allows for a character variable. If you are using implicit passthru you can control the SAS type created by using the DBSASTYPE= dataset option. The second part of your message does not make any sense to me. If the variable is defined as an INTEGER in the remote database then SAS will transfer it to a numeric variable (SAS stores all numbers as 8 byte floating point numbers).

jlh368 · ‎11-28-2017

Hi, Are you able to run the nodes in the other projects or just view them? I had a similar issue and caused by an EM license update that I hadn't applied. Larry

DougWielenga · ‎08-04-2017

Larry, First of all, kudos on reading the documentation! I will confess that I missed that detail in the documentation. As a general rule, altering the posterior probabilities to be centered closer to the population values does not change the sort order of the observations. Also, the probability estimated by the model would likely be optimistic even if the data set was not oversampled since the model is typically optimized on the data used to build/validate it. As a result, the best assessment of model performance comes from putting the model into use. SAS Model Manager is a product designed to monitor model performance over time and can perform retraining when the performance declines. Since models do not tend to perform as well in practice as they have on the training/validation data (e.g. because time has passed, market penetration has changed, economic pressures might be different, etc...), I would have had no issue in assigning prior probabilities and decision weights in the Input Data Source node and then including the Ensemble node later. The probabilities themselves are not as much of a concern to me as the sort order of the resulting scored data. I have talked with one customer for whom the predicted probabilities themselves were quite important, but it is important to note that each observation in the data will either have the event or not in a binary target scenario. Probability only makes sense when looking at subgroups of observations. Since the adjustment for priors really impacts where the probabilities are centered, it is possible that some groups might represent resulting probabilities higher than the adjustment suggested while other groups have probabilities that are lower. The Decisions node allows you to assign weights which can then be multiplied by the probability of each event to determine which outcome is the most profitable (or least costly). In the end, these calcuations are attempting to represent possible business goals. I always recommend a more direct approach, however, where you set up the priors and decisions weights in the Input Data Source so that they are available to the modeling nodes but then focus on the sort order of the results paying less attention to the computed probabilities or the 'Decision' unless the decision weights completely represent the business objective. When all is said and done, your mileage might differ in which case you might consider trying both approaches -- one specifying decision weights and priors in the Input Data Source node and the other not specifying them at all prior to modeling -- and then choose the approach which seems to perform best on your data. I am doubtful that going through the extra work of setting up a Decisions node after each node which the documentation could be interpreted to suggest will be as good a use of your time as investigating more models. If you are intent on getting probabilities that have been adjusted overall to be more like the population, using the Decisions node after the modeling node is the only way to do that. Let me know what you think. Cordially, Doug

jlh368 · ‎08-02-2017

I took a deeper dive into the example listed above and I realize there are many inputs that affect the score percentages. The change I had questioned below, the scoring percentages being closer to the original data set percentages, was the effect of the sample proportion. I adjusted the data partition percentages from Train/validate 50/50 to 70/30 and noticed the change in the model. This change, in turn, affected the scoring proportions. I also did see the updated prior probabilities in the SAS score code node. In short, it was doing what it was supposed to do, and I learned a bit. Any suggestions on topics to follow up on from here?

jlh368 · ‎07-12-2017

Thank you!

Online Status	Offline
Date Last Visited	‎02-21-2024 12:48 PM

Re: SQL Passthrough creates character vars of length 32767

SQL Passthrough creates character vars of length 32767

Re: unknown error when opening new project - Sas enterprise miner

Re: Oversample and Score classification example

Re: Tip: How to model a rare target using an oversample approach in S...

SAS Enterprise Miner 14.1 Ensemble model with Decision Node

Oversample and Score classification example

Re: Enterprise miner - WARNING: Physical file does not exist

Enterprise miner - WARNING: Physical file does not exist

Tip: How to Apply Path Analysis in SAS® Enterprise Miner™ to Gain Insi...

Tip: How to model a rare target using an oversample approach in SAS® ...

10 SAS Enterprise Miner shortcuts you’ll want to keep handy

Re: ensemble models

Re: SQL Passthrough creates character vars of length 32767

Re: unknown error when opening new project - Sas enterprise miner

Re: SAS Enterprise Miner 14.1 Ensemble model with Decision Node

Re: Oversample and Score classification example

Re: Enterprise miner - WARNING: Physical file does not exist