Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Predictive Modelling using SAS EM

Reply
Contributor
Posts: 41

Predictive Modelling using SAS EM

Hi All,

Currently constructing a predictive modelling data layer.  Just wondering if a layer is allowed to have duplicate payments? i.e. same id, but the payment is binned twice into two differing bins.  Initial thoughts are that payments should be in one bucket or other not both as this would comprise the layer and any subsequent modelling from the layer.

Thanks

Graham Rice

SAS Employee
Posts: 68

Re: Predictive Modelling using SAS EM

Posted in reply to gra_in_aus

Currently binning forces values to be in only 1 bin.  Through fuzzy clustering, you could assign probabilistic barriers so that values can span clusters.

Contributor
Posts: 41

Re: Predictive Modelling using SAS EM

Hi Jonathan,

Thanks for your reply...  I don't think I explained my problem correctly... apologies.  I am in the process of creating a predictive modeling data layer using SAS Base.  I have been asked to bin the same payment into differing descriptive bins.  My concern is that if a payment is binned into two differing bins will it lead the model and possible results to be incorrect due to these duplicate payments.  I this maybe the case.  However, I am not crash hot on predictive modeling using SAS EM.

Thanks again

Graham Rice

SAS Employee
Posts: 122

Re: Predictive Modelling using SAS EM

Posted in reply to gra_in_aus

Graham,

Yes you can definitely code the same payment values into differing binning criteria/schema/rules/cuts/wishes. This is often seen with modelers using BASE. In case of using EM, it is not unusual to see a modeler add SAS Code Node to run his or her custom coding on the same variables, alongside whatever EM is doing with the variables, to compare and test. The logic behind this is: while rule of thumbs or general guidelines often apply, the 'best' cuts/bins often are determined by try and error.

After coding the same payment variables into differing bucket variables, you should, though, expect that they are highly correlated. Depending on specifics, sometimes you select one over the others. Sometimes you build them into PCA or factors. The reality is when the data, in your case the payment data, are typically NOT collected with any analytics in mind. The data just ENTERED into your database. You have to configure it to situate your models. The payment variable is like you foot. Of course you should try different pairs of shoes to decide which one fits the best.

Jason Xin

Ask a Question
Discussion stats
  • 3 replies
  • 546 views
  • 3 likes
  • 3 in conversation