BookmarkSubscribeRSS Feed
Pregel
Calcite | Level 5

Hello all, 

 

I have a score from a binary classification model (probability of some event of interest). This score was generated by an external provider, I don't have access to the underlying data or model logic, but I have seen the model metrics which indicate it is strong. 

 

I want to augment this prediction with additional data attributes (that was not available to the external provider). There are 3 ways of doing this: 

1) Use the probability score as a direct input into a new model which also consumes the additional data (leads to instability)

2) Create a new model using the additional data and ensemble the probability score of this model with the external model (has the potential to hurt the overall prediction)

3) Use an incremental learning approach (similar to boosting), set the probability score of the external model as the baseline and then incrementally increase the fit using the additional data 

 

Incremental learning is possible using python and the xgboost package , keen to understand if its possible using either SAS EM or base code? Has anyone attempted anything like this? 

 

Many thanks,

Shane 

 

7 REPLIES 7
Reeza
Super User
Why isn't rebuilding a model from scratch an option and using the old scoring as a baseline?
Pregel
Calcite | Level 5

Good question, training a new model on the additional data attributes is an option, however, the external model was trained on a data source that is very different to source of the additional data attributes. Given these diverse data sources our hypothesis is that the figurative sum of both models would be better than any of the individual models - hence the desire to combine them. 

Reeza
Super User
You have the scoring rules/criteria though, ie you can apply it to a new data set?
Pregel
Calcite | Level 5

No I don't have the logic and the data sets are mutually exclusive. 

Reeza
Super User
Well if you don't have the logic/scoring code you can't use it in the future anyways then to build future models/predictions and score it so what value would it add to your future process then?
Pregel
Calcite | Level 5
I won’t have the scoring logic or underlying data but I will receive the score on a monthly subscription basis, hence the need for an incremental learning approach.
Reeza
Super User

I would use that as a variable in my model but I’m also biased against using outsourced models where it’s a black box. But I’m primarily in health care where the stakes are people’s lives. I’d definitely be building my own models and almost use theirs as the predictor to see if I could reverse engineer it with my data and then build it out. Assuming that’s not against your terms of service. 


@Pregel wrote:
I won’t have the scoring logic or underlying data but I will receive the score on a monthly subscription basis, hence the need for an incremental learning approach.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 7 replies
  • 1217 views
  • 0 likes
  • 2 in conversation