BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
munitech4u
Quartz | Level 8

I ran gradient boosting in EM and want to use the attached scoring code to the new dataset. How can I use it for scoring, as its written without setting a dataset and even some variables used are not present.

1 ACCEPTED SOLUTION

Accepted Solutions
WendyCzika
SAS Employee

Just like with all data step score code from EM:

data /* name of data set containing scores */;
  set /* data that you want to score */;
  %inc "/app/sasdata/EBI_ADVANL/EM_Projects/churn/Workspaces/EMWS1/Boost/EMPUBLISHSCORE.sas " /* or paste in the score code */;
run;

View solution in original post

12 REPLIES 12
Reeza
Super User

You can't easily. How are you getting data in without a SET statement?

WendyCzika
SAS Employee

Just like with all data step score code from EM:

data /* name of data set containing scores */;
  set /* data that you want to score */;
  %inc "/app/sasdata/EBI_ADVANL/EM_Projects/churn/Workspaces/EMWS1/Boost/EMPUBLISHSCORE.sas " /* or paste in the score code */;
run;
munitech4u
Quartz | Level 8

But my dataset, does not have some variables, which are there in code. I am not sure, how this code is generated.

Reeza
Super User

If you built a model that requires certain variables and you want to score with the same model you need those variables. If you don't have those variables, then either remove them and rebuild your model, OR change your model so it can handle missing values by including missing values for that variable in the training data.

 

 

munitech4u
Quartz | Level 8
Well, my input dataset does not have some of these variables like those starting with "_". but they are in scoring code. So that is what I am confused about.
Reeza
Super User

Did you build your model?

In the process did you create any new variables? SAS may have automatically named them. SAS may also be creating automatic variables required in your model, for intermediate steps.

 

Your input dataset needs to match the structure of your training data set. Same variables, same names, same types and same levels for categorical data. 

 

My suggestion would be to try and see what happens. 

 

@WendyCzika has shown the correct way to score a new dataset. 

 

munitech4u
Quartz | Level 8
I had applied one oversampling step. But i think that should affect only the target variable and nothing new else
WendyCzika
SAS Employee

I'm guessing the _ variables that you mean, if they are not input variables, are created BY the scoring code - they don't need to be in the data you are scoring, so it should be fine.

munitech4u
Quartz | Level 8
If you look at top of the code, you notice:


********** LEAF 1 NODE 2467 ***************;
IF _ARB_BADF_ EQ 0 THEN DO;

will it not throw the error?
WendyCzika
SAS Employee

It's defined above that:

_ARB_BADF_ = 0;

munitech4u
Quartz | Level 8
Oh, I overlooked that. Seems like I am good to go then. Thanks!
JasonXin
SAS Employee
Hi, When we build EM flows, it actually writes out SAS code in the background behind each (most) node. When a model is built and scoring code is built, it typically retains all the analytically relevant pre-codes leading towards the final scoring equation. For example, if a variable transformation node is involved in the flow, all the transformations are automatically retained including all the renames, what-if... There is also a separate SAS Score Code Node that helps. One may ask: I transformed 800 variables + derived 1000 variables, but only used 12 in the final model. Does the scoring have all of them? No, the scoring code only contains what survives into the final model. If you see score code and optimized score code, make sure you pick the optimized one. Exception to that is: if you insert your custom code by using SAS Code node, they are not automatically copied over. Also this 'way' may not work for some methods like random forest. But I recall gradient boosting is fine. Over the years I have seen EM users going back to transformation node to pick up the code behind the scene, study and improve their SAS programming that way. Although conventions like _ appear a bit odd, coding there is 'best'. Enjoy. Jason Xin

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 3453 views
  • 4 likes
  • 4 in conversation