BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Anurag12
Fluorite | Level 6

Hello all,

 

I had a few questions from SAS Viya (Model Studio) hoping that i would get a answer here.

 

1) Can we download score code from Open Source Code in Model Studio and whether yes/no what are the things that can be done with the same?

 

2) Which node is the best for removing the potential outliers from the dataset(Imputation/Filtering/Replacement) and if any of the option is true then why?

 

3) What happens when we apply 'Early Stopping criteria' in Gradient Boosting Node?

 

4) What will happen if we score a holdout dataset in Pipleline Comaprison Tab of Model Studio?

 

5) How do we create missing value indicators in Imputation Node?

 

 

Thank you in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
chmedi
SAS Employee

Hello, in response to your questions:

 

  1.  No, downloading score code cannot be done in Viya 3.5 or 4.  But in Viya 4, you can register a model (to Model Manager) in Open Source Code node if the pipeline has only Python score code.
  2. The Replacement node is used to identify outlier observations using several methods (Std Deviations from the Mean, Absolute Deviations from the Median, Extreme percentiles) and then replace those outlier values with the calculated limit value, or missing.  The Anomaly Detection node is also used to identify anomalies (outliers), but this node by default removes those observations in the pipeline for training.
  3. From the documentation:  Early stopping takes advantage of the fact that boosting is an iterative process. This means that the prediction error can be measured on a validation data set at each iteration of the process. When the prediction error on the validation data meets specified criterion, the gradient boosting process stops, yielding a model that is less overtrained than if the boosting process were allowed to continue until completion.  https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=v_010&docsetId=casml&docsetTarget=casml... 
  4. You are prompted to select a CAS table, and the Champion model is used to score that table.  Those scoring results are then included in the pipeline comparison.
  5. In the properties for the Imputation node there is the Indicators section, where you can choose to generate a single indicator and/or unique indicators.  The single indicator is a count of the number of inputs that are missing for an observation.  For unique indicators, a binary missing indicator is generated for each input variable.

View solution in original post

2 REPLIES 2
chmedi
SAS Employee

Hello, in response to your questions:

 

  1.  No, downloading score code cannot be done in Viya 3.5 or 4.  But in Viya 4, you can register a model (to Model Manager) in Open Source Code node if the pipeline has only Python score code.
  2. The Replacement node is used to identify outlier observations using several methods (Std Deviations from the Mean, Absolute Deviations from the Median, Extreme percentiles) and then replace those outlier values with the calculated limit value, or missing.  The Anomaly Detection node is also used to identify anomalies (outliers), but this node by default removes those observations in the pipeline for training.
  3. From the documentation:  Early stopping takes advantage of the fact that boosting is an iterative process. This means that the prediction error can be measured on a validation data set at each iteration of the process. When the prediction error on the validation data meets specified criterion, the gradient boosting process stops, yielding a model that is less overtrained than if the boosting process were allowed to continue until completion.  https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=v_010&docsetId=casml&docsetTarget=casml... 
  4. You are prompted to select a CAS table, and the Champion model is used to score that table.  Those scoring results are then included in the pipeline comparison.
  5. In the properties for the Imputation node there is the Indicators section, where you can choose to generate a single indicator and/or unique indicators.  The single indicator is a count of the number of inputs that are missing for an observation.  For unique indicators, a binary missing indicator is generated for each input variable.
Anurag12
Fluorite | Level 6
Thanks for the reply..

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 2 replies
  • 1561 views
  • 2 likes
  • 2 in conversation