Solved: Doubts in SAS Model Studio/Data Mining and Machine Learning

Anurag12 · Posted 03-20-2021 07:01 AM

Hello all,

I had a few questions from SAS Viya (Model Studio) hoping that i would get a answer here.

1) Can we download score code from Open Source Code in Model Studio and whether yes/no what are the things that can be done with the same?

2) Which node is the best for removing the potential outliers from the dataset(Imputation/Filtering/Replacement) and if any of the option is true then why?

3) What happens when we apply 'Early Stopping criteria' in Gradient Boosting Node?

4) What will happen if we score a holdout dataset in Pipleline Comaprison Tab of Model Studio?

5) How do we create missing value indicators in Imputation Node?

Thank you in advance!

chmedi · Posted 03-24-2021 11:55 AM

Hello, in response to your questions:

No, downloading score code cannot be done in Viya 3.5 or 4. But in Viya 4, you can register a model (to Model Manager) in Open Source Code node if the pipeline has only Python score code.
The Replacement node is used to identify outlier observations using several methods (Std Deviations from the Mean, Absolute Deviations from the Median, Extreme percentiles) and then replace those outlier values with the calculated limit value, or missing. The Anomaly Detection node is also used to identify anomalies (outliers), but this node by default removes those observations in the pipeline for training.
From the documentation: Early stopping takes advantage of the fact that boosting is an iterative process. This means that the prediction error can be measured on a validation data set at each iteration of the process. When the prediction error on the validation data meets specified criterion, the gradient boosting process stops, yielding a model that is less overtrained than if the boosting process were allowed to continue until completion. https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=v_010&docsetId=casml&docsetTarget=casml...
You are prompted to select a CAS table, and the Champion model is used to score that table. Those scoring results are then included in the pipeline comparison.
In the properties for the Imputation node there is the Indicators section, where you can choose to generate a single indicator and/or unique indicators. The single indicator is a count of the number of inputs that are missing for an observation. For unique indicators, a binary missing indicator is generated for each input variable.

View solution in original post

chmedi · Posted 03-24-2021 11:55 AM

Hello, in response to your questions:

No, downloading score code cannot be done in Viya 3.5 or 4. But in Viya 4, you can register a model (to Model Manager) in Open Source Code node if the pipeline has only Python score code.
The Replacement node is used to identify outlier observations using several methods (Std Deviations from the Mean, Absolute Deviations from the Median, Extreme percentiles) and then replace those outlier values with the calculated limit value, or missing. The Anomaly Detection node is also used to identify anomalies (outliers), but this node by default removes those observations in the pipeline for training.
From the documentation: Early stopping takes advantage of the fact that boosting is an iterative process. This means that the prediction error can be measured on a validation data set at each iteration of the process. When the prediction error on the validation data meets specified criterion, the gradient boosting process stops, yielding a model that is less overtrained than if the boosting process were allowed to continue until completion. https://go.documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=v_010&docsetId=casml&docsetTarget=casml...
You are prompted to select a CAS table, and the Champion model is used to score that table. Those scoring results are then included in the pipeline comparison.
In the properties for the Imputation node there is the Indicators section, where you can choose to generate a single indicator and/or unique indicators. The single indicator is a count of the number of inputs that are missing for an observation. For unique indicators, a binary missing indicator is generated for each input variable.

Anurag12 · Posted 03-27-2021 03:16 PM

Thanks for the reply..

Doubts in SAS Model Studio/Data Mining and Machine Learning

Re: Doubts in SAS Model Studio/Data Mining and Machine Learning

Re: Doubts in SAS Model Studio/Data Mining and Machine Learning

Re: Doubts in SAS Model Studio/Data Mining and Machine Learning

Doubts in SAS Model Studio/Data Mining and Machine Learning

Re: Doubts in SAS Model Studio/Data Mining and Machine Learning

Re: Doubts in SAS Model Studio/Data Mining and Machine Learning

Re: Doubts in SAS Model Studio/Data Mining and Machine Learning

Ready to join fellow brilliant minds for the SAS Hackathon?