Watch this Ask the Expert session to learn the key benefits of using SAS® Visual Machine Learning and how SAS supports your migration to this platform.
Watch the Webinar
You will learn:
How building machine learning pipelines and finding good models through Model Studio is easy.
About SAS tools that are well-integrated covering the full analytics life cycle.
How intelligent automation combined with human oversight provides robust decision making.
Which tools can support your transition from SAS® Enterprise Miner™ to SAS Visual Machine Learning.
The questions from the Q&A segment held at the end of the webinar are listed below and the slides from the webinar are attached.
Q&A
How big is the dataset you are working with?
The data is very small. Maybe ~5-10 MB. It comes from an old training course. We used this data to relate to users of SAS Enterprise Miner.
Is there a limit on the number of observations used in the open source modeling nodes?
There is no strictly enforced limit. The node will default to sampling it down to 10k observations. But that is just an option, and you can tell it to not sample.
Looking at the decision tree (and other classification objective functions), is there a method to configure a profit loss for the misclassification? i.e., False positives are very bad where other outcomes are not as negative for the particular use case.
We are working on adding support for defining a profit or cost matrix in the project to consider in the assessment.
On that overview slide showing the analytics life cycle, is all of that included in SAS Visual Machine Learning or do you have to buy multiple products to get all of that?
Everything you saw there comes as part of SAS Visual Machine Learning except for decisioning where you can pull in different analytics and models from different domains. That is an add-on piece. Now, we do offer these as a package. That would be the next level up, SAS Data Science Decisioning.
For open source code nodes, does Python need to be installed on CAS controller? CAS workers? SPRE?
Yes, we’re running Python and it must be installed somewhere. Where does it need to be? As it's implemented right now, when that node runs, it's running on whatever is dedicated as the SAS compute server, or essentially what you're referring to as three. This has the SAS runtime and SAS programming runtime environment so that Python must either be installed there, or at least accessible from there, maybe through a map network drive, because we're essentially issuing the Python command with the code that you provided as the script to run. So, the Python command must be able to run on that machine.
Follow up question on open source nodes. Is it possible to use different Python environments to ensure package versioning when necessary?
Yes, there's a way to select a different set of environments that have been defined as accessible Python environments that can be used that are accessible from the SAS Visual Machine Learning environment and so you can do exactly that. We've had a lot of customer requests for that based on that very reasoning that different versions are needed, or they want to experiment with different versions and things like that. You can find it in the properties on the right.
We sometimes have to leverage datasets that may range up to 30 million records. Can this handle it nimbly and how long will it take to run?
The environment overall can certainly handle data sets up to millions and millions of records. When you talk about how nimbly this can handle it, some of that's going to come down to each algorithm when you perform the model training; whether that algorithm is designed for that number of records or not, and so it's on a case-by-case basis. The answer really depends on what exactly you're doing. But as far as loading it into the environment, visualizing that and exploring it, that's no problem with that volume of data, it's more about the algorithm training.
Most of our data is in Oracle. Is there a suggested method of connection for optimization?
We have connectors that have been written for all the different types of data sources, including Oracle and written in a way that optimizes any of the data transfer needed. I believe there are different options to set with that connector.
To use SAS Visual Machine Learning environment used in the demo, what do I need to have on my machine to try this on my own data?
We have a trials environment that's hosted at SAS, which is a great place to try this. Try SAS Viya.
Are there any YouTube videos with interactive examples?
We have an excellent SAS users YouTube channel: https://www.youtube.com/sasusers
You can try SAS Viya for free here: https://www.sas.com/en_us/trials/software/visual-data-science-decisioning/viya-trial-form.html
What about Teradata? Are there methods of direct paththrough to speed up data feed or is it better to have a SAS dataset (development sample) and run the models on the latter?
This is a response from my colleague Brian Kinnebrew:
PROC SQL can run inside Teradata (in-database) using implicit or explicit passthrough using SAS/ACCESS. For more information, see this internal link: SAS Help Center: SQL Pass-Through Facility Specifics for Teradata
Recommended Resources
SAS Visual Machine Learning Support
How Do I Move From SAS Enterprise Miner to SAS Viya?
SAS Visual Data Mining and Machine Learning Programming Guide
Getting Started with SAS Visual Data Mining and Machine Learning in Model Studio
Please see additional resources in the attached slide deck.
Want more tips? Be sure to subscribe to the Ask the Expert board to receive follow up Q&A, slides and recordings from other SAS Ask the Expert webinars.
... View more