Hello,
I am using SAS Viya Model Studio ML and DM for educational purposes. I use a data set that i partition 70% training, 30% validation. I assume that if i do the project from scratch, every time the software chooses different data sets for training and validation randomly so the rsults of e.g. the Decision tree will every time be slightly different. Is that right?
If yes is there a way to select a seed so every time i create the project the training - validation sets will be the same?
One solution that i have found is to set a partition binary variable in a data set so every time the sets will be the same but i was wondering whether i can do this whithout the extra variable via a seed. The seed was the case in SAS EM.
Thanks in advance,
Andreas
Hi Andreas,
Ok i spoke with a colleague in R&D and they confirmed the seed for the partitioning of data within Model Studio 8.5 is fixed value. It is the same value for each project you create and for each run of the data node.
You should be able to verify this by looking at summary statistics for each of the partitioned tables. They should be the same. If you are seeing behaviour which suggests that the partitioning of data is not consistent, then please contact Technical Support and provide some examples.
If you are seeing slightly different results for each run of the model, then perhaps the algorithms that underpin each modelling technique may have seed/starting values that are chosen at random or can be specified by the user. The VDMML documentation may help. https://go.documentation.sas.com/?docsetId=casml&docsetTarget=casml_whatsnew_sect003.htm&docsetVersi...
You are correct that there is no option for the user within Model Studio GUI to set the seed value. Your feedback has been passed on to R&D.
Cheers, Simon
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
You may also find this recently published article helpful: https://communities.sas.com/t5/SAS-Communities-Library/SAS-Model-Studio-8-5-projects-and-considerati...
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Hello Simon,
Thanks for your answer!
So do you agree that if i do the project from scratch, every time the software chooses different data sets for training and validation randomly so the rsults of e.g. the Decision tree will every time be slightly different?
If yes is there a way to select a seed so every time i create the project the training - validation sets will be the same by using the Model Studio GUI?
Hi Andreas,
Ok i spoke with a colleague in R&D and they confirmed the seed for the partitioning of data within Model Studio 8.5 is fixed value. It is the same value for each project you create and for each run of the data node.
You should be able to verify this by looking at summary statistics for each of the partitioned tables. They should be the same. If you are seeing behaviour which suggests that the partitioning of data is not consistent, then please contact Technical Support and provide some examples.
If you are seeing slightly different results for each run of the model, then perhaps the algorithms that underpin each modelling technique may have seed/starting values that are chosen at random or can be specified by the user. The VDMML documentation may help. https://go.documentation.sas.com/?docsetId=casml&docsetTarget=casml_whatsnew_sect003.htm&docsetVersi...
You are correct that there is no option for the user within Model Studio GUI to set the seed value. Your feedback has been passed on to R&D.
Cheers, Simon
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Hi Andreas,
I received an update from my colleagues on this topic.
In essence, once you have created a Model Studio project which uses data 'x', everytime the data node is run, your partitions will remain the same.
However, if you create multiple projects which use the same set of data 'x', the partitions will look different across the projects.
If you are teaching students and each student has their own project and you really want to them to have identical partitions for data 'x', then use the program method i outline in the communities article to create identical partitions by having each student run the proc partition with the same seed.
Sorry for any confusion, and as mentioned before we've provided feedback for Model Studio users to be able to set the seed in the Model Studio GUI.
Thanks, Simon
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Hi SImon!
Thanks for the update.
ANother good idea for passing to the R&D is that a seed is available for the event based sampling facility. I think now every time you create a project it samples the events and the non events with a new seed so the results won;t be the same.
Thanks,
Andreas
Hi Andreas,
I will add your feedback regarding the seed for event based sampling back to R&D.
Cheers, Simon
Register today and join us virtually on June 16!
sasglobalforum.com | #SASGF
View now: on-demand content for SAS users
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.