BookmarkSubscribeRSS Feed
andrea_magatti
Obsidian | Level 7

Hi, community,

I want to explain a couple of situations where Viya (3.5) behaves differently from SAS 9.4.

The cases:

  1. dataSciencePilot.featureMachine
  2. Proc partition vs. Proc surveyselect

In the first case, I encountered a strange behavior with a simple dataset of 15k obs and around 120 features.

I added some date vars to the specific input list on a machine with 512GB ram and 80 cores on the first run.

To my surprise, the actionset has used all the available ram and the swap, causing the cas process to be killed by OS (Redhat). After that, I realized that the date vars were not helpful for the model I was going to build, so these vars have been dropped.

With that change, the process took 30 seconds to complete, so I assume that the distribution of the dates determines some issues. My question is: why Viya hasn't provided any warning in the log?

 

Second case

While sampling a dataset needed for the TSNE analysis, I first used the surveyselect procedure for a stratified sampling approach. By mistake, I added the logical key of the dataset to the BY group.

The procedure log reported (correctly):

ERROR: The number of strata, 14551, is greater than the total sample size, 1456.

That's fine! I recognized my mistake, and once corrected, I got the results I needed.

Then I tried the same (erroneous) approach with the partition procedure, obtaining the same result I encountered with the dataSciencePilot.featureMachine: The actionSet has consumed both the RAM and the SWAP filesystem without warnings in the log.

Could you explain this behavior?

I appreciate any help you can provide.

 

 

 

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 0 replies
  • 418 views
  • 0 likes
  • 1 in conversation