BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Rodgers_125
Obsidian | Level 7

Hi experts,


I want to apply in a Big Data Project some Data Mining Techniques with SAS.

I’m planning my methodology (a gantt project) and I have some doubts to ask because I don’t want to “kill” SAS Machine whit a big amount of data to analyze:


1) Is a good choice divide the data to 3 data sets (training, tests and validation) on Big Data Tool? I usually do SAS Enterprise Miner to target data.

2) Choose only a data set of my big amount of data and then store it into SAS Files to use SAS Miner to create this 3 data sets.

What is the best option?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
RadhikhaMyneni
SAS Employee

You could use the HPA (High-Performance Analytics) nodes in Enterprise Miner for your data (80GB). This also requires that you have cluster/group of machines or MPP (Massive Parallel Processing) setup so the data can be distributed across them to perform modeling computations -- similar to what you are planning to do manually. To use HPA in MPP setup in EM, you will need SAS High-Performance Data Mining License. Here is tip that introduces HPA and other SAS products that handle large data: SAS High-Perfo​rmance Analytics tip #1: How it differs from SAS Grid & SAS In-Memory Analytics

 

If you want additional details about HPA in Enterprise Miner, continue reading subsequent tips in this series:

SAS High-Perfo​rmance Analytics tip #2: HPDM nodes in SAS Enterprise Miner

SAS High-Perfo​rmance Analytics tip #3: Example flow diagram in SAS Enterprise Miner

SAS High-Perfo​rmance Analytics tip #4: Scoring with SAS Enterprise Miner

SAS High-Perfo​rmance Analytics tip #5: Scoring with Analytic Store files

 

Hope this helps!

View solution in original post

5 REPLIES 5
Reeza
Super User

How 'big' is your data?

 

The partitioning of datasets has nothing to do with data size, it's a methodological consideration. 

Rodgers_125
Obsidian | Level 7

Like 800 GB.

Yes, but I'm afraid about put all the data into SAS Miner.

Reeza
Super User

At the end of the day it will depend on your setup. 

 

My my guess is that's going to be too big 😩

Rodgers_125
Obsidian | Level 7

I guess I've to do some segmentation on Big Data Tool before I load the Data Sets into SAS. If I create some rules to create some clusters with a smaller amount of data using the Big Data tool to do that segmentation, then I can use SAS Miner. But in this case, I will have multiple diagrams in SAS Miner... 😞

RadhikhaMyneni
SAS Employee

You could use the HPA (High-Performance Analytics) nodes in Enterprise Miner for your data (80GB). This also requires that you have cluster/group of machines or MPP (Massive Parallel Processing) setup so the data can be distributed across them to perform modeling computations -- similar to what you are planning to do manually. To use HPA in MPP setup in EM, you will need SAS High-Performance Data Mining License. Here is tip that introduces HPA and other SAS products that handle large data: SAS High-Perfo​rmance Analytics tip #1: How it differs from SAS Grid & SAS In-Memory Analytics

 

If you want additional details about HPA in Enterprise Miner, continue reading subsequent tips in this series:

SAS High-Perfo​rmance Analytics tip #2: HPDM nodes in SAS Enterprise Miner

SAS High-Perfo​rmance Analytics tip #3: Example flow diagram in SAS Enterprise Miner

SAS High-Perfo​rmance Analytics tip #4: Scoring with SAS Enterprise Miner

SAS High-Perfo​rmance Analytics tip #5: Scoring with Analytic Store files

 

Hope this helps!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to choose a machine learning algorithm

Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1418 views
  • 0 likes
  • 3 in conversation