I am new to the SAS EM. I have a dataset contains over 3 million rows and around 100 variables. I would like to explore the distribution of one of these 100 variables. But each time I run a histogram, it will give me a sample distribution of the whole dataset, which means it only fetches several thousand data points in the dataset. Is there any property I should change in order to get a full picutre of my dataset?
Eminer in enterprise miner configuration is running behind the curtains a workspace-server.
You should also have EGuide and with that accessing the same workspace-server.
With that you can use the basic SAS coding like means/univariate. Eminer is based on the Semma approach.
There should be no limitations on that part of the workspaceserver. (usermod files maintained by your sas admin).
If there are limitations / restrictions that is not good as you have to go into discussions to get your work done.
If you look at Eminer you will find a code an log tab. That is, you can code old classic SAS-code while you are running Eminer.
Eminer has a lot of documentation that is in the on-line help of the product.: (this is 13.1 manual)
Setting the Sample Properties
Before creating graphs, you should sample the data set. Sampling reduces the processing time that is required to create the graphs and is especially important if you are creating graphs from a large data set. on page 387 documentation.
...
If you want to specify a custom fetch size (such as 50,000 observations) to be used in Explore windows during an Enterprise Miner session, you can use the EM_EXPLOREOBS_MAX macro variable to submit a statement via Program Manager or your start file:
%let EM_EXPLOREOBS_MAX=50000;
Do you want to consult a sas platform admin? Check my profile (linkedin).
The site platfomadmin.com is owned by Paul Homes, he is running metacoda.
There are parameters in the left side to adjust all kind of options.
Are you comfortable with enterprise miner, or is this a first attempt usage?
When it is the fist time please look at some demo-s and documented examples.
SAS Enterprise Miner (doc) SAS Tutorials | SAS Training (miner video-s)
SAS Talks (full screen samples)
There are several ways to get a bigger sample for exploration.
A short way to do it: on the main menu go to Options, then to Preferences (or you can press Ctrl+Shit+O). There is a menu called Interactive Sampling. Specify Sample Method as Random and Fetch Size as Max.
This will make all your explorations be based on a bigger sample. Notice that sample is only for visualization purposes, any model that you build will be based on your entire dataset, except for a few HP model nodes, when they are not running on a distributed environment.
As Jaap recommends, one of the best ways to get started is to take a look at documents like Getting Started with SAS Enterprise Miner 13.1. Thanks for that link!
Good luck!
Miguel
Jaap, thank you for the links. I will be sure to spend some time studying them.
Miguel, I changed the Preferences specifying Sample Method as Random and Fetch Size as Max. Now the fetched rows becomes 5,000. It is still far smaller comparing the entire dataset size 3M. I totally uderstand the models will be based on the full dataset instead of the samples. But I am sitll curious that if there is a way to explore the entire dataset. Or is it just mission impossible to achieve it in EM? Our EM is a Unix version, not a single machine version. Will some configurations have to be done involving SAS administrator? Thank you!
Lychee,
Try this. Click on your data node (we often call this IDS for Input Data Source). Then on the menu on the left, click on the ellipsis for Variables, under the Columns submenu.
Once there, select a few variables and click Explore.
In my example I selected the target (late) and two other variables (ActualEllapsedTime and Airtime).
The below window opens. Then change sample method to Random and Fetch Size to max. You should get a very high number of observations picked.
For my dataset with I got 29785 fetched rows. If this is still not enough for your exploration purposes, you can learn how to use proc means or proc univariate through the SAS Code node. It seems to me that the interactive explore mode in Enterprise Miner can only work with so many variables.
I hope it helps,
Thanks,
Miguel
MiguelMaldonado, What is the entire size of your dataset when you said you got 29,785 fetched rows?
This dataset has a little bit more than 37 million rows. It is a very state-of-the-art machine, with High-Performance Data Mining enabled, sorry, not trying to be a bragger here :smileyblush:.
HP Partition node log:
Stratification Number of Training Validation
Variable observations Observations Observations
0 30044537 21031915 9012622
1 7015680 4911730 2103950
What Jaap means, is that you can submit a project start code in a menu like the below. Click on the name of your project, then on the ellipsis for Project Start Code. I could not override the 29K fetched rows though... not sure what I am missing.
Bear with me a couple hours and I will send you an example of how to run proc univariate or proc means in a SAS Code Node.
Later,
Miguel
Eminer in enterprise miner configuration is running behind the curtains a workspace-server.
You should also have EGuide and with that accessing the same workspace-server.
With that you can use the basic SAS coding like means/univariate. Eminer is based on the Semma approach.
There should be no limitations on that part of the workspaceserver. (usermod files maintained by your sas admin).
If there are limitations / restrictions that is not good as you have to go into discussions to get your work done.
If you look at Eminer you will find a code an log tab. That is, you can code old classic SAS-code while you are running Eminer.
Eminer has a lot of documentation that is in the on-line help of the product.: (this is 13.1 manual)
Setting the Sample Properties
Before creating graphs, you should sample the data set. Sampling reduces the processing time that is required to create the graphs and is especially important if you are creating graphs from a large data set. on page 387 documentation.
...
If you want to specify a custom fetch size (such as 50,000 observations) to be used in Explore windows during an Enterprise Miner session, you can use the EM_EXPLOREOBS_MAX macro variable to submit a statement via Program Manager or your start file:
%let EM_EXPLOREOBS_MAX=50000;
Do you want to consult a sas platform admin? Check my profile (linkedin).
The site platfomadmin.com is owned by Paul Homes, he is running metacoda.
Jaap, I am a one week SAS EM user. Could you clarify several terms mentiond above - Does Eminer mean Enterprise Miner? Is EGuide equal to Enterprise Guide? When you mentioned to use a macro variable to submit via Program Manager or Start file, Is this work I should do or is this the configuration our SAS administrator should do in the server side?
Yes I did the abbreviations:
- Enterprise Guide -> Eguide
- Enterprise Miner -> Eminer
The answer how to override EMiner defaults by use of SAS-macro-s Miguel has answered.
It are your settings you can decide on and your code options to manage.
The Eminer project is build up in a rather complex folder structure on the OS.
All steps nodes are creating datasets (SAS libraries) sometimes other types (logs/output) and in rare events SAS catalogs.
Imagine what happens when a sample is made from another dataset. It really makes copy of the data.
In a environment like Miguels that can be better optimized, that is shifting the feeling of sizing and numbers.
This is my first time to come to this site. I can't believe that I learned so much. Thank you very much, Jaap.
Lychee,
I've been in your shoes. I taught myself Enterprise Miner back in the day, and I can guarantee that the one doc that will help you dominate the learning curve is SAS Enterprise Miner 13.1: Reference Help (It is in Jaap's links from earlier today)... Put some time on it, and it will really pay off.
If you don't have a copy yet, talk to your SAS rep ASAP. You really need this book...
An extract of that book below to clarify the Project Start Code.
Good luck!
Miguel
By setting %let EM_EXPLOREOBS_MAX=4000000, I am able to pull the entire dataset into the Explore window. Thank you so so so... much!!!!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Use this tutorial as a handy guide to weigh the pros and cons of these commonly used machine learning algorithms.
Find more tutorials on the SAS Users YouTube channel.