SAS Enterprise Miner shortcut: Use the Advanced option when profiling data

by Community Manager 3 weeks ago (1,212 Views)

Thank you to Philip Easterling , SAS Principal Systems Engineer, for contributing this tip as part of the SAS Enterprise Minor shortcut series. In his words: 


Here’s a simple one which is sometimes overlooked because it’s not the SAS Enterprise Miner (EM) default. 


When you go through the “Create Data Source” wizard, you encounter the properties screen, which has two options (Basic and Advanced).  Basic is the default, and, from my observations, most people just go with the default Basic option.  However, I always switch to the Advanced option because it provides a convenient way to profile the data.


After completing the “Create Data Source” wizard steps and incorporating this data as an Input Data Source node in an EM diagram, I then select the “Edit Variables” option for the node. When the variable list opens in a window, I then check the “Statistics” box at the top right. This provides a lot of information about both the categorical and numeric variables, including the percent of rows with missing values for each variable. 


You can click on a column heading (a statistic) to sort by ascending or descending values of that statistic. So, I can sort the missing values column to get an idea of which of my variables have a large percentage of missing values. Similarly, I can “profile” any other variable by looking at its values for any of the reported statistics. This minimizes the need to switch to something like SAS Enterprise Guide to profile the data.


You can also profile the data during the “Create Data Source” process by checking the “Statistics” box at the top right of the variable list screen, since this variables list screen is the next one after the Basic/Advanced property selection step.  If you go with the “Basic” default option, the “Statistics” box selection is always grayed out and unavailable for selection in EM when choosing to “Edit Variables,” causing you to miss out on this convenient data profiling method.



