BookmarkSubscribeRSS Feed

Tip: Defining Global Metadata for SAS® Enterprise Miner™ Projects

Started ‎05-15-2015 by
Modified ‎10-06-2015 by
Views 4,057

SAS Enterprise Miner relies heavily on column metadata, information about your variables, in its data mining process.  Metadata can be created automatically by SAS Enterprise Miner, using the Enterprise Miner Advisor in the Data Source Wizard to assign variable attributes such as role and level.  You have the opportunity to override these automatic settings for the column metadata by making changes in the column metadata table that is displayed in the Wizard, or by including SAS DATA step code for defining the metadata attributes in the Column Metadata step, such as:

 

/* Use the following code to change the variable role */

if level="INTERVAL" then role="INPUT";

 

But what if other projects or users need access to this data source with the same column metadata?  Or you want to apply the same column metadata to variables with the same name but in different data sets?  Here are three ways you can define global metadata that can be used across projects, users, and/or data sets:

 

  • If you want to be able to share the data source and its column metadata for a particular data set across projects or users, use the %EMDS macro to create a global data source definition.
  • If you want attributes such as role and level to be set in various data sources for variables with a certain name, define a columnmeta data set in the EMGMETA library.
  • If you want attributes such as role and level to be set in various data sources for all variables with a certain prefix, suffix, or string in their name, create a metacode.sas file in the folder corresponding to the EMGMETA library.

 

We will look at each of these in detail.

 

Use the %EMDS macro to create a global data source definition

 

When you use the Data Source Wizard to create a data source, the data source is only available to the current project you are in.  Alternatively, you can run the %EMDS macro to create a data source definition on data saved to a shared SAS Enterprise Miner Global Data Sources (EMGDS) library so that other users and projects can also access this data source.  The following example code shows how you can run the %EMDS macro to create the data source definition, then use DATA step code to modify the metadata, similar to the DATA step code you would use in the Data Source Wizard.

 

Note that this code must be run in a SAS session outside of SAS Enterprise Miner and before opening SAS Enterprise Miner.

 

libname EMGDS '/* put the path to your shared global data sources folder here */';

 

 

/* Create your own copy of SASHELP.CARS in your EMGDS library */

data EMGDS.MyCars;

  set sashelp.cars;

run;

 

 

/* Initialize the EMDS macro */

filename code catalog "sashelp.emutil.em_loadutilmacros.source";

%include code;

 

 

/* create data source and initial column metadata table table, only assigning target variable */

%emds(data=emgds.MyCars,target=MPG_City) ;

 

 

/* update the column metadata table, which has the _cm suffix, with other changes */

data emgds.MyCars_cm;

  set emgds.MyCars_cm;

  /* Any variable with the MPG_ prefix that is not the target will be rejected*/

  if substr(NAME, 1,4)="MPG_" and ROLE ne "TARGET" then ROLE="REJECTED";

run;

 

 

/* cleanup - deassign filerefs */

filename code;

 

After running this code in your SAS session, open SAS Enterprise Miner and include the same LIBNAME statement for the EMGDS library in your Project Start Code and run.

 

libname EMGDS '/* put the path to your shared global data sources folder here */';

 

Now you will see MyCars listed in the Data Sources folder of the Project Panel.  Any other users and/or projects using this same EMGDS library will pick up this data source as well.  Any modifications you make to the metadata within the SAS Enterprise Miner project will be local to this project (saved to the SAS Enterprise Miner Local Data Source [EMLDS] library that is unique to your current project) and will not affect other users or projects.

 

Define a columnmeta data set in the EMGMETA library

 

You might have several data sets that contain the same variables, and you want these variables to have the same metadata attributes across data sets.  You can accomplish this by defining an Enterprise Miner Global Metadata (EMGMETA) library to store your global metadata definitions.  In this library, you can create global column metadata to share between users and projects.  The following example code shows how you can create a columnmeta data set in your EMGMETA library folder to set metadata attributes for certain variables, in this case the variable AGE is assigned a role of REJECTED and a level of INTERVAL.

 

Again, the code below must be run in a SAS session outside of SAS Enterprise Miner.

 

libname EMGMETA '/* put the path to your shared global metadata folder here */';

 

data emgmeta.columnmeta;

  length name $32 level role $8;

  name="age"; role="REJECTED"; level="INTERVAL"; output;

  name="Age"; role="REJECTED"; level="INTERVAL"; output;

  name="AGE"; role="REJECTED"; level="INTERVAL"; output;

run;

 

After running this code in your SAS session, open SAS Enterprise Miner and include the same LIBNAME statement for the EMGMETA library in your Project Start Code and run.

 

libname EMGMETA '/* put the path to your shared global metadata folder here */';

 

Now the variable AGE (with various capitalizations) is assigned the role of REJECTED and the level of INTERVAL whenever you create a new data source that contains this variable. Any modifications you make to the metadata within the SAS Enterprise Miner project will be local to this project (saved to the SAS Enterprise Miner Local Metadata [EMLMETA] library that is unique to your current project) and will not affect other users or projects.

 

Create a metacode.sas file in the folder corresponding to the EMGMETA library

 

A new feature in SAS Enterprise Miner 13.2 offers even more flexibility for defining global metadata. Suppose you have variables whose names all have a certain prefix or contain a certain string, and you want to set metadata attributes for these variables without having to list them all individually.  You can do this with SAS code in the Column Metadata step of the Data Source Wizard or as done in the DATA step for emgds.MyCars_cm in the example above for the %EMDS macro when defining a data source.  However, if you have multiple data sets that contain these variables, it can be burdensome to have to include that code each time a new data source is created.  As an alternative, you can put this code into a metacode.sas file in the folder that maps to your EMGMETA library so it is automatically applied to all data sources that are created.

 

As above, you need to define the EMGMETA library in your SAS Enterprise Miner Project Start Code by including the following LIBNAME statement and running the code:

 

 

  libname EMGMETA '/* put the path to your shared global metadata folder here */';

 

 

Next, you need to create a metacode.sas file in the folder corresponding to the EMGMETA library that contains the DATA step statements you want to execute, similar to the code you would include in the Data Source Wizard for defining column metadata.  For example:

 

      if index(upcase(name), 'DE')=1 then ROLE='REJECTED';

 

Now when you create a new data source, all variables with the prefix “DE” in their name will be assigned a role of REJECTED.  Any other project or user that creates an EMGMETA library pointing to the same folder will also pick up this column metadata when defining a data source.

 

Note that you can also place the metacode.sas file in the Enterprise Miner Local Metadata (EMLMETA) folder of your project if you want this column metadata at an individual project level instead of being shared across projects.

 

The three methods for defining global column metadata discussed here provide you with an easy way to create metadata definitions that can be used in data sources across projects and users, and even across different data sets that have variables with the same name.  The benefit of these methods is that you only need to create the data source or the metadata definition one time, and it will automatically be accessible to all other projects and users that have the same global data source or global metadata library defined.  Additionally, the ability to use DATA step code with IF statements gives you a powerful way to set column attributes for many variables at once.

Comments
This helps a lot when customers have long list of predefined column features that they want to force into EM, from the sources such as Hive meta data store. One thing we have found that we should be careful about is the scope. Some are still getting used to what the word GLOBAL means here. But so far more benefits than harms. Thanks Jason Xin
Version history
Last update:
‎10-06-2015 11:17 AM
Updated by:
Contributors

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags