We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Tip: How to create data sources from imported files and share them across multiple projects

by SAS Employee jakanj on ‎10-23-2014 01:01 PM - edited on ‎01-23-2017 11:19 AM by Community Manager (18,396 Views)

Have you used the File Import node to import external files in Enterprise Miner?  If so, you may have wondered how to create a data source and share it across projects without the manual intervention of copying a table and creating a data source. In this tip we will see how to use the Save Data node to save a SAS data set and use the SAS Enterprise Miner Global Data Sources library to create and share data sources across multiple projects. Let’s explore this topic now.

 

Data

We are going to import the soil.csv file which contains African soil property data into Enterprise Miner, then we will create Soil_Train as a data source. As a prerequisite to the steps in this tip, download the attached file soil.csv to a location you can access on the file system.

 

Import file in Enterprise Miner using File Import node

We are using the File Import node to import soil.csv into Enterprise Miner. You can also import Microsoft Excel, SAS JMP, SPSS, Stata, Tab-Delimited, Paradox and dBASE files using the File Import node. The File Import node is located on the Sample tab of the SAS Enterprise Miner toolbar.

  1. Launch EM then create a project and diagram.
  2. Place the File Import node on the diagram.
  3. Make the following changes in the File Import property panel:
    1. Set Advanced Advisor = Yes to use the Advanced Advisor to configure additional metadata properties.
    2. Click the ellipsis button for Import File to open the File Import window.
      1. Click Browse on the File Import window to navigate to the location of soil.csv file.
      2. Select soil.csv file and click Open button on the Open window.
      3. Select OK on the File Import window to import file.
  4. Run the File Import node.
  5. In the File Import node properties panel, click the ellipsis button in the Variables property to open the Variables window.
  6. Verify that all of the variables from soil.csv file have been imported properly and have been assigned the correct role and level by the advanced advisor.

To use this SAS data set in the different diagram, you have to follow multiple steps. First you have to use the operating system to copy the fimport.sas7bcat file out of the project directory. Then you need to define a SAS library and create a data source manually that can be used in a different diagram. There is an easier way to accomplish this.

 

Save SAS data set using the Save Data node

The Save Data node is located on the Utility tab. Now, drag the Save Data node to the diagram. Connect the File Import node to the Save Data node as shown below.

flow.PNG

The Save Data node can export JMP, Excel 2010, CSV, and tab-delimited files. We are using this node to save SAS data set in the pre-existing SAS library. To accomplish this, first we need to assign library in the project startup code editor.

  1. To save the exported data as a SAS data set, you need to define SAS library. To define the SAS library, follow these steps:

         a.  Make sure that the C:\EMTip\SoilData folder exists on the file system.

         b.  In the EM project panel, select your project.

         c.  In the properties panel, click the ellipsis button in the Project Start Code property to open the Project Start Code window.

         d.  In the Project Start Code window, enter SoilData ‘C:\EMTip\SoilData’;. Click Run Now. This creates the library SoilData to save files in the C:\EMTip\SoilData directory.

  2. Select the Save Data node and adjust the following properties.

    1. In the Filename Prefix property, enter Soil.
    2. In the File Format property, ensure SAS (.sas7bdat) is selected.
    3. Click the ellipsis button in the SAS Library Name property to open the Select a SAS Library window. Select SoilData and click OK.

  3. Run the Save Data node.

  4. In the operating system file browser, navigate to the C:\EMTip\SoilData directory and verify that the SAS data set, soil_train was created successfully.  Remember that this directory will be created on the server rather than client.

Now to use this data set in the project, you need to define the library SoilData to read the soil_train SAS data set and create the data source manually. If you want to use this data source in a different project, you have to create new data source in the new project. To eliminate these manual steps, you can use SAS Enterprise Miner Global Data Source library.

 

Using SAS Enterprise Miner Global Data Sources Library to Share Data Source Definitions between Multiple Projects

You can define the EMGDS (SAS Enterprise Miner Global Data Sources) library in your project startup code.

 

   1. To define the EMGDS library, open the Project Start Code window and enter the following code. Make sure that the Global_Datasources folder exists on the file system.

 

libname EMGDS ‘C:\Global_Datasources’;

 

   2. Click Run Now. This creates the SAS Enterprise Miner Global Data Sources library.

   3. In the project panel, select View -> Program Editor to open the program editor.

   4. Run following code.

 

filename code catalog "sashelp.emutil.emds.source";

%include code;

 

%emds(data=SoilData.soil_train,

rootLibrary=EMGDS,

target=Cover_Type,

name=SoilData,

userid= <userid>,

tablerole=TRAIN,

adviseMode=advanced);

 

   5. In the project panel, select View -> Refresh Project.

   6. Expand the Data Sources.

   7. Verify that new data source, “soil_train” appears the Data Sources list. Now you can use this data source in different diagrams.

   8. In the project panel, select File -> New -> Project to create a new project.

   9. In newly created project, open the Project Start Code window and enter the following code. Select Run Now to create the EMGDS and SoilData libraries.

 

libname EMGDS "C:\Global_Datasources";

libname SoilData 'C:\EMTip\SoilData';

 

   10. In the project panel, select View -> Refresh Project.

   11. Expand Data Sources.

   12. Verify that the soil_train data source appears in the Data Sources list

   13. Right-click on soil_train and select Edit Variables.  Notice that correct roles and levels are assigned and target role is assigned to Cover_Type variable.Now you can use this data source for modeling in this project.

 

Summary

We have seen how to use the File Import node to import a CSV file, the Save Data node to save data as a SAS data set and the Global Data Source library to share data sources between multiple projects.

Attachment
Comments
by Senior User krkerwin
on ‎07-26-2016 02:49 PM

I have created my file import node on the diagram then accessed the properties to change the Advanced Advisor to yes.  Then when I click the file import elipses link, the window pops up but the buttons to import are greyed out?  How do I make them visible?

 

Kathleen

by New Contributor Jeanette
on ‎09-12-2016 03:55 AM

In part 

Save SAS data set using the Save Data node

1d) In the Project Start Code window, enter SoilData ‘C:\EMTip\SoilData’;. Click Run Now. This creates the library SoilData to save files in the C:\EMTip\SoilData directory

did not work. I had to use 

LIBNAME SoilData 'C:\EMTip\SoilData';
Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.