Have you used the File Import node to import external files in Enterprise Miner? If so, you may have wondered how to create a data source and share it across projects without the manual intervention of copying a table and creating a data source. In this tip we will see how to use the Save Data node to save a SAS data set and use the SAS Enterprise Miner Global Data Sources library to create and share data sources across multiple projects. Let’s explore this topic now.
We are going to import the soil.csv file which contains African soil property data into Enterprise Miner, then we will create Soil_Train as a data source. As a prerequisite to the steps in this tip, download the attached file soil.csv to a location you can access on the file system.
We are using the File Import node to import soil.csv into Enterprise Miner. You can also import Microsoft Excel, SAS JMP, SPSS, Stata, Tab-Delimited, Paradox and dBASE files using the File Import node. The File Import node is located on the Sample tab of the SAS Enterprise Miner toolbar.
To use this SAS data set in the different diagram, you have to follow multiple steps. First you have to use the operating system to copy the fimport.sas7bcat file out of the project directory. Then you need to define a SAS library and create a data source manually that can be used in a different diagram. There is an easier way to accomplish this.
The Save Data node is located on the Utility tab. Now, drag the Save Data node to the diagram. Connect the File Import node to the Save Data node as shown below.
The Save Data node can export JMP, Excel 2010, CSV, and tab-delimited files. We are using this node to save SAS data set in the pre-existing SAS library. To accomplish this, first we need to assign library in the project startup code editor.
a. Make sure that the C:\EMTip\SoilData folder exists on the file system.
b. In the EM project panel, select your project.
c. In the properties panel, click the ellipsis button in the Project Start Code property to open the Project Start Code window.
d. In the Project Start Code window, enter SoilData ‘C:\EMTip\SoilData’;. Click Run Now. This creates the library SoilData to save files in the C:\EMTip\SoilData directory.
2. Select the Save Data node and adjust the following properties.
3. Run the Save Data node.
4. In the operating system file browser, navigate to the C:\EMTip\SoilData directory and verify that the SAS data set, soil_train was created successfully. Remember that this directory will be created on the server rather than client.
Now to use this data set in the project, you need to define the library SoilData to read the soil_train SAS data set and create the data source manually. If you want to use this data source in a different project, you have to create new data source in the new project. To eliminate these manual steps, you can use SAS Enterprise Miner Global Data Source library.
You can define the EMGDS (SAS Enterprise Miner Global Data Sources) library in your project startup code.
1. To define the EMGDS library, open the Project Start Code window and enter the following code. Make sure that the Global_Datasources folder exists on the file system.
libname EMGDS ‘C:\Global_Datasources’;
2. Click Run Now. This creates the SAS Enterprise Miner Global Data Sources library.
3. In the project panel, select View -> Program Editor to open the program editor.
4. Run following code.
filename code catalog "sashelp.emutil.emds.source";
5. In the project panel, select View -> Refresh Project.
6. Expand the Data Sources.
7. Verify that new data source, “soil_train” appears the Data Sources list. Now you can use this data source in different diagrams.
8. In the project panel, select File -> New -> Project to create a new project.
9. In newly created project, open the Project Start Code window and enter the following code. Select Run Now to create the EMGDS and SoilData libraries.
libname EMGDS "C:\Global_Datasources";
libname SoilData 'C:\EMTip\SoilData';
10. In the project panel, select View -> Refresh Project.
11. Expand Data Sources.
12. Verify that the soil_train data source appears in the Data Sources list
13. Right-click on soil_train and select Edit Variables. Notice that correct roles and levels are assigned and target role is assigned to Cover_Type variable.Now you can use this data source for modeling in this project.
We have seen how to use the File Import node to import a CSV file, the Save Data node to save data as a SAS data set and the Global Data Source library to share data sources between multiple projects.