Splitting a Table by Column Contents?

Reply
Contributor
Posts: 51

Splitting a Table by Column Contents?

 

Pretty much every dataset I produce has some common columns, these contain the Sector, Region and Office that the line of data is linked to.  Data is often distributed to a Sector, a Region or an Office and usually this is done by dumping the entire dataset to an Excel spreadsheet and using slicers etc to allow individuals to make their own decisions about which data cut they want.  This is all fairly new as a way of working and I'd like to improve things because often having a full data-cut causes security concerns or simply means there's too much data for antiquated machines to run comfortably.

 

I'd like to add a Code Node to my Enterprise Guide process flows that creates multiple tables, one for each <parameter> and filters each table by that parameter.  The parameters would be the Sector, the Region or the Office, so for example if I wanted to use the Region, I would end up with 8 output tables, one for each Region.

 

If I could do that, that woujld be amazing.  But if it's relatively simple, I'd like to be able to use multiple parameters, as some times it would be very useful to break the data down further, for example a Region's data broken down by Sector.

 

In most cases I would want to export the results to Excel, I don't have access to any of the services around that other than right-click, 'export results as a step in the process'.  So a code that incorporated the export would be really helpful too.  Obviously I'd need to specify a location in the Code Node and the export needs to be .csv as the way Enterpise Guide builds .xls files (the referencing between the different .xml files is either relative or the opposite I can't remember which) means they can't be used as Data Connections in Excel.  Data Connections are really useful for us as they allow us to build process flows that automatically update the Excel-based products that we deliver.

 

Hope that all makes sense, but if any part isn't clear, just let me know.

 

Thanks in advance!

Super User
Posts: 6,942

Re: Splitting a Table by Column Contents?

To me, that would be a real case for stored processes. You can set prompts for the parameters (pre-populated with variable names) and then create the tables accordingly.

Since prompts are also available in EG (and you can create a stored process from a node), that is the first path I'd follow.

 

Do you have a STP server?

 

Basic code for splitting along parameters:

%let indata=sashelp.cars;
%let outlib=work;
%let param=origin;

proc sort
  data=&indata (keep=&param)
  out=lookup
  nodupkey
;
by &param;
run;

data _null_;
call execute ('data ');
do until (eof1);
  set lookup end=eof1;
  call execute("&outlib.." !! trim(&param) !! ' ');
end;
call execute ("; set &indata.;");
do until(eof2);
  set lookup end=eof2;
  call execute("if &param = '" !! trim(&param) !! "' then output &outlib.." !! trim(&param) !! ';');
end;
call execute('run;');
stop;
run;

At the end you'll have three datasets in work, dynamically created from sashelp.cars. All parameters are supplied as macro variables, so you can easily define prompts for that. You could also use the same lookup dataset to create a single export step for every origin, using call execute and a where dataset condition in the proc export.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 51

Re: Splitting a Table by Column Contents?

Thanks, that's very helpful!

 

I'm making this up as I go along really, so I don't know if I can store a process, but I'll look into that.  What's an STP Server?

 

Thanks again, really appreciate the help.

Contributor
Posts: 51

Re: Splitting a Table by Column Contents?

Oh, in your code example I'm specifying the input data table.  Is there anyway to have that pick up whatever table the Code Node is linked to?  

Contributor
Posts: 51

Re: Splitting a Table by Column Contents?

Having a look around, I can open Stored Processes, and I've found a load in a folder so it looks like that might be an option, so long as I can save to that folder.  That'll depend on permissions, but also whether I'm allowed to use the folder.  I'll investigate.  In the mean time, if I get the code to work, I can easily convert that into a Stored Process?

 

Super User
Posts: 6,942

Re: Splitting a Table by Column Contents?


paulrockliffe wrote:

Having a look around, I can open Stored Processes, and I've found a load in a folder so it looks like that might be an option, so long as I can save to that folder.  That'll depend on permissions, but also whether I'm allowed to use the folder.  I'll investigate.  In the mean time, if I get the code to work, I can easily convert that into a Stored Process?

 


There are several steps involved:

- identify which nodes in a EG project will have to go into one STP

- get the code from these nodes and combine them into one code node, or have them in one process flow

- define the user prompts and use the results of the prompts in the nodes or the code

- once that works, create the stored process from that

 

Requirements

- you need a physical location (a "stored process repository") on the server where the stored process code will be stored (you and sassrv need read/write/execute permissions - OS level - there)

- you need a folder in metadata where the stored process definition will be stored. My Folder can be used, but if you want to share your STP, a folder in "Shared Data" is needed. The necessary metadata permissions are required there.

 

For most of the tasks, Enterprise Guide is a good helper. Start small (think "hello world"), and move to more complex STPs from there.

 

Since a BI server usually includes a web component, trying

http://your_sas_server:7980/SASStoredProcess

should lead you to the web interface for stored processes.

7980 is the default port for the SAS web server as delivered with 9.4

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 6,942

Re: Splitting a Table by Column Contents?

A code node does not have dedicated "input data" or "output data" parameters like other nodes. That is because there is no way how EG could know beforehand if your code actually needs or creates data.

For this you will have to rely on defined user prompts.

A quick introduction to user prompts can be found here: http://support.sas.com/resources/papers/proceedings13/028-2013.pdf

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 6,942

Re: Splitting a Table by Column Contents?

In a BI Server environment (meaning you have a metadata server, application servers and a web interface to them), one of the elements is a stored process server. This means you have one or more SAS processes waiting, and when a request is made, the code for the STP is fetched and executed. Output from that process can be an output stream (eg HTML). Through the web interface, people can request STPs for execution.

 

A good starting point is here: https://support.sas.com/documentation/cdl/en/stpug/68399/HTML/default/viewer.htm#p0ct3m137hug02n1lqu...

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 51

Re: Splitting a Table by Column Contents?

OK, I've made a start with getting this to work within my setup and with a table that includes my population.

 

I've got it working with Office as the Parameter, but where the Office has two words (eg Castle House) in the name it is creating two identical tables, one named Castle, the other House.  Is that an easy fix as it then causes problems dealing with the data down-stream.  I've got two Regions with London in the name, this results on the code stoping once it's created the first table called "London".

 

 

Is it possible for the code to start by deleting any tables it's already created; running it a second time causes errors because tables already exist?  The error is ERROR: Data set WORK.LONDON is already open for output.  I've tried deleting the tables that are visible on the Process Flow, but they remain in WORK, so that doens't remove the error.  

 

Thanks again!

Super User
Posts: 6,942

Re: Splitting a Table by Column Contents?

When you prepare the lookup table, you can create two variables; one for the actual contents of your parameter variable, and the other with a value derived from that that contains a valid SAS name (no blanks or special characters).

If it's not that easy to create valid SAS names, consider just using &param.001, &param.002 and so on (just the name of the parameter and a sequential number), as these will surely be valid.

The problem with the locks should be fixed if you set your Enterprise Guide to not automatically open resulting datasets for display.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 51

Re: Splitting a Table by Column Contents?

Thanks, sorry do you mind explaining how I add the second column to the Lookup table created by the Proc Sort?

 

I'm very new to SQL and can follow what you've done but don't know how to add another variable when your code isn't adding the first variable, just keeping something that was already there.

Super User
Posts: 6,942

Re: Splitting a Table by Column Contents?

How you solve the problem of creating valid SAS names depends on the data you have. So have to apply a hefty dose of Maxim 3 (Know your data) to the problem. Do a proc freq on the variables in question, and see if a simple replacement of blanks with underlines solves it without creating unwanted similar values.

A first attempt might look like that:

%let indata=sashelp.baseball;
%let outlib=work;
%let param=team;

proc sort
  data=&indata (keep=&param)
  out=lookup
  nodupkey
;
by &param;
run;

data lookup;
set lookup;
vname = translate(trim(&param),'_',' ');
run;

data _null_;
call execute ('data ');
do until (eof1);
  set lookup end=eof1;
  call execute("&outlib.." !! trim(vname) !! ' ');
end;
call execute ("; set &indata.;");
do until(eof2);
  set lookup end=eof2;
  call execute("if &param = '" !! trim(&param) !! "' then output &outlib.." !! trim(vname) !! ';');
end;
call execute('run;');
stop;
run;

If it's hard to create valid dataset names (special characters etc.), the intermediate step for preparing lookup might be changed to this:

data lookup;
set lookup;
vname = "&param" !! put(_n_,z4.);
run;

Now the datasets will be numbered, so you will have to rely on the lookup dataset to tell you which dataset belongs where.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 51

Re: Splitting a Table by Column Contents?

Thanks, I'll take a look at that.  

 

The data is only going to be split by three columns, all contain only names of either a Region, Office, or Sector.  There's something like 10 offices, 8 regions and 10 sectors and most are single words, though there are some with multiples words separated by a space and a few where there's an ampersand in there.  To get a valid SAS name I'd need to transform the contents by deleting ampersands, removeing double spaces and then replacing spaces with underscores.  That would cover every possible eventuality for the data I have.  

 

Or I could get the chap that builds the table I'm using to modify it so that it only contains valid results, that would robably be easiest!

Contributor
Posts: 51

Re: Splitting a Table by Column Contents?

Right, got that working I think, though it was creating some data set names that are longer than 32 characters, so I've had to use substr to knock the end of the names.  I cna't work out how to remove an ampersand without ending up with two underscores next to each other, but that's a minor problem.

 

Do you know how I go about getting each output table to automatically export to a folder as a .csv file?

 

I presume if I use prompts to control the code node that the only way to be able to easily apply the code in a new project is to use the Stored Process?

 

Thanks again, this has been a great help!

Super User
Posts: 6,942

Re: Splitting a Table by Column Contents?

The lookup file created in my last version already contains the name of the dataset, which can also be used as a valid filename for the OS.

So you can use it to dynamically create the export steps:

%let outpath=/somewhere/subdir/;

data _null_;
set lookup;
call execute("proc export data=&outlib.." !! trim(vname) !! " outfile='&outpath." !! trim(vname) !! ".csv' dbms=csv; run;");
run;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Ask a Question
Discussion stats
  • 23 replies
  • 148 views
  • 1 like
  • 2 in conversation