SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Using User Defined Functions and Subroutines in DI Studio

Reply
SAS Employee
Posts: 51

Using User Defined Functions and Subroutines in DI Studio

OVERVIEW

The ability to create User Defined Functions/Subroutines in SAS was introduced in version 9.2. One of my first uses for user defined functions was to derive a Date Dimension key. Date Dimensions are one of the few types of tables for which I create a smart key, usually in the format CCYYMMDD. To derive this from an incoming Oracle date/time value, I had to enter this behemoth into the Expression Editor in DI Studio:

input(compress(put(datepart(),yymmdd10.),'-'),8.)


After the release of 9.2, I just had to enter this:

deriveDateDimPK(,”DATETIME”);


This is certainly easier to remember! Making use of this convenience requires a few steps, which are discussed below.

CREATE YOUR FUNCTION OR SUBROUTINE WITH PROC FCMP




PROC FCMP OUTLIB=..;
function (, $, ) <$>

return ();
endsub;
QUIT;


Some explanation of the pseudo code above: All functions and subroutines must be declared within PROC FCMP. The OUTLIB parameter specifies the library, dataset and package in which functions/subroutines will be stored. Technically, each line of a function is stored in a SAS dataset along with the function name. A package is like a namespace in other languages: a way to disambiguate functions of the same name within the same library and dataset.
Within PROC FCMP, the declaration of a function is fairly simple, as the syntax above demonstrates. This pseudo code is actually a bit simplified from what you’ll find in the online doc, but the basics are shown:

  1. Every function has a name
  2. A function can have zero or more arguments enclosed in parentheses.
  3. Function parameter values may be numerics or strings. One places a “$” after the parameter name to indicate that it is a string.
  4. Functions may return a numeric or a string. If the function returns a string, the “$” symbol must come after the VARARGS.
  5. A function may accept a variable number of parameters, with parameters after the named parameters being passed into the function as an array (I won’t cover that in this post).
  6. Every function must return a value prior to the “endsub” statement.

So, the deriveDateDimPK() function I referred to above could be implemented like so:


PROC FCMP OUTLIB=sasuser.funcs.dmart;
function deriveDateDimPK (inputField, inputFormat $);
length dateDimKey 3
inDate 3;
if inputformat eq "DATETIME" then inDate = datepart(inputField);
else inDate = inputField;
return (input(compress(put(inDate,yymmdd10.),'-'),8.));
endsub;
QUIT;


Notice that the syntax used above is much like that of the DATA Step. You can refer to standard DATA Step functions and even your own user defined functions if they’re declared prior to their use (either above the current function/subroutine in PROC FCMP or via the “cmplib” option, which is discussed below). One can also declare local variables and make use of conditional and iterative logic. There are some things that can be used in the DATA Step that are NOT available in user defined functions (DATA and OUTPUT statements, reading from or writing to files other than LOG and PRINT with INFILE and FILE statements, etc). A complete list of the differences between PROC FCMP functions/subroutines and the DATA Step can be found at http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003262740.htm.

Most of what is written above applies to subroutines as well as functions. The differences with a subroutine are that subroutines have output parameters but NO return values.


POINTING TO FUNCTION PACKAGES

Before a function or subroutine may be called from a DATA Step, it’s parent library/dataset must be referenced using the CMPLIB option. Using the example from above, the options statement would be:


OPTIONS CMPLIB=sasuser.funcs;


Notice that only the library and dataset levels are necessary and that the package may be omitted. This options statement may be specified in one of several places to make the functions callable in DI Studio, including , including the function code file itself, the “sasv9.cfg” file, the “autoexec” file for the current Application Server Context or in pre-process code for the step or job.

EXAMPLE USAGE
The DATA Step program below shows the function defined above in action:


/* Create a table with date/time values */
DATA work.funcInput;
KEEP ID TRANSACTION_DTTM;
startDate = today();
do i = 0 to 9;
ID=i+1;
TRANSACTION_DTTM = dhms(startDate-i,0,0,0);
OUTPUT;
end;
RUN;
/* Show usage of deriveDateDim() function */
DATA _NULL_;
SET work.funcInput;
dateDimPK = deriveDateDimPK(TRANSACTION_DTTM,"DATETIME");
put ID= TRANSACTION_DTTM= datetime20. dateDimPK=;
RUN;


The PROC SQL code below shows the same type of usage:

PROC SQL;
select ID,
TRANSACTION_DTTM,
deriveDateDimPK(TRANSACTION_DTTM,"DATETIME")
from work.funcInput;
QUIT;

SAS user defined functions should only be used in PROC SQL when querying SAS data. If these are used when querying RDBMS data, the query will not be processed in the database.


PROMOTING FUNCTIONS BETWEEN ENVIRONMENTS

Remember from above that functions are stored in SAS datasets. That being the case, they may be promoted as a Library and Table along with the rest of your DI Studio objects in a SAS package file using the “Export…” menu.


CONCLUSION
Functions and subroutines can be of great help to reuse logic between jobs and projects, much more cleanly than was the case before their introduction. This post showed a simple use of functions. There are a lot of other useful things you can do with functions, like loading arrays from a dataset, writing arrays to a dataset, etc. If you want to learn more about what you can do with PROC FCMP and functions and subroutines, please use the following link: http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003262717.htm.
PROC Star
Posts: 1,232

Re: Using User Defined Functions and Subroutines in DI Studio

Nice introduction to the benefits of user functions.

Just a note that often the SAS macro language can provide similar functionality.

So in this case, if your desired code was:

  input(compress(put(datepart(<date/time field>),yymmdd10.),'-'),8.)

Your macro could be:

%macro makedate(field);

  input(compress(put(datepart(&field),yymmdd10.),'-'),8.)

%mend makedate;


Used like:

DATA _null_;

       SET work.funcInput;

       dateDimPK = %makedate(TRANSACTION_DTTM);

       put ID= TRANSACTION_DTTM= datetime20. dateDimPK=;

RUN;

Not a criticism of your post.  Always more than one way to skin a cat.

Respected Advisor
Posts: 3,894

Re: Using User Defined Functions and Subroutines in DI Studio

Hi Tim

Thanks for the valuable information.

Could you please elaborate a bit more section "PROMOTING FUNCTIONS BETWEEN ENVIRONMENTS"? (the link there seems incomplete)

What confuses me:

"Remember from above that functions are stored in SAS datasets."  I would have thought these are compiled entries in a SAS Catalog

"That being the case, they may be promoted as a Library and Table along with the rest of your DI Studio objects in a SAS package file..." But these packages are only metadata. The compiled functions as such won't be moved this way. Also as this is a catalog and not a table there won't be table metadata.

I was thinking so far that I would need some DI once-off job with the PROC FCMP code, that this code gets promoted and then needs to be re-run in the target environment, and if not compiling into the default location one would also need to amend the global option cmplib (eg. in the autoexec of the target environment).

.... but may be I'm missing something here. Could you please give more detail?

Thanks

Patrick

Ask a Question
Discussion stats
  • 2 replies
  • 1303 views
  • 0 likes
  • 3 in conversation