We’re smarter together. Learn from this collection of community knowledge and add your expertise.

Building a User Defined Function Library with Proc FCMP - The Namecase Function

by Super Contributor on ‎08-16-2017 12:10 PM (1,654 Views)

Introduction

 

It often feels like no matter how many inbuilt functions a language has (and SAS has a huge number) there’s always one you’d like to have but find is missing. SAS programmers have been used to getting around this either by hard coding their algorithms or writing reusable macros – however ever since the release of SAS 9.2 we have had the ability to build our own user defined functions (UDFs) with Proc FCMP. This series of articles will focus on building a portable library of UDFs in much the same way that programmers have built their own macro libraries and will showcase some of the facilities of Proc FCMP which can add value to your programming efforts.

 

This first article will focus on one function - the namecase function. It is sometimes necessary to run a data cleaning exercise on people’s names and one thing you have to do during that exercise is ensure proper capitalization. For most people, the simple application of the inbuilt propcase function is enough as it capitalizes the first letter of a word and makes all other letters lower case. This does not work, however, for a significant number of names e.g. propcase incorrectly turns MacArthur into Macarthur, O’Neill into O’neill and von Richtofen into Von Richtofen. The namecase function which we will create attempts to rectify that problem although given the inherently personal nature of names it cannot be 100% guaranteed to cover all possibilities. So, if you find other instances of names where propcase doesn’t work as required you could always add them to the custom namecase function.

 

The Code

 

The code for namecase is fairly straightforward although a few points should be mentioned:

 

1. In the Proc FCMP statement the destination for the compiled function is a three-part name of the form:

 

Libname.dataset.package

 

Functions are stored in special SAS data sets and organized into packages – for namecase we will create a package called char into which we will store all functions which operate on character values.

 

2. One of the most important things to note in the code is that the number of words in a name is obtained and then used to help determine which type of name we are processing – as a fall-back anything which doesn’t fit into one of the special categories is converted to proper case.

 

Here is the full code

 

 

 


/* Define the location of the UDF library */

libname udflib '/folders/myshortcuts/Dropbox/SAS/UDFLib';

/************************************************************************************
Title: Namecase Purpose: Converts a name into the proper case taking account of special cases Restriction: Handles names of 50 characters or less Syntax: NAMECASE(argument) Required Argument: A character constant, variable or expression Details: The namecase function converts names into their proper case taking account ofthe following prefixes: Mc, Mac, O', d', van, von, de, ap, du, van der Examples: namecase("o'neill") - returns O'Neill namecase("d'artagnan") - returns d'Artagnan ************************************************************************************/ proc fcmp outlib=udflib.funcs.char; function namecase(lastname $) $50; length newname $50; /* Count the number of words in the name so we can handle names like von Richtofen */ numwords=countw(lastname); /* Convert the name into lowercase so we know what state we're starting from */ lowname=lowcase(lastname); if numwords=1 then do; if substr(lowname,1,2)="mc" then newname=cat((upcase(substr(lowname,1,1))),lowcase(substr(lowname,2,1)),upcase(substr(lowname,3,1)),lowcase(substr(lowname,4))); else if substr(lowname,1,3)="mac" then newname=cat((upcase(substr(lowname,1,1))),lowcase(substr(lowname,2,2)),upcase(substr(lowname,4,1)),lowcase(substr(lowname,5))); else if substr(lowname,1,2)="o'" then newname=cat((upcase(substr(lowname,1,1))),"'",upcase(substr(lowname,3,1)),lowcase(substr(lowname,4))); else if substr(lowname,1,2)="d'" then newname=cat((lowcase(substr(lowname,1,1))),"'",upcase(substr(lowname,3,1)),lowcase(substr(lowname,4))); else newname=propcase(lowname); end; else if numwords=2 then do; if lowcase(scan(lowname,1)) in ('van','von','de','ap') then do; newname=cat(lowcase(scan(lowname,1))," ",propcase(scan(lowname,2))); end; else if lowcase(scan(lowname,1)) in ('du') then do; newname=propcase(lowname); end; else newname=propcase(lowname); end; else if numwords=3 then do; if lowcase(scan(lowname,1))="van" and lowcase(scan(lowname,2))="der" then do; newname=cat(lowcase(scan(lowname,1))," ",lowcase(scan(lowname,2))," ",propcase(scan(lowname,3))); end; else newname=propcase(lowname); end; /* Handle any names which aren't special cases */ else newname=propcase(lowname); /* Remove any surplus blanks from the start and end of the name */ newname=strip(newname); /* Return the new, properly capitalized name */ return(newname); endsub; run;

 

 

We can test the function like this

 

 

 

libname udflib '/folders/myshortcuts/Dropbox/SAS/UDFLib';

options cmplib=udflib.funcs;

data namechars;
	length name $30;
	infile datalines dlm=",";
	input name;
	datalines;
smith
SMITH
o'reilly
O'REILLY
macarthur
MACARTHUR
mcdonald
MCDONALD
d'artagnan
D'ARTAGNAN
van winkle
VAN WINKLE
von richtofen
VON RICHTOFEN
VAN DER MERWE
van der merwe
;
run;

data _null_;
	set namechars;
	name=namecase(name);
	put name=;
run;

 

This writes the following to the SAS log

 

 

 
 name=Smith
 name=Smith
 name=O'Reilly
 name=O'Reilly
 name=MacArthur
 name=MacArthur
 name=McDonald
 name=McDonald
 name=d'Artagnan
 name=d'Artagnan
 name=van Winkle
 name=van Winkle
 name=von Richtofen
 name=von Richtofen
 name=van der Merwe
 name=van der Merwe

 

Conclusion

 

We have seen how we can create a simple but useful character function which will we can call from our SAS code to fill a gap in the SAS inbuilt functions. If you think you can see an improvement to this function or if there are any other functions you would like to see covered please leave a comment below.

Comments
by Super User
on ‎08-16-2017 01:06 PM

This is a useful example, thanks for sharing!

by PROC Star
2 weeks ago

 Very nice example. Thank you @ChrisBrooks :)

by Super Contributor
a week ago

Good example. Shows details of documenting the program for easy refreshing the mind when you return to the program after several months.

 

datasp

 

Contributors
Your turn
Sign In!

Want to write an article? Sign in with your profile.