The cat functions are arguably one of the most used recent additions to the SAS language. They are so useful that some have been clamouring to further extend their functions, like here and here.
I also have needed some improvements, so I finally bit the bullet and wrote the concatenation function that I needed.
The FCMP function presented here has not withstood the test of time, so may be imperfect and is to be used with caution.
This concatenation function can use delimiters and formats, allows missing values to be incorporated, allows quoting, and includes basic cleaning options such as upcase, lowcase, compbl, removing non-printable characters.
Of course, FCMP's limitations mean this function cannot be as convenient as a SAS Institute-issued function. The main restrictions are: Only strings can be concatenated, all strings must have the same length, and having to build an array before using the function.
Another useful feature (option v maybe?) would be allow using the format associated with each variable instead of supplying a global format to all variables. This too could only be done by a native function.
/***********************************************************************************************
FCMP Function Name CONCAT
------------------
Description This function concatenates text strings.
----------- It allows delimiters, formats, missing values, quoting, and includes basic
cleaning options such as upcase, lowcase, compbl, removing non-printable characters.
See the parameters section for more information
Parameters DLM $ Delimiter string to insert between concatenated strings
---------- The delimiter string is only used if option D is found
QUOTE $ Character(s) used to surround concatenated strings
Single and double quotes can be used
Brackets ( [ { < > } ] ) can be used. In this case:
Opening brackets ( [ { < are inserted at the start
Closing brackets > } ] ) are inserted at the end
Multiple characters can be used, except for single and double quotes
The quoting string is only used if option Q is found
FORMAT $ Name of the character format to apply
The format is only used if option F is found
OPTIONS $ Options driving concatenation. Case insensitive. These letters are recognised:
Q Use quoting characters(s) if provided
F Use format if provided
D Use delimiter in concatenated string (space if no DLM provided)
M Include missing string in concatenated string
T Trim string before adding to concatenated string
S Strip string before adding to concatenated string
C Compbl string before adding to concatenated string
U Upcase string before adding to concatenated string
L Lowcase string before adding to concatenated string
P Propcase string before adding to concatenated string
N Clean non-printable characters before adding to concatenated string
A[*] $ Character array containing the strings to concatenate
Example data TEST;
------- array ARR[6] $7 ( ' ' 'strawberry' ' ' 'custard apple' '"paul''s" berry ' ' ');
length S $ 200;
OPT='t '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='d '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='f '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='q '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='m '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='u '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='ft '; S = concat('-|-', ']]]', '$revers20.', OPT, ARR); output;
OPT='tfd '; S = concat('-|-', ']]]', '$hex20. ', OPT, ARR); output;
OPT='tdfqmu'; S = concat('-|-', ']]]', '$hex20. ', OPT, ARR); output;
OPT='tdfqmp'; S = concat(' | ', '{' , '$8. ', OPT, ARR); output;
OPT='qdtu '; S = concat(' | ', '"' , '$8. ', OPT, ARR); output;
run;
t strawbecustard"paul's
d strawbe |custard |"paul's
f ebwarts dratsuc s'luap"
q {strawbe }{custard }{"paul's }
m strawbe custard "paul's
u STRAWBE CUSTARD "PAUL'S
ft ebwartsdratsucs'luap"
tfd 7374726177626520-|-6375737461726420-|-227061756C277320
tdfqmu [[[2020202020202020]]]-|-[[[5354524157424520]]]-|-[[[2020202020202020]]]-|-[[[4355535441524420]]]-|-[[[225041554C275320]]]-|-[[[2020202020202020]]]
tdfqmp { } | {Strawbe} | { } | {Custard} | {"paul's} | { }
qdtu "STRAWBE" | "CUSTARD" | """PAUL'S"
Limitations 1- No parameter validation
----------- 2- Delimiter string max length = 32 characters
3- Quote string max length = 32 characters
4- The function does not recognise lengths shorter than 8 and sets the length to 8
This only matters if trimming is not requested
5- Only strings can be concatenated, numeric variables cannot be used
6- All strings must have the same length
7- Strings must be built into an array before using the function
8- There is no way to use the formats associated with the variables, only a global format
Author Christian Graffeuille
------
Changes
-------
***********************************************************************************************/
proc fcmp outlib=WORK.MYFUNCS.STR;
function concat(
DLM $ /* Delimiter string to insert between concatenated strings */
/* The delimiter string is only used if option D is found */
, QUOTE $ /* Character(s) used to surround concatenated strings */
/* Single and double quotes can be used */
/* Brackets ( [ { < > } ] )can be used. In this case: */
/* Opening brackets ( [ { < are inserted at the start */
/* Closing brackets > } ] ) are inserted at the end */
/* Multiple characters can be used, except for single and double quotes */
/* The quoting string is only used if option Q is found */
, FORMAT $ /* Name of the character format to apply */
/* The format is only used if option F is found */
, OPTIONS $ /* Options driving concatenation. These letters are recognised: */
/* Q Use quoting characters(s) if provided */
/* F Use format if provided */
/* D Use delimiter in concatenated string (space if no DLM provided) */
/* M Include missing string in concatenated string */
/* T Trim string before adding to concatenated string */
/* S Strip string before adding to concatenated string */
/* C Compbl string before adding to concatenated string */
/* U Upcase string before adding to concatenated string */
/* L Lowcase string before adding to concatenated string */
/* P Propcase string before adding to concatenated string */
/* N Clean non-printable characters before adding to concatenated string */
, A[*] $ /* Character array containing the strings to concatenate */
)
$ 32767;
%* Init ;
length RESULT S $327 Q1 Q2 OPT $32 L LF FIRSTOUT DO_S DO_T 8 ;
RESULT= 'a';
OPT = upcase(OPTIONS);
DO_S = index(OPT,'S');
DO_T = index(OPT,'T');
%* Get variable length. Sets the length to no less than 8 ;
L=length(A[1]||'a')-1;
%* Set surrounding (quoting) characters ;
if index(OPT,'Q') & lengthn(QUOTE ) then do;
if QUOTE in:('(',')') then do; Q1=repeat('(',length(QUOTE)-1); Q2=repeat(')',length(QUOTE)-1); end;
else if QUOTE in:('{','}') then do; Q1=repeat('{',length(QUOTE)-1); Q2=repeat('}',length(QUOTE)-1); end;
else if QUOTE in:('[',']') then do; Q1=repeat('[',length(QUOTE)-1); Q2=repeat(']',length(QUOTE)-1); end;
else if QUOTE in:('<','>') then do; Q1=repeat('<',length(QUOTE)-1); Q2=repeat('>',length(QUOTE)-1); end;
else if index(QUOTE,'"') then Q1='"';
else if index(QUOTE,"'") then Q1="'";
else do; Q1=QUOTE ; Q2=QUOTE ; end;
end;
%* Loop through the values and append;
do I = 1 to dim(A);
%* If value is missing then skip unless option M is present;
if lengthn(A[I]) | index(OPT,'M') then do;
%* Apply value clean-up options;
if index(OPT,'C') then S=compbl(trim(A[I])); else S=A[I];
if index(OPT,'U') then S=upcase (trim(S));
else if index(OPT,'L') then S=lowcase (trim(S));
else if index(OPT,'P') then S=propcase(trim(S));
%* Format value if needed;
if index(OPT,'F') & FORMAT ne ' ' then do;
S =putc(substrn(S,1,L),FORMAT)||'a';
LF=length(S)-1;
S =substrn(S,1,LF);
end;
%* Store value length;
LF=max(L,LF);
if index(OPT,'D') & FIRSTOUT then RESULT = substrn(RESULT,1,length(RESULT)-1)||DLM||'a'; %* Add delimiter if previous value found ;
%* Prepare value;
if Q1 in ('"',"'") then do; %* Use quote function ;
if DO_S then RESULT=substrn(RESULT,1,length(RESULT)-1)||quote(strip(S) ,Q1)||'a'; %* Strip or trim if needed ;
else if DO_T then RESULT=substrn(RESULT,1,length(RESULT)-1)||quote(trim (S) ,Q1)||'a';
else RESULT=substrn(RESULT,1,length(RESULT)-1)||quote(substrn(S,1,LF),Q1)||'a';
end;
else do; %* Add quote character(s) ;
if DO_S then RESULT=substrn(RESULT,1,length(RESULT)-1)||trimn(Q1)||strip(S) ||trimn(Q2)||'a'; %* Strip or trim if needed ;
else if DO_T then RESULT=substrn(RESULT,1,length(RESULT)-1)||trimn(Q1)||trim (S) ||trimn(Q2)||'a';
else RESULT=substrn(RESULT,1,length(RESULT)-1)||trimn(Q1)||substrn(S,1,LF)||trimn(Q2)||'a';
end;
FIRSTOUT=1; %* Flag used in case the 1st values=' ' & a delimiter is requested ;
end;
end;
return (substr(RESULT,1,length(RESULT)-1));
endsub;
run;
Can you provide a couple of examples of how this is used?
Well, that led through an interesting excursion in the help file. I had either forgotten about or never knew about the CATQ function and the family of "Excel" functions whose names end with "_slk" (I assume the suffix comes from the late lamented PROC SYLK).
I started looking at function help because I did remember the existence of the VVALUE function, but not all of the details. I was thinking that would be useful for formatting values, but it turns out that VVALUE works only inside a data step, or at least it is documented to work only in a data step. But you could use VVALUE to good effect with your function in a data step.
I would propose a different syntax for the new function, where you provide arguments in pairs of (value, value-options); you could provide separate options for every input value, which would allow the values to have different formats, quote some arguments but not others, and so forth. So the function would be something like
result = catall(delimiter, value1, options1, value2, options2, ...);
and an example usage would be
mynewvalue = catall(delimiter, id_num, 'FS',
birth_date, 'f=date11.',
name, 'q');
You're right, this would probably have to be provided as a built-in function by SAS, because some of the parameters could be numeric, and because to be most useful it would have to be callable from PROC SQL (which doesn't accept array arguments).
Hmm. PROC SQL seems to have a length limit of 200 characters for the value returned by some but not all functions, which makes many character functions less useful than they might be. I wonder why that might be, or if there is a workaround.
%let longval = %sysfunc(repeat(*, 400));
%let longlen = %length(&LONGVAL.);
%put &=LONGLEN;
proc sql;
create table temp
(
longval1 char(401),
longval2 char(401) ,
longval3 char(401) ,
name char(10)
);
insert into temp
select
repeat('*', 400) length=401 ,
"&LONGVAL." length=401,
upcase("&LONGVAL.") length=401 ,
name
from
sashelp.class
where
name='Alfred';
select
(length(longval1)) as longlen1 , /* 200 */
(length(longval2)) as longlen2 , /* 401 */
(length(longval3)) as longlen3 /* 401 */
from
temp;
quit;
@PaigeMiller There is a data step example in the file header, as well as the value output for each set of input values.
data TEST;
array ARR[6] $7 ( ' ' 'strawberry' ' ' 'custard apple' '"paul''s" berry ' ' ');
length S $ 200;
OPT='t '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='d '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='f '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='q '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='m '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='u '; S = concat('|' , '{' , '$revers20.', OPT, ARR); output;
OPT='ft '; S = concat('-|-', ']]]', '$revers20.', OPT, ARR); output;
OPT='tfd '; S = concat('-|-', ']]]', '$hex20. ', OPT, ARR); output;
OPT='tdfqmu'; S = concat('-|-', ']]]', '$hex20. ', OPT, ARR); output;
OPT='tdfqmp'; S = concat(' | ', '{' , '$8. ', OPT, ARR); output;
OPT='qdtu '; S = concat(' | ', '"' , '$8. ', OPT, ARR); output;
run;
t strawbecustard"paul's d strawbe |custard |"paul's f ebwarts dratsuc s'luap" q {strawbe }{custard }{"paul's } m strawbe custard "paul's u STRAWBE CUSTARD "PAUL'S ft ebwartsdratsucs'luap" tfd 7374726177626520-|-6375737461726420-|-227061756C277320 tdfqmu [[[2020202020202020]]]-|-[[[5354524157424520]]]-|-[[[2020202020202020]]]-|-[[[4355535441524420]]]-|-[[[225041554C275320]]]-|-[[[2020202020202020]]] tdfqmp { } | {Strawbe} | { } | {Custard} | {"paul's} | { } qdtu "STRAWBE" | "CUSTARD" | """PAUL'S"
@JackHamilton Function VVALUE is indeed only accessible from a data step. While this function would be useful in SQL, it is not really meaningful in FCMP as the FCMP parameters are disconnected from the data step variables. So the idea of using the formats associated with the variables is reserved for an Institute-supplied function.
Regarding the variable-value pairs, it's not possible in the way described with the current limitation of FCMP as optional parameters are extremely sadly disallowed. I reckon forcing all variables to the same length and putting them in an array, as my function does, is painful enough; interleaving values and options in that array is a step too far, or at least should be a separate, even more flexible and syntaxically convoluted function.
As for the lengths, I am with you too. It is very disappointing that SQL misbehaves in this manner, and I would call this a defect.
Even forcing the length with the put function fails
insert into temp
select put(repeat('*', 400),$401.)
, "&LONGVAL."
, upcase("&LONGVAL.")
, name
from sashelp.class(obs=1);
SAS functions behave erratically in this respect, and even in a data step surprises abound.
data T; s1=repeat('aa',55); * 200; s2=upcase('aa'); * 2; s3=reverse('aa'); * 2; s4=cat('aa'); * 200; s5=put('aa',$char.); * 2; s6=putc('aa','$char.'); * 2; s7=prxchange('s/./b/',1,'a');* 200; s8=substr('aa',1,1); * 2; s9=compbl('aa'); * 2; s10=compress('aa'); * 2; s11=strip('aa'); * 2; s12=trim('aa'); * 2; s13=left('aa'); * 2; s13=coalescec('aa','bbb'); * 2; s14=choosec(1,'aa','bbb'); * 200; s15=dequote('aa'); * 2; s16=ifc(1,'aa','bbb'); * 200; s17=nliteral('aa'); * 200; s18=trimn('aa'); * 2; s19=substrn('aa',1,1); * 2; s20=translate('aa','a','b'); * 2; s21=transtrn('aa','a','b'); * 200; s22=tranwrd('aa','a','b'); * 200; s23=quote('aa'); * 200; s24=scan('aa',1); * 2; s25=subpad('aa',1,1); * 200; s26=nliteral('aa'); * 200; run;
The text in bold is a nudge to SAS Institute for action..
As I don't usually use PROC FCMP, isn't there some command you need to run before you can use CONCAT as a function?
If the function is saved as in the example
proc fcmp outlib=WORK.MYFUNCS.STR;
You need to set this option
options cmplib = WORK.MYFUNCS;
which acts a bit like option FMTSEARCH, for for compiled routines.
Or perhaps, just in case something is already using functions,
options append=cmplib work.myfuncs ;
Not very high on my list, but I wouldn't mind a REMOVE= option to remove an existing object from cmplib or ftmsearch or whatever else can use append or insert.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Lock in the best rate now before the price increases on April 1.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.