BookmarkSubscribeRSS Feed

The CONCAT function to rule them all

Started ‎10-07-2021 by
Modified ‎10-07-2021 by
Views 4,632

The cat functions are arguably one of the most used recent additions to the SAS language. They are so useful that some have been clamouring to further extend their functions, like here and here

I also have needed some improvements, so I finally bit the bullet and wrote the concatenation function that I needed. 

 

The FCMP function presented here has not withstood the test of time, so may be imperfect and is to be used with caution.

This concatenation function can use delimiters and formats, allows missing values to be incorporated, allows quoting, and includes basic cleaning options such as upcase, lowcase, compbl, removing non-printable characters.

 

Of course, FCMP's limitations mean this function cannot be as convenient as a SAS Institute-issued function. The main restrictions are: Only strings can be concatenated, all strings must have the same length, and having to build an array before using the function.

Another useful feature (option v maybe?) would be allow using the format associated with each variable instead of supplying a global format to all variables. This too could only be done by a native function.

 

/***********************************************************************************************

    FCMP Function Name  CONCAT
    ------------------

    Description         This function concatenates text strings.
    -----------         It allows delimiters, formats, missing values, quoting, and includes basic 
                          cleaning options such as upcase, lowcase, compbl, removing non-printable characters. 
                        See the parameters section for more information

    Parameters          DLM     $  Delimiter string to insert between concatenated strings                 
    ----------                       The delimiter string is only used if option D is found                
                        QUOTE   $  Character(s) used to surround concatenated strings                      
                                     Single and double quotes can be used                                  
                                     Brackets ( [ { < > } ] ) can be used. In this case:                    
                                       Opening brackets ( [ { < are inserted at the start                  
                                       Closing brackets > } ] ) are inserted at the end                    
                                     Multiple characters can be used, except for single and double quotes  
                                     The quoting string is only used if option Q is found                  
                        FORMAT  $  Name of the character format to apply                                   
                                     The format is only used if option F is found                          
                        OPTIONS $  Options driving concatenation. Case insensitive. These letters are recognised:            
                                     Q Use quoting characters(s) if provided                               
                                     F Use format if provided                                              
                                     D Use delimiter in concatenated string (space if no DLM provided)     
                                     M Include missing string in concatenated string                       
                                     T Trim string before adding to concatenated string                    
                                     S Strip string before adding to concatenated string                   
                                     C Compbl string before adding to concatenated string                  
                                     U Upcase string before adding to concatenated string                  
                                     L Lowcase string before adding to concatenated string                 
                                     P Propcase string before adding to concatenated string                
                                     N Clean non-printable characters before adding to concatenated string 
                        A[*]    $  Character array containing the strings to concatenate                   

    Example             data TEST;
    -------               array ARR[6] $7 ( ' '  'strawberry'  ' '   'custard   apple'  '"paul''s" berry '  ' ');
                          length S $ 200;
                          OPT='t     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
                          OPT='d     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
                          OPT='f     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
                          OPT='q     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
                          OPT='m     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
                          OPT='u     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
                          OPT='ft    '; S = concat('-|-', ']]]', '$revers20.', OPT, ARR); output;
                          OPT='tfd   '; S = concat('-|-', ']]]', '$hex20.   ', OPT, ARR); output;
                          OPT='tdfqmu'; S = concat('-|-', ']]]', '$hex20.   ', OPT, ARR); output;
                          OPT='tdfqmp'; S = concat(' | ', '{'  , '$8.       ', OPT, ARR); output;
                          OPT='qdtu  '; S = concat(' | ', '"'  , '$8.       ', OPT, ARR); output;  
                        run;

                        t        strawbecustard"paul's
                        d        strawbe |custard |"paul's
                        f        ebwarts             dratsuc             s'luap"
                        q        {strawbe }{custard }{"paul's }
                        m                strawbe         custard "paul's
                        u        STRAWBE CUSTARD "PAUL'S
                        ft       ebwartsdratsucs'luap"
                        tfd      7374726177626520-|-6375737461726420-|-227061756C277320
                        tdfqmu   [[[2020202020202020]]]-|-[[[5354524157424520]]]-|-[[[2020202020202020]]]-|-[[[4355535441524420]]]-|-[[[225041554C275320]]]-|-[[[2020202020202020]]]
                        tdfqmp   { } | {Strawbe} | { } | {Custard} | {"paul's} | { }
                        qdtu     "STRAWBE" | "CUSTARD" | """PAUL'S"

    Limitations         1- No parameter validation
    -----------         2- Delimiter string max length = 32 characters 
                        3- Quote string max length = 32 characters
                        4- The function does not recognise lengths shorter than 8 and sets the length to 8
                           This only matters if trimming is not requested
                        5- Only strings can be concatenated, numeric variables cannot be used
                        6- All strings must have the same length
                        7- Strings must be built into an array before using the function
                        8- There is no way to use the formats associated with the variables, only a global format 
                         
    Author              Christian Graffeuille
    ------

    Changes             
    -------

***********************************************************************************************/

proc fcmp outlib=WORK.MYFUNCS.STR;
  function concat(
        DLM     $ /* Delimiter string to insert between concatenated strings                 */
                  /*   The delimiter string is only used if option D is found                */
      , QUOTE   $ /* Character(s) used to surround concatenated strings                      */
                  /*   Single and double quotes can be used                                  */
                  /*   Brackets ( [ { < > } ] )can be used. In this case:                    */
                  /*     Opening brackets ( [ { < are inserted at the start                  */
                  /*     Closing brackets > } ] ) are inserted at the end                    */
                  /*   Multiple characters can be used, except for single and double quotes  */
                  /*   The quoting string is only used if option Q is found                  */
      , FORMAT  $ /* Name of the character format to apply                                   */
                  /*   The format is only used if option F is found                          */
      , OPTIONS $ /* Options driving concatenation. These letters are recognised:            */
                  /*   Q Use quoting characters(s) if provided                               */
                  /*   F Use format if provided                                              */
                  /*   D Use delimiter in concatenated string (space if no DLM provided)     */
                  /*   M Include missing string in concatenated string                       */
                  /*   T Trim string before adding to concatenated string                    */
                  /*   S Strip string before adding to concatenated string                   */
                  /*   C Compbl string before adding to concatenated string                  */
                  /*   U Upcase string before adding to concatenated string                  */
                  /*   L Lowcase string before adding to concatenated string                 */
                  /*   P Propcase string before adding to concatenated string                */
                  /*   N Clean non-printable characters before adding to concatenated string */
      , A[*]    $ /* Character array containing the strings to concatenate                   */
      )
      $ 32767;

    %* Init ;
    length RESULT S $327 Q1 Q2 OPT $32 L LF FIRSTOUT DO_S DO_T 8 ;
    RESULT= 'a';                         
    OPT   = upcase(OPTIONS);                
    DO_S  = index(OPT,'S'); 
    DO_T  = index(OPT,'T'); 
    %* Get variable length. Sets the length to no less than 8 ;
    L=length(A[1]||'a')-1;         

    %* Set surrounding (quoting) characters ;                                              
    if index(OPT,'Q') & lengthn(QUOTE ) then do;
      if      QUOTE in:('(',')') then do; Q1=repeat('(',length(QUOTE)-1); Q2=repeat(')',length(QUOTE)-1); end;
      else if QUOTE in:('{','}') then do; Q1=repeat('{',length(QUOTE)-1); Q2=repeat('}',length(QUOTE)-1); end;
      else if QUOTE in:('[',']') then do; Q1=repeat('[',length(QUOTE)-1); Q2=repeat(']',length(QUOTE)-1); end;
      else if QUOTE in:('<','>') then do; Q1=repeat('<',length(QUOTE)-1); Q2=repeat('>',length(QUOTE)-1); end;
      else if index(QUOTE,'"')   then Q1='"'; 
      else if index(QUOTE,"'")   then Q1="'"; 
      else do; Q1=QUOTE ; Q2=QUOTE ; end;  
    end;                                                                

    %* Loop through the values and append;
    do I = 1 to dim(A);
      %* If value is missing then skip unless option M is present;
      if lengthn(A[I]) | index(OPT,'M') then do;    
        %* Apply value clean-up options; 
        if      index(OPT,'C') then S=compbl(trim(A[I])); else S=A[I];   
        if      index(OPT,'U') then S=upcase  (trim(S)); 
        else if index(OPT,'L') then S=lowcase (trim(S));
        else if index(OPT,'P') then S=propcase(trim(S));
        %* Format value if needed;
        if index(OPT,'F') & FORMAT ne ' ' then do; 
          S =putc(substrn(S,1,L),FORMAT)||'a';          
          LF=length(S)-1;                                           
          S =substrn(S,1,LF);
        end;                                            
        %* Store value length;
        LF=max(L,LF);

        if index(OPT,'D') & FIRSTOUT then RESULT = substrn(RESULT,1,length(RESULT)-1)||DLM||'a';                   %* Add delimiter if previous value found ;

        %* Prepare value;            
        if Q1 in ('"',"'") then do;                                                                                %* Use quote function              ;
          if      DO_S then RESULT=substrn(RESULT,1,length(RESULT)-1)||quote(strip(S)       ,Q1)||'a';             %* Strip or trim if needed         ;         
          else if DO_T then RESULT=substrn(RESULT,1,length(RESULT)-1)||quote(trim (S)       ,Q1)||'a';
          else              RESULT=substrn(RESULT,1,length(RESULT)-1)||quote(substrn(S,1,LF),Q1)||'a';  
        end;
        else do;                                                                                                   %* Add quote character(s)          ;    
          if      DO_S then RESULT=substrn(RESULT,1,length(RESULT)-1)||trimn(Q1)||strip(S)       ||trimn(Q2)||'a'; %* Strip or trim if needed         ;
          else if DO_T then RESULT=substrn(RESULT,1,length(RESULT)-1)||trimn(Q1)||trim (S)       ||trimn(Q2)||'a';
          else              RESULT=substrn(RESULT,1,length(RESULT)-1)||trimn(Q1)||substrn(S,1,LF)||trimn(Q2)||'a';

        end;
        FIRSTOUT=1;                                         %* Flag used in case the 1st values=' ' & a delimiter is requested ;
      end;
    end;
                    
    return (substr(RESULT,1,length(RESULT)-1));                                      
  endsub;

run;

 

 

 

 

Comments

Can you provide a couple of examples of how this is used?

Well, that led through an interesting excursion in the help file.  I had either forgotten about or never knew about the CATQ function and the family of "Excel" functions whose names end with "_slk" (I assume the suffix comes from the late lamented PROC SYLK).

 

I started looking at function help because I did remember the existence of the VVALUE function, but not all of the details.  I was thinking that would be useful for formatting values, but it turns out that VVALUE works only inside a data step, or at least it is documented to work only in a data step.  But you could use VVALUE to good effect with your function in a data step.

 

I would propose a different syntax for the new function, where you provide arguments in pairs of (value, value-options); you could provide separate options for every input value, which would allow the values to have different formats, quote some arguments but not others, and so forth.  So the function would be something like 

 

     result = catall(delimiter, value1, options1, value2, options2, ...);

 

and an example usage would be 

 

mynewvalue = catall(delimiter, id_num, 'FS',
birth_date, 'f=date11.',                                 name, 'q');

You're right, this would probably have to be provided as a built-in function by SAS, because some of the parameters could be numeric, and because to be most useful it would have to be callable from PROC SQL (which doesn't accept array arguments).  

 

Hmm.  PROC SQL seems to have a length limit of 200 characters for the value returned by some but not all functions, which makes many character functions less useful than they might be.  I wonder why that might be, or if there is a workaround.

 

%let longval = %sysfunc(repeat(*, 400));
%let longlen = %length(&LONGVAL.);
%put &=LONGLEN;

proc sql;
    create table temp 
        (
        longval1 char(401), 
        longval2 char(401) ,
        longval3 char(401) ,
        name char(10)
        );

    insert into temp 
        select
            repeat('*', 400) length=401 ,
            "&LONGVAL." length=401, 
            upcase("&LONGVAL.") length=401 ,
            name 
        from 
            sashelp.class
        where 
            name='Alfred';    
    select
        (length(longval1)) as longlen1 ,  /* 200 */
        (length(longval2)) as longlen2 ,  /* 401 */
        (length(longval3)) as longlen3    /* 401 */
    from
        temp;
quit;

 

 

 

 

 

 

@PaigeMiller There is a data step example in the file header, as well as the value output for each set of input values.

data TEST;
  array ARR[6] $7 ( ' '  'strawberry'  ' '   'custard   apple'  '"paul''s" berry '  ' ');
  length S $ 200;
  OPT='t     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
  OPT='d     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
  OPT='f     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
  OPT='q     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
  OPT='m     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
  OPT='u     '; S = concat('|'  , '{'  , '$revers20.', OPT, ARR); output;
  OPT='ft    '; S = concat('-|-', ']]]', '$revers20.', OPT, ARR); output;
  OPT='tfd   '; S = concat('-|-', ']]]', '$hex20.   ', OPT, ARR); output;
  OPT='tdfqmu'; S = concat('-|-', ']]]', '$hex20.   ', OPT, ARR); output;
  OPT='tdfqmp'; S = concat(' | ', '{'  , '$8.       ', OPT, ARR); output;
  OPT='qdtu  '; S = concat(' | ', '"'  , '$8.       ', OPT, ARR); output;  
run;
t        strawbecustard"paul's
d        strawbe |custard |"paul's
f        ebwarts             dratsuc             s'luap"
q        {strawbe }{custard }{"paul's }
m                strawbe         custard "paul's
u        STRAWBE CUSTARD "PAUL'S
ft       ebwartsdratsucs'luap"
tfd      7374726177626520-|-6375737461726420-|-227061756C277320
tdfqmu   [[[2020202020202020]]]-|-[[[5354524157424520]]]-|-[[[2020202020202020]]]-|-[[[4355535441524420]]]-|-[[[225041554C275320]]]-|-[[[2020202020202020]]]
tdfqmp   { } | {Strawbe} | { } | {Custard} | {"paul's} | { }
qdtu     "STRAWBE" | "CUSTARD" | """PAUL'S"

 

 

@JackHamilton Function VVALUE is indeed only accessible from a data step. While this function would be useful in SQL, it is not really meaningful in FCMP as the FCMP parameters are disconnected from the data step variables. So the idea of using the formats associated with the variables is reserved for an Institute-supplied function.

Regarding the variable-value pairs, it's not possible in the way described with the current limitation of FCMP as optional parameters are extremely sadly disallowed. I reckon forcing all variables to the same length and putting them in an array, as my function does, is painful enough; interleaving values and options in that array is a step too far, or at least should be a separate, even more flexible and syntaxically convoluted function.

As for the lengths, I am with you too. It is very disappointing that SQL misbehaves in this manner, and I would call this a defect.

Even forcing the length with the put function fails

 

    insert into temp 
        select put(repeat('*', 400),$401.) 
             , "&LONGVAL." 
             , upcase("&LONGVAL.") 
             , name 
        from sashelp.class(obs=1);    

SAS functions behave erratically in this respect, and even in a data step surprises abound.

data T;
 s1=repeat('aa',55);          * 200;
 s2=upcase('aa');             * 2; 
 s3=reverse('aa');            * 2; 
 s4=cat('aa');                * 200;
 s5=put('aa',$char.);         * 2; 
 s6=putc('aa','$char.');      * 2; 
 s7=prxchange('s/./b/',1,'a');* 200;
 s8=substr('aa',1,1);         * 2; 
 s9=compbl('aa');             * 2; 
 s10=compress('aa');          * 2; 
 s11=strip('aa');             * 2; 
 s12=trim('aa');              * 2; 
 s13=left('aa');              * 2; 
 s13=coalescec('aa','bbb');   * 2; 
 s14=choosec(1,'aa','bbb');   * 200;
 s15=dequote('aa');           * 2; 
 s16=ifc(1,'aa','bbb');       * 200;
 s17=nliteral('aa');          * 200;
 s18=trimn('aa');             * 2; 
 s19=substrn('aa',1,1);       * 2; 
 s20=translate('aa','a','b'); * 2; 
 s21=transtrn('aa','a','b');  * 200;
 s22=tranwrd('aa','a','b');   * 200;
 s23=quote('aa');             * 200;
 s24=scan('aa',1);            * 2; 
 s25=subpad('aa',1,1);        * 200;
 s26=nliteral('aa');          * 200;
run;

The text in bold is a nudge to SAS Institute for action..

 

 

As I don't usually use PROC FCMP, isn't there some command you need to run before you can use CONCAT as a function?

@PaigeMiller 

If the function is saved as in the example

proc fcmp outlib=WORK.MYFUNCS.STR; 

You need to set this option

options cmplib = WORK.MYFUNCS;  

which acts a bit like option FMTSEARCH, for for compiled routines. 

Or perhaps, just in case something is already using functions,

 

options append=cmplib work.myfuncs ;

Not very high on my list, but I wouldn't mind a REMOVE= option to remove an existing object from cmplib or ftmsearch or whatever else can use append or insert.

Version history
Last update:
‎10-07-2021 06:09 AM
Updated by:
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags