BookmarkSubscribeRSS Feed
tomcmacdonald
Quartz | Level 8

From my understanding these three statements should be equivalent

 

 

CATX('', COL1, COL2, COL3)
TRIM(COL1) || TRIM(COL2) || TRIM(COL3)
PRXCHANGE('s/\s+//', -1, COL1) || PRXCHANGE('s/\s+//', -1, COL2) || PRXCHANGE('s/\s+//', -1, COL3)

 

COL1, COL2, and COL3 are characters.

 

I check for whitespace with:

 

PRXCHANGE('s/\s/X/', -1, CONCATENATED_COLUMN) 

 

1 and 3 produce equivalent columns but 2 does not.  All three produce trailing whitespace.  I would like to check this column for characters like:

 

CASE WHEN PRXMATCH('/^0+$/', CONCATENATED_COLUMN) THEN 1 ELSE 0 END AS FOO

But can't because of the whitespace that still appears.  Must I always account for whitespace whenever I do any operation like this?  Something like this:

 

 

CASE WHEN PRXMATCH('/^0+(?:\s*)$/', CONCATENATED_COLUMN) THEN 1 ELSE 0 END AS FOO

Or:

 

JOIN ON PRXCHANGE('s/\s+//', -1, TABLE_A.ID) = PRXCHANGE('s/\s+//', -1, TABLE_B.ID)

Because I can never be sure CATX or TRIM did their job? 

11 REPLIES 11
Reeza
Super User

Replace TRIM with STRIP and what does that do?

tomcmacdonald
Quartz | Level 8

Same result as TRIM. 

Cynthia_sas
SAS Super FREQ

Hi: I don't understand why you are even using CATX. That function assumes you want to introduce some character between every argument. Personally, if you want there NOT to be ANY whitespace or blank between the arguments, I would use CATT instead of CATX.

 

no_blank.png

Cynthia

tomcmacdonald
Quartz | Level 8

CATT eliminates only trailing whitespace according to:

http://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=n16y1vyi397p84n19dpvzw0pemsa.htm&...

CATX eliminates trailing and leading whitespace according to:

http://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=n0p7wxtk0hvn83n1pveisbcp2ae9.htm&...

That was my rationale for choosing CATX with a blank delimiter. 

 

My tentative solution to this is:

 

PRXCHANGE('s/\s+//', -1, (COL1 || COL2 || COL3))
Tom
Super User Tom
Super User

You told CATX() to insert a space between the values.  If you just place two quotes next to each other then you are telling SAS you want a blank string of length one.  You can generate an empty string using the TRIMN() function.

 

Don't use a function, catx(), designed to insert something between the values when you don't actually want something inserted. Use CATT() or CATS().

 

Also what does \s mean in regular expressions?  Is it only spaces? or does it include other "white space"?

 

 

tomcmacdonald
Quartz | Level 8

A 0-length character does not exist in SAS?  '' actually means ' '? 

 

According to the PERL regular expression manual:

\s means the five characters [ \f\n\r\t]

 

https://perldoc.perl.org/perlre.html

 

Tom
Super User Tom
Super User

For normal operations what would you do with a zero length character string?   If you assign it to an actual variable it will result in at least one space anyway.   SAS stores character variables as fixed length and fills (pads) with spaces.  It ignores trailing spaces when comparing strings.

 

If you want to eliminate trailing/leading characters other than spaces then you will need to use regex.

 

Or convert them to spaces and then use strip().

data _null_;
  length var1 var2 $10 ;
  var1='XYZ';
  var2=cat(' ','09'x,'XYZ','09'x,' ','A0'x);
  put (2*var1 2*var2) (=$quote. / =$hex./);
  if var1 ne var2 then put 'NOT EQUAL';
  if var1  = strip(translate(var2,' ','090A0D00A0'x)) then put 'EQUAL';
run;

 

 

 

 

 

 

 

Tom
Super User Tom
Super User

The first 6 look like normal control characters. Not sure what the others are used for.

The set of characters that are deemed whitespace are those that Unicode calls "Pattern White Space", namely:

 

  1. U+0009 CHARACTER TABULATION
  2. U+000A LINE FEED
  3. U+000B LINE TABULATION
  4. U+000C FORM FEED
  5. U+000D CARRIAGE RETURN
  6. U+0020 SPACE
  7. U+0085 NEXT LINE
  8. U+200E LEFT-TO-RIGHT MARK
  9. U+200F RIGHT-TO-LEFT MARK
  10. U+2028 LINE SEPARATOR
  11. U+2029 PARAGRAPH SEPARATOR

 

 

ballardw
Super User

It never hurts to provide a concrete example involving actual values and what the result is supposed to be at the end.

Showing what does not work does not really tell us what you want to accomplish.

 


@tomcmacdonald wrote:

From my understanding these three statements should be equivalent

 

 


CATX('', COL1, COL2, COL3)
TRIM(COL1) || TRIM(COL2) || TRIM(COL3)
PRXCHANGE('s/\s+//', -1, COL1) || PRXCHANGE('s/\s+//', -1, COL2) || PRXCHANGE('s/\s+//', -1, COL3)

What makes you think they should be equivalent?

For one thing you can get different default lengths.

From the documentation in CATX:

In a DATA step, if the CATX function returns a value to a variable that has not previously been assigned a length, then that variable is given a length of 200 bytes. If the concatenation operator (||) returns a value to a variable that has not previously been assigned a length, then that variable is given a length that is the sum of the lengths of the values that are being concatenated.

Throw Prxchange in with || and I am not going to guess what the overall default length for the result would be.

Strip or Trim(left(var)) would be much closer to cats.

Trim removes trailing blanks and returns one blank if the string is missing. So not the equivalent of using Cats or catx in either form.

data example;
   a='a ';
   b='  ';
   c=' c';
   abc = trim(a)||trim(b)||trim(c);
   hij = strip(a)||strip(b)||strip(c);
   xyz = cats(a,b,c);
   pdq = catx('',a,b,c);
run;
Cynthia_sas
SAS Super FREQ

Well, if you are worried about leading blanks, then use CATS...or, just do what we used to do before the CAT family of functions -- the original way is shown below --  using TRIM(LEFT(varname)) with the || operator or just using LEFT with CATT:

no_blank2.png

 

Cynthia

jim_cai
Calcite | Level 5

Any two of the three expressions are not equivalent to each other.

 

The CATX function is equivalent to strip(COL1)||" "||strip(COL2)||" "||strip(COL3), provided that a single blank character was defined to be the delimiter in the original example.

 

There are no functions equivalent to the second expression, although the closest one whould be CATT function, which is actually equivalent to trimn(COL1)||trimn(COL2)||trimn(COL3).

 

The third expression is equivalent to compress(COL1)||compress(COL2)||compress(COL3), resulting that all the space characters are removed, not just leading and trailing blanks.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 2389 views
  • 4 likes
  • 6 in conversation