BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Varrelle
Quartz | Level 8

Hello SAS Community:

 

I have a dataset with a character variable, I'll call it "char_var".  If I explore one of "char_var"'s observations, the value for "char_var" appears as 'hello', and when I display it with PROC PRINT, it looks just like 'hello' with no trailing or leading blanks.  When I apply the LENGTH function to the "char_var" variable however and assign that result to a variable I named "len", the value of "len" is 10.  Even when I apply the TRIM and STRIP functions, separately each to "char_var", first, before applying the LENGTH function, and assigning the results to separate variables "len_trim", and "len_strip", respectively, the value stored in each of these variables is also 10.

 

I guess there is some kind of non-printing character that is not getting removed? The reason I need to get just the 'printable' characters isolated is because the "char_var" column is supposed to serve as a "foreign key"  for another table (data set), and since I can't reduce the value of the "char_var" down to the length of just the printable characters, which would be 5, I can't match the value correctly with the 'hello' value in the joining table.

 

I tried using the NOTPRINT and INDEX, FIND functions as well to see if I could locate these 'invisible' characters, but I didn't have any success.

 

Any ideas on how I can strip these extra characters from the "char_var" variables' value to reduce its length to 5 (the length of the printable characters)? Or perhaps another strategy to be able to link the tables on this variable, despite the mismatching lengths?

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you have determined that your character strings include 'A0'x characters then just convert them to spaces.

var = translate(var,' ','A0'x);

View solution in original post

11 REPLIES 11
Tom
Super User Tom
Super User

Are you running with double byte character set?  That would require 10 bytes to store 5 characters.

 

Otherwise look at the actual content of the variable using the $HEX format.

 

Most likely the invisible character in position 10 is one of

Null '00'x

Tab '09'x

Linefeed '0A'x

Carriage Return '0D'x

Non-breaking space 'A0'x

Varrelle
Quartz | Level 8
Hi @Tom

Thanks very much -- it turns out when I display the content of the "char_var" variable with the $HEXw. format, the string is being padded with the 'A0'x character. I'm not sure that I've adjusted the byte-configuration of my character set to double-bytes.

Is the padding of my string 'hello' with the 'Non-breaking space' character the reason it won't match the 'hello' value in my joining table in PROC SQL?
Tom
Super User Tom
Super User

If you have determined that your character strings include 'A0'x characters then just convert them to spaces.

var = translate(var,' ','A0'x);
s_lassen
Meteorite | Level 14

How to get rid of invisible control characters: use the COMPRESS function with 'C' as third parameter:

char_var=compress(char_var,,'C');
Tom
Super User Tom
Super User

@s_lassen wrote:

How to get rid of invisible control characters: use the COMPRESS function with 'C' as third parameter:

char_var=compress(char_var,,'C');

That will take care of TAB, LF and CR (and all characters that are smaller than a space) but it does not remove the non-breaking space character 'A0'x.

char_var=compress(translate(char_var,' ','A0'x),,'C');
Varrelle
Quartz | Level 8

Hi@s_lassen and  @Tom 

 

Thanks very much for pointing me to the COMPRESS function.  I think it will solve my problem.

 

Cheers!

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Varrelle 

 

In SAS, a character variable has a defined length, and if it contains a text that is shorter than the variable length, the remaining length is filled with spaces.

 

The trim function removes leading blanks, and the strip function removes both leading and trailing blanks, so to the returned value from the function doesn't have trailing blanks, but if the returned value is assigned the original variable, as in "var = strip(var)", the function result is written back in the original variable, which still has the same length, so it will be padded with blanks up to that length.

 

In most cases it doesn't matter (apart from waste of disk space), because comparisons are made from start until last non-blank character, so in a where-clause "ABC" is equal to "ABC      ".

 

But in find-functions it does matter, because (example)  "ABC        " cannot be found in the string "123 ABC XYZ", while trim("ABC    ") can. 

 

 

 

Varrelle
Quartz | Level 8

Hi @ErikLund_Jensen :

 

Thanks for the nice explanation.

 

I applied the TRIM and STRIP functions in the where clause of my PROC SQL call to match the 'char_var' variable's value with that in a joining table, but it wasn't returning a match.  It looks like the string was being padded with the 'A0'x character which perhaps TRIM and STRIP cannot handle?

 

Not really sure.

 

ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Varrelle 

 

Trim and strip remove only spaces (20x). Other characters (visible or not) can be removed with the compress function,

 

The function has a lot of options. E.g. compress(string,' ','nk') looks for spaces (specified as a character constant in parm 1), and it also looks for digits, the underscore character, and English letters (specified with the modifier "n" in parm 2), and the "k" modifier reverses the function, so these characters are kept instead of removed. Have a look in the manual

https://documentation.sas.com/doc/en/vdmmlcdc/1.0/lefunctionsref/n0fcshr0ir3h73n1b845c4aq58hz.htm

 

Note that all spaces are kept, including leading blanks, so a strip (or left) function on the result is still recommended.

Varrelle
Quartz | Level 8

Hi @ErikLund_Jensen 

 

This COMPRESS function is quite powerful -- thanks for sharing the link to the docs and for your explanation of some key use cases.  really neat -- thanks!

 

Best wishes,

Veeral

kilronan
SAS Employee

Hi - I had a similar problem. Some other environment (maybe from an Excel import to SAS) had created some kind of padding some places after a character string in my data set. I tried everything mentioned but couldn't get TRIM, STRIP etc to work. Also, I didn't know what kind of control characters they were - they didn't display. In the end this solution worked (where fixed_string is a character value and problem_string is the same value followed by unknown non-characters):

 

fixed_string=upcase(reverse(substrn(reverse(problem_string),anyalpha(reverse(problem_string)))));

 

It's a bit convoluted but did the trick.

Ronan

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 11677 views
  • 10 likes
  • 5 in conversation