BookmarkSubscribeRSS Feed
David_Billa
Rhodochrosite | Level 12

I've used the format $hex50. to identify any special characters or invisible characters in the string. Value for the string is coming from .xls. Assume string value is "Urology". When I executed the program with $hex50. with the value "Urology", I got following message in the log. How to interpret this code to see if/what special characters in the string? 

 

55726F6C6F67790D0A0D0A

Following code is removing the spaces between the strings. I want to remove special characters (spaces, CR, LF) before and after the string and NOT between the strings

 

compress(string,'090A0C0D20A0'x))

 

19 REPLIES 19
Tom
Super User Tom
Super User

Look up the ASCII codes.  Or just play around yourself with the using the $HEX format on single characters.

 

You have :

55

72

6F

6C

6F

67

79

0D

0A

0D

0A

 

The last four are two pairs of Carriage Return and Line Feed characters.  

You probably do NOT want those in your actual text data.  You could use COMPRESS() to remove them

string = compress(string,'0A0D'x);

Or perhaps use TRANSLATE() to replace them with spaces

string = translate(string,'  ','0A0D'x);
David_Billa
Rhodochrosite | Level 12

@Tom Thank you for the information. I want to remove the special characters  (spaces, CR, LF) before and after the string and NOT between the string. Currently following line is removing the spaces between the string which I don't want to do. 

 

compress(string,'090A0C0D20A0'x))

E.g. If the string is 'Banking and Insurance' and I don't want to make it as  'BankingandInsurance'  by removing the spaces between the string.

David_Billa
Rhodochrosite | Level 12

 I want to remove the special characters  (spaces, CR, LF) before and after the string and NOT between the string. Currently following line is removing the spaces between the string which I don't want to do. 

 

compress(string,'090A0C0D20A0'x))

E.g. If the string is 'Banking and Insurance' and I don't want to make it as  'BankingandInsurance'  by removing the spaces between the string.

Tom
Super User Tom
Super User

@David_Billa wrote:

 I want to remove the special characters  (spaces, CR, LF) before and after the string and NOT between the string. Currently following line is removing the spaces between the string which I don't want to do. 

 

compress(string,'090A0C0D20A0'x))

E.g. If the string is 'Banking and Insurance' and I don't want to make it as  'BankingandInsurance'  by removing the spaces between the string.


If you don't want to remove the spaces then take the 20 out of your COMPRESS() function all.

 

Usually I use translate to replace the goofy characters with spaces and use COMPBL() to reduce them to one per break.  You can use LEFT() to remove the leading spaces.  

left(compbl(translate(string,' ','090A0C0DA0'x)))

No need to use TRIM() , or STRIP(), to remove the trialing spaces because they will just get added back when you store the result into a variable.

David_Billa
Rhodochrosite | Level 12
I want to remove all the special/invisible characters including spaces
before and after the string and NOT between the string.
Reeza
Super User
What's a special character? Hyphen? Exclamation mark? This is why you need to list the characters of interest.
Tom
Super User Tom
Super User

@David_Billa wrote:
I want to remove all the special/invisible characters including spaces
before and after the string and NOT between the string.

If you want to use COMPRESS() to remove characters you have to tell it exactly what characters to remove.  So if you don't want it to remove spaces then don't tell it to remove them.

 

If it is easier you can use the K modifier and instead tell it which characters to keep.  For example if you only want to keep the normal ASCII characters just use something like:

new_string = compress(old_string,collate(32,126),'k');

32 is 20 in hex which is a space.  126 is 7E in hex which is a ~.  All of the chracters from space to ~ are in printable standard 7-bit ASCII characters. Everything else is either a control character like CR or LF or some strange non-ASCII character like accented letters or strange symbols.

 

If you are having trouble figuring out what characters you have that is what the $HEX format can help with.

 

The STRIP() function will remove leading and trailing SPACES.  It does not remove any other characters.  As I explained before there is no need to remove the trailing spaces since they just come back when you assign the value to a variable. SAS uses FIXED LENGTH character variables.  So they are always padded with spaces to fill the full storage length.  So LEFT() is enough to remove the leading spaces.

 

Not sure what you mean by "between strings".  To you mean between the words in the current character variable?  That is why I suggested using the COMPBL() function.  That will collapse multiple adjacent spaces into one space, but does not remove the space between the words.

 

Or are you planning to combine a series of character variables together?  If so then explain how you are going to combine them to help on preventing extra spaces being inserted.

Reeza
Super User
I believe 20 is the typical space characters. Why did you add all the whitespace characters to your remove list?
David_Billa
Rhodochrosite | Level 12
I'm not certain how to remove white spaces and special characters before
and after the strings. Any help?
Reeza
Super User
Assuming trim/strip have failed?
David_Billa
Rhodochrosite | Level 12
Right, trim has failed and hence I moved to compress.
Reeza
Super User

SAS automatically addes 20 (spaces to fill out a length of the variable) so that's fine, but otherwise, COMPRESS() seems to work.

 

 
data have;
input value $hex50.;
format value $hex50.;
cards;
55726F6C6F67790D0A0D0A
;
run;

proc print data=have;
run;
proc print data=have;
format value $50.;
run;

data want;
set have;
value2=strip(value);
value3=trim(value);
value4 = compress(value, '0D0A'x);

format value2 value3 value4 $hex50.;

run;

proc print data=want;
run;
David_Billa
Rhodochrosite | Level 12
Thank you for the instruction. Whether 0D0A in compress function removes
special/invisible characters as well along with whitespaces before and
after the strings?

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 19 replies
  • 1826 views
  • 8 likes
  • 5 in conversation