The character variabel CHARVAR have values mostly like:
0178 Adam
0245 Barbara
But there are some values with some leading blanks like:
^^0791 Cathy
^^0437 David
I want to get rid of the blanks.
STRIP(CHARVAR) and LEFT(CHARVAR) doesn´t work.
Is there any special problem with these functions, when the first character after the blanks is a zero?
The data are imported from Excel. I am using SAS Enterprise Guide.6.1.
No, there is no such problem. Are you sure you're dealing with blanks or could it be that these are other non-printable white-space characters?
If so then the most efficient way I can think of to get rid of any leading white-space characters would be using a regular expression as done in below code for "want2".
data sample;
length have $20;
have='09'x||'0791 Cathy';
want1=strip(have);
want2=prxchange('s/^\s*//o',1,have);
put have= $hex.;
put want1= $hex.;
put want2= $hex.;
run;
Message was edited by: Patrick Matter fixed the hex format
Thanks for your answer Patrick.
I have some problems interpreting 's/^\s*//o'. Can you elaborate on that, please.
And shouldn't the "1" in prxchange('s/^\s*//o',1,have) be "-1" if the number of non-printable characters are unknown?
I have no idea of what the possible non-printable characters could be. I used "^" above to indicate blanks.
Use "put <your variable>= $hex.;" in your code. This will show you in the log the hex codes so you can find out what these white-space characters really are.
As for the RegEx ^\s*
^ at the beginning of the string
\s any white-space character
* zero, one or many
And because this RegEx can match only once at the beginning of the string, a '1' should be sufficient.
Message was edited by: Patrick Matter fixed the hex format
You wrote: "Use "put <your variable>= hex.;" in your code.".
This gives:
^^0791 Cathy .
^^0437 David .
in the log, and here my ^^ stands for blanks.
This should indicate that the white-space characters really are blanks. It is very strange then, that the strip function doesn't work.
Where the dots come from I don't know.
The DOT come from you leaving off the period after the format HEX in your PUT statement. It should be $HEX. and you may want to include W as in $HEX8.;
attjooo wrote:
Where the dots come from I don't know.
You should have values like this is you really used the $HEX. format. Only the '2020' are spaces. The others are other binary codes that might look like spaces when printed.
94 data _null_;
95 x=' 0791 Cathy';
96 put x= $hex. ;
97 x='0000'x || '0791 Cathy';
98 put x= $hex.;
99 x='A020'x || '0791 Cathy';
100 put x= $hex.;
101 x='0909'x || '0791 Cathy';
102 put x= $hex.;
103
104 run;
x=202030373931204361746879
x=000030373931204361746879
x=A02030373931204361746879
x=090930373931204361746879
My data above was fake data.
Using $hex. the problematic variable values all starts with
A0203036
and then 34 more digits.
How should I interprete this?
¨'
'A0'x is character that some (particularly Microsoft products) use as a "non-breaking" space. Convert them to a real space and then STRIP() or LEFT() should work for you.
CHARVAR = left(translate(CHARVAR,' ','A0'x));
Thank you Tom. My problem is now solved.
Then please mark the most suitable answer as "correct" and answers which were helpful as "helpful".
I was looking for a button marked "Correct" to press, but I couldn't find any.
If the legitimate characters always begin with a "0", then you shouldn't have to figure out what the extra character is. You could try:
length first $ 1;
first = charvar;
if first ne '0' then charvar = comrpess(charvar, first);
If the first legitimate character might be a nonzero, the problem gets more complex but COMPRESS can probably handle it. There is a third parameter for COMPRESS, for example, that would let you keep all letters and digits and discard everything else.
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.