Dear community,
I want to drop the leading zeros in a string which looks like the below dataset. I am trying to match these numbers in another table (which has the same numbers but no leading zeros). Could you please help me do this. I tried this way, but all zero numbers are throwing error in my sas. Could you please help me. Thanks !!
data leadingzero; input number $; new_num1=substr(number, indexc(number,'123456789')); cards; 0123 123 124-487 007_0k 000033y 00000 ; run;
The log for error is
1 data leadingzero;
2 input number $;
3 new_num1=substr(number, indexc(number,'123456789'));
4 cards;
NOTE: Invalid second argument to function SUBSTR at line 3 column 10.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+
10 00000
number=00000 new_num1= _ERROR_=1 _N_=6
NOTE: The data set WORK.LEADINGZERO has 6 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.32 seconds
cpu time 0.04 seconds
Another solution would be:
new_num2=prxChange("s/^0+//", 1, number);
Depends what you want for those cases. One solution is:
new_num1=substr(number, max(1, indexc(number,'123456789')));
Another solution would be:
new_num2=prxChange("s/^0+//", 1, number);
Or maybe:
new_num3=prxChange("s/^0+(\S)/\1/", 1, number);
Use VERIFY
to determine position of first leading character that is not a 0
.
string = substr(string, VERIFY(string,'0'));
I assume that what you want to do when the number part is all zeros is not to return an empty string, but to have a single zero, as that is the normal numeric representation of the number zero.
In which case I assume you want '00x' to become '0x' and '000000' to become '0'.
PRX expressions seem a good way of accomplishing that: What you want to do is to delete any zeros that come before a digit (including zero):
new_num1=prxChange('s/^0*(?=\d)//', 1, number);
The PRX expression:
"^" means the beginning of the input string.
"0*" is any number of zeros.
"(?=\d)" is a look-ahead assertion, saying that after the string found there must be a single digit (including a zero).
Yes, all zeros throw an error because indexc(number,'123456789') returns 0, but 0 cannot be a value for the second argument of SUBSTR. Here is a solution that should work:
data leadingzero;
input number $;
new_num1 = substr(number,findc(number,'0','k'));
cards;
0123
123
124-487
007_0k
000033y
00000
;
It creates the following data table:
Hope, this helps.
Also, I would recommend not to rely on the default column length and assign it explicitly:
data leadingzero;
length number $10;
input number $;
new_num1 = substr(number,findc(number,'0','k'));
cards;
0123
123
124-487
007_0k
000033y
00000
;
Otherwise, you may run in a situation of truncated variable values. The length must be large enough to accommodate all zeros plus one.
But even more robust solution would be this:
data leadingzero;
length number $10;
input number $;
cards;
0123
123
124-487
007_0k
000033y
0000000000
;
data noleadingzero (drop=pos);
set leadingzero;
length new_num1 $10;
pos = findc(number,'0','k');
if pos=0 then new_num1='';
else new_num1=substr(number,pos);
run;
This way, your stripping leading zeros will not depend on possible errors in your input data set: you create your input data set first, make sure it is correct, and then use it in the next data step which takes care of stripping the leading zeros.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.