DATA Step, Macro, Functions and more

Converting text to numeric format issues

Accepted Solution Solved
Reply
Super Contributor
Posts: 395
Accepted Solution

Converting text to numeric format issues

 Hi, I have an account number that is text, and I want to conver it to number format.  It is in this format ($17.) and informat ($17.) and it looks like this  6.8347285748+014 , if I use the code below it converts it to numeric however it looks like this 683472857480000, and it makes the last 4 digits 0's.. Is there a way to fix this and not have the 4 0's, but the actual real digits?  Thanks.

 

New_acct = input(acct_id,17.);
format New_acct 17.;


Accepted Solutions
Solution
‎05-07-2016 03:05 PM
Trusted Advisor
Posts: 1,115

Re: Converting text to numeric format issues

Hi @podarum,

 

It's true that SAS can "safely" store integers in numeric variables only up to a certain size. (But this does not mean that it cannot deal at all with larger integers.) This maximum size is platform-dependent. For Windows/Unix systems the limit for a numeric variable of length 8 bytes (i.e. the maximum and default length) is 2**53 = 9,007,199,254,740,992, i.e. 9.007...E15 (see the table in @Reeza's linked document).

 

The number you mention is clearly smaller than that (by a factor of approx. 13). So, SAS would have no issues with this particular number. (Numbers with 17 digits, as could be contained in your Char-17 variable, would be a different story, though.)

 

However, if I understand your post correctly, the character value ACCT_ID "looks like ... 6.8347285748+014". Maybe this is a typo and you missed the "E" before "+014", because otherwise the INPUT function with informat 17. would result in a missing value and an "Invalid argument" note in the log.

 

In any case, if the character value contains scientific notation such as "...E+14" (or "...E+014" for that matter), your alarm bells should ring, as this would indicate potential (and even probable) loss of information at the time when the exact integer, say, 683472857475432, was converted to scientific notation. This may very well have happened outside SAS, before variable ACCT_ID was even created.

 

The permanent format and informat specification $17. of ACCT_ID indicate automated raw data import (e.g. using PROC IMPORT; was Excel involved??). The raw data value must have been already in scientific notation if it was read as such with informat $17.!

 

So, you have no chance to retrieve the "actual real digits" from a character string in ACCT_ID which doesn't contain them. Instead, you have to trace the value back and prevent whatever application it was from "displaying" the account number in scientific notation (e.g. format Excel cells as "Text"). After that you'll have to repeat the data import. If the account numbers have only 15 digits (like 683472857475432) and no (informative) leading zeros, you can safely store them in a numeric variable of length 8 (if you like), but I wouldn't recommend this. (The scientific notation might then crop up again ...)

View solution in original post


All Replies
Super User
Posts: 17,836

Re: Converting text to numeric format issues

Unfortunately SAS cannot deal with numbers that large Smiley Sad

 

It has to do with numerical precision and how computers store numbers more so than any SAS limitation. If you need to do calculations with these numbers DS2 can read/work with them but you'll have to store results back to character at the end or you'll lose the precision.

 

http://support.sas.com/documentation/cdl/en/lrcon/68089/HTML/default/viewer.htm#p0ji1unv6thm0dn1gp4t...

Super User
Posts: 5,257

Re: Converting text to numeric format issues

Why do you want it to be numerical?
I can see the possibility to earn a few bytes on each record but other than that?
And I can see a risk in loosing information. Account no might have leading zeros and not always have same length...?
Data never sleeps
Solution
‎05-07-2016 03:05 PM
Trusted Advisor
Posts: 1,115

Re: Converting text to numeric format issues

Hi @podarum,

 

It's true that SAS can "safely" store integers in numeric variables only up to a certain size. (But this does not mean that it cannot deal at all with larger integers.) This maximum size is platform-dependent. For Windows/Unix systems the limit for a numeric variable of length 8 bytes (i.e. the maximum and default length) is 2**53 = 9,007,199,254,740,992, i.e. 9.007...E15 (see the table in @Reeza's linked document).

 

The number you mention is clearly smaller than that (by a factor of approx. 13). So, SAS would have no issues with this particular number. (Numbers with 17 digits, as could be contained in your Char-17 variable, would be a different story, though.)

 

However, if I understand your post correctly, the character value ACCT_ID "looks like ... 6.8347285748+014". Maybe this is a typo and you missed the "E" before "+014", because otherwise the INPUT function with informat 17. would result in a missing value and an "Invalid argument" note in the log.

 

In any case, if the character value contains scientific notation such as "...E+14" (or "...E+014" for that matter), your alarm bells should ring, as this would indicate potential (and even probable) loss of information at the time when the exact integer, say, 683472857475432, was converted to scientific notation. This may very well have happened outside SAS, before variable ACCT_ID was even created.

 

The permanent format and informat specification $17. of ACCT_ID indicate automated raw data import (e.g. using PROC IMPORT; was Excel involved??). The raw data value must have been already in scientific notation if it was read as such with informat $17.!

 

So, you have no chance to retrieve the "actual real digits" from a character string in ACCT_ID which doesn't contain them. Instead, you have to trace the value back and prevent whatever application it was from "displaying" the account number in scientific notation (e.g. format Excel cells as "Text"). After that you'll have to repeat the data import. If the account numbers have only 15 digits (like 683472857475432) and no (informative) leading zeros, you can safely store them in a numeric variable of length 8 (if you like), but I wouldn't recommend this. (The scientific notation might then crop up again ...)

Super User
Posts: 17,836

Re: Converting text to numeric format issues

  If you take a 18 digit character and use input with best32 or even 32. does it convert correctly? It doesn't for me in SAS UE. 

Super Contributor
Posts: 395

Re: Converting text to numeric format issues

No it didn't for me either..

Trusted Advisor
Posts: 1,115

Re: Converting text to numeric format issues

[ Edited ]

Reeza wrote:

  If you take a 18 digit character and use input with best32 or even 32. does it convert correctly? It doesn't for me in SAS UE. 


Hi @Reeza,

 

With a standard SAS 9.4 installation (Windows) it depends on the integer and on the definition of a "correct" conversion. An integer with 18 decimal digits would require between (up to) 56 and 59 mantissa bits for an exact internal representation (depending on the size of the number). Given the 52 bits available, it's clear a priori that 4 - 7 bits will be missing, so that differences between the intended and the actual result of up to 2**4 - 1 = 15 in the case of 4 (lost) bits and up to 2**7 - 1 = 127 in the case of 7 bits could be anticipated.

 

In reality it's even worse. It seems that an additional error of 1 in the least significant bit is quite common. (I don't know if it can get worse than that.* ) Last year I observed this with longer integers, but I've just checked that the same effect occurs with some (but not all) 18-digit integers as well.

 

Example:

data _null_;
c='123456789012345678';
n=123456789012345678;
x=input(c,32.);
y=input(c,best32.);
put c=;
put (n x y) (=best24./);
put (n x y) (=hex16./);
run;

Result:

c=123456789012345678
n=123456789012345696
x=123456789012345696
y=123456789012345696
n=437B69B4BA630F36
x=437B69B4BA630F36
y=437B69B4BA630F36

The exact (mathematical) binary representation of this integer ends in ...001101001110. The 0011 corresponds to the hexadecimal 3 in ...F36. The last four bits do not fit into the 52-bit mantissa. So, naive expectation would be that they are either truncated or, at best, rounded. However, truncation would result in a 4 as the last hexadecimal digit and rounding in a 5, but in fact we observe a 6. This leads to an absolute error of 96-78=18 (decimal), i.e. greater than the anticipated maximum of 15.

 

So, even in this somewhat "elaborate" sense, the conversion is not quite correct.

 

In the above example the character-to-numeric conversion using the INPUT function gives the same result as direct numeric assignment (n=...), which I think is not surprising.

 

It would be interesting to learn why the additional error in the last bit occurs.

 

Do you get different results in SAS UE with the above example or do you have an example where the absolute error cannot be "explained" as above? Then I would like to add it to my collection of strange phenomena about numeric representation issues. :-)

 

Edit: * I do know examples of non-integers whose internal representation changes by more than 1 in the least significant bit if a number of zeros are appended to their decimal representation.

Super User
Posts: 17,836

Re: Converting text to numeric format issues

@FreelanceReinhard I get different results, probably within the margin of error I assume?

 

 c=123456789012345678
 n=123456789012345680
 x=123456789012345680
 y=123456789012345680
 n=437B69B4BA630F35
 x=437B69B4BA630F35
 y=437B69B4BA630F35
Trusted Advisor
Posts: 1,115

Re: Converting text to numeric format issues

Thanks @Reeza, this is interesting. So, in this particular case, SAS UE yields the best possible result, given the limited number of 52 mantissa bits. The theoretical exact result (with a 56-bit mantissa) would be 437B69B4BA630F34E in hexadecimal representation.

Super Contributor
Posts: 395

Re: Converting text to numeric format issues

Thank you FreelanceReinhard... the issue was with the excel import.  I made the field numeric in Excel and imported it again.. it worked fine.  Thanks again.

Super User
Posts: 10,500

Re: Converting text to numeric format issues

I am curious as to what arithmetic you intend to perform on your ID variable.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 1198 views
  • 4 likes
  • 5 in conversation