turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Converting text to numeric format issues

Topic Options

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

05-06-2016 11:31 PM

Hi, I have an account number that is text, and I want to conver it to number format. It is in this format ($17.) and informat ($17.) and it looks like this 6.8347285748+014 , if I use the code below it converts it to numeric however it looks like this 683472857480000, and it makes the last 4 digits 0's.. Is there a way to fix this and not have the 4 0's, but the actual real digits? Thanks.

New_acct = input(acct_id,17.);

format New_acct 17.;

Accepted Solutions

Solution

05-07-2016
03:05 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to podarum

05-07-2016 11:26 AM

Hi @podarum,

It's true that SAS can "safely" store integers in numeric variables only up to a certain size. (But this does *not* mean that it cannot deal at all with larger integers.) This maximum size is platform-dependent. For Windows/Unix systems the limit for a numeric variable of length 8 bytes (i.e. the maximum and default length) is 2**53 = 9,007,199,254,740,992, i.e. 9.007...E**15** (see the table in @Reeza's linked document).

The number you mention is clearly smaller than that (by a factor of approx. 13). So, SAS would have no issues with this particular number. (Numbers with **17** digits, as could be contained in your Char-17 variable, would be a different story, though.)

However, if I understand your post correctly, the *character *value ACCT_ID "looks like ... 6.8347285748+014". Maybe this is a typo and you missed the "E" before "+014", because otherwise the INPUT function with informat 17. would result in a missing value and an "Invalid argument" note in the log.

In any case, if the *character* value contains scientific notation such as "...E+14" (or "...E+014" for that matter), your alarm bells should ring, as this would indicate potential (and even probable) loss of information at the time when the exact integer, say, 6834728574** 75432**, was converted to scientific notation. This may very well have happened outside SAS, before variable ACCT_ID was even created.

The permanent format and informat specification $17. of ACCT_ID indicate automated raw data import (e.g. using PROC IMPORT; was Excel involved??). The raw data value must have been already in scientific notation if it was read as such with informat $17.!

So, you have no chance to retrieve the "actual real digits" from a character string in ACCT_ID which doesn't contain them. Instead, you have to trace the value back and prevent whatever application it was from "displaying" the account number in scientific notation (e.g. format Excel cells as "Text"). After that you'll have to repeat the data import. If the account numbers have only 15 digits (like 683472857475432) and no (informative) leading zeros, you can safely store them in a numeric variable of length 8 (if you like), but I wouldn't recommend this. (The scientific notation might then crop up again ...)

All Replies

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to podarum

05-06-2016 11:47 PM

Unfortunately SAS cannot deal with numbers that large

It has to do with numerical precision and how computers store numbers more so than any SAS limitation. If you need to do calculations with these numbers DS2 can read/work with them but you'll have to store results back to character at the end or you'll lose the precision.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to podarum

05-07-2016 01:50 AM

Why do you want it to be numerical?

I can see the possibility to earn a few bytes on each record but other than that?

And I can see a risk in loosing information. Account no might have leading zeros and not always have same length...?

I can see the possibility to earn a few bytes on each record but other than that?

And I can see a risk in loosing information. Account no might have leading zeros and not always have same length...?

Data never sleeps

Solution

05-07-2016
03:05 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to podarum

05-07-2016 11:26 AM

Hi @podarum,

It's true that SAS can "safely" store integers in numeric variables only up to a certain size. (But this does *not* mean that it cannot deal at all with larger integers.) This maximum size is platform-dependent. For Windows/Unix systems the limit for a numeric variable of length 8 bytes (i.e. the maximum and default length) is 2**53 = 9,007,199,254,740,992, i.e. 9.007...E**15** (see the table in @Reeza's linked document).

The number you mention is clearly smaller than that (by a factor of approx. 13). So, SAS would have no issues with this particular number. (Numbers with **17** digits, as could be contained in your Char-17 variable, would be a different story, though.)

However, if I understand your post correctly, the *character *value ACCT_ID "looks like ... 6.8347285748+014". Maybe this is a typo and you missed the "E" before "+014", because otherwise the INPUT function with informat 17. would result in a missing value and an "Invalid argument" note in the log.

In any case, if the *character* value contains scientific notation such as "...E+14" (or "...E+014" for that matter), your alarm bells should ring, as this would indicate potential (and even probable) loss of information at the time when the exact integer, say, 6834728574** 75432**, was converted to scientific notation. This may very well have happened outside SAS, before variable ACCT_ID was even created.

The permanent format and informat specification $17. of ACCT_ID indicate automated raw data import (e.g. using PROC IMPORT; was Excel involved??). The raw data value must have been already in scientific notation if it was read as such with informat $17.!

So, you have no chance to retrieve the "actual real digits" from a character string in ACCT_ID which doesn't contain them. Instead, you have to trace the value back and prevent whatever application it was from "displaying" the account number in scientific notation (e.g. format Excel cells as "Text"). After that you'll have to repeat the data import. If the account numbers have only 15 digits (like 683472857475432) and no (informative) leading zeros, you can safely store them in a numeric variable of length 8 (if you like), but I wouldn't recommend this. (The scientific notation might then crop up again ...)

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

05-07-2016 01:44 PM

If you take a 18 digit character and use input with best32 or even 32. does it convert correctly? It doesn't for me in SAS UE.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

05-07-2016 02:58 PM

No it didn't for me either..

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

05-07-2016 04:21 PM - edited 05-07-2016 04:42 PM

Reeza wrote:

Hi @Reeza,

With a standard SAS 9.4 installation (Windows) it depends on the integer and on the definition of a "correct" conversion. An integer with 18 decimal digits would require between (up to) 56 and 59 mantissa bits for an exact internal representation (depending on the size of the number). Given the 52 bits available, it's clear *a priori* that 4 - 7 bits will be missing, so that differences between the intended and the actual result of up to 2**4 - 1 = 15 in the case of 4 (lost) bits and up to 2**7 - 1 = 127 in the case of 7 bits could be anticipated.

In reality it's even worse. It seems that an additional error of 1 in the least significant bit is quite common. (I don't know if it can get worse than that.* ) Last year I observed this with longer integers, but I've just checked that the same effect occurs with some (but not all) 18-digit integers as well.

Example:

```
data _null_;
c='123456789012345678';
n=123456789012345678;
x=input(c,32.);
y=input(c,best32.);
put c=;
put (n x y) (=best24./);
put (n x y) (=hex16./);
run;
```

Result:

c=123456789012345678 n=123456789012345696 x=123456789012345696 y=123456789012345696 n=437B69B4BA630F36 x=437B69B4BA630F36 y=437B69B4BA630F36

The exact (mathematical) binary representation of this integer ends in ...**0011**01001110. The **0011** corresponds to the hexadecimal **3** in ...F**3**6. The last four bits do not fit into the 52-bit mantissa. So, naive expectation would be that they are either truncated or, at best, rounded. However, truncation would result in a 4 as the last hexadecimal digit and rounding in a 5, but in fact we observe a 6. This leads to an absolute error of 96-78=18 (decimal), i.e. greater than the anticipated maximum of 15.

So, even in this somewhat "elaborate" sense, the conversion is not quite correct.

In the above example the character-to-numeric conversion using the INPUT function gives the same result as direct numeric assignment (n=...), which I think is not surprising.

It would be interesting to learn why the *additional* error in the last bit occurs.

Do you get different results in SAS UE with the above example or do you have an example where the absolute error cannot be "explained" as above? Then I would like to add it to my collection of strange phenomena about numeric representation issues. :-)

Edit: * I *do* know examples of *non*-integers whose internal representation changes by more than 1 in the least significant bit if a number of zeros are appended to their decimal representation.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

05-07-2016 06:59 PM

@FreelanceReinhard I get different results, probably within the margin of error I assume?

```
c=123456789012345678
n=123456789012345680
x=123456789012345680
y=123456789012345680
n=437B69B4BA630F35
x=437B69B4BA630F35
y=437B69B4BA630F35
```

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Reeza

05-08-2016 05:26 AM

Thanks @Reeza, this is interesting. So, in this particular case, SAS UE yields the best possible result, given the limited number of 52 mantissa bits. The theoretical exact result (with a 56-bit mantissa) would be 437B69B4BA630F3**4E** in hexadecimal representation.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to FreelanceReinhard

05-07-2016 03:04 PM

Thank you FreelanceReinhard... the issue was with the excel import. I made the field numeric in Excel and imported it again.. it worked fine. Thanks again.

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to podarum

05-09-2016 11:21 AM

I am curious as to what arithmetic you intend to perform on your ID variable.