Solved: Re: Numeric Representation on Windows Platforms

Doug____ · Posted 01-08-2019 12:45 PM

I have an integer numeric value xxxxxxxxxxxxxxx4 (16 digits) that needs to have the last digit recoded to a 3. SAS will not do this in a data step - it leaves the original value. However, when I set the last digit to a 2, this works. Is this a limitation of SAS on Windows platforms? How can I get around it?

The data were originally imported from MS XLSX and the observations described above were imported incorrectly as well.

FreelanceReinh · Posted 01-08-2019 02:06 PM

Hi @Doug____,

Not only SAS under Windows (and Unix) does have this limitation. Excel is even worse: When I enter the 16-digit integer 1234567890123453 into a cell (Excel 2013 on Windows 7) without any precautions, Excel rounds it off to 1234567890123450, whereas SAS can store it unchanged in a numeric variable because it does not exceed the limit 2**53=9.007...E15 (see ballardw's post or the table "Largest Integer That Can Be Safely Stored in a Given Length" in the documentation).

Therefore, I suspect that in your Excel file this value was already stored as a character string (e.g. with a leading single quote) and I second ballardw's recommendation to read such values as character strings in the first place (and don't let PROC IMPORT guess what type they are).

The limiting factor for handling long integers (as numeric values) in SAS under Windows (and Unix) is the 52 mantissa bits (plus the "implied bit") used for the internal representation of numeric values. SAS can very well store some integers beyond the limit mentioned above without loss of precision, namely those whose 54th, 55th, ... binary digits (if any) are zero. Obviously, all of these are even numbers and the more binary digits they have, the larger is the power of 2 by which they are divisible. That's why the last digit 3 is problematic in your example while 2 as the last digit worked (in the case of 16 decimal digits).

View solution in original post

ballardw · Posted 01-08-2019 01:02 PM

How are you "recoding" the value? Can you post the code?

It is likely depending on your approach that you may be exceeding precision.. Documentation shows the largest integer represented under Windows exactly is 9,007,199,254,740,992 and generally 15 significant digits are retained for larger. If your approach does something to cause the value to cross that value then you could have issues.

Is that variable actually intended to be numeric, as in are you doing arithmetic with the values? If not I would suggest going back in your process and creating the variable as character which will have no precis

Doug____ · Posted 01-08-2019 01:47 PM

As far as changing the format, the system is designed around numeric identifiers. Second, the code is just a data step where I look for the identifier values in question (which SAS finds) and say if this then that. No other code is used. But this does not work in all cases as it leaves the original value and appears to ignore the if-then syntax in the data step.

ballardw · Posted 01-08-2019 06:34 PM

@Doug____ wrote:

As far as changing the format, the system is designed around numeric identifiers. Second, the code is just a data step where I look for the identifier values in question (which SAS finds) and say if this then that. No other code is used. But this does not work in all cases as it leaves the original value and appears to ignore the if-then syntax in the data step.

Actual code statements of your data step would help. Providing several example values where it works and does not work would also help.

FreelanceReinh · Posted 01-08-2019 02:06 PM

Hi @Doug____,

Not only SAS under Windows (and Unix) does have this limitation. Excel is even worse: When I enter the 16-digit integer 1234567890123453 into a cell (Excel 2013 on Windows 7) without any precautions, Excel rounds it off to 1234567890123450, whereas SAS can store it unchanged in a numeric variable because it does not exceed the limit 2**53=9.007...E15 (see ballardw's post or the table "Largest Integer That Can Be Safely Stored in a Given Length" in the documentation).

Therefore, I suspect that in your Excel file this value was already stored as a character string (e.g. with a leading single quote) and I second ballardw's recommendation to read such values as character strings in the first place (and don't let PROC IMPORT guess what type they are).

The limiting factor for handling long integers (as numeric values) in SAS under Windows (and Unix) is the 52 mantissa bits (plus the "implied bit") used for the internal representation of numeric values. SAS can very well store some integers beyond the limit mentioned above without loss of precision, namely those whose 54th, 55th, ... binary digits (if any) are zero. Obviously, all of these are even numbers and the more binary digits they have, the larger is the power of 2 by which they are divisible. That's why the last digit 3 is problematic in your example while 2 as the last digit worked (in the case of 16 decimal digits).

Kurt_Bremser · Posted 01-08-2019 02:10 PM

16 decimal digits exceeds the maximum precision achievable with 8-byte real storage, so you will have such issues. Since such precision is not needed with regards to statistics, so I assume you have something which is not really a number and should be stored as a string.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 01-08-2019 03:48 PM

The largest consecutive integer in sas windows is 9,007,199,254,740,992, which you can determine by using the CONSTANT function as here:

data _null_;
  x=constant('exactint',8);
  put x=comma21.;
run;

This means that you can NOT get all the integers above that value (call it EXACTINT8). But you can get these larger integers:

All the even integers from EXACTINT8 to 2*EXACTINT8 (that's why your number can't end in 3, because it's larger than EXACTINT8, but it's not an even number).
All the 0mod4 (no remainder when divided by 4) from 2*EXACTINT8 to 4*EXACTINT8
All the 0mod8 from 4*EXACTINT8 to 8*EXACTINT8
All the 0mod16 from 8*EXACTINT8 to 16*EXACTINT8
All the 0mod32 from 16*EXACTINT8 to 32*EXACTINT8
etc.

Also it means the following:

You can only have integers from EXACTINT8 down to EXACTINT8/2.
You can only have integers and halves from EXACTINT8/2 down to EXACTINT8/4.
You can only have integers, halves, and quarters from EXACTINT8/4 down to EXACTINT8/8.
etc.

Actually all SAS numbers (but see below) use floating point storage representation, which reserves bits for an exponent. Because there is no actual integer numeric TYPE in SAS (where no bits would be wasted on exponents) you don't get all the integers that 8-bytes of storage would otherwise provide.

However, you CAN have integer types (and therefore larger consecutive integers) in PROC DS2, but that it another topic. Just note that PROC DS2 will USE integer types, but it won't store integer types in a sas data set.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

SASKiwi · Posted 01-08-2019 10:25 PM

If you are not doing any mathematical calculations on this integer value then import it as a character variable and you then avoid numeric precision problems entirely. Using the "Text to Column" feature on this column in Excel prior to import will ensure SAS treats it as character.

In the industry I work in, banking, I strike this sort of issue all the time. For example credit card numbers are 16 digits long so have to be stored as character columns to avoid any precision problems.

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!