proc sql sum is inaccurate

pinkyc · Posted 02-06-2015 01:09 PM

I have data like this:

Pat_ID Discharge Admit Paid_Amount

A 2/1/2012 -20.48

A 2/1/2012 -10.18

A 2/1/2012 -1300.44

A 2/1/2012 1300.44

A 2/1/2012 10.18

A 2/1/2012 20.48

A 1/1/2012 2/1/2012 50.50

A 1/1/2012 2/1/2012 60.48

I need to sum up the paid_amounts by Pat_ID, discharge and admit so I used this:

proc sql;

create table sumPaid as

select pat_id, admit, discharge, sum(paid_amount) as paid_amount

from pat_det

by pat_id, admit, discharge;

quit;

Which gives me something like this:

pat_ID Admit Discharge paid_amount

A 2/1/2012 4.54747474E-13

A 2/1/2012 1/1/2012 110.98

You can tell that the observations with blank discharge but admit 2/1/2012 have equal paid_amounts that should add up to 0 yet for similar situations, I get extremely small E-13 sums and I have no idea why that could be.

Haikuo · Posted 02-06-2015 02:39 PM

One possibility is that you are tricked by the format. See below:

data pat_det;

input Pat_ID :$10. (Discharge Admit) (:mmddyy10.) Paid_Amount /*default format is best12.*/;

cards;

A . 2/1/2012 -20.48

A . 2/1/2012 -10.18

A . 2/1/2012 -1300.44

A . 2/1/2012 1300.4400000001

A . 2/1/2012 10.18

A . 2/1/2012 20.48

A 1/1/2012 2/1/2012 50.50

A 1/1/2012 2/1/2012 60.48

;

proc sql;

create table sumPaid as

select pat_id, admit, discharge, sum(paid_amount) as paid_amount

from pat_det

group by pat_id, admit, discharge;

quit;

Haikuo · Posted 02-06-2015 02:48 PM

One way to reconcile is to use ROUND(), before SUM.

ballardw · Posted 02-06-2015 03:05 PM

Actually I think this might be a case of round After sum as the first example would like to be 0 but I think the binary representation of negative and positive is part of the issue.

Haikuo · Posted 02-06-2015 03:13 PM

Good point. However, I was worry about the small discrepancies may add up big if not using ROUND() at early stage.

ballardw · Posted 02-06-2015 04:34 PM

I don't have to deal with negative numbers in my work often. This example tells me that if you're dealing with such then you may need to consider rounding at both ends. I did a quick trial with the data as shown by the OP and created a paid amount rounded to 0.01. The sums for the group that should total 0 were both -8.17E14 (different systems different precision likely).

pinkyc · Posted 02-11-2015 09:27 AM

After investigating, it seems that the issue is caused by sql which stores numbers as floating numbers and supposedly, I have to specify the sum as numeric(8,2) with a cast function. I'm not sure how to do that, so instead of using proc sql I used proc summary and the issue went away.

Tom · Posted 02-11-2015 10:07 AM

Good that you got the answer you needed, but your explanation is off.

PROC SUMMARY worked because by default it applies the format from the source variable to the derived statistic. So if you had 8.2 format attached to your PAID_AMOUNT variable then the SUM statistic would also be displayed using the 8.2 format. In SQL syntax you just need to add the FORMAT option to your derived variable to see the same results.

select pat_id, admit, discharge, sum(paid_amount) as paid_amount format=8.2

SAS stores numbers as floating point. There is no way to represent decimal values (like dollars and cents) exact using binary floating point numbers. Rounding in advance won't really help. Rounding after summing can help a little. If you really want to work with dollars and cents then you might consider storing the values as # of CENTS instead of # of DOLLARS because then they will be store as integers and the inability of binary floating point format to exactly represent decimal fractions will not apply.

pinkyc · Posted 02-11-2015 03:51 PM

Thanks for clarifying - my non explanation was just what I thought I gathered from trying to find a solution. I don't know much if anything about how SAS stores numbers so this helps!