BookmarkSubscribeRSS Feed
pinkyc
Calcite | Level 5

I have data like this:

Pat_ID     Discharge     Admit          Paid_Amount

A                                   2/1/2012     -20.48

A                                   2/1/2012     -10.18

A                                   2/1/2012     -1300.44

A                                   2/1/2012      1300.44

A                                   2/1/2012     10.18

A                                   2/1/2012     20.48

A               1/1/2012      2/1/2012     50.50

A               1/1/2012      2/1/2012     60.48

I need to sum up the paid_amounts by Pat_ID, discharge and admit so I used this:

proc sql;

create table sumPaid as

select pat_id, admit, discharge, sum(paid_amount) as paid_amount

from pat_det

by pat_id, admit, discharge;

quit;

Which gives me  something like this:

pat_ID     Admit     Discharge     paid_amount

A          2/1/2012                4.54747474E-13

A          2/1/2012  1/1/2012         110.98

You can tell that the observations with blank discharge but admit 2/1/2012 have equal paid_amounts that should add up to 0 yet for similar situations, I get extremely small E-13 sums and I have no idea why that could be.

9 REPLIES 9
Haikuo
Onyx | Level 15

One possibility is that you are tricked by the format.  See below:

data pat_det;

     input Pat_ID :$10.    (Discharge     Admit) (:mmddyy10.)          Paid_Amount /*default format is best12.*/;

     cards;

A                      .             2/1/2012     -20.48

A .                                  2/1/2012     -10.18

A .                                  2/1/2012     -1300.44

A . 2/1/2012      1300.4400000001

A   .                                2/1/2012     10.18

A    .                               2/1/2012     20.48

A              1/1/2012      2/1/2012     50.50

A              1/1/2012      2/1/2012     60.48

;

Capture02051.PNG

proc sql;

     create table sumPaid as

           select pat_id, admit, discharge, sum(paid_amount) as paid_amount

                from pat_det

                     group by pat_id, admit, discharge;

quit;

Capture0205.PNG

Haikuo
Onyx | Level 15

One way to reconcile is to use ROUND(), before SUM.

ballardw
Super User

Actually I think this might be a case of round After sum as  the first example would like to be 0 but I think the binary representation of negative and positive is part of the issue.

Haikuo
Onyx | Level 15

Good point. However, I was worry about the small discrepancies may add up big if not using ROUND() at early stage.

ballardw
Super User

I don't have to deal with negative numbers in my work often. This example tells me that if you're dealing with such then you may need to consider rounding at both ends. I did a quick trial with the data as shown by the OP and created a paid amount rounded to 0.01. The sums for the group that should total 0 were both -8.17E14 (different systems different precision likely).

pinkyc
Calcite | Level 5


After investigating, it seems that the issue is caused by sql which stores numbers as floating numbers and supposedly, I have to specify the sum as numeric(8,2) with a cast function.  I'm not sure how to do that, so instead of using proc sql I used proc summary and the issue went away.

Tom
Super User Tom
Super User

Good that you got the answer you needed, but your explanation is off.

PROC SUMMARY worked because by default it applies the format from the source variable to the derived statistic. So if you had 8.2 format attached to your PAID_AMOUNT variable then the SUM statistic would also be displayed using the 8.2 format.  In SQL syntax you just need to add the FORMAT option to your derived variable to see the same results.

select pat_id, admit, discharge, sum(paid_amount) as paid_amount format=8.2

SAS stores numbers as floating point.  There is no way to represent decimal values (like dollars and cents) exact using binary floating point numbers. Rounding in advance won't really help. Rounding after summing can help a little.  If you really want to work with dollars and cents then you might consider storing the values as # of CENTS instead of # of DOLLARS because then they will be store as integers and the inability of binary floating point format to exactly represent decimal fractions will not apply.

pinkyc
Calcite | Level 5

Thanks for clarifying - my non explanation was just what I thought I gathered from trying to find a solution.  I don't know much if anything about how SAS stores numbers so this helps!

ChrisHemedinger
Community Manager

An exhaustive explanation about floating point numbers in SAS (and in software in general):

http://blogs.sas.com/content/sasdummy/2012/03/01/precision-in-sas-numbers/

There you will learn more than you ever wanted to know!

Chris

It's time to register for SAS Innovate! Join your SAS user peers in Las Vegas on April 16-19 2024.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 5148 views
  • 3 likes
  • 5 in conversation