BookmarkSubscribeRSS Feed
jdmarino
Fluorite | Level 6

Joining on floating point numbers is a bad idea, yet SAS does not offer an integer type.  What precision datetime (milliseconds?  microseconds?) can I safely do an equi-join on (or test for equality)?  I realize that with a time field, I should get higher precision.

I'm thinking that because the FUZZ functions define fuzz as 1e-12, I should be safe going down to nanoseconds (1e-9) with a time field, and milliseconds (1e-3, maybe even microseconds?) with a datetime field. 

Can anyone point me to a definitive reference?  Google hasn't turned up anything fo me.

10 REPLIES 10
snoopy369
Barite | Level 11

Depends on your data and how it's produced.

First off, SAS's precision is 2^53, aside from the floating point.  However, DATETIME is a number in the billions usually (# of seconds since 1/1/1960 is about 1.7 billion), which is about 2^31.  So you really have about 2^22 more precision available in the decimal; that works out to about microseconds in a datetime.  (This is because SAS stores in a 64 bit floating point number 1 sign bit, 11 exponent bits, and 52 bits for the mantissa, which can store up to 2^53 accurately.)  This is all assuming you're on an IEEE system, ie, Intel, Windows/Unix, etc.  If you're on a mainframe it's a bit different.

This example shows about how the precision works:

data test;

format dtval dtval_test DATETIME23.6;

dtval=datetime();

do _pow = -1 to -15 by -1;

  dtval_test = dtval - 10**_pow;

  diff = dtval-dtval_test;

    put "Difference:" dtval= dtval_test= diff= 18.16 _pow=;

end;

run;

It stops working (stops having a difference) at 10^-7, which would be tens of millions.  That's where you completely lose precision, ie, SAS won't even consider a difference of 1x10-7 to be a difference.

Before that, the difference seems good to about that spot: for 10^-1, diff is 0.0999999046 for example (this is for datetime when I ran this, actual difference may vary some).  Further on that is approximately consistent; around the 7th or 8th digit it diverges from where it should be.  The first six digits after the decimal are 'right', the seventh is often but not always right.

This is all contingent on using datetimes approximately like this year's datetimes.  If you are using time or datetime intervals, ie, values that are pretty small on the left side of the decimal, you can be much more precise.  Basically consider 15 significant digits on either side of the decimal and you're okay.  Nanoseconds in times should be fine for example (left has at most 5 digits, right can have 10 safely).   If you're solely looking at intervals, and there isn't any left side of the decimal, you can have up to 39 digits after the decimal in theory according to the datetime format (though I don't highly recommend doing that).

Reference: SAS(R) 9.2 Language Reference: Concepts, Second Edition

jdmarino
Fluorite | Level 6

I used your technique for generating differences, and tested sql joins.  With datetimes I get exact matches up to 1e-6 (milliseconds); once the differences get smaller than that sql "mistakenly" joins values that it should not.  (That is, it agrees with your statement "SAS won't even consider a deff of 10-7 to be a difference".)

With times (no date part), I get accurate joining down to 1e-11.  To play it safe, I won't go beyond nonoseconds.

jakarman
Barite | Level 11

Good question and a very good answer.

My question: you are sure you need a sql join/merge?

Perhaps an other coding is a better fit for your (not described) problem.

I am thinking on a sas data step merge approach using observations in an ordered way.

---->-- ja karman --<-----
snoopy369
Barite | Level 11

SQL Join allows you to do approximate merge (like PGStats shows).  SAS data step merge would require rounding or something else, if your datetimes are very precise (ie, have microseconds or beyond in them).

jakarman
Barite | Level 11

Snoopy. You do not need an exact merge by approach.

With an ordered set check you can use the same logic as in a dow loop is used.

3 gl thinking and forcing sql are different world's in algorithmes. Nosql also is a good one.

---->-- ja karman --<-----
jdmarino
Fluorite | Level 6

I try not to be a one-trick-pony, but I do like sql and make good use of it in SAS.  If is correct (and I plan to do some join tests) and sql joining will work with microsecond times, then I can stick to what I know well.  If not, then it will be time for a new tool in the toolbox.

snoopy369
Barite | Level 11

I think the 'how do you join' is pretty irrelevant, basically.  SQL join, datastep set/merge, whatever, you have to either:

* Round your data to microseconds or more precise

or

* Do your test on a range

All of this aside, if you're doing math on your datetimes, don't forget that you still can have slight differences unless you round immediately before you do the join (since 0.1 etc. are not perfectly representable).

jakarman
Barite | Level 11

Snoopy, The basics of SQL are limited to some algorithms with a coding approach.

Theses are failing in some areas or causing a lot of performance overhead. That is why other approaches exist of which some are hying (hadoop nosql) at the moment. Depending  on the problem not a one-trick-pony approach. Is it sufficient to solve with what you know well, do not bother. jdmarino argument holds. 

---->-- ja karman --<-----
PGStats
Opal | Level 21

Joining on floating point numbers is never a good idea. Joining on rounded floating point numbers will work better but is still imperfect. There are always pairs of numbers very close apart that will not round to the same value. That leaves you with join conditions like

on a.dt between b.dt-0.0001 and b.dt+0.0001

which may or may not be optimized in your particular situation (DBMS, indexes).

PG

PG
jdmarino
Fluorite | Level 6

I generally use

where abs(a.dt-b.dt) < 0.0001

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 2206 views
  • 3 likes
  • 4 in conversation