DATA Step, Macro, Functions and more

Is it safe to sql join on datetimes?

Reply
Frequent Contributor
Posts: 78

Is it safe to sql join on datetimes?

Joining on floating point numbers is a bad idea, yet SAS does not offer an integer type.  What precision datetime (milliseconds?  microseconds?) can I safely do an equi-join on (or test for equality)?  I realize that with a time field, I should get higher precision.

I'm thinking that because the FUZZ functions define fuzz as 1e-12, I should be safe going down to nanoseconds (1e-9) with a time field, and milliseconds (1e-3, maybe even microseconds?) with a datetime field. 

Can anyone point me to a definitive reference?  Google hasn't turned up anything fo me.

Super Contributor
Posts: 253

Re: Is it safe to sql join on datetimes?

Depends on your data and how it's produced.

First off, SAS's precision is 2^53, aside from the floating point.  However, DATETIME is a number in the billions usually (# of seconds since 1/1/1960 is about 1.7 billion), which is about 2^31.  So you really have about 2^22 more precision available in the decimal; that works out to about microseconds in a datetime.  (This is because SAS stores in a 64 bit floating point number 1 sign bit, 11 exponent bits, and 52 bits for the mantissa, which can store up to 2^53 accurately.)  This is all assuming you're on an IEEE system, ie, Intel, Windows/Unix, etc.  If you're on a mainframe it's a bit different.

This example shows about how the precision works:

data test;

format dtval dtval_test DATETIME23.6;

dtval=datetime();

do _pow = -1 to -15 by -1;

  dtval_test = dtval - 10**_pow;

  diff = dtval-dtval_test;

    put "Difference:" dtval= dtval_test= diff= 18.16 _pow=;

end;

run;

It stops working (stops having a difference) at 10^-7, which would be tens of millions.  That's where you completely lose precision, ie, SAS won't even consider a difference of 1x10-7 to be a difference.

Before that, the difference seems good to about that spot: for 10^-1, diff is 0.0999999046 for example (this is for datetime when I ran this, actual difference may vary some).  Further on that is approximately consistent; around the 7th or 8th digit it diverges from where it should be.  The first six digits after the decimal are 'right', the seventh is often but not always right.

This is all contingent on using datetimes approximately like this year's datetimes.  If you are using time or datetime intervals, ie, values that are pretty small on the left side of the decimal, you can be much more precise.  Basically consider 15 significant digits on either side of the decimal and you're okay.  Nanoseconds in times should be fine for example (left has at most 5 digits, right can have 10 safely).   If you're solely looking at intervals, and there isn't any left side of the decimal, you can have up to 39 digits after the decimal in theory according to the datetime format (though I don't highly recommend doing that).

Reference: SAS(R) 9.2 Language Reference: Concepts, Second Edition

Frequent Contributor
Posts: 78

Re: Is it safe to sql join on datetimes?

Posted in reply to snoopy369

I used your technique for generating differences, and tested sql joins.  With datetimes I get exact matches up to 1e-6 (milliseconds); once the differences get smaller than that sql "mistakenly" joins values that it should not.  (That is, it agrees with your statement "SAS won't even consider a deff of 10-7 to be a difference".)

With times (no date part), I get accurate joining down to 1e-11.  To play it safe, I won't go beyond nonoseconds.

Trusted Advisor
Posts: 3,212

Re: Is it safe to sql join on datetimes?

Good question and a very good answer.

My question: you are sure you need a sql join/merge?

Perhaps an other coding is a better fit for your (not described) problem.

I am thinking on a sas data step merge approach using observations in an ordered way.

---->-- ja karman --<-----
Super Contributor
Posts: 253

Re: Is it safe to sql join on datetimes?

SQL Join allows you to do approximate merge (like PGStats shows).  SAS data step merge would require rounding or something else, if your datetimes are very precise (ie, have microseconds or beyond in them).

Trusted Advisor
Posts: 3,212

Re: Is it safe to sql join on datetimes?

Posted in reply to snoopy369

Snoopy. You do not need an exact merge by approach.

With an ordered set check you can use the same logic as in a dow loop is used.

3 gl thinking and forcing sql are different world's in algorithmes. Nosql also is a good one.

---->-- ja karman --<-----
Frequent Contributor
Posts: 78

Re: Is it safe to sql join on datetimes?

I try not to be a one-trick-pony, but I do like sql and make good use of it in SAS.  If is correct (and I plan to do some join tests) and sql joining will work with microsecond times, then I can stick to what I know well.  If not, then it will be time for a new tool in the toolbox.

Super Contributor
Posts: 253

Re: Is it safe to sql join on datetimes?

I think the 'how do you join' is pretty irrelevant, basically.  SQL join, datastep set/merge, whatever, you have to either:

* Round your data to microseconds or more precise

or

* Do your test on a range

All of this aside, if you're doing math on your datetimes, don't forget that you still can have slight differences unless you round immediately before you do the join (since 0.1 etc. are not perfectly representable).

Trusted Advisor
Posts: 3,212

Re: Is it safe to sql join on datetimes?

Posted in reply to snoopy369

Snoopy, The basics of SQL are limited to some algorithms with a coding approach.

Theses are failing in some areas or causing a lot of performance overhead. That is why other approaches exist of which some are hying (hadoop nosql) at the moment. Depending  on the problem not a one-trick-pony approach. Is it sufficient to solve with what you know well, do not bother. jdmarino argument holds. 

---->-- ja karman --<-----
Respected Advisor
Posts: 4,920

Re: Is it safe to sql join on datetimes?

Joining on floating point numbers is never a good idea. Joining on rounded floating point numbers will work better but is still imperfect. There are always pairs of numbers very close apart that will not round to the same value. That leaves you with join conditions like

on a.dt between b.dt-0.0001 and b.dt+0.0001

which may or may not be optimized in your particular situation (DBMS, indexes).

PG

PG
Frequent Contributor
Posts: 78

Re: Is it safe to sql join on datetimes?

I generally use

where abs(a.dt-b.dt) < 0.0001

Ask a Question
Discussion stats
  • 10 replies
  • 749 views
  • 3 likes
  • 4 in conversation