BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
phil_de_choc
Calcite | Level 5

Hello,

When I do two times the same left join including a sum, I get slightly different output datasets.  Differences of smnt of max E-8 for numbers in thousands.

 

proc sql;
    create table tab3_time1 as 
    select
        t1.num_dman_finm,
        sum(t2.mnt) as smnt
    from local.tab1 as t1
    left join local.tab2 as t2
    on t1.num_dman_finm = t2.num_dman_finm
        and t1.num_pers_dfe = t2.num_pers_dfe
    group by t1.num_dman_finm
    order by t1.num_dman_finm
    ;
quit;
proc sql;
    create table tab3_time2 as 
    select
        t1.num_dman_finm,
        sum(t2.mnt) as smnt
    from local.tab1 as t1
    left join local.tab2 as t2
    on t1.num_dman_finm = t2.num_dman_finm
        and t1.num_pers_dfe = t2.num_pers_dfe
    group by t1.num_dman_finm
    order by t1.num_dman_finm
    ;
quit;
proc compare data=tab3_time1 compare=tab3_time2;
run;

The above proc sql are identical, except for the output name.

tab1 contains only num_dman_finm and num_pers_dfe.  Without duplicates.

tab2 contains num_dman_finmnum_pers_dfe and mnt.  With key duplicates.

 num_dman_finm and num_pers_dfe are integer keys.  mnt are moneys amounts, with 2 decimals.  

 

 

Shouldn't the same request produce the same output?

There are ~5000 differences in the output datasets of ~900 000.

Working on version 9.04.

1 ACCEPTED SOLUTION

Accepted Solutions
ChrisNZ
Tourmaline | Level 20

Unlike a data step, SQL does not have to process data in order.; it can also split a query and join the results at the end. Because the results may not be joined in the same order, the calculation (the sum here) performed is slightly different. And because computers have issues with decimal numerical precision, the order matters.

I'd expect the difference to be lower though, around 1e-15 rather than 1e-8. That's many orders of magnitude higher, and that's where my explanation falls shorter than I'd like. 

View solution in original post

3 REPLIES 3
PGStats
Opal | Level 21

This is odd. Are you saying that the results vary every time you run this query? In a random or systematic way?

PG
ChrisNZ
Tourmaline | Level 20

Unlike a data step, SQL does not have to process data in order.; it can also split a query and join the results at the end. Because the results may not be joined in the same order, the calculation (the sum here) performed is slightly different. And because computers have issues with decimal numerical precision, the order matters.

I'd expect the difference to be lower though, around 1e-15 rather than 1e-8. That's many orders of magnitude higher, and that's where my explanation falls shorter than I'd like. 

phil_de_choc
Calcite | Level 5

1e-8 is the maximum reported by proc compare.  There were many differences around 1e-14.  And there were many amounts in the 100k or greater, not just 1k.  I'll verify at work tomorrow that the greatest differences are for the greatest amounts, and if so, mark your response as accepted answer.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 1257 views
  • 1 like
  • 3 in conversation