BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
whymath
Lapis Lazuli | Level 10

I happened to  find the following issue which confused me for several hours.

 

data test;
	input RandNo$ Trt$ Tmax;
	cards;
	K64	R	0.5
	K64	T	0.15
	K64	R	0.15
	K64	T	0.5
	K65	T	0.5
	K65	R	0.33
	K65	T	0.17
	K65	R	0.5
;
run;

proc sql noprint;
	create table SQL as 
	select RandNo, TRT, avg(Tmax) as Tmax_Mean
	from test
	group by RandNo, TRT
	;
quit;

ods output Summary = Means;
proc means data = test n mean;
	class RandNo TRT;
	var Tmax;
run;
ods output;

proc sql;
	select a.RandNo, a.TRT, a.Tmax_Mean as SQL,
		b.Tmax_Mean as Means,
		SQL - Means as Dif
	from SQL as a
	left join Means as b
	on a.RandNo = b.RandNo and a.TRT = b.TRT
	;
quit;

Output:

RandNo Trt SQL Mean Dif 
K64 R 0.325 0.325 0 
K64 T 0.325 0.325 -555E-19 
K65 R 0.415 0.415 0 
K65 T 0.335 0.335 -555E-19 

 

So why dose the results from proc means and proc sql differs from the other?

PS: I have tried deleting the observations of 'K64' or 'K65' and the difference just disappear this time. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

@whymath wrote:
Oh, that is very nice. I used to ods output statement because I can not remember how to write different output statement in different procedure. Do they differs in performance?

I would be very surprised if the direct output would not outperform the ODS method.

 

One of the examples for Maxim 14 I will bring up in my upcoming presentation at SAS GF 2019 will be an example where using ODS CSV instead of a proc export to csv took about 100 times longer.

 

And your other question: yes, slight changes in your logic will change the order of calculations; since some of the intermediate results will be afflicted by precision loss, and others not, you will get precision differences in the end result. That's why it is always sound technique to use round() before doing numerical comparisons in SAS where you test for equality.

 

This is true for all software that uses "real" (mantissa an exponent) storage. Integers don't have that problem, but you can't handle really big numbers with them.

View solution in original post

9 REPLIES 9
SASKiwi
PROC Star

SAS stores numbers in 8 bytes of memory which means it can hold a maximum of 15 digits accurately. The difference you are reporting is in the 16th digit and beyond, so it is meaningless. Apply the ROUND function if you want to get rid of the meaningless noise of tiny differences:

round(SQL - Means, 0.000001)

 

whymath
Lapis Lazuli | Level 10

Thx, but I really want to know why.

Kurt_Bremser
Super User

Note that the difference is very small. It is an artifact caused by how numbers are stored (8-byte real), the resulting limits in precision, and the fact that some (in fact quite a lot) of decimal fractions can't be represented exactly in binary at all.

So a slightly different order of calculations can cause this slight imprecision in the result values.

whymath
Lapis Lazuli | Level 10

Thank you, kind man.
What I really want is comment on the difference between proc sql and proc means, do you think they process the numbers differently? And why this difference disappear when there is only one possible value of 'RandNo'?

Reeza
Super User
In general, this isn't really a SAS problem, it's a computer problem and Excel, Python, R all have similar issues.

Here's the full documentation and explanation of why this occurs.

https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p0ji1unv6thm0dn1gp4t01a1u0g6.htm&docsetVe...
Kurt_Bremser
Super User

PS you don't have to go through ODS to create an output dataset from proc summary/means:

proc means data=test nway noprint;
class RandNo TRT;
var Tmax;
output
  out=means (drop=_type_ _freq_)
  mean(tmax)=tmax_mean
;
run;
whymath
Lapis Lazuli | Level 10
Oh, that is very nice. I used to ods output statement because I can not remember how to write different output statement in different procedure. Do they differs in performance?
Kurt_Bremser
Super User

@whymath wrote:
Oh, that is very nice. I used to ods output statement because I can not remember how to write different output statement in different procedure. Do they differs in performance?

I would be very surprised if the direct output would not outperform the ODS method.

 

One of the examples for Maxim 14 I will bring up in my upcoming presentation at SAS GF 2019 will be an example where using ODS CSV instead of a proc export to csv took about 100 times longer.

 

And your other question: yes, slight changes in your logic will change the order of calculations; since some of the intermediate results will be afflicted by precision loss, and others not, you will get precision differences in the end result. That's why it is always sound technique to use round() before doing numerical comparisons in SAS where you test for equality.

 

This is true for all software that uses "real" (mantissa an exponent) storage. Integers don't have that problem, but you can't handle really big numbers with them.

whymath
Lapis Lazuli | Level 10
Really detailed explanation.
I will follow you on GF.

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 2635 views
  • 0 likes
  • 4 in conversation