I happened to find the following issue which confused me for several hours.
data test;
input RandNo$ Trt$ Tmax;
cards;
K64 R 0.5
K64 T 0.15
K64 R 0.15
K64 T 0.5
K65 T 0.5
K65 R 0.33
K65 T 0.17
K65 R 0.5
;
run;
proc sql noprint;
create table SQL as
select RandNo, TRT, avg(Tmax) as Tmax_Mean
from test
group by RandNo, TRT
;
quit;
ods output Summary = Means;
proc means data = test n mean;
class RandNo TRT;
var Tmax;
run;
ods output;
proc sql;
select a.RandNo, a.TRT, a.Tmax_Mean as SQL,
b.Tmax_Mean as Means,
SQL - Means as Dif
from SQL as a
left join Means as b
on a.RandNo = b.RandNo and a.TRT = b.TRT
;
quit;
Output:
RandNo Trt SQL Mean Dif K64 R 0.325 0.325 0 K64 T 0.325 0.325 -555E-19 K65 R 0.415 0.415 0 K65 T 0.335 0.335 -555E-19
So why dose the results from proc means and proc sql differs from the other?
PS: I have tried deleting the observations of 'K64' or 'K65' and the difference just disappear this time.
@whymath wrote:
Oh, that is very nice. I used to ods output statement because I can not remember how to write different output statement in different procedure. Do they differs in performance?
I would be very surprised if the direct output would not outperform the ODS method.
One of the examples for Maxim 14 I will bring up in my upcoming presentation at SAS GF 2019 will be an example where using ODS CSV instead of a proc export to csv took about 100 times longer.
And your other question: yes, slight changes in your logic will change the order of calculations; since some of the intermediate results will be afflicted by precision loss, and others not, you will get precision differences in the end result. That's why it is always sound technique to use round() before doing numerical comparisons in SAS where you test for equality.
This is true for all software that uses "real" (mantissa an exponent) storage. Integers don't have that problem, but you can't handle really big numbers with them.
SAS stores numbers in 8 bytes of memory which means it can hold a maximum of 15 digits accurately. The difference you are reporting is in the 16th digit and beyond, so it is meaningless. Apply the ROUND function if you want to get rid of the meaningless noise of tiny differences:
round(SQL - Means, 0.000001)
Thx, but I really want to know why.
Note that the difference is very small. It is an artifact caused by how numbers are stored (8-byte real), the resulting limits in precision, and the fact that some (in fact quite a lot) of decimal fractions can't be represented exactly in binary at all.
So a slightly different order of calculations can cause this slight imprecision in the result values.
Thank you, kind man.
What I really want is comment on the difference between proc sql and proc means, do you think they process the numbers differently? And why this difference disappear when there is only one possible value of 'RandNo'?
PS you don't have to go through ODS to create an output dataset from proc summary/means:
proc means data=test nway noprint;
class RandNo TRT;
var Tmax;
output
out=means (drop=_type_ _freq_)
mean(tmax)=tmax_mean
;
run;
@whymath wrote:
Oh, that is very nice. I used to ods output statement because I can not remember how to write different output statement in different procedure. Do they differs in performance?
I would be very surprised if the direct output would not outperform the ODS method.
One of the examples for Maxim 14 I will bring up in my upcoming presentation at SAS GF 2019 will be an example where using ODS CSV instead of a proc export to csv took about 100 times longer.
And your other question: yes, slight changes in your logic will change the order of calculations; since some of the intermediate results will be afflicted by precision loss, and others not, you will get precision differences in the end result. That's why it is always sound technique to use round() before doing numerical comparisons in SAS where you test for equality.
This is true for all software that uses "real" (mantissa an exponent) storage. Integers don't have that problem, but you can't handle really big numbers with them.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.