I want to compare two variables and return 'true' if they are equal within a specified tolerance, otherwsie 'false', and return 'false' if one of them is missing but 'true' if they are both missing. Is there an easy way to do this in a data step?
x | y | f(x,y) |
0.5000000 | 0.5000001 | TRUE |
0.5000000 | . | FALSE |
. | . | TRUE |
Use Round() :
data test;
input x y;
format x y best12.;
datalines;
0.5000000 0.5000001
0.5000000 .
. .
;
%let tolerance=1e-6;
data want;
set test;
f = round(x,&tolerance.) = round(y,&tolerance.);
run;
proc print data=want noobs; run;
Try this:
data have;
infile cards missover;
input x y;
format x 10.7 y 10.7;
comp_flag=ifc(round(x,0.000001)=round(y,0.000001),'Ture','Fail');
cards;
0.5000000 0.5000001
0.5000000 .
. .
;
proc print;
run;
I would advise you to stay away from the ROUND function on each of the values. It will not always give you the result you want. For example:
tolerance = 0.01;
x = 2.004;
y = 1.993;
If you round each variable separately, you'll get 2 - 1.99 which is TRUE. But the difference between them is 0.011 which is FALSE. Instead, take the absolute value of the difference and compare that to your tolerance level. It's a little clumsy, but you could try:
length result $ 5;
if x = y = . then result = 'TRUE';
else if x=. then result = 'FALSE';
else if y=. then result = 'FALSE';
else do;
if abs(x - y) <= tolerance then result='TRUE';
else result='FALSE';
end;
There may be slicker ways to code this, but stay away from ROUND.
@Astounding wrote:
I would advise you to stay away from the ROUND function on each of the values. It will not always give you the result you want. For example:
tolerance = 0.01;
x = 2.004;
y = 1.993;
If you round each variable separately, you'll get 2 - 1.99 which is TRUE. But the difference between them is 0.011 which is FALSE. Instead, take the absolute value of the difference and compare that to your tolerance level. It's a little clumsy, but you could try:
length result $ 5;
if x = y = . then result = 'TRUE';
else if x=. then result = 'FALSE';
else if y=. then result = 'FALSE';
else do;
if abs(x - y) <= tolerance then result='TRUE';
else result='FALSE';
end;
There may be slicker ways to code this, but stay away from ROUND.
Nice thing abou this is if you have a changing tolerance it is easier to use by supplying the tolerance value as part of the incoming data.
I might suggest a 1/0 coding instead of "True"/"False" as if you take the mean of the 1/0 then you get a percentage of in tolerance or sum a count of within tolerance not to mention use in some modeling approaches.
With either approach I might suggest that x=. or y=. is missing instead of coded as "False" as someone could mistake that with the result of an actual comparison between two values.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.