New User
Posts: 1

How to check variables for equality, within tolerance, niether missing

I want to compare two variables and return 'true' if they are equal within a specified tolerance, otherwsie 'false', and return 'false' if one of them is missing but 'true' if they are both missing.  Is there an easy way to do this in a data step?

 x y f(x,y) 0.5000000 0.5000001 TRUE 0.5000000 . FALSE . . TRUE
Posts: 5,535

Re: How to check variables for equality, within tolerance, niether missing

[ Edited ]

Use Round() :

``````
data test;
input x y;
format x y best12.;
datalines;
0.5000000	0.5000001
0.5000000	.
.	.
;

%let tolerance=1e-6;

data want;
set test;
f = round(x,&tolerance.) = round(y,&tolerance.);
run;

proc print data=want noobs; run;``````

PG
Super Contributor
Posts: 275

Re: How to check variables for equality, within tolerance, niether missing

Try this:

``````data have;
infile cards missover;
input x	y;
format x 10.7 y 10.7;
comp_flag=ifc(round(x,0.000001)=round(y,0.000001),'Ture','Fail');
cards;
0.5000000	0.5000001
0.5000000     .
.             .
;
proc print;
run;``````
Super User
Posts: 6,782

Re: How to check variables for equality, within tolerance, niether missing

I would advise you to stay away from the ROUND function on each of the values.  It will not always give you the result you want.  For example:

tolerance = 0.01;

x = 2.004;

y = 1.993;

If you round each variable separately, you'll get 2 - 1.99 which is TRUE.  But the difference between them is 0.011 which is FALSE.  Instead, take the absolute value of the difference and compare that to your tolerance level.  It's a little clumsy, but you could try:

length result \$ 5;

if x = y = . then result = 'TRUE';

else if x=. then result = 'FALSE';

else if y=. then result = 'FALSE';

else do;

if abs(x - y) <= tolerance then result='TRUE';

else result='FALSE';

end;

There may be slicker ways to code this, but stay away from ROUND.

Super User
Posts: 13,574

Re: How to check variables for equality, within tolerance, niether missing

Astounding wrote:

I would advise you to stay away from the ROUND function on each of the values.  It will not always give you the result you want.  For example:

tolerance = 0.01;

x = 2.004;

y = 1.993;

If you round each variable separately, you'll get 2 - 1.99 which is TRUE.  But the difference between them is 0.011 which is FALSE.  Instead, take the absolute value of the difference and compare that to your tolerance level.  It's a little clumsy, but you could try:

length result \$ 5;

if x = y = . then result = 'TRUE';

else if x=. then result = 'FALSE';

else if y=. then result = 'FALSE';

else do;

if abs(x - y) <= tolerance then result='TRUE';

else result='FALSE';

end;

There may be slicker ways to code this, but stay away from ROUND.

Nice thing abou this is if you have a changing tolerance it is easier to use by supplying the tolerance value as part of the incoming data.

I might suggest a 1/0 coding instead of "True"/"False" as if you take the mean of the 1/0 then you get a percentage of in tolerance or sum a count of within tolerance not to mention use in some modeling approaches.

With either approach I might suggest that x=. or y=. is missing instead of coded as "False" as someone could mistake that with the result of an actual comparison between two values.

Discussion stats
• 4 replies
• 357 views
• 1 like
• 5 in conversation