BookmarkSubscribeRSS Feed
David_Whitaker
Calcite | Level 5

I want to compare two variables and return 'true' if they are equal within a specified tolerance, otherwsie 'false', and return 'false' if one of them is missing but 'true' if they are both missing.  Is there an easy way to do this in a data step?

 

xyf(x,y)
0.50000000.5000001TRUE
0.5000000.FALSE
..TRUE
4 REPLIES 4
PGStats
Opal | Level 21

Use Round() :

 


data test;
input x y;
format x y best12.;
datalines;
0.5000000	0.5000001
0.5000000	.
.	.	
;

%let tolerance=1e-6;

data want;
set test;
f = round(x,&tolerance.) = round(y,&tolerance.);
run;

proc print data=want noobs; run;

 

PG
slchen
Lapis Lazuli | Level 10

Try this:

data have;
infile cards missover;
input x	y;	
format x 10.7 y 10.7;
comp_flag=ifc(round(x,0.000001)=round(y,0.000001),'Ture','Fail');
cards;
0.5000000	0.5000001	
0.5000000     .
.             .
;
proc print;
run;
Astounding
PROC Star

I would advise you to stay away from the ROUND function on each of the values.  It will not always give you the result you want.  For example:

 

tolerance = 0.01;

x = 2.004;

y = 1.993;

 

If you round each variable separately, you'll get 2 - 1.99 which is TRUE.  But the difference between them is 0.011 which is FALSE.  Instead, take the absolute value of the difference and compare that to your tolerance level.  It's a little clumsy, but you could try:

 

length result $ 5;

if x = y = . then result = 'TRUE';

else if x=. then result = 'FALSE';

else if y=. then result = 'FALSE';

else do;

   if abs(x - y) <= tolerance then result='TRUE';

   else result='FALSE';

end;

 

There may be slicker ways to code this, but stay away from ROUND.

ballardw
Super User

@Astounding wrote:

I would advise you to stay away from the ROUND function on each of the values.  It will not always give you the result you want.  For example:

 

tolerance = 0.01;

x = 2.004;

y = 1.993;

 

If you round each variable separately, you'll get 2 - 1.99 which is TRUE.  But the difference between them is 0.011 which is FALSE.  Instead, take the absolute value of the difference and compare that to your tolerance level.  It's a little clumsy, but you could try:

 

length result $ 5;

if x = y = . then result = 'TRUE';

else if x=. then result = 'FALSE';

else if y=. then result = 'FALSE';

else do;

   if abs(x - y) <= tolerance then result='TRUE';

   else result='FALSE';

end;

 

There may be slicker ways to code this, but stay away from ROUND.


Nice thing abou this is if you have a changing tolerance it is easier to use by supplying the tolerance value as part of the incoming data.

 

I might suggest a 1/0 coding instead of "True"/"False" as if you take the mean of the 1/0 then you get a percentage of in tolerance or sum a count of within tolerance not to mention use in some modeling approaches.

With either approach I might suggest that x=. or y=. is missing instead of coded as "False" as someone could mistake that with the result of an actual comparison between two values.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2019 views
  • 1 like
  • 5 in conversation