BookmarkSubscribeRSS Feed
David_Whitaker
Calcite | Level 5

I want to compare two variables and return 'true' if they are equal within a specified tolerance, otherwsie 'false', and return 'false' if one of them is missing but 'true' if they are both missing.  Is there an easy way to do this in a data step?

 

xyf(x,y)
0.50000000.5000001TRUE
0.5000000.FALSE
..TRUE
4 REPLIES 4
PGStats
Opal | Level 21

Use Round() :

 


data test;
input x y;
format x y best12.;
datalines;
0.5000000	0.5000001
0.5000000	.
.	.	
;

%let tolerance=1e-6;

data want;
set test;
f = round(x,&tolerance.) = round(y,&tolerance.);
run;

proc print data=want noobs; run;

 

PG
slchen
Lapis Lazuli | Level 10

Try this:

data have;
infile cards missover;
input x	y;	
format x 10.7 y 10.7;
comp_flag=ifc(round(x,0.000001)=round(y,0.000001),'Ture','Fail');
cards;
0.5000000	0.5000001	
0.5000000     .
.             .
;
proc print;
run;
Astounding
PROC Star

I would advise you to stay away from the ROUND function on each of the values.  It will not always give you the result you want.  For example:

 

tolerance = 0.01;

x = 2.004;

y = 1.993;

 

If you round each variable separately, you'll get 2 - 1.99 which is TRUE.  But the difference between them is 0.011 which is FALSE.  Instead, take the absolute value of the difference and compare that to your tolerance level.  It's a little clumsy, but you could try:

 

length result $ 5;

if x = y = . then result = 'TRUE';

else if x=. then result = 'FALSE';

else if y=. then result = 'FALSE';

else do;

   if abs(x - y) <= tolerance then result='TRUE';

   else result='FALSE';

end;

 

There may be slicker ways to code this, but stay away from ROUND.

ballardw
Super User

@Astounding wrote:

I would advise you to stay away from the ROUND function on each of the values.  It will not always give you the result you want.  For example:

 

tolerance = 0.01;

x = 2.004;

y = 1.993;

 

If you round each variable separately, you'll get 2 - 1.99 which is TRUE.  But the difference between them is 0.011 which is FALSE.  Instead, take the absolute value of the difference and compare that to your tolerance level.  It's a little clumsy, but you could try:

 

length result $ 5;

if x = y = . then result = 'TRUE';

else if x=. then result = 'FALSE';

else if y=. then result = 'FALSE';

else do;

   if abs(x - y) <= tolerance then result='TRUE';

   else result='FALSE';

end;

 

There may be slicker ways to code this, but stay away from ROUND.


Nice thing abou this is if you have a changing tolerance it is easier to use by supplying the tolerance value as part of the incoming data.

 

I might suggest a 1/0 coding instead of "True"/"False" as if you take the mean of the 1/0 then you get a percentage of in tolerance or sum a count of within tolerance not to mention use in some modeling approaches.

With either approach I might suggest that x=. or y=. is missing instead of coded as "False" as someone could mistake that with the result of an actual comparison between two values.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 1126 views
  • 1 like
  • 5 in conversation