Solved: Re: Bizarre Issue with subsetting where var= macro var

sm4 · Posted 08-28-2018 10:43 AM

Hi,

I am trying to set a dataset only where the value of a particular column (let's call it col1) is equal to the value of a macro var (let's call it test1) . This works perfectly until, bizarrely, I multiply col1 by 100 to form col2 so that now I am setting only where col2= &test1.. With this change, it only works for certain rows, not others.

I know this sounds really confusing, here's the full code:

*create test dataset;
proc sql;                  
          create table test                 
                   (hello char(40), col1 num);             

          insert into test                  
                   values('a',0.059)                
                   values('b',0.064)                
                   values('c',0.044)                
                   values('d',0.082)
                   values('e',0.061)                
                   values('f',0.082)                
                   values('g',0.066)                
                   values('h',0.042)
                   values('i',0.055)
;                 
quit;             

*multiply by 100, create a new column:;
data test0 (drop = col1);
         set test;
         col2 = col1*100;
run;

*create macro vars out of all values of col2 above;
%let test1 = 5.9;*;
%let test2 = 6.4;
%let test3 = 4.4;*;
%let test4 = 8.2;*;
%let test5 = 6.1;
%let test6 = 6.6;*;
%let test7 = 4.2;
%let test8 = 5.5;

*try to subset to each of these;
data test1;
         set test0;
         where col2 = &test1.;
run;
proc print;
run;

data test2;
         set test0;
         where col2 = &test2.;
run;
proc print;
run;

data test3;
         set test0;
         where col2 = &test3.;
run;
proc print;
run;

data test4;
         set test0;
         where col2 = &test4.;
run;
proc print;
run;

data test5;
         set test0;
         where col2 = &test5.;
run;
proc print;
run;

data test6;
         set test0;
         where col2 = &test6.;
run;
proc print;
run;

data test7;
         set test0;
         where col2 = &test7.;
run;
proc print;
run;

data test8;
         set test0;
         where col2 = &test8.;
run;
proc print;
run;

When I run it (I also had a friend run it and she got the same issue), I only get 4 datasets w/ observations, those whose macro var does not have a commented star (to the right of the let statement). Here's the log statement I get for one of the ones that did not print:

NOTE: There were 0 observations read from the data set WORK.TEST0.
WHERE col2=5.9;
NOTE: The data set WORK.TEST1 has 0 observations and 2 variables.

I originally had tried something like:

data test1;
 set test;
 where col1 = &test1./100;
run;
proc print;
run;

Here's the log statement I get for that:

NOTE: There were 0 observations read from the data set WORK.TEST.
WHERE col1=0.059;
NOTE: The data set WORK.TEST1 has 0 observations and 2 variables.

So then I tried switching to this col2 thing.

I must be missing something obvious, but I'm at my wit's end - would really appreciate if someone could point it out.

Thanks very much,

Reeza · Posted 08-28-2018 10:51 AM

It's a numerical precision issue.

data test1;
         set test0;
         where round(col2, 0.1) = round(&test1., 0.1);
run;

View solution in original post

Reeza · Posted 08-28-2018 10:51 AM

It's a numerical precision issue.

data test1;
         set test0;
         where round(col2, 0.1) = round(&test1., 0.1);
run;

sm4 · Posted 08-28-2018 11:22 AM

Thanks very much!!!!! So is the general rule that I need to explicitly round macro vars when comparing them to values in a dataset?

FreelanceReinh · Posted 08-28-2018 03:57 PM

@sm4 wrote:

So is the general rule that I need to explicitly round macro vars when comparing them to values in a dataset?

No, it has nothing to do with macro variables. On the contrary, macro variables contain text. Hence, they are not affected by numeric representation issues. (However, when they resolve to numeric values, such as 5.9, these values are, of course, subject to the same rules as any other numeric literal.)

The "general rule" is: Whenever SAS performs a calculation involving at least one non-integer number (such as 5.9), there is a substantial risk that, due to rounding error, the result differs slightly (by a very small number, e.g. 1E-15) from what it should be mathematically. If you apply the ROUND function to such a result (with a suitable rounding unit as the second argument), you'll adjust it to a multiple of the rounding unit. Thus, if the original result differs slightly from this value just due to rounding error, you will now obtain the correct result.

The tiny differences mentioned above are harmful in all sorts of comparisons. This includes explicit comparisons using a comparison operator (=, >, <=, ~=, etc.) like in your example, but also comparisons made internally, e.g., by MERGE statements, PROC SORT, PROC COMPARE or when a numeric format is applied.

So, you could have avoided all the issues if you had rounded the result of your calculation immediately, e.g.:

col2 = round(col1*100, 1e-9);

Otherwise, you'd have to apply the ROUND function to col2 in all comparisons (see @Reeza's suggestion -- where it was actually redundant to round the macro variable value &test1, see first paragraph).

The rounding unit has to be chosen small enough not to influence results, but big enough to correct rounding errors. In your example, where you know in advance that the exact result will have (at most) one decimal, 0.1 is a suitable rounding unit, but many smaller values, e.g. 0.01=1e-2 or 0.001=1e-3 would be suitable as well. I chose 1e-9 in my suggestion because you could routinely use the same value if your exact result had two, three, ..., eight or even nine decimals. And it's still much greater than typical rounding errors incurred due to numerical accuracy issues (unless your results have very large absolute values, e.g. > 1 million).

To see why some but not all of your comparisons failed, look at the results of the following data step (which shows the internal binary representation of the respective numbers in hexadecimal format) in the log window:

data _null_;
input x;
a=x*100;
b=round(a, 0.1);
if a=b then put 'No problem with ' x= +(-1) '. a and b have the same internal representation:';
else put x= 'is problematic. a and b look like ' b +(-1) ', but their internal representations differ:';
put (a b) (=hex16. /) /;
cards;
0.059
0.064
0.044
0.082
0.061
0.066
0.042
0.055
;

Here are some more seemingly "bizarre" examples:

data _null_;
if 0.1+0.7~=0.8   then put 'This';
if 1.2*1.5<1.8    then put 'looks';
if 9.9/3>3.3      then put 'quite';
if 1e-5 ne 1.0e-5 then put 'bizarre!';
run;

data _null_;
if round(0.1+0.7, 1e-9)~=0.8   then put 'This';
if round(1.2*1.5, 1e-9)<1.8    then put 'will';
if round(9.9/3, 1e-9)>3.3      then put 'not';
if 1e-5 ne round(1.0e-5, 1e-9) then put 'appear!';
run;

The first of the above two data steps is likely to execute some if not all four PUT statements (this is platform dependent) because of numeric representation issues. The second step avoids these issues by using the ROUND function. None of its PUT statements will be executed. The example involving 1.0e-5 is one of the rare cases where these issues occur even without performing a calculation.

Please see the ROUND function documentation for further information. Still more details can be found in Numerical Accuracy in SAS Software.

sm4 · Posted 08-29-2018 12:14 PM

Thanks very much for the detailed answer!!

It's something I will be sure to keep in mind from now on.

Bizarre Issue with subsetting where var= macro var

Re: Bizarre Issue with subsetting where var= macro var

Re: Bizarre Issue with subsetting where var= macro var

Re: Bizarre Issue with subsetting where var= macro var

Re: Bizarre Issue with subsetting where var= macro var

Re: Bizarre Issue with subsetting where var= macro var

Click image to register for webinar

Classroom Training Available!