Solved: Re: Unexpected rounding using randomly generated values in DS2 method ...

CurtisMackWSIPP · Posted 02-17-2021 12:21 PM

Hi all,

Right now I think this could be a bug. However, it could be me not understanding type conversion in the FedSQL system or limitation of the RAND function. Please consider this code. I have condensed this down from something more complicated, so there is no obvious use case for this example:

proc ds2;
   package mytools / overwrite =yes;

    method mymethod(Double Value,Double rand) returns double;
      declare double returnVal;
      returnVal = Value;
      if (rand > .5) then returnVal = 0;
      return (returnVal);
    end;
  run;
run;
quit;

data have(drop=y);
  do a = 1 to 50; 
    r = rand('UNIFORM'); 
    x = (a / 5000);
    output; 
  end;
  x = 0;
  do y = 1 to 5; 
    r = rand('UNIFORM'); 
    output; 
  end;
run;

proc fedsql;
  drop table want_rand force;
  drop table want_fixed force;

  create table want_rand as
  select * from
  (select x,r,mytools.mymethod(x,rand('UNIFORM')) as t
     from have) as w
   where t > 0;

  create table want_fixed as
  select * from
  (select x,r,mytools.mymethod(x,r) as t
     from have) as w
   where t > 0;
quit;

data want_rand;
  set want_rand;
  format t 10.7;
run;

What I can't explain is the rows where t = 0 in the want_rand table. The query is supposed to only select values where it is > 0. The want_fixed table does not have these t = 0 rows even though the only difference between these queries is passing the value of the variable "r" which is predefined as opposed to passing a randomly generated number directly. The method should be setting about 50% of the values to 0. Both of the queries returns about the same amount of rows, even though the want_rand table has several 0 values for t. I interpret this to mean that the query is subsetting correctly, but the conversion from FedSql output to a SAS dataset is rounding them to 0 after the fact. However there seems to be no correlation to the size of x and those with t=0, so rounding due to type conversion doesn't explain it very well. Notice that the method does not use the value of "rand" to do any calculation, it is only used as a condition. So how would the different values passed change the datatype or formatting of the output variable at all? The final data set id just to assign a more detailed numeric format to "t" and rule out this being a display format issue.

I tried moving the call to RAND inside the DS2 method and got the same result as when passing it as a parameter. So could this have something to do with calling RAND in the FedSQL environment? The same exact logic inside a FCMP function called in PROC SQL works as expected.

So, like I said, I'm about to submit this to tech support. I am hoping one of you will educate me as to why this is expected behavior.

Thanks!

CurtisMackWSIPP · Posted 03-08-2021 02:26 PM

Tech support has found that this bug is limited in scope and there is a work around (generating the random value outside of the FedSQL query). They have therefore decided "the fix to be included in a future release cycle".

View solution in original post

ballardw · Posted 02-17-2021 12:43 PM

Try using a format that shows more decimals.

When you use a format that specifies the number of decimals such as

 format t 10.7;

then any value that has zeroes to 7 places gets rounded and if the 8th decimal place is not 5 or greater you see 0 as a result.

Run this an look in the log for the result. Different format, different appearance.

data example;
    x= 0.000000009;
    put x=10.7;
    put x=10.8;
    put x=11.9;
    put x=best16.;
run;

Moral of the story: if you a concerned about a specific level of precision then either control the values with explicit rounding or use a format long enough to display the digits of interest. Also keep in mind the precision stored in the variables. As soon as you move to the data step you are back to 8 bit precision.

CurtisMackWSIPP · Posted 02-17-2021 01:32 PM

Where would I put that format in FedSQL?

CurtisMackWSIPP · Posted 02-17-2021 01:35 PM

And the method works fine if a value is passed from a variable instead of directly from the RAND function. If this were a truncation issue, it would be the opposite. If I put the RAND function directly into the method the same issue arises.

CurtisMackWSIPP · Posted 02-17-2021 01:49 PM

To further rule out this being a formatting, rounding, or type conversion issue, consider the following code where the value is converted to a string using the 10.7 format before it is returned. The query still returns rows where the value of "t" = '0' even though the query says "where t != '0'" when the value of RAND is passes but not when the fixed value is passed.

proc ds2;
   package mytools / overwrite =yes;

    method mymethod(Double Value,Double rnd) returns char(32);
      declare double returnVal r z;
      declare char(32) returnStr;
      returnVal = Value;
      if (rnd > .5) then do;
        returnVal = 0;
      end;
      returnStr = put(returnVal,10.7);
      return (returnVal);
    end;
  run;
run;
quit;

data have(drop=y);
  do a = 1 to 50; 
    r = rand('UNIFORM'); 
    x = (a / 5000);
    output; 
  end;
  x = 0;
  do y = 1 to 5; 
    r = rand('UNIFORM'); 
    output; 
  end;
run;

proc fedsql;
  drop table want_rand force;
  drop table want_fixed force;

  create table want_rand as
  select * from
  (select x,r,mytools.mymethod(x,rand('UNIFORM')) as t
     from have) as w
   where t != '0';

  create table want_fixed as
  select * from
  (select x,r,mytools.mymethod(x,r) as t
     from have) as w
   where t != '0';
quit;

Tom · Posted 02-17-2021 02:43 PM

Since your function is not returning the string variable its value will just disappear when the function call ends.

      returnStr = put(returnVal,10.7);
      return (returnStr);

CurtisMackWSIPP · Posted 02-17-2021 04:44 PM

Good catch, I should have seen that. It does not address my original problem however. Here is a new version of the code that uses whole numbers so formatting and rounding are not as likely to be the culprits. The WANT_RAND table still has the unexplained values of 0.

proc ds2;
   package mytools / overwrite =yes;

    method mymethod(Double Value,Double rand) returns double;
      declare double returnVal;
      returnVal = Value;
      if (rand > .5) then returnVal = 0;
      return (returnVal);
    end;
  run;
run;
quit;

data have(drop=y);
  do a = 1 to 50; 
    r = rand('UNIFORM'); 
    x = a ;
    output; 
  end;
  x = 0;
  do y = 1 to 5; 
    r = rand('UNIFORM'); 
    output; 
  end;
run;

proc fedsql;
  drop table want_rand force;
  drop table want_fixed force;

  create table want_rand as
  select * from
  (select x,r,mytools.mymethod(x,rand('UNIFORM')) as t
     from have) as w
   where t > 0;

  create table want_fixed as
  select * from
  (select x,r,mytools.mymethod(x,r) as t
     from have) as w
   where t > 0;
quit;

CurtisMackWSIPP · Posted 02-22-2021 02:08 PM

I'm following up on my own thread for documentation purposes.

I submited this to SAS tech support. After some back and forth, they have come to the conclusion that this is likely a bug.

This is very concerning given that no errors are generated and the results are only partially incorrect. I could have easily missed this behavior if I hadn’t been subsetting for rows where the value was > 0. If I hadn’t been, I may not have noticed that too many values were being set to 0.

In testing this they told me about the importance of including the IPTRACE option on any FedSQL calls. This function gives more information on how the query performs. I will report errors that aren't being propagated to the log. When using this option, any call to a DS2 method from a FedSQL query returns the error "ERROR: [42000] Syntax error or access violation". This error happened even in the very simple method below.

proc ds2;
   package mytools / overwrite =yes;
    method mymethod(Double Value) returns double;
      return (Value);
    end;
  run;
run;
quit;

CurtisMackWSIPP · Posted 03-02-2021 09:49 AM

An update from tech support.

... there is a bug how PROC FedSQL is handling the DS2 code which is causing you to see output that is incorrect (T=0). R&D is looking into it ...

CurtisMackWSIPP · Posted 03-08-2021 02:26 PM

Tech support has found that this bug is limited in scope and there is a work around (generating the random value outside of the FedSQL query). They have therefore decided "the fix to be included in a future release cycle".

Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Re: Unexpected rounding using randomly generated values in DS2 method called from FedSQL

Classroom Training Available!