Re: Solve non-linear equation system

invalid · Posted 03-13-2019 09:23 AM

Hi!

I have a data set with a column f with many rows

I want to find a value b so that when taking sum over all i

c1 = SUM_i(MAX(b*f_i,c2))

c1 and c2 are constants.

Here is example data (in reality I have much more than 100 rows)

%let c1 = 40000;
%let c2 = 100;

data mydata;
call streaminit(123);       /* set random number seed */
do i = 1 to 100;
   f=rand("Lognormal");
   output;
end;
run;

By exporting mydata to excel and using goal seek i've found that b=245 solves my example (see attachment).

But I cant figure out how to find b using SAS. I think it should be doable using proc model?

I think i am on version 9.4

Rick_SAS · Posted 03-13-2019 01:49 PM

You want to find the root (zero) of the equation

F(b) = SUM_i(MAX(b*f_i,c2)) - c1

If you have SAS/IML, I recommend using the FROOT function to find the root.

proc iml;
use mydata;  read all var "f";  close;

/* find the zero of this function */
start Func(b) global(f);
   v = (b#f <> &c2);  /* v[i] = max(b*f[i], &c2) */
   return &c1 - sum(v);
finish;

b = froot("Func", {0 1000});  
print b;

If you don't have SAS/IML, there are other ways to solve for the root of a nonlinear equation. Which of the following do you have licensed?

SAS/ETS (PROC MODEL)

SAS/OR (PROC OPTMODEL)

SAS/STAT (PROC NLIN)

invalid · Posted 03-14-2019 05:22 AM

I have

SAS/ETS (PROC MODEL)
SAS/STAT (PROC NLIN)

The other two I don't have.

invalid · Posted 03-18-2019 01:11 PM

I've tried solving this using PROC MODEL. But I can't figure it out.

Any pointers?

FreelanceReinh · Posted 03-24-2019 03:54 PM

Hello @invalid,

Sorry for the late reply. I don't have a SAS/ETS license, so can't help you with PROC MODEL. But your problem can be solved with Base SAS alone, at least if c2>=0. Here is how:

data want(keep=b);
if &c2<0 then do;
  put "WAR" "NING: This algorithm doesn't handle the case c2<0!";
  stop;
end;
else if &c1<n*&c2 then do;
  put 'No solution exists because c1 < n*c2.';
  stop;
end;
if _n_=1 then do;
  if 0 then set mydata nobs=n;
  dcl hash h(dataset: 'mydata(keep=f)', multidata: 'y', ordered: 'y');
  h.definekey('f');
  h.definedone();
  dcl hiter hi('h');
  hi.first();
  minf=f;
  hi.last();
  maxf=f;
  rc=hi.next();
end;
if &c1=n*&c2 then do;
  if minf then lb=&c2/minf;
  if maxf then ub=&c2/maxf;
  if .z<maxf<0 | .z<minf<0=maxf then do;
    b=lb; output;
    put 'All b >= ' lb :best16. '(and only these) are solutions.';
  end;
  else if maxf=0 then do; /* In this case all f_i=0. */
    b=0; output;
    put 'All real numbers b are solutions.';
  end;
  else if maxf>0 then do;
    if minf>=0 then do;
      b=ub; output;
      put 'All b <= ' ub :best16. '(and only these) are solutions.';
    end;
    else if &c2>0 then do;
      b=lb; output; b=ub; output;
      put 'All b in the interval [' lb :best16. +(-1) ', ' ub :best16. +(-1) '] (and only these) are solutions.';
    end;
    else do; /* i.e. minf<&c2=0 */
      b=0; output;
      put 'b=0 is the only solution.';
    end;
  end;
end;
/* Main case: &c1>n*&c2 */
else if minf=maxf=0 then put 'No solution exists because c1 > n*c2 >= 0, but all f_i are zero.';
else do;
  if .z<minf<0<maxf then put 'There are exactly two solutions:';
  else put 'There is only one solution:';
  if .z<minf<0 then do;
    /* Determine a solution b<0 */
    hi.first();
    s=f;
    do m=1 to n;
      b=(&c1-(n-m)*&c2)/s;
      if hi.next()=0 & b*f>&c2 then s+f;
      else leave;
    end;
    put +3 b= best16.;
    output;
  end;
  if maxf>0 then do;
    /* Determine a solution b>0 */
    hi.last();
    s=f;
    do m=n to 1 by -1;
      b=(&c1-(m-1)*&c2)/s;
      if hi.prev()=0 & b*f>&c2 then s+f;
      else leave;
    end;
    put +3 b= best16.;
    output;
  end;
end;
stop;
run;

Again, the above solution requires c2>=0 -- I hope that this is consistent with your real c2 values -- and assumes that all f_i, i.e. the values of F in dataset MYDATA, are non-missing. (I haven't spent much time on the case c2<0, but I know that it is substantially different.) Run times were less than 3 seconds for n=10,000,000 observations on my machine.

The algorithm is based on a sorted sequence of the F values (sorted ascending or descending). To avoid a PROC SORT step I use a sorted hash table and a hash iterator, which is also convenient because the latter can traverse the table in both directions. But the algorithm could also be implemented without a hash object, especially if MYDATA was already sorted by F.

I've tested the program with integer values of c1 and c2. Please note that in case of non-integer values the tests of &c1<n*&c2 etc. could be affected by numerical accuracy issues. In this case appropriate rounding should be applied.

In some situations there are infinitely many solutions, e.g. an entire interval (which is then indicated in the log). In this case the output dataset WANT contains only the finite interval boundaries (or 0 if all real numbers are solutions).

Of course, it's easy to verify solutions in a simple data step and I would recommend doing so. (Note, for example, the potential risk of numerical accuracy issues in the tests of b*f>&c2.)

Feel free to ask if you have questions about the program.

Solve non-linear equation system