BookmarkSubscribeRSS Feed
invalid
Fluorite | Level 6

Hi!

 

I have a data set with a column f with many rows

 

 

I want to find a value so that when taking sum over all

 

c1 = SUM_i(MAX(b*f_i,c2))

 

c1 and c2 are constants.

 

Here is example data (in reality I have much more than 100 rows)

%let c1 = 40000;
%let c2 = 100;

data mydata;
call streaminit(123);       /* set random number seed */
do i = 1 to 100;
   f=rand("Lognormal");
   output;
end;
run;

By exporting mydata to excel and using goal seek i've found that b=245 solves my example (see attachment). 

 

But I cant figure out how to find b using SAS. I think it should be doable using proc model?

 

I think i am on version 9.4

4 REPLIES 4
Rick_SAS
SAS Super FREQ

You want to find the root (zero) of the equation

F(b) = SUM_i(MAX(b*f_i,c2)) - c1

If you have SAS/IML, I recommend using the FROOT function to find the root.

 

proc iml;
use mydata;  read all var "f";  close;

/* find the zero of this function */
start Func(b) global(f);
   v = (b#f <> &c2);  /* v[i] = max(b*f[i], &c2) */
   return &c1 - sum(v);
finish;

b = froot("Func", {0 1000});  
print b;

Region Capture.png

 

If you don't have SAS/IML, there are other ways to solve for the root of a nonlinear equation. Which of the following do you have licensed?

 

SAS/ETS (PROC MODEL)

SAS/OR (PROC OPTMODEL)

SAS/STAT (PROC NLIN)

invalid
Fluorite | Level 6

I have 

  • SAS/ETS (PROC MODEL)
  • SAS/STAT (PROC NLIN)

The other two I don't have.

invalid
Fluorite | Level 6
I've tried solving this using PROC MODEL. But I can't figure it out.

Any pointers?
FreelanceReinh
Jade | Level 19

Hello @invalid,

 

Sorry for the late reply. I don't have a SAS/ETS license, so can't help you with PROC MODEL. But your problem can be solved with Base SAS alone, at least if c2>=0. Here is how:

 

data want(keep=b);
if &c2<0 then do;
  put "WAR" "NING: This algorithm doesn't handle the case c2<0!";
  stop;
end;
else if &c1<n*&c2 then do;
  put 'No solution exists because c1 < n*c2.';
  stop;
end;
if _n_=1 then do;
  if 0 then set mydata nobs=n;
  dcl hash h(dataset: 'mydata(keep=f)', multidata: 'y', ordered: 'y');
  h.definekey('f');
  h.definedone();
  dcl hiter hi('h');
  hi.first();
  minf=f;
  hi.last();
  maxf=f;
  rc=hi.next();
end;
if &c1=n*&c2 then do;
  if minf then lb=&c2/minf;
  if maxf then ub=&c2/maxf;
  if .z<maxf<0 | .z<minf<0=maxf then do;
    b=lb; output;
    put 'All b >= ' lb :best16. '(and only these) are solutions.';
  end;
  else if maxf=0 then do; /* In this case all f_i=0. */
    b=0; output;
    put 'All real numbers b are solutions.';
  end;
  else if maxf>0 then do;
    if minf>=0 then do;
      b=ub; output;
      put 'All b <= ' ub :best16. '(and only these) are solutions.';
    end;
    else if &c2>0 then do;
      b=lb; output; b=ub; output;
      put 'All b in the interval [' lb :best16. +(-1) ', ' ub :best16. +(-1) '] (and only these) are solutions.';
    end;
    else do; /* i.e. minf<&c2=0 */
      b=0; output;
      put 'b=0 is the only solution.';
    end;
  end;
end;
/* Main case: &c1>n*&c2 */
else if minf=maxf=0 then put 'No solution exists because c1 > n*c2 >= 0, but all f_i are zero.';
else do;
  if .z<minf<0<maxf then put 'There are exactly two solutions:';
  else put 'There is only one solution:';
  if .z<minf<0 then do;
    /* Determine a solution b<0 */
    hi.first();
    s=f;
    do m=1 to n;
      b=(&c1-(n-m)*&c2)/s;
      if hi.next()=0 & b*f>&c2 then s+f;
      else leave;
    end;
    put +3 b= best16.;
    output;
  end;
  if maxf>0 then do;
    /* Determine a solution b>0 */
    hi.last();
    s=f;
    do m=n to 1 by -1;
      b=(&c1-(m-1)*&c2)/s;
      if hi.prev()=0 & b*f>&c2 then s+f;
      else leave;
    end;
    put +3 b= best16.;
    output;
  end;
end;
stop;
run;

Again, the above solution requires c2>=0 -- I hope that this is consistent with your real c2 values -- and assumes that all f_i, i.e. the values of F in dataset MYDATA, are non-missing. (I haven't spent much time on the case c2<0, but I know that it is substantially different.) Run times were less than 3 seconds for n=10,000,000 observations on my machine.

 

The algorithm is based on a sorted sequence of the F values (sorted ascending or descending). To avoid a PROC SORT step I use a sorted hash table and a hash iterator, which is also convenient because the latter can traverse the table in both directions. But the algorithm could also be implemented without a hash object, especially if MYDATA was already sorted by F.

 

I've tested the program with integer values of c1 and c2. Please note that in case of non-integer values the tests of &c1<n*&c2 etc. could be affected by numerical accuracy issues. In this case appropriate rounding should be applied.

 

In some situations there are infinitely many solutions, e.g. an entire interval (which is then indicated in the log). In this case the output dataset WANT contains only the finite interval boundaries (or 0 if all real numbers are solutions).

 

Of course, it's easy to verify solutions in a simple data step and I would recommend doing so. (Note, for example, the potential risk of numerical accuracy issues in the tests of b*f>&c2.)

 

Feel free to ask if you have questions about the program.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 769 views
  • 0 likes
  • 3 in conversation