Solved: adjust variables to match totals

ciro · Posted 06-06-2022 06:09 AM

Hi,

I would like to adjust the two variables y1 and y2 of dataset have_det so that the total over idgroup is equal to the values in have tot.

I would like to do it with proc optmodel.

the constrains are:
y1sol >=0 ,y2sol >=0
sum(i in idgroup) y1sol=y1tot
sum(i in idgroup) y2sol=y2tot

and the objective is
min sum(i in idgroup) (y1-y1sol)*2
min sum(i in idgroup) (y2-y2sol)*2

basically they are two independent problems: one for y1 and one for y2.

I would appreciate very much for the code.

I would also appreciate any suggestions on the objective function so that the solution is one more similar to the one obtained by scaling the variable(s) to the total(s), that is a raking procedure.

thank you in advance
C

RobPratt · Posted 06-08-2022 05:39 PM

I think the following does what you want, with a separate QP solve for each of the two problems.

proc optmodel;
   set PROBS = 1..2;

   set IDS;
   num y {PROBS, IDS};
   str idgroup {IDS};
   read data have_det into IDS=[id] {p in PROBS} <y[p,id]=col('y'||p)> idgroup;

   set <str> GROUPS;
   num ytot {PROBS, GROUPS};
   read data have_tot into GROUPS=[g=idgroup] {p in PROBS} <ytot[p,g]=col('y'||p||'tot')>;

   num pThis;

   /* declare optimization model for one problem at a time */
   var yvar {IDS} >= 0;

   con SumPerGroup {g in GROUPS}:
      sum {i in IDS: idgroup[i] = g} yvar[i] = ytot[pThis,g];

   min Objective = sum {i in IDS} (y[pThis,i] - yvar[i])^2;

   num ysol {PROBS, IDS};

   /* call solver once for each problem */
   do pThis = PROBS;
      put pThis=;
      solve;
      for {i in IDS} ysol[pThis,i] = yvar[i];
   end;

   /* create SAS output data set */
   create data want from [id]
      {p in PROBS} <col('y'||p)=y[p,id]>
      idgroup
      {p in PROBS} <col('y'||p||'sol')=ysol[p,id]>;
quit;

View solution in original post

ciro · Posted 06-08-2022 06:00 AM

Hi,
please, is there anyone that can help on this?
thank you again for all the support

Patrick · Posted 06-08-2022 07:04 AM

I don't have the subject matter expertise to address your question but I can tell from reading what you provide that your question is certainly "underspecified".

If you're asking for support then you need to help your helpers by trying to formulate as clearly as possible what you have and what you need. It's normally also really appreciated if you already share some of your code (whether already fully working or not) as this demonstrates a) the level of your SAS skills so people can gear their answers toward it and b) that you've already spent time solving the problem and are not just fishing for free work from others.

And last but not least: Provide the fully working SAS code that creates the sample "Have" data, provide the desired result and how you want to get there. This again is "helping your helpers" so they can spend the time on the bits you actually need help with.

ciro · Posted 06-08-2022 09:33 AM

Thanks Patrick for the answer and the time spent in addressing the problems you saw in my message.
I did not share the code with my attempts because at the moment is a total mess. I tried to adapt something I found online, but there is more than one thing that I do not understand well.
As for the Want dataset I did not post it because I could not solve the problem to get there. As for the code to get the Have datasets it does not seem relevant to me since they are just subsets of bigger data I am working on.
I am just asking for help in this forum. I understand that everybody is very busy and not receiving an answer is a possible outcome for a question. However if you or anybody else felt I am "just fishing for free work" I apologize for this,
I take the chance to thank you and all the people that take the time to help me and anybody else that asks questions.

Tom · Posted 06-08-2022 09:37 AM

Sample data does not need to be the real data. Make an example that is small enough that you can calculate the result by hand so that you can demonstrate what you are trying to do.

ciro · Posted 06-08-2022 10:47 AM

The want dataset should be similar to the one obtained as follows:

proc sql;
create table want(drop=y1totdet y2totdet) as select a.*,
y1*(y1tot/y1totdet) as y1sol,
y2*(y2tot/y2totdet) as y2sol
from (select *,sum(y1) as y1totdet,
sum(y2) as y2totdet from have_det group by idgroup) as a
left join
have_tot as b
on a.idgroup=b.idgroup
order by id;
quit;

In this case the scaling obtained via a simple sql is enough. in fact the sum over idgroup are equal to the ones in have_tot dataset:
proc means data=want sum;
class idgroup;
var y1sol y2sol;
run;

but in other cases, due to rounding the totals do not match perfectly.

moreover there are cases when, for instance, all y1 in have_det dataset are 0 and cannot be scaled to an y1tot >0.

that's why I would like to solve the problem via an optimization procedure.

third, but only optional, it would be a plus to have the solutions to be integer numbers.

Probably I have to run the scaling first and then the optmodel procedure with the problem defined on the on the scaled variables.

However my main problem here is how to write the optmodel code to work across records within idgroups. That is what I am unable to do.

Hope this is clearer.

In any case, thank you again.

RobPratt · Posted 06-08-2022 05:39 PM

I think the following does what you want, with a separate QP solve for each of the two problems.

proc optmodel;
   set PROBS = 1..2;

   set IDS;
   num y {PROBS, IDS};
   str idgroup {IDS};
   read data have_det into IDS=[id] {p in PROBS} <y[p,id]=col('y'||p)> idgroup;

   set <str> GROUPS;
   num ytot {PROBS, GROUPS};
   read data have_tot into GROUPS=[g=idgroup] {p in PROBS} <ytot[p,g]=col('y'||p||'tot')>;

   num pThis;

   /* declare optimization model for one problem at a time */
   var yvar {IDS} >= 0;

   con SumPerGroup {g in GROUPS}:
      sum {i in IDS: idgroup[i] = g} yvar[i] = ytot[pThis,g];

   min Objective = sum {i in IDS} (y[pThis,i] - yvar[i])^2;

   num ysol {PROBS, IDS};

   /* call solver once for each problem */
   do pThis = PROBS;
      put pThis=;
      solve;
      for {i in IDS} ysol[pThis,i] = yvar[i];
   end;

   /* create SAS output data set */
   create data want from [id]
      {p in PROBS} <col('y'||p)=y[p,id]>
      idgroup
      {p in PROBS} <col('y'||p||'sol')=ysol[p,id]>;
quit;

ciro · Posted 06-09-2022 04:10 AM

Thank you so much Rob. It will help me a lot to learn the procedure by studying your code.

adjust variables to match totals

Re: adjust variables to match totals

Re: adjust variables to match totals

Re: adjust variables to match totals

Re: adjust variables to match totals

Re: adjust variables to match totals

Re: adjust variables to match totals

Re: adjust variables to match totals

Re: adjust variables to match totals

SAS Innovate 2025: Register Today!