Solved: Re: Macro for computing summation variable

anisfiu · Posted 02-07-2017 10:26 PM

Hi

I will sincerely appreciate if someone can help with the below issue:

I have the attached dataset snaphot of which is as below. I want to compute a variable that for each row first takes the value of cumafd_h, divides this into current and all the lagged values of afd_h, multiplies by the correspding value of mb_h (current when takes the current afd_h and lagged when takes lagged afd_h), and then sums the entire series. I want to do this for alll the rows and by gvkey. For example:

- for the 3rd row (gvkey 1000, fyear 1972), the variable needs to be compuated as 1.38*(-7.05/-6.86)+1.30*(.19/-6.86)+1.42*((.)/(-6.86))

- for the 4th row (gvkey 1000, fyear 1973), the variable needs to be compuated as 0.82*(-0.37/-7.23))+1.38*(-7.05/-7.23)+1.30*(.19/-7.23)+1.42*((.)/(-7.23))

I also want to impose the restriction that whenever the value within the parrenthesis is <0 , the it is resert as 0. For example - for the 4th row, (.19/-7.23) needs to be reset to 0.

This variable also needs to be computed by group gvkey.

My apologies if it is a lot of work.

gvkey	fyear	mb_h	afd_h	cumafd_h
1000	1970	1.42
1000	1971	1.30	0.19	0.19
1000	1972	1.38	-7.05	-6.86
1000	1973	0.82	-0.37	-7.23
1000	1974	0.79	-0.48	-7.71
1000	1975	0.92	-1.08	-8.79
1000	1976	0.93	9.19	0.40
1000	1977	1.07	5.16	5.56
1001	1983	2.28
1001	1984	1.25	0.27	0.27
1001	1985	1.68	17.24	17.51
1002	1972	0.75
1003	1983	1.94
1003	1984	1.11	-0.25	-0.25

mkeintz · Posted 02-08-2017 12:37 AM

Assuming I have identified an error in your numbers, then this should work (10feb2017: deprecated, please see second version below)

data want (drop=neg_sum pos_sum);
  set have;
  by gvkey;

  retain pos_sum neg_sum;

  if first.gvkey then call missing(pos_sum,neg_sum);

  if sign(afd_h)=-1 then do;
  	neg_sum=sum(neg_sum,afd_h*mb_h);
	result=neg_sum/cumafd_h;
  end;
  else if sign(afd_h)= 1 then do;
	pos_sum=sum(pos_sum,afd_h*mb_h);
	result=pos_sum/cumafd_h;
  end;
run;

I leave the INCORRECT code above for those who saw this response previously. The correct code, below, now properly identifies when the POS_SUM vs NEG_SUM should be used. It produces the same results as @art297's elsewhere in this topic discussion.

data want (drop=neg_sum pos_sum);
  set have;
  by gvkey;

  retain pos_sum neg_sum;

  if first.gvkey then call missing(pos_sum,neg_sum);

  if sign(afd_h)=-1 then neg_sum=sum(neg_sum,afd_h*mb_h); else
  if sign(afd_h)= 1 then pos_sum=sum(pos_sum,afd_h*mb_h);

  if sign(cumafd_h)=-1 then result=neg_sum/cumafd_h; else
  if sign(cumafd_h)= 1 then result=pos_sum/cumafd_h;
  end;
run;

The neg_sum is the sum of products of AFD_H*MB_H to be used when CUMAFD_H is negative. pOS_SUM is the same to be used when CUMAFD_H is positive. This implicitly sets the relevant parenthetical expressions (historic AFD_H divided by current CUMAFD_H) to zero when they would be negative.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

art297 · Posted 02-07-2017 10:42 PM

It would help if you showed the values you want computed for each row. Also, what do you want to call the new computed variable?

Art, CEO, AnalystFinder.com

anisfiu · Posted 02-07-2017 11:09 PM

Hey Art

Thank you for your quick response.

I have computed values of the first few rows and called the variable wtdmb_h.

Revised excel attached

Appreciate this.

mkeintz · Posted 02-07-2017 11:53 PM

Please tell me how you get 1.38215 for 1972 and 1.35329 for 1973.

I ask because 1972 includes the expression (.19/-6.86) which is negative and therefore should be reset to zero. But you do not appear to do so.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

mkeintz · Posted 02-08-2017 12:37 AM

Assuming I have identified an error in your numbers, then this should work (10feb2017: deprecated, please see second version below)

data want (drop=neg_sum pos_sum);
  set have;
  by gvkey;

  retain pos_sum neg_sum;

  if first.gvkey then call missing(pos_sum,neg_sum);

  if sign(afd_h)=-1 then do;
  	neg_sum=sum(neg_sum,afd_h*mb_h);
	result=neg_sum/cumafd_h;
  end;
  else if sign(afd_h)= 1 then do;
	pos_sum=sum(pos_sum,afd_h*mb_h);
	result=pos_sum/cumafd_h;
  end;
run;

I leave the INCORRECT code above for those who saw this response previously. The correct code, below, now properly identifies when the POS_SUM vs NEG_SUM should be used. It produces the same results as @art297's elsewhere in this topic discussion.

data want (drop=neg_sum pos_sum);
  set have;
  by gvkey;

  retain pos_sum neg_sum;

  if first.gvkey then call missing(pos_sum,neg_sum);

  if sign(afd_h)=-1 then neg_sum=sum(neg_sum,afd_h*mb_h); else
  if sign(afd_h)= 1 then pos_sum=sum(pos_sum,afd_h*mb_h);

  if sign(cumafd_h)=-1 then result=neg_sum/cumafd_h; else
  if sign(cumafd_h)= 1 then result=pos_sum/cumafd_h;
  end;
run;

The neg_sum is the sum of products of AFD_H*MB_H to be used when CUMAFD_H is negative. pOS_SUM is the same to be used when CUMAFD_H is positive. This implicitly sets the relevant parenthetical expressions (historic AFD_H divided by current CUMAFD_H) to zero when they would be negative.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

anisfiu · Posted 02-08-2017 03:30 PM

Hi mkeintz

Sorry for my late reply. Yes are absolutely right. My computation for 1972 was wrong. I ran your codes and it seems to work perfectly. Thank you very much.

anisfiu · Posted 02-10-2017 10:11 AM

Hi mkeinthz

I was looking at the output randomly and also went over the logic of your codes again. I want to reset negative value of (afd_h/cumafd_h) to zero. It seems you are separating the neg and pos based on only afd_h. It seems not to address the cases when afd_h and cumafd_h are individually negative leading to a pos value for (afd_h/cumafd_h). I am looking into this again.

I do not expect you to fix this for me. For now I am good. If I fix this I will let you know. But for now it seems art's codes address those cases I mentioned.

Thanks

mkeintz · Posted 02-10-2017 12:27 PM

@anisfiu wrote:

Hi mkeinthz

I was looking at the output randomly and also went over the logic of your codes again. I want to reset negative value of (afd_h/cumafd_h) to zero. It seems you are separating the neg and pos based on only afd_h. It seems not to address the cases when afd_h and cumafd_h are individually negative leading to a pos value for (afd_h/cumafd_h). I am looking into this again.

I do not expect you to fix this for me. For now I am good. If I fix this I will let you know. But for now it seems art's codes address those cases I mentioned.

Thanks

@anisfiu

I believe you are correct, and that my code submitted earlier will produce erroneous results when sign(cumafd_h) ^= sign(afd_h). Don't know how I missed that. I'll edit my other post to include what I believe should be the correct code. It now completely matches what @art297's code produces. However, it does still offer some efficiencies in that it keeps 2 running totals instead of recalculating them with each new record.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

anisfiu · Posted 02-14-2017 12:13 PM

Thank you mkeintz.

art297 · Posted 02-08-2017 12:57 AM

This gets the same results as @mkeintz's code, thus you should mark his as being the correct answer. My version runs slower, but might save you the time for checking whether his calculations follow your logic.

data want (drop=counter i);
  set t2;
  array mb(100) _temporary_;
  array af(100) _temporary_;
  by gvkey;
  if first.gvkey then counter=0;
  else do;
    counter+1;
    mb(counter)=mb_h;
    af(counter)=afd_h;
    wtdmb_h=0;
    do i=1 to counter;
      wtdmb_h=wtdmb_h+mb(i)*max(0,af(i)/cumafd_h);
    end;
  end;
run;

HTH,

Art, CEO, AnalystFinder.com

anisfiu · Posted 02-08-2017 03:31 PM

Hi art

Thank you very much for your solution to the problem. I ran your code and it seems to work perfectly.

Thanks

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away