BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
anisfiu
Calcite | Level 5

 

Hi

I will sincerely appreciate if someone can help with the below issue:

I have the attached dataset snaphot of which is as below. I want to compute a variable that for each row first takes the value of  cumafd_h, divides this into current and all the lagged values of afd_h, multiplies by the correspding value of mb_h (current when takes the current afd_h and lagged when takes lagged afd_h), and then sums the entire series. I want to do this for alll the rows and by gvkey. For example:

- for the 3rd row (gvkey 1000, fyear 1972), the variable needs to be compuated as 1.38*(-7.05/-6.86)+1.30*(.19/-6.86)+1.42*((.)/(-6.86))

- for the 4th row (gvkey 1000, fyear 1973), the variable needs to be compuated as  0.82*(-0.37/-7.23))+1.38*(-7.05/-7.23)+1.30*(.19/-7.23)+1.42*((.)/(-7.23))

 

I also want to impose the restriction that whenever the value within the parrenthesis is <0 , the it is resert as 0. For example - for the 4th row, (.19/-7.23) needs to be reset to 0.

 

This variable also needs to be computed by group gvkey.

 

My apologies if it is a lot of work.

 

gvkeyfyearmb_hafd_hcumafd_h
100019701.42  
100019711.300.190.19
100019721.38-7.05-6.86
100019730.82-0.37-7.23
100019740.79-0.48-7.71
100019750.92-1.08-8.79
100019760.939.190.40
100019771.075.165.56
100119832.28  
100119841.250.270.27
100119851.6817.2417.51
100219720.75  
100319831.94  
100319841.11-0.25-0.25
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming I have identified an error in your numbers, then this should work (10feb2017: deprecated, please see second version below)

 

data want (drop=neg_sum pos_sum);
  set have;
  by gvkey;

  retain pos_sum neg_sum;

  if first.gvkey then call missing(pos_sum,neg_sum);

  if sign(afd_h)=-1 then do;
  	neg_sum=sum(neg_sum,afd_h*mb_h);
	result=neg_sum/cumafd_h;
  end;
  else if sign(afd_h)= 1 then do;
	pos_sum=sum(pos_sum,afd_h*mb_h);
	result=pos_sum/cumafd_h;
  end;
run;

 

I leave the INCORRECT code above for those who saw this response previously.  The correct code, below, now properly identifies when the POS_SUM vs NEG_SUM should be used.  It produces the same results as @art297's elsewhere in this topic discussion.

 

data want (drop=neg_sum pos_sum);
  set have;
  by gvkey;

  retain pos_sum neg_sum;

  if first.gvkey then call missing(pos_sum,neg_sum);

  if sign(afd_h)=-1 then neg_sum=sum(neg_sum,afd_h*mb_h); else
  if sign(afd_h)= 1 then pos_sum=sum(pos_sum,afd_h*mb_h);

  if sign(cumafd_h)=-1 then result=neg_sum/cumafd_h; else
  if sign(cumafd_h)= 1 then result=pos_sum/cumafd_h;
  end;
run;

The neg_sum is the sum of products  of AFD_H*MB_H to be used when CUMAFD_H is negative.  pOS_SUM is the same to be used when  CUMAFD_H is positive. This implicitly sets the relevant parenthetical expressions (historic AFD_H divided by current CUMAFD_H) to zero when they would be negative.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

10 REPLIES 10
art297
Opal | Level 21

It would help if you showed the values you want computed for each row. Also, what do you want to call the new computed variable?

 

Art, CEO, AnalystFinder.com

 

anisfiu
Calcite | Level 5

Hey Art

 

Thank you for your quick response.

I have computed values of the first few rows and called the variable wtdmb_h. 

Revised excel attached

 

Appreciate this.

mkeintz
PROC Star

Please tell me how you get 1.38215 for 1972 and 1.35329 for 1973.

 

I ask because 1972 includes the expression (.19/-6.86) which is negative and therefore should be reset to zero. But you do not appear to do so.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
mkeintz
PROC Star

Assuming I have identified an error in your numbers, then this should work (10feb2017: deprecated, please see second version below)

 

data want (drop=neg_sum pos_sum);
  set have;
  by gvkey;

  retain pos_sum neg_sum;

  if first.gvkey then call missing(pos_sum,neg_sum);

  if sign(afd_h)=-1 then do;
  	neg_sum=sum(neg_sum,afd_h*mb_h);
	result=neg_sum/cumafd_h;
  end;
  else if sign(afd_h)= 1 then do;
	pos_sum=sum(pos_sum,afd_h*mb_h);
	result=pos_sum/cumafd_h;
  end;
run;

 

I leave the INCORRECT code above for those who saw this response previously.  The correct code, below, now properly identifies when the POS_SUM vs NEG_SUM should be used.  It produces the same results as @art297's elsewhere in this topic discussion.

 

data want (drop=neg_sum pos_sum);
  set have;
  by gvkey;

  retain pos_sum neg_sum;

  if first.gvkey then call missing(pos_sum,neg_sum);

  if sign(afd_h)=-1 then neg_sum=sum(neg_sum,afd_h*mb_h); else
  if sign(afd_h)= 1 then pos_sum=sum(pos_sum,afd_h*mb_h);

  if sign(cumafd_h)=-1 then result=neg_sum/cumafd_h; else
  if sign(cumafd_h)= 1 then result=pos_sum/cumafd_h;
  end;
run;

The neg_sum is the sum of products  of AFD_H*MB_H to be used when CUMAFD_H is negative.  pOS_SUM is the same to be used when  CUMAFD_H is positive. This implicitly sets the relevant parenthetical expressions (historic AFD_H divided by current CUMAFD_H) to zero when they would be negative.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
anisfiu
Calcite | Level 5

Hi mkeintz

 

Sorry for my late reply. Yes are absolutely right. My computation for 1972 was wrong. I ran your codes and it seems to work perfectly. Thank you very much.

anisfiu
Calcite | Level 5

Hi mkeinthz

 

I was looking at the output randomly and also went over the logic of your codes again. I want to reset negative value of (afd_h/cumafd_h) to zero. It seems you are separating the neg and pos based on only afd_h. It seems not to address the cases when afd_h and cumafd_h are individually negative leading to a pos value for (afd_h/cumafd_h). I am looking into this again.

 

I do not expect you to fix this for me. For now I am good. If I fix this I will let you know. But for now it seems art's codes address those cases I mentioned.

 

Thanks

mkeintz
PROC Star

@anisfiu wrote:

Hi mkeinthz

 

I was looking at the output randomly and also went over the logic of your codes again. I want to reset negative value of (afd_h/cumafd_h) to zero. It seems you are separating the neg and pos based on only afd_h. It seems not to address the cases when afd_h and cumafd_h are individually negative leading to a pos value for (afd_h/cumafd_h). I am looking into this again.

 

I do not expect you to fix this for me. For now I am good. If I fix this I will let you know. But for now it seems art's codes address those cases I mentioned.

 

Thanks


@anisfiu

 

I believe you are correct, and that my code submitted earlier will produce erroneous results when sign(cumafd_h) ^= sign(afd_h).  Don't know how I missed that.  I'll edit my other post to include what I believe should be the correct code.  It now completely matches what @art297's code produces.  However, it does still offer some efficiencies in that it keeps 2 running totals instead of recalculating them with each new record.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
anisfiu
Calcite | Level 5

Thank you mkeintz.

art297
Opal | Level 21

This gets the same results as @mkeintz's code, thus you should mark his as being the correct answer. My version runs slower, but might save you the time for checking whether his calculations follow your logic.

 

data want (drop=counter i);
  set t2;
  array mb(100) _temporary_;
  array af(100) _temporary_;
  by gvkey;
  if first.gvkey then counter=0;
  else do;
    counter+1;
    mb(counter)=mb_h;
    af(counter)=afd_h;
    wtdmb_h=0;
    do i=1 to counter;
      wtdmb_h=wtdmb_h+mb(i)*max(0,af(i)/cumafd_h);
    end;
  end;
run;

HTH,

Art, CEO, AnalystFinder.com

anisfiu
Calcite | Level 5

Hi art

 

Thank you very much for your solution to the problem. I ran your code and it seems to work perfectly.

 

Thanks

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 10 replies
  • 1654 views
  • 1 like
  • 3 in conversation