turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Macro for computing summation variable

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

02-07-2017 10:26 PM

Hi

I will sincerely appreciate if someone can help with the below issue:

I have the attached dataset snaphot of which is as below. I want to compute a variable that for each row first takes the value of cumafd_h, divides this into current and all the lagged values of afd_h, multiplies by the correspding value of mb_h (current when takes the current afd_h and lagged when takes lagged afd_h), and then sums the entire series. I want to do this for alll the rows and by gvkey. For example:

- for the 3rd row (gvkey 1000, fyear 1972), the variable needs to be compuated as 1.38*(-7.05/**-6.86**)+1.30*(.19/**-6.86**)+1.42*((.)/(-**6.86**))

- for the 4th row (gvkey 1000, fyear 1973), the variable needs to be compuated as 0.82*(-0.37/-7.23))+1.38*(-7.05/-7.23)+1.30*(.19/-7.23)+1.42*((.)/(-7.23))

I also want to impose the restriction that whenever the value within the parrenthesis is <0 , the it is resert as 0. For example - for the 4th row, (.19/-7.23) needs to be reset to 0.

This variable also needs to be computed by group gvkey.

My apologies if it is a lot of work.

gvkey | fyear | mb_h | afd_h | cumafd_h |

1000 | 1970 | 1.42 | ||

1000 | 1971 | 1.30 | 0.19 | 0.19 |

1000 | 1972 | 1.38 | -7.05 | -6.86 |

1000 | 1973 | 0.82 | -0.37 | -7.23 |

1000 | 1974 | 0.79 | -0.48 | -7.71 |

1000 | 1975 | 0.92 | -1.08 | -8.79 |

1000 | 1976 | 0.93 | 9.19 | 0.40 |

1000 | 1977 | 1.07 | 5.16 | 5.56 |

1001 | 1983 | 2.28 | ||

1001 | 1984 | 1.25 | 0.27 | 0.27 |

1001 | 1985 | 1.68 | 17.24 | 17.51 |

1002 | 1972 | 0.75 | ||

1003 | 1983 | 1.94 | ||

1003 | 1984 | 1.11 | -0.25 | -0.25 |

Accepted Solutions

Solution

02-08-2017
03:30 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

02-08-2017 12:37 AM - edited 02-10-2017 12:32 PM

Assuming I have identified an error in your numbers, then this should work (10feb2017: deprecated, please see second version below)

```
data want (drop=neg_sum pos_sum);
set have;
by gvkey;
retain pos_sum neg_sum;
if first.gvkey then call missing(pos_sum,neg_sum);
if sign(afd_h)=-1 then do;
neg_sum=sum(neg_sum,afd_h*mb_h);
result=neg_sum/cumafd_h;
end;
else if sign(afd_h)= 1 then do;
pos_sum=sum(pos_sum,afd_h*mb_h);
result=pos_sum/cumafd_h;
end;
run;
```

I leave the INCORRECT code above for those who saw this response previously. The correct code, below, now properly identifies when the POS_SUM vs NEG_SUM should be used. It produces the same results as @art297's elsewhere in this topic discussion.

```
data want (drop=neg_sum pos_sum);
set have;
by gvkey;
retain pos_sum neg_sum;
if first.gvkey then call missing(pos_sum,neg_sum);
if sign(afd_h)=-1 then neg_sum=sum(neg_sum,afd_h*mb_h); else
if sign(afd_h)= 1 then pos_sum=sum(pos_sum,afd_h*mb_h);
if sign(cumafd_h)=-1 then result=neg_sum/cumafd_h; else
if sign(cumafd_h)= 1 then result=pos_sum/cumafd_h;
end;
run;
```

The neg_sum is the sum of products of AFD_H*MB_H to be used when CUMAFD_H is negative. pOS_SUM is the same to be used when CUMAFD_H is positive. This implicitly sets the relevant parenthetical expressions (historic AFD_H divided by current CUMAFD_H) to zero when they would be negative.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to anisfiu

02-07-2017 10:42 PM

It would help if you showed the values you want computed for each row. Also, what do you want to call the new computed variable?

Art, CEO, AnalystFinder.com

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to art297

02-07-2017 11:09 PM

Hey Art

Thank you for your quick response.

I have computed values of the first few rows and called the variable wtdmb_h.

Revised excel attached

Appreciate this.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to anisfiu

02-07-2017 11:53 PM - edited 02-08-2017 12:25 AM

Please tell me how you get 1.38215 for 1972 and 1.35329 for 1973.

I ask because 1972 includes the expression (.19/-6.86) which is negative and therefore should be reset to zero. But you do not appear to do so.

Solution

02-08-2017
03:30 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

02-08-2017 12:37 AM - edited 02-10-2017 12:32 PM

Assuming I have identified an error in your numbers, then this should work (10feb2017: deprecated, please see second version below)

```
data want (drop=neg_sum pos_sum);
set have;
by gvkey;
retain pos_sum neg_sum;
if first.gvkey then call missing(pos_sum,neg_sum);
if sign(afd_h)=-1 then do;
neg_sum=sum(neg_sum,afd_h*mb_h);
result=neg_sum/cumafd_h;
end;
else if sign(afd_h)= 1 then do;
pos_sum=sum(pos_sum,afd_h*mb_h);
result=pos_sum/cumafd_h;
end;
run;
```

I leave the INCORRECT code above for those who saw this response previously. The correct code, below, now properly identifies when the POS_SUM vs NEG_SUM should be used. It produces the same results as @art297's elsewhere in this topic discussion.

```
data want (drop=neg_sum pos_sum);
set have;
by gvkey;
retain pos_sum neg_sum;
if first.gvkey then call missing(pos_sum,neg_sum);
if sign(afd_h)=-1 then neg_sum=sum(neg_sum,afd_h*mb_h); else
if sign(afd_h)= 1 then pos_sum=sum(pos_sum,afd_h*mb_h);
if sign(cumafd_h)=-1 then result=neg_sum/cumafd_h; else
if sign(cumafd_h)= 1 then result=pos_sum/cumafd_h;
end;
run;
```

The neg_sum is the sum of products of AFD_H*MB_H to be used when CUMAFD_H is negative. pOS_SUM is the same to be used when CUMAFD_H is positive. This implicitly sets the relevant parenthetical expressions (historic AFD_H divided by current CUMAFD_H) to zero when they would be negative.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

02-08-2017 03:30 PM

Hi mkeintz

Sorry for my late reply. Yes are absolutely right. My computation for 1972 was wrong. I ran your codes and it seems to work perfectly. Thank you very much.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

02-10-2017 10:11 AM

Hi mkeinthz

I was looking at the output randomly and also went over the logic of your codes again. I want to reset negative value of (afd_h/cumafd_h) to zero. It seems you are separating the neg and pos based on only afd_h. It seems not to address the cases when afd_h and cumafd_h are individually negative leading to a pos value for (afd_h/cumafd_h). I am looking into this again.

I do not expect you to fix this for me. For now I am good. If I fix this I will let you know. But for now it seems art's codes address those cases I mentioned.

Thanks

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to anisfiu

02-10-2017 12:27 PM

anisfiu wrote:

Hi mkeinthz

I was looking at the output randomly and also went over the logic of your codes again. I want to reset negative value of (afd_h/cumafd_h) to zero. It seems you are separating the neg and pos based on only afd_h. It seems not to address the cases when afd_h and cumafd_h are individually negative leading to a pos value for (afd_h/cumafd_h). I am looking into this again.

I do not expect you to fix this for me. For now I am good. If I fix this I will let you know. But for now it seems art's codes address those cases I mentioned.

Thanks

I believe you are correct, and that my code submitted earlier will produce erroneous results when sign(cumafd_h) ^= sign(afd_h). Don't know how I missed that. I'll edit my other post to include what I believe should be the correct code. It now completely matches what @art297's code produces. However, it does still offer some efficiencies in that it keeps 2 running totals instead of recalculating them with each new record.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to mkeintz

02-14-2017 12:13 PM

Thank you mkeintz.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to anisfiu

02-08-2017 12:57 AM

This gets the same results as @mkeintz's code, thus you should mark his as being the correct answer. My version runs slower, but might save you the time for checking whether his calculations follow your logic.

data want (drop=counter i); set t2; array mb(100) _temporary_; array af(100) _temporary_; by gvkey; if first.gvkey then counter=0; else do; counter+1; mb(counter)=mb_h; af(counter)=afd_h; wtdmb_h=0; do i=1 to counter; wtdmb_h=wtdmb_h+mb(i)*max(0,af(i)/cumafd_h); end; end; run;

HTH,

Art, CEO, AnalystFinder.com

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to art297

02-08-2017 03:31 PM

Hi art

Thank you very much for your solution to the problem. I ran your code and it seems to work perfectly.

Thanks