Hi
I will sincerely appreciate if someone can help with the below issue:
I have the attached dataset snaphot of which is as below. I want to compute a variable that for each row first takes the value of cumafd_h, divides this into current and all the lagged values of afd_h, multiplies by the correspding value of mb_h (current when takes the current afd_h and lagged when takes lagged afd_h), and then sums the entire series. I want to do this for alll the rows and by gvkey. For example:
- for the 3rd row (gvkey 1000, fyear 1972), the variable needs to be compuated as 1.38*(-7.05/-6.86)+1.30*(.19/-6.86)+1.42*((.)/(-6.86))
- for the 4th row (gvkey 1000, fyear 1973), the variable needs to be compuated as 0.82*(-0.37/-7.23))+1.38*(-7.05/-7.23)+1.30*(.19/-7.23)+1.42*((.)/(-7.23))
I also want to impose the restriction that whenever the value within the parrenthesis is <0 , the it is resert as 0. For example - for the 4th row, (.19/-7.23) needs to be reset to 0.
This variable also needs to be computed by group gvkey.
My apologies if it is a lot of work.
gvkey | fyear | mb_h | afd_h | cumafd_h |
1000 | 1970 | 1.42 | ||
1000 | 1971 | 1.30 | 0.19 | 0.19 |
1000 | 1972 | 1.38 | -7.05 | -6.86 |
1000 | 1973 | 0.82 | -0.37 | -7.23 |
1000 | 1974 | 0.79 | -0.48 | -7.71 |
1000 | 1975 | 0.92 | -1.08 | -8.79 |
1000 | 1976 | 0.93 | 9.19 | 0.40 |
1000 | 1977 | 1.07 | 5.16 | 5.56 |
1001 | 1983 | 2.28 | ||
1001 | 1984 | 1.25 | 0.27 | 0.27 |
1001 | 1985 | 1.68 | 17.24 | 17.51 |
1002 | 1972 | 0.75 | ||
1003 | 1983 | 1.94 | ||
1003 | 1984 | 1.11 | -0.25 | -0.25 |
Assuming I have identified an error in your numbers, then this should work (10feb2017: deprecated, please see second version below)
data want (drop=neg_sum pos_sum);
set have;
by gvkey;
retain pos_sum neg_sum;
if first.gvkey then call missing(pos_sum,neg_sum);
if sign(afd_h)=-1 then do;
neg_sum=sum(neg_sum,afd_h*mb_h);
result=neg_sum/cumafd_h;
end;
else if sign(afd_h)= 1 then do;
pos_sum=sum(pos_sum,afd_h*mb_h);
result=pos_sum/cumafd_h;
end;
run;
I leave the INCORRECT code above for those who saw this response previously. The correct code, below, now properly identifies when the POS_SUM vs NEG_SUM should be used. It produces the same results as @art297's elsewhere in this topic discussion.
data want (drop=neg_sum pos_sum);
set have;
by gvkey;
retain pos_sum neg_sum;
if first.gvkey then call missing(pos_sum,neg_sum);
if sign(afd_h)=-1 then neg_sum=sum(neg_sum,afd_h*mb_h); else
if sign(afd_h)= 1 then pos_sum=sum(pos_sum,afd_h*mb_h);
if sign(cumafd_h)=-1 then result=neg_sum/cumafd_h; else
if sign(cumafd_h)= 1 then result=pos_sum/cumafd_h;
end;
run;
The neg_sum is the sum of products of AFD_H*MB_H to be used when CUMAFD_H is negative. pOS_SUM is the same to be used when CUMAFD_H is positive. This implicitly sets the relevant parenthetical expressions (historic AFD_H divided by current CUMAFD_H) to zero when they would be negative.
It would help if you showed the values you want computed for each row. Also, what do you want to call the new computed variable?
Art, CEO, AnalystFinder.com
Hey Art
Thank you for your quick response.
I have computed values of the first few rows and called the variable wtdmb_h.
Revised excel attached
Appreciate this.
Please tell me how you get 1.38215 for 1972 and 1.35329 for 1973.
I ask because 1972 includes the expression (.19/-6.86) which is negative and therefore should be reset to zero. But you do not appear to do so.
Assuming I have identified an error in your numbers, then this should work (10feb2017: deprecated, please see second version below)
data want (drop=neg_sum pos_sum);
set have;
by gvkey;
retain pos_sum neg_sum;
if first.gvkey then call missing(pos_sum,neg_sum);
if sign(afd_h)=-1 then do;
neg_sum=sum(neg_sum,afd_h*mb_h);
result=neg_sum/cumafd_h;
end;
else if sign(afd_h)= 1 then do;
pos_sum=sum(pos_sum,afd_h*mb_h);
result=pos_sum/cumafd_h;
end;
run;
I leave the INCORRECT code above for those who saw this response previously. The correct code, below, now properly identifies when the POS_SUM vs NEG_SUM should be used. It produces the same results as @art297's elsewhere in this topic discussion.
data want (drop=neg_sum pos_sum);
set have;
by gvkey;
retain pos_sum neg_sum;
if first.gvkey then call missing(pos_sum,neg_sum);
if sign(afd_h)=-1 then neg_sum=sum(neg_sum,afd_h*mb_h); else
if sign(afd_h)= 1 then pos_sum=sum(pos_sum,afd_h*mb_h);
if sign(cumafd_h)=-1 then result=neg_sum/cumafd_h; else
if sign(cumafd_h)= 1 then result=pos_sum/cumafd_h;
end;
run;
The neg_sum is the sum of products of AFD_H*MB_H to be used when CUMAFD_H is negative. pOS_SUM is the same to be used when CUMAFD_H is positive. This implicitly sets the relevant parenthetical expressions (historic AFD_H divided by current CUMAFD_H) to zero when they would be negative.
Hi mkeintz
Sorry for my late reply. Yes are absolutely right. My computation for 1972 was wrong. I ran your codes and it seems to work perfectly. Thank you very much.
Hi mkeinthz
I was looking at the output randomly and also went over the logic of your codes again. I want to reset negative value of (afd_h/cumafd_h) to zero. It seems you are separating the neg and pos based on only afd_h. It seems not to address the cases when afd_h and cumafd_h are individually negative leading to a pos value for (afd_h/cumafd_h). I am looking into this again.
I do not expect you to fix this for me. For now I am good. If I fix this I will let you know. But for now it seems art's codes address those cases I mentioned.
Thanks
@anisfiu wrote:
Hi mkeinthz
I was looking at the output randomly and also went over the logic of your codes again. I want to reset negative value of (afd_h/cumafd_h) to zero. It seems you are separating the neg and pos based on only afd_h. It seems not to address the cases when afd_h and cumafd_h are individually negative leading to a pos value for (afd_h/cumafd_h). I am looking into this again.
I do not expect you to fix this for me. For now I am good. If I fix this I will let you know. But for now it seems art's codes address those cases I mentioned.
Thanks
I believe you are correct, and that my code submitted earlier will produce erroneous results when sign(cumafd_h) ^= sign(afd_h). Don't know how I missed that. I'll edit my other post to include what I believe should be the correct code. It now completely matches what @art297's code produces. However, it does still offer some efficiencies in that it keeps 2 running totals instead of recalculating them with each new record.
Thank you mkeintz.
This gets the same results as @mkeintz's code, thus you should mark his as being the correct answer. My version runs slower, but might save you the time for checking whether his calculations follow your logic.
data want (drop=counter i); set t2; array mb(100) _temporary_; array af(100) _temporary_; by gvkey; if first.gvkey then counter=0; else do; counter+1; mb(counter)=mb_h; af(counter)=afd_h; wtdmb_h=0; do i=1 to counter; wtdmb_h=wtdmb_h+mb(i)*max(0,af(i)/cumafd_h); end; end; run;
HTH,
Art, CEO, AnalystFinder.com
Hi art
Thank you very much for your solution to the problem. I ran your code and it seems to work perfectly.
Thanks
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.