BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
art297
Opal | Level 21

I still don't know if I fully understand the problem you are trying to solve, but the following matches your test data results:

proc sql noprint;

  create table want as

    select distinct a.portid,a.firmid,a.myear,a.return,b.sumreturn, b.cntreturn,bportid

      from return as a

        left join

          (select portid as bportid, myear, sum(return) as sumreturn, count(return) as cntreturn

               from return group by myear,bportid) as b

             on a.myear=b.myear

               where a.portid ne bportid

                 order by firmid,myear,portid

  ;

quit;

data want (drop=next_measure prev_measure measure);

  set want;

  by firmid myear;

  set want ( firstobs = 2 keep = return rename = (return = Next_Measure) )

      want (      obs = 1 drop = _all_                                     );

  Prev_Measure = ifn( first.myear, (.), lag(return) );

  Next_Measure = ifn(  last.myear, (.), Next_Measure );

  if not (first.myear and last.myear) then do;

    if first.myear then measure=next_measure;

    else measure=prev_measure;

 

    sumreturn=sumreturn-measure;

    cntreturn=cntreturn-1;

  end;

  average=sumreturn/cntreturn;

run;

proc sort data=want;

  by myear portid  firmid;

run;

art297
Opal | Level 21

You apparently updated your post at the same time as I sent you a response.  Take a look at what I had sent.

finans_sas
Quartz | Level 8

Thank you Arthur (art297) for spending your valuable time helping me out. Your updated version indeed generates the desirable output. However, it will take me a while to digest what exactly is going on in the code and to make sure that I can comfortably apply to my large data set. I am really sorry that I have not made my inquiry very clear. Maybe the story below may shed some light:

Imagine that you have two portfolios (portid 1 and 2) and company A is in both portid 1 and portid2. Portid 1 also contains Companies B and C. Portid 2 contains Companies D and E. In short,

Portid 1: A, B, C

Portid 2: A, D, E

I am interested in calculating an average value for each company by averaging returns of companies in other portfolios. For example,

average for C: (A+D+E)/3, all of which come from portid 2 (A is common)

average for E: (A+B+C)/3, all of which come from portid 1

average for A in portid 1= (D+E)/2

average for A in portid 2=(B+C)/2

Overall, I have two rules: a company's average return must exclude its own return (as in A) and must only include returns from other portfolios.

I hope my objective is clearer now.

art297
Opal | Level 21

As long as your real data is limited to two portfolios I think the code will work.  In the case where there is only one portfolio, it simply takes the average.  In the case where there are two portfolios, it gets the revenue to removed from the other portfolio.

finans_sas
Quartz | Level 8

Thank you for your further explanations. In this project, I have 50 portfolios, but sometimes it gets as high as 400. I should then be careful about the portfolio size.

art297
Opal | Level 21

If you post an example with 3 portfolios, and the results you want, we can test whether the methodology holds.

finans_sas
Quartz | Level 8

I just realized that my program also does not work when I have more than two portfolios. See the data set below. For example, for portid=6 and firmid=620, the average should be calculated averaging returns all of companies in portid 1 through portid 5 (where myear=200302), but all of the suggestions seem to calculate 4 or 5 different average for portid=6 and firmid=620.

DATA return;

INPUT portid firmid myear return;

DATALINES;

1 100 200301  2.11

1 120 200301  3.12

1 130 200301  2.13

1 140 200301  3.67

1 100 200302  5.23

1 120 200302  6.81

1 130 200302  2.55

1 140 200302  4.21

1 150 200302  3.21

2 300 200301  1.11

2 320 200301  0.12

2 330 200301 -2.13

2 340 200301 -1.67

2 300 200302  4.23

2 320 200302  3.81

2 330 200302  1.55

2 340 200302  2.21

2 350 200302  4.23

2 100 200302  5.23

3 56 200301  3.11

3 100 200301  2.11

3 763 200301  5.13

3 412 200301  4.67

3 124 200302  3.23

3 97 200302  3.81

3 195 200302  1.55

4 740 200302  0.21

4 250 200302  -1.21

4 812 200301  -1.11

4 823 200301  7.12

5 835 200301 -3.33

5 947 200301 -4.67

5 980 200302  4.23

6 620 200302  3.81

;

run;

ghastly_kitten
Fluorite | Level 6

:smileygrin: funny.

There was a mistake in my decision, but it still works for your problem.

Here step by step (so it'll be easier to check the correctness of all steps):

Just copypaste and run it.

DATA return;

INPUT portid firmid myear return;

DATALINES;

1 100 200301  2.11

1 120 200301  3.12

1 130 200301  2.13

1 140 200301  3.67

1 100 200302  5.23

1 120 200302  6.81

1 130 200302  2.55

1 140 200302  4.21

1 150 200302  3.21

2 300 200301  1.11

2 320 200301  0.12

2 330 200301 -2.13

2 340 200301 -1.67

2 300 200302  4.23

2 320 200302  3.81

2 330 200302  1.55

2 340 200302  2.21

2 350 200302  4.23

2 100 200302  5.23

3 56 200301  3.11

3 100 200301  2.11

3 763 200301  5.13

3 412 200301  4.67

3 124 200302  3.23

3 97 200302  3.81

3 195 200302  1.55

4 740 200302  0.21

4 250 200302  -1.21

4 812 200301  -1.11

4 823 200301  7.12

5 835 200301 -3.33

5 947 200301 -4.67

5 980 200302  4.23

6 620 200302  3.81

;

run;

proc sql;

  create table portfolio_averages as

  select

  portid,

  myear,

  ( sum(S_portid) - S_portid ) as SUM_ret_other_portfolios,

  ( sum(N_portid) - N_portid ) as N_other_portid

  from

  (

    select

         portid,

         myear,

         sum(return) as S_portid,

         count(return) as N_portid

     from

      return

     group by

      portid, myear

  )

  group by

  myear

    order by

    myear, portid;

quit;

proc sql;

create table Firms_averages as

     select

          firmid,

   myear,

          SUM(return) as SUM_firmid,

          count(return) as N_firmid

     from

          return

     group by

          myear, firmid

  order by

   myear, firmid;

quit;

/* here we add the return to the sum, because we extract it (it's used in SUM_firmid, while it shouldn't be used at all) */

proc sql;

  create table RETURN as

  select

  r.firmid,

  r.portid,

  r.myear,

  r.return,

  (p.SUM_ret_other_portfolios - f.SUM_firmid + r.return) / (p.N_other_portid - f.N_firmid + 1) as return_estim

  from

  return r

  left join Firms_averages f on

  r.firmid = f.firmid and r.myear = f.myear

  left join portfolio_averages p on

  p.myear = r.myear and p.portid = r.portid

  order by

  portid, myear, firmid;

quit;

If you wish so, as proposed by DanielSantos, you can integrate everything in one sql query.

Just substitute the ( sql code ) of a certain table instead of it's alias in a final query, but as I mentioned above - it will be hard to check the correctness, while maximum momentary hard-disk space consumption would be the same.

I checked it manually for several firmids, portfolios and years and still think - that this will solve your task.

DanielSantos
Barite | Level 11

OK, now your last sample is a good one.

I've updated my proposal (in blue) to suit the multiplicity of the data. It's actually just a matter of grouping the data. Again, as along as there is no dups by portid, firmid and myear, I think it will do what you need in a efficiently manner.

Logic is the same from my previous post, average the cross join (with the convenient clause) aggregates by portid/myear and firmid/myear subtracting the current return, which is not to be considered.

proc sql noprint;

create table want (drop = _:) as

select a.portid, a.firmid, a.myear, a.return,

       sum(b.sumreturn)-c.sumreturn+a.return as _sum, sum(b.cntreturn)-c.cntreturn+1 as _cnt,

       calculated _sum / calculated _cnt as avgreturn

       from

       return as a,

       (select portid, myear, sum(return) as sumreturn, count(return) as cntreturn from return group by portid, myear) as b,

       (select firmid, myear, sum(return) as sumreturn, count(return) as cntreturn from return group by firmid, myear) as c

where a.portid ne b.portid and a.myear eq b.myear and a.firmid eq c.firmid and b.myear eq c.myear

group by a.portid, a.firmid,  a.myear, a.return;

quit;


Cheers from Portugal.

Daniel Santos @ www.cgd.pt


ghastly_kitten
Fluorite | Level 6

Cool. I tested it and result on a sample coincides with mine. And it's shorter!

The best decision here Smiley Happy

finans_sas
Quartz | Level 8

Thank you all for your invaluable help. This is really a great community dedicated to sharing and fostering our knowledge.

Best,

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 25 replies
  • 3910 views
  • 14 likes
  • 7 in conversation