Good evening SAS community,
I have once again run into a problem which seems pretty complex to me and I have no idea how to solve it. I have also not found anything on this forum that comes close to my problem but it is really hard to put in words so I might've missed it. If this is the case I'm extremely sorry. Once again I'm pretty new to SAS and thus do not have a good understanding of how to properly display my problem here.
I have the following data:
ID; DATE; PERMNO; RETURN; CAPITALIZATION;
1 19990202 a 0.01 400
1 19990202 b 0.04 100
1 19990203 a 0.01 400
1 19990203 b 0.04 100
2 20000101 c 0.01 800
2 20000102 c 0.02 800
3 19300812 d 0.03 750
3 19300813 d 0.03 750
The main problem is that some of my IDs have multiple permnos for the same date (see observations 1+2 & 3+4).
What I want to happen is that the observations where this is the case are combined to one in the following way:
-ID: stays the same
-Date: stays the same
-Permno: it doesn't really matter which one is taken, either the first or the second one, whatever is easier I guess
-Return: Now here comes the funny part... 😄 what I want to happen is: A new return is calculated from the old ones in the following way (a.return*(a.capitalization/(a.capitalization+b.capitalization))+b.return*(b.capitalization/(a.capitalization+b.capitalization)))
-Capitalization: take the sum of the two capitalization (a.capitalization+b.capitalization)
I have a basic idea on how to identify the specific observations where this problem is apparent. I thought of something like:
if ID=lag(ID) AND date=lag(date) AND permno^=lag(permno)
then...
and now I do not know how to go on. I have no idea how to properly combine two observations, especially including the calculation of a new return.
Thank you very much in advance for helping me out. SAS can be so hard to understand as a newbie sometimes..
Have a great evening.
Best regards
Nici
PS. I'm running on SAS 9.4
This:
data have;
input ID DATE:yymmdd8. PERMNO $ RETURN CAPITALIZATION;
format date yymmdd10.;
datalines;
1 19990202 a 0.01 400
1 19990202 b 0.04 100
1 19990203 a 0.01 400
1 19990203 b 0.04 100
2 20000101 c 0.01 800
2 20000102 c 0.02 800
3 19300812 d 0.03 750
3 19300813 d 0.03 750
;
proc sql;
create table want as
select
id,
date,
min(permNo) as permNo,
sum(return*capitalization)/ sum(capitalization) as return,
sum(capitalization) as capitalization
from have
group by id, date;
select * from want;
quit;
ID DATE permNo return capitalization -------------------------------------------------------- 1 1999-02-02 a 0.016 500 1 1999-02-03 a 0.016 500 2 2000-01-01 c 0.01 800 2 2000-01-02 c 0.02 800 3 1930-08-12 d 0.03 750 3 1930-08-13 d 0.03 750
would multiple permnos be sets of two or
can it be any number of distinct values a,b,c, and so on ?
This:
data have;
input ID DATE:yymmdd8. PERMNO $ RETURN CAPITALIZATION;
format date yymmdd10.;
datalines;
1 19990202 a 0.01 400
1 19990202 b 0.04 100
1 19990203 a 0.01 400
1 19990203 b 0.04 100
2 20000101 c 0.01 800
2 20000102 c 0.02 800
3 19300812 d 0.03 750
3 19300813 d 0.03 750
;
proc sql;
create table want as
select
id,
date,
min(permNo) as permNo,
sum(return*capitalization)/ sum(capitalization) as return,
sum(capitalization) as capitalization
from have
group by id, date;
select * from want;
quit;
ID DATE permNo return capitalization -------------------------------------------------------- 1 1999-02-02 a 0.016 500 1 1999-02-03 a 0.016 500 2 2000-01-01 c 0.01 800 2 2000-01-02 c 0.02 800 3 1930-08-12 d 0.03 750 3 1930-08-13 d 0.03 750
Hey @PGStats ,
I've been using your code successfully. Somehow I'm having problems now however. Before, as shown in your example, I received only one combined observation. Now the observations are still combined, however, I receive duplicates. If 2 observations where combined, I get the combined observation twice. Why is this the case? I'm really confused because the code worked perfectly before.
I also noticed the following message in the log: "The query requires remerging summary statistics back with the original data"
instead of this output
ID DATE permNo return capitalization -------------------------------------------------------- 1 1999-02-02 a 0.016 500 1 1999-02-03 a 0.016 500 2 2000-01-01 c 0.01 800 2 2000-01-02 c 0.02 800 3 1930-08-12 d 0.03 750 3 1930-08-13 d 0.03 750
I get the following one:
ID DATE permNo return capitalization -------------------------------------------------------- 1 1999-02-02 a 0.016 500
1 1999-02-02 a 0.016 500 1 1999-02-03 a 0.016 500
1 1999-02-03 a 0.016 500 2 2000-01-01 c 0.01 800 2 2000-01-02 c 0.02 800 3 1930-08-12 d 0.03 750 3 1930-08-13 d 0.03 750
Thank you in advance for your help.
BR
Nici
The code I suggested will not generate that message. Remerging will occur when columns are named in the select clause that are not summarized or mentioned in the group by clause. Can't say more without seeing your code.
Hey,
I think I might know what the problem is then. The shown here was only a simplification of the actual data. This is how my data looks:
data have;
input pcusip date permno ret cap anncmt code shrout prc;
datalines;
1 19990202 a 0.01 400 19990209 3 100 4
1 19990202 b 0.04 100 19990209 3 100 1
1 19990203 a 0.01 400 19990209 3 100 4
1 19990203 b 0.04 100 19990209 3 100 1
2 20000101 c 0.01 800 20000310 3 400 2
2 20000102 c 0.02 800 20000310 3 400 2
3 19300812 d 0.03 750 19300810 3 750 1
3 19300813 d 0.03 750 19300810 3 750 1
;
My Code:
proc sql; /*Capital weight the returns of same CUSIP with multiple permnos on same date and combine the permnos to one*/
create table want as
select
pcusip,
date,
code,
anncmt,
sum(shrout) as shrout,
sum(prc) as prc,
min(permno) as permno, /*for the two cases in this sample the smaller permno is the one that existed first*/
sum(ret*cap)/ sum(cap) as ret,
sum(cap) as cap
from have
group by pcusip, date;
select * from want;
quit;
From your comment I guess the problem are the variables anncmt & code right? Since they are not part of any calculations but also are not in the group by statement. Is there any way around this problem?
Thank you for your help once again.
BR
nici
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.