Hi everyone, I have a dataset that looks like below.
permno date ret
A 01/Jan/1985 0.14
A 01/Feb/1985 0.43
A 01/Mar/1985 -0.59
. ................... .....
. ..................... .......
. .................... .......
A 01/Apr/2011 0.42
B 01/Feb/1985 0.452
B 01/Mar/1985 -0.52
B
B
B
.
.
Several features of the dataset:
- Each permno doesn't necessarily have the same observations; So, maybe the first permno starts with January, but the second permo starts with March because of missing values for Jan + Feb
The final output I want to have:
- a cumulative sum of using an arbitrary rolling window, say, 3 months or so
- For example, for 3-month window, the cumulative sum works as follows: sum of 1st, 2nd, and 3rd observatoins and output the cumulative sum. Then sum of 2nd, 3rd, and 4th obs and output the cumulative sum
What I have done and checked and made sure it computes correctly: I have written a macro to compute such cumulative rolling sum, i.e. sum of each 3 obs, output a dataset, then proc append each of these datasets.
What is my current problem that I can't figure out how to tackle:
- It might happen that the LAST 2 obs of 1st permno is used along with the 1st obs of the 2nd permno.
- In other words, I don't know how to recognize that the cumulative sum runs through each of the permnos and stops at the N-(window-1) of each permno, where N is the # of observations for each permno, instead of through to the first obs of the next permno.
What I currently have in mind:
- Create each of the dataset for each of the permnos
- Then apply the rolling macro I wrote to each of thems.
But then, I'm faced with a problem: What is the best way I create each of the dataset for each permno? proc sort by permno NODUPKEY. Then I use the following
data first_permno;
set original_ds;
where permno="A";
run;
This way, I have to create tens of thousands of datasets corresponding to each permno! Also, I need to create a macro variable containing all unique permnos and using %scan function to scan through such a list and extract each of them and put them in WHERE statement in the SAS code just above? It seems complicated.
Can you please suggest some way out of this?
Thanks a lot for your help! I really appreciate it.
Message was edited by: smilingmelbourne