BookmarkSubscribeRSS Feed
srikanthyadav44
Quartz | Level 8

dear all,

i have the dataset of stock returns for 163 stocks in daily frequency, i have to compute cumulative return for 50 preceding days and 200 preceding days for all the 163 stock returns.  I need 50 days cumulative return in one file and 200 days cumulative returns in another file.

my present data file is in this format 

DateC1C2C3………………C163
5/4/20100.034732-1.305271.414732 3.014732
6/4/20100.2047320.124732-0.32527 2.304732
7/4/20102.3947320.1747321.314732 -0.72527
8/4/2010-1.26527-2.06527-0.22527 -0.83527
9/4/2010-0.485270.384732-0.28527 1.584732
########-0.74641-1.14641-0.91641 3.173589
########-0.906413.4435890.973589 -0.46641
########-0.476410.373589-0.59641 1.833589
########-0.53641-1.36641-0.73641 -0.82641
########-1.46617-2.066171.773835 3.353835
########0.1338350.7438356.183835 -2.59617
########1.1438350.0838353.933835 2.473835
########-1.316170.3238351.743835 3.173835
########-1.816170.5938351.713835 -0.57617
########0.5338350.6938351.893835 -0.22617
########-0.526170.8838351.043835 -1.14617
########0.543835-1.99617-2.63617 -0.79617
########-1.19617-0.86617-0.35617 -0.75617
########0.973835-1.026170.103835 1.453835
3/5/2010-0.356011.943990.35399 -1.64601
4/5/2010-2.726010.48399-0.33601 -3.02601
5/5/20100.73399-0.806010.25399 1.29399
6/5/2010-1.126010.293990.02399 -2.11601
7/5/2010-1.33601-0.30601-0.65601 -2.90601
########2.8138350.8238353.373835 2.243835
########-0.40617-0.056170.223835 -2.31617
########-0.836170.253835-0.16617 3.283835
########1.4438352.9538350.343835 4.833835
########0.213835-2.056172.253835 -2.18617
########-3.32632-1.35632-2.28632 3.863679
########0.1636790.3536790.723679 -3.56632
########-2.236320.043679-1.69632 -3.91632
########-0.066320.2436792.323679 3.513679
########1.753679-0.96632-0.64632 -2.73632
########-0.469420.1505780.390578 -1.56942
########-1.60942-4.41942-1.93942 -2.16942
########-1.359422.8305780.410578 -0.08942
########-1.659421.300578-6.48942 0.780578
########1.1505782.230578-0.80942 1.870578
########-1.22004-1.900040.509959 0.019959
1/6/20100.1699591.199959-1.32004 -2.67004
2/6/20100.8899590.7299591.199959 0.069959
3/6/20101.959959-1.630041.529959 5.539959
4/6/2010-0.20004-0.330040.309959 1.639959
7/6/20101.039804-1.9902-2.6102 -2.2102

 

required output format for 50 days cumulative returns is as follows 

DateC1_CR50C2_CR50C3_CR50………………C163_CR50
16/6/20100.2104120.189752-1.72066 -0.04066
17/6/20100.7898630.769204-0.19066 0.31934
18/6/2010-0.2325-0.25316-0.07066 -0.70066
21/6/20101.7087991.688142.44934 -0.44066
22/6/2010-0.68886-0.70952-2.45066 -0.35066
23/6/20100.1240640.1034041.03934 -0.19066
24/6/2010-0.04792-0.06857-0.41066 -0.27066
25/6/2010-0.9736-0.99426-0.62066 -0.21066
28/6/20101.215761.195410.859649 0.169649
29/6/2010-1.46089-1.48124-0.40035 0.089649
30/6/20101.0663711.0460210.999649 0.649649
1/7/2010-1.15678-1.17713-0.89035 -0.46035
2/7/2010-0.27268-0.29303-0.86035 -0.17035
5/7/2010-0.02292-0.043570.409342 0.009342
6/7/20101.009990.989331-0.98066 -0.72066
7/7/2010-0.91072-0.93138-0.32066 -0.28066
8/7/20101.058091.037432-1.01066 0.689342
9/7/20101.044211.023551-0.96066 -0.27066
12/7/20100.5691440.548332-1.21081 -0.09081
13/7/20100.3273480.306536-0.84081 0.139188
14/7/2010-0.26885-0.28966-1.77081 0.779188
15/7/2010-0.13748-0.15830.689188 -0.03081
16/7/20100.2812680.2604560.149188 0.519188
19/7/2010-0.13821-0.16026-0.05205 0.607952
20/7/2010-0.34311-0.36516-0.38205 -0.19205
21/7/20100.5823180.560271.917952 0.927952
22/7/20100.7858880.76384-1.08205 1.737952
23/7/20100.13130.1092530.417952 -0.69205
26/7/2010-0.5613-0.583350.457952 2.447952
27/7/20100.2212150.1991673.017952 3.057952
28/7/2010-0.61045-0.63250.137952 0.657952
29/7/20100.210060.1880120.067952 -1.44205
30/7/2010-0.76649-0.78853-1.11205 -0.02205
2/8/20101.1862071.1632330.347025 4.717025
3/8/20100.1453380.122364-0.58297 2.087025
4/8/20100.5189150.49594-1.21297 -1.63297
5/8/2010-0.38021-0.40319-0.40297 -0.46297
6/8/2010-0.14422-0.167190.947025 0.447025

 

required output format for 200 days cumulative returns is 

DateC1_CR200C2_CR200C3_CR200………………C163_CR200
16/6/20100.064732433-0.6252675672.274732433-0.545267567-0.545267567
17/6/2010-1.2352675670.034732433-1.095267567-0.985267567-0.985267567
18/6/20102.254732433-0.77526756711.27473243-0.635267567-0.635267567
21/6/2010-0.1852675670.034732433-4.445267567-1.545267567-1.545267567
22/6/20100.484732433-0.9152675670.5647324332.1747324332.174732433
23/6/2010-1.2264112460.963588754-1.556411246-0.466411246-0.466411246
24/6/20100.193588754-0.6264112460.953588754-0.376411246-0.376411246
25/6/20100.6135887545.763588754-3.166411246-1.456411246-1.456411246
28/6/2010-0.9764112460.123588754-2.7364112460.7135887540.713588754
29/6/2010-0.986165446-1.8061654461.353834554-1.646165446-1.646165446
30/6/20101.4338345543.4238345540.5238345542.7438345542.743834554
1/7/20101.6238345540.6138345541.593834554-0.646165446-0.646165446
2/7/2010-0.756165446-0.126165446-1.1361654461.9638345541.963834554
5/7/20100.0638345540.293834554-2.2161654460.2538345540.253834554
6/7/20101.103834554-2.4761654461.213834554-0.366165446-0.366165446
7/7/20101.7538345540.583834554-1.696165446-0.366165446-0.366165446
8/7/20100.1038345540.103834554-2.486165446-0.996165446-0.996165446
9/7/20100.263834554-0.4361654460.473834554-1.096165446-1.096165446
12/7/2010-2.3961654461.583834554-2.7461654468.6838345548.683834554
13/7/2010-0.516010208-1.196010208-1.8160102081.0639897921.063989792
14/7/2010-3.786010208-1.616010208-3.0860102080.1539897920.153989792
15/7/20100.323989792-0.5260102084.8939897922.9539897922.953989792
16/7/2010-3.706010208-0.726010208-2.3960102084.4639897924.463989792
19/7/2010-0.506010208-2.286010208-3.556010208-3.306010208-3.306010208
20/7/20102.803834554-0.1561654465.4138345541.5638345541.563834554
21/7/2010-3.246165446-1.346165446-1.9061654462.0138345542.013834554
22/7/2010-1.416165446-1.0861654462.7938345542.2038345542.203834554
23/7/2010-0.0661654462.6938345541.5638345541.8538345541.853834554
26/7/2010-3.356165446-1.836165446-3.056165446-1.336165446-1.336165446
27/7/2010-0.256321002-1.2463210021.803678998-0.906321002-0.906321002
28/7/20102.213678998-0.336321002-0.876321002-2.116321002-2.116321002
29/7/2010-3.126321002-2.156321002-5.226321002-3.156321002-3.156321002
30/7/2010-0.016321002-2.966321002-2.536321002-3.736321002-3.736321002
2/8/20101.1336789982.203678998-0.7963210020.4736789980.473678998
3/8/20101.160577594-0.889422406-0.489422406-2.229422406-2.229422406
4/8/2010-1.799422406-2.4794224060.850577594-4.619422406-4.619422406
5/8/20100.9805775945.6105775942.9705775943.8405775943.840577594
6/8/20102.3405775948.0005775940.4405775941.2505775941.250577594

 

please guide in writing the SAS code 

6 REPLIES 6
PaigeMiller
Diamond | Level 26

Again, we request you provide the data as SAS data step code (instructions), as we can't work from the pasted data you have provided. You will get faster answers when you provide the data in this form. We should not have to repeatedly ask for this, we are trying to help you but you have to help us as well.

--
Paige Miller
Reeza
Super User
If you're using WRDS system I believe it supports PROC EXPAND.

This paper explains how to answer your question:https://lexjansen.com/wuss/2003/Tutorials/i-using_proc_expand.pdf

However your data is structured badly for it, ideally you'd have it a long format, not a wide format with 163 columns. Then you can just add a BY statement for your multiple different categories.
Otherwise you could technically use an array or macro variables. But that's more cumbersome.

A data step approach:
https://gist.github.com/statgeek/27e23c015eae7953eff2

Quick proc expand example - paper is way more detailed:
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
mkeintz
PROC Star

WRDS does support proc expand, and you could use it by mean of 163 statements in the proc.

 

But if you don't have PROC EXPAND (from the SAS/ETS module), then the data step below would work for the data structure you describe.

 

BTW, I would agree with Reeza's observation that having a single column of returns, sorted by stock and date would be much better for proc expand (one statement instead of 163).  But the structure you have might actually be better if at some point you are  looking for returns on multi-stock portfolios, when there is periodic rebalancing.  

 

In the absence of sample data in the form of a working data step, this code is untested:

 

data want200 (drop=c1-c163 s);
  set have;

  array current_ret{163}  c1-c163 ;
  array logsum200  {163}  _temporary_;              /*Array of 200-day sums of log(1+daily return) */
  array roll200    {163}  roll200_c1-roll200_c163 ; /*Desired result variables */

  do S=1 to 163;                                    /*For each stock ... */
    logsum200{S} + log(1+current_ret{s}) + (-coalesece(lag200(log(1+current_ret{s})),0); /*... update logsum200 */
    roll200{S}=exp(logsum200{S}) - 1;                                                    /* get coresponding 200-day return*/
  end;
  if _n_>=200 ;
run;

Notes:

 

  1. I assume you are looking for 200-day trading day windows, not 200 calendar days.
  2. You replace all the "200"s with "50"s to get fifty-day windows.
  3. This program assumes no missing data.
  4. The "if _n_>=200" permits the data step automatic output action to occur only once the 200th incoming observation has been read.
  5. I am naming the resulting variables as roll200_c1-roll200_c163, which allows a convenient shorthand for generating a list of 163 variables.  Naming them as you want (C1_CR200 C2_CR200 ... C163_CR200) would require spelling out 163 names (unless you choose to learn some sas macro coding).
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
Assuming I understood your question and there are no gap between two date.



data have;
call streaminit(123);
do stock=1 to 160;
 do date='01jan2015'd to '01jan2020'd;
  c1=rand('uniform');
  c2=rand('uniform');
  output;
 end;
end;
format date date9.;
run;

data want;
 set have;
 by stock;
array x1{50} _temporary_;
array x2{50} _temporary_;

if first.stock then do;n=0;call missing(of x1{*} x2{*});end;
n+1;
i=mod(n,50);

x1{i+1}=c1;
x2{i+1}=c2;

C1_CR50=sum(of x1{*});
C2_CR50=sum(of x2{*});

drop n i;
run;
s_lassen
Meteorite | Level 14

I am not quite sure I understand your output data: both tables with cumulative returns start 50 days after the first date, but the values are different.

 

One way to calculate the cumulative returns for many variables could be to use a two-dimensional array:

data want50;
  set have;
  array returns50 (0:50;1:163) 8 _temporary_;
  array returns(*) c1-c163;
  array cumulative(*) 8 cr50_c1-cr50_c163;
  retain cr50_c1-cr50_C163;
  do _I_=1 to dim(returns);
    returns50(mod(_N_-1,51),_I_)=returns(_I_);
    cumulative(_I_)+returns(_I_);
cumulative(_I_)+-returns50(mod(_N_,51),_I_)); end; if _N_>=50; drop _I_ c1-c163; run;

So, apparently we are adding the day's returns and then subtracting the next day's returns from the cumulative return. How does that work? It works because the next day has not been read yet, so if there is anything there, it is the the return 50 days before that is subtracted. Note that first index of the two-dimensional array is zero-based, so that actual number of elements is 51*163 - as the return that we need to subtract is before the 50-day period.

 

I renamed your output variables so that they could be named for an array. 

 

To calculate the 200 days cumulative return, change the number 50 to 200, and the number 51 to 201.

mkeintz
PROC Star

@s_lassen :

 

Although the OP didn't define what is meant by "cumulative returns", it's almost certainly not a simple sum of returns, but rather the compound return from

     compound_return = (1+return1) *(1+return2) *...*(1+return50) - 1

    or

     compound_return = exp{log(1+return1) +...(log(return50)} - 1

 

So you'll probably have to store log(1+return{s,d}) in you rolling history array    - where s indexes stock and d indexes day.

 

However, the approach of using a rolling history array, to be summed with every daily advance, does a lot more computing than using a rolling sum, updated by adding the most recent log(1+return) and subtracting the 50th preceding log(1+return).   The bigger the rolling window, the more computing required by the rolling history approach.  After all, returns for day number i will added 50 separate times, for day i through day i+49 - for a 50 day window.    The rolling sum approach uses day i returns only twice - once to add it for day i and once to subtract it for day i+50 - no matter what size the window is.

 

Of course the rolling sum approach can introduce more computer precision error than the rolling history approach. 

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2442 views
  • 3 likes
  • 6 in conversation