- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi everyone,
I am new to SAS and doing some research. For my research, I need to make a 12-month window of my data and output it to a new dataset so that I can use this new dataset to carry on the rest of the processing. I have attached an Excel sheet with four columns: ID, YEAR, MONTH, and RET. The basic algorithm of what I need to achieve is as follows:
1. Take the first window of 12 months from the original dataset (e.g., here, it would be January 1994 to December 1994)
2. output it to a new dataset, say, temp.
3. sort temp as per RET.
4. append temp to an already existing dataset (this dataset is calculated independently and has 12 rows and 6 columns of predefined values)
5. perform the calculations necessary. (here after the calculations, it gives output as a single value.)
6. append that output to the original dataset in the row of Demeber 1994 (basically the last row of every window)
7. check whether the next row ID in the original dataset is same as the previous, if not take availabe values for the window.
8. slide the window by one month in the original dataset. (so, now, it will be from February 1994 to January 1995.)
9. Go to step 2.
This is a basic algo of what I want to do. Here the code should also check if the ID is changing or not, so that there is no overlap of different IDs in a window.
Any help and guidance is appreciated. Thank you so much !!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"make a 12-month window of my data" for what ? mean or standard deviation ? It is foreward 12-month or backward 12-month ?
The following code is for a rolling window of 30 days ahead to calculated STD .
You can modify it according to your request.
data data2;
call streaminit(123);
do permno=1 to 10;
do date='01jan1980'd to '01jan1990'd;
return=rand('uniform');
if weekday(date) not in ( 1 7 ) then output;
end;
end;
format date yymmdd10.;
run;
data want2;
array r{%sysevalf('01jan1980'd):%sysevalf('06mar2025'd)} _temporary_;
array x{0:29} _temporary_;
call missing(of r{*});
do until(last.permno);
set data2;
by permno;
r{date}=return;
end;
do until(last.permno);
set data2;
by permno;
do i=date to date+29;
x{mod(i,date)}=r{i};
end;
std=std(of x{*});
output;
end;
drop i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Depending on what you need to calculate there could be other approaches to what you have in mind. If you search the communities here with keywords like rolling window you might find discussions that are close to your use case and could give you some ideas.
I found after a quick search this discussion and then applied one of the referenced approaches to your sample data.
No clue if that comes even close to what you need but below code should demonstrate that you can eventually find a simpler coding approach if you search a bit more through old discussions and/or tell us a bit more how the desired result using your sample data should look like.
data have;
infile datalines truncover;
input ID $ YEAR MONTH RET;
date=mdy(month,1,year);
format date yymm8.;
drop year month;
datalines;
1 1994 1 3.79
1 1994 2 0.75
1 1994 3 -1.52
1 1994 4 4.8
1 1994 5 -1.64
1 1994 6 -2.77
1 1994 7 3.87
1 1994 8 1.81
1 1994 9 -5.42
1 1994 10 2.54
1 1994 11 -1.51
1 1994 12 -0.49
1 1995 1 -1.05
1 1995 2 -0.23
1 1995 3 3
1 1995 4 9.01
1 1995 5 0.57
1 1995 6 0.22
1 1995 7 3.99
1 1995 8 -5.54
1 1995 9 1.66
1 1995 10 -2.41
1 1995 11 2.41
1 1995 12 5.12
1 1996 1 4.43
1 1996 2 1.91
1 1996 3 -3.06
1 1996 4 2.44
1 1996 5 2.52
1 1996 6 0.29
1 1996 7 -0.17
1 1996 8 0.91
1 1996 9 1.19
1 1996 10 2.85
1 1996 11 4.15
1 1996 12 0.06
1 1997 1 2.13
1 1997 2 1.67
1 1997 3 1.54
1 1997 4 1.26
1 1997 5 1.32
1 1997 6 4.62
1 1997 7 3.2
1 1997 8 -8.56
1 1997 9 6.85
1 1997 10 -1.16
1 1997 11 0.51
1 1997 12 -0.98
1 1998 1 4.63
1 1998 2 6.13
1 1998 3 8.6
1 1998 4 5.68
1 1998 5 5.9
1 1998 6 5.14
1 1998 7 2.72
1 1998 8 -8.61
1 1998 9 -1.74
1 1998 10 7.63
1 1998 11 1.4
1 1998 12 8.52
1 1999 1 -6.12
1 1999 2 -4.75
1 1999 3 -4.88
1 1999 4 -2.72
1 1999 5 -5.28
2 1994 1 -0.04
2 1994 2 0.19
2 1994 3 4.58
2 1994 4 -3.34
2 1994 5 -0.96
2 1994 6 -2.51
2 1994 7 5.89
2 1994 8 -17.99
2 1994 9 -3.71
2 1994 10 0.53
2 1994 11 -3.55
2 1994 12 3.08
2 1995 1 7.13
2 1995 2 0.44
2 1995 3 -2.06
2 1995 4 -3.17
2 1995 5 -2.24
2 1995 6 -1.32
2 1995 7 9.15
2 1995 8 -2.78
2 1995 9 -0.08
2 1995 10 -9.03
2 1995 11 0.42
2 1995 12 -6.27
2 1996 1 3.42
2 1996 2 4.93
2 1996 3 2.55
2 1996 4 -8.43
2 1996 5 -2.5
2 1996 6 0.5
2 1996 7 5.03
2 1996 8 -3.93
2 1996 9 -0.09
2 1996 10 1.61
2 1996 11 4.85
2 1996 12 0.59
2 1997 1 -1.71
2 1997 2 -11.69
2 1997 3 -0.4
2 1997 4 -16.15
2 1997 5 5.74
2 1997 6 5.65
2 1997 7 2.98
2 1997 8 -4.9
2 1997 9 -10.83
2 1997 10 -0.98
2 1997 11 17.07
2 1997 12 8.42
2 1998 1 -1.65
2 1998 2 2.23
2 1998 3 -1.46
2 1998 4 6.48
2 1998 5 5.12
2 1998 6 3.48
2 1998 7 8.91
2 1998 8 4.62
2 1998 9 4.18
2 1998 10 -5.19
2 1998 11 -0.02
2 1998 12 0.86
2 1999 1 1.91
2 1999 2 -1.27
2 1999 3 4.75
2 1999 4 7.6
2 1999 5 5.92
2 1999 6 6.94
2 1999 7 5.2
2 1999 8 -1.97
2 1999 9 5.53
2 1999 10 -1.48
2 1999 11 -3.4
2 1999 12 -3.98
2 2000 1 0.01
2 2000 2 4.05
2 2000 3 1.12
2 2000 4 1.22
2 2000 5 -2.58
2 2000 6 2.35
2 2000 7 4.63
2 2000 8 8.61
2 2000 9 0.66
2 2000 10 3.43
2 2000 11 4.06
2 2000 12 -4.16
2 2001 1 0.29
2 2001 2 2.3
2 2001 3 3.22
2 2001 4 3.14
2 2001 5 2.58
2 2001 6 3.75
2 2001 7 5.44
2 2001 8 -0.45
2 2001 9 -1.28
2 2001 10 4.66
2 2001 11 5.81
2 2001 12 0.38
2 2002 1 -0.88
2 2002 2 -4.21
2 2002 3 -1.17
2 2002 4 3.7
2 2002 5 4.47
2 2002 6 -4.92
2 2002 7 -1.23
2 2002 8 -8.67
2 2002 9 3.5
2 2002 10 3.33
;
proc sort data=have out=inter;
by id date;
run;
proc expand data=inter out=want method=none;
by id;
id date;
convert ret = ret_avg12 / transout=(movave 12);
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Instead of separate data sets, perhaps this idea of using BY groups is what you want
https://blogs.sas.com/content/iml/2017/02/13/run-1000-regressions.html
By the way, you focus tightly on an algorithm that you want. We call that the XY problem and it is not a good way to get the help you need and move forward. Better you should tell us the goal of doing this, once you have the data arranged however you want, what will you do with it? Regression? Medians? Reverse double-porcupine method? Please tell us where this is going. Maybe SAS has already programmed what you want. Maybe someone else has already programmed what you want. Maybe there are easier ways to get there than the algorithm you are trying to implement.
Paige Miller