- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm trying to understand how PROC EXPAND works and why one calculation isn't working the way I expect.
I don't understand why the third one doesn't work the way as expected.
The first two calculations are correct, but LEAD doesn't generate the data expected.
From this original discussion:
http://stackoverflow.com/q/42395126/1919583
*create sample data to work with;
data random;
call streaminit(25);
do date='01Jan2016'd to '31Dec2016'd;
x=round(Rand('normal', 100, 15), 0.01);
output;
format date date9.;
end;
run;
*Modified calculation;
proc expand data=random out=want;
id date;
convert x= x_movSum /transformout = (movsum 10 trimleft 9); /*This is the correct calculation expected*/
convert x = x_movSumFIRST / transformout = (reverse movsum 10 trimleft 9 reverse);
convert x = x_movSumWRONG /transformout = (lead 10 reverse movesum 10 reverse); /*suggest via Tom*/
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello -
If I understand your question correctly you are mostly wondering about the effect of "lead 10", correct?
proc expand data=random out=want;
id date;
convert x = x_movSumWRONG /transformout = (lead 10 reverse movesum 10 reverse); /*suggest via Tom*/
run;
Your data (random) looks like this - leaving out some data in the middle:
date | x |
1-Jan-16 | 114.68 |
2-Jan-16 | 83.68 |
3-Jan-16 | 111.5 |
4-Jan-16 | 86.82 |
5-Jan-16 | 103.46 |
6-Jan-16 | 92.85 |
7-Jan-16 | 88.22 |
8-Jan-16 | 98.48 |
9-Jan-16 | 99.5 |
10-Jan-16 | 132.27 |
11-Jan-16 | 87.05 |
12-Jan-16 | 99.98 |
13-Jan-16 | 75.69 |
… | … |
20-Dec-16 | 93.67 |
21-Dec-16 | 114.5 |
22-Dec-16 | 94.1 |
23-Dec-16 | 96.01 |
24-Dec-16 | 104.5 |
25-Dec-16 | 104.69 |
26-Dec-16 | 116.58 |
27-Dec-16 | 98.41 |
28-Dec-16 | 103.41 |
29-Dec-16 | 110.42 |
30-Dec-16 | 90.44 |
31-Dec-16 | 88.62 |
- Lead 10 will tell EXPAND to forget about the first 10 observations (obs) - so your time series now starts at 87.05 - you will introduce missing values are the end of your series (unless you would add SETMISS 0, which would append 0s instead).
- Reverse will tell EXPAND to reverse the order, the missing values you introduced will be your new first obs, 87.05 your new last obs.
- Movesum 10 will create the backward moving sum of 10 obs.
- Finally Reserve will put the series into its original order again.
If you decompose your "wrong" statement into:
proc expand data=random out=want;
id date;
convert x = x_lead10 /transformout = (lead 10);
convert x = x_reverse /transformout = (lead 10 reverse);
convert x = x_movsum10 /transformout = (lead 10 reverse movsum 10);
convert x = x_reverseII /transformout = (lead 10 reverse movsum 10 reverse);
run;
you will see these columns in WANT.
In my mind this is the expected behaviour - it is just not the same what you'd like to archive with:
convert x= x_movSum /transformout = (movsum 10 trimleft 9);
Hope this makes sense,
Udo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello -
If I understand your question correctly you are mostly wondering about the effect of "lead 10", correct?
proc expand data=random out=want;
id date;
convert x = x_movSumWRONG /transformout = (lead 10 reverse movesum 10 reverse); /*suggest via Tom*/
run;
Your data (random) looks like this - leaving out some data in the middle:
date | x |
1-Jan-16 | 114.68 |
2-Jan-16 | 83.68 |
3-Jan-16 | 111.5 |
4-Jan-16 | 86.82 |
5-Jan-16 | 103.46 |
6-Jan-16 | 92.85 |
7-Jan-16 | 88.22 |
8-Jan-16 | 98.48 |
9-Jan-16 | 99.5 |
10-Jan-16 | 132.27 |
11-Jan-16 | 87.05 |
12-Jan-16 | 99.98 |
13-Jan-16 | 75.69 |
… | … |
20-Dec-16 | 93.67 |
21-Dec-16 | 114.5 |
22-Dec-16 | 94.1 |
23-Dec-16 | 96.01 |
24-Dec-16 | 104.5 |
25-Dec-16 | 104.69 |
26-Dec-16 | 116.58 |
27-Dec-16 | 98.41 |
28-Dec-16 | 103.41 |
29-Dec-16 | 110.42 |
30-Dec-16 | 90.44 |
31-Dec-16 | 88.62 |
- Lead 10 will tell EXPAND to forget about the first 10 observations (obs) - so your time series now starts at 87.05 - you will introduce missing values are the end of your series (unless you would add SETMISS 0, which would append 0s instead).
- Reverse will tell EXPAND to reverse the order, the missing values you introduced will be your new first obs, 87.05 your new last obs.
- Movesum 10 will create the backward moving sum of 10 obs.
- Finally Reserve will put the series into its original order again.
If you decompose your "wrong" statement into:
proc expand data=random out=want;
id date;
convert x = x_lead10 /transformout = (lead 10);
convert x = x_reverse /transformout = (lead 10 reverse);
convert x = x_movsum10 /transformout = (lead 10 reverse movsum 10);
convert x = x_reverseII /transformout = (lead 10 reverse movsum 10 reverse);
run;
you will see these columns in WANT.
In my mind this is the expected behaviour - it is just not the same what you'd like to archive with:
convert x= x_movSum /transformout = (movsum 10 trimleft 9);
Hope this makes sense,
Udo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Makes sense, thanks Udo 🙂