BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.

I'm trying to understand how PROC EXPAND works and why one calculation isn't working the way I expect.

 

I don't understand why the third one doesn't work the way as expected.

 

The first two calculations are correct, but LEAD doesn't generate the data expected.

From this original discussion:

http://stackoverflow.com/q/42395126/1919583

 

*create sample data to work with;
data random;
    call streaminit(25);

    do date='01Jan2016'd to '31Dec2016'd;
        x=round(Rand('normal', 100, 15), 0.01);
        output;
        format date date9.;
    end;
run;

*Modified calculation;
proc expand data=random out=want;
    id date;
    convert x= x_movSum /transformout = (movsum 10 trimleft 9); /*This is the correct calculation expected*/
    convert x = x_movSumFIRST / transformout = (reverse movsum 10 trimleft 9 reverse);
    convert x = x_movSumWRONG /transformout = (lead 10 reverse movesum 10 reverse); /*suggest via Tom*/
run;
1 ACCEPTED SOLUTION

Accepted Solutions
udo_sas
SAS Employee

Hello -

If I understand your question correctly you are mostly wondering about the effect of "lead 10", correct?

 

proc expand data=random out=want;
    id date;
    convert x = x_movSumWRONG /transformout = (lead 10 reverse movesum 10 reverse); /*suggest via Tom*/
run;

 

Your data (random) looks like this - leaving out some data in the middle:

date x
1-Jan-16 114.68
2-Jan-16 83.68
3-Jan-16 111.5
4-Jan-16 86.82
5-Jan-16 103.46
6-Jan-16 92.85
7-Jan-16 88.22
8-Jan-16 98.48
9-Jan-16 99.5
10-Jan-16 132.27
11-Jan-16 87.05
12-Jan-16 99.98
13-Jan-16 75.69
20-Dec-16 93.67
21-Dec-16 114.5
22-Dec-16 94.1
23-Dec-16 96.01
24-Dec-16 104.5
25-Dec-16 104.69
26-Dec-16 116.58
27-Dec-16 98.41
28-Dec-16 103.41
29-Dec-16 110.42
30-Dec-16 90.44
31-Dec-16 88.62

 

  • Lead 10 will tell EXPAND to forget about the first 10 observations (obs) - so your time series now starts at 87.05 - you will introduce missing values are the end of your series (unless you would add SETMISS 0, which would append 0s instead).
  • Reverse will tell EXPAND to reverse the order, the missing values you introduced will be your new first obs, 87.05 your new last obs.
  • Movesum 10 will create the backward moving sum of 10 obs.
  • Finally Reserve will put the series into its original order again.

 

If you decompose your "wrong" statement into:

proc expand data=random out=want;
    id date;
    convert x = x_lead10 /transformout = (lead 10);
    convert x = x_reverse /transformout = (lead 10 reverse);
    convert x = x_movsum10 /transformout = (lead 10 reverse movsum 10);
    convert x = x_reverseII /transformout = (lead 10 reverse movsum 10 reverse);
run;

 

you will see these columns in WANT.

snapshot.JPG

In my mind this is the expected behaviour - it is just not the same what you'd like to archive with:

 convert x= x_movSum /transformout = (movsum 10 trimleft 9);

 

Hope this makes sense,

Udo

View solution in original post

2 REPLIES 2
udo_sas
SAS Employee

Hello -

If I understand your question correctly you are mostly wondering about the effect of "lead 10", correct?

 

proc expand data=random out=want;
    id date;
    convert x = x_movSumWRONG /transformout = (lead 10 reverse movesum 10 reverse); /*suggest via Tom*/
run;

 

Your data (random) looks like this - leaving out some data in the middle:

date x
1-Jan-16 114.68
2-Jan-16 83.68
3-Jan-16 111.5
4-Jan-16 86.82
5-Jan-16 103.46
6-Jan-16 92.85
7-Jan-16 88.22
8-Jan-16 98.48
9-Jan-16 99.5
10-Jan-16 132.27
11-Jan-16 87.05
12-Jan-16 99.98
13-Jan-16 75.69
20-Dec-16 93.67
21-Dec-16 114.5
22-Dec-16 94.1
23-Dec-16 96.01
24-Dec-16 104.5
25-Dec-16 104.69
26-Dec-16 116.58
27-Dec-16 98.41
28-Dec-16 103.41
29-Dec-16 110.42
30-Dec-16 90.44
31-Dec-16 88.62

 

  • Lead 10 will tell EXPAND to forget about the first 10 observations (obs) - so your time series now starts at 87.05 - you will introduce missing values are the end of your series (unless you would add SETMISS 0, which would append 0s instead).
  • Reverse will tell EXPAND to reverse the order, the missing values you introduced will be your new first obs, 87.05 your new last obs.
  • Movesum 10 will create the backward moving sum of 10 obs.
  • Finally Reserve will put the series into its original order again.

 

If you decompose your "wrong" statement into:

proc expand data=random out=want;
    id date;
    convert x = x_lead10 /transformout = (lead 10);
    convert x = x_reverse /transformout = (lead 10 reverse);
    convert x = x_movsum10 /transformout = (lead 10 reverse movsum 10);
    convert x = x_reverseII /transformout = (lead 10 reverse movsum 10 reverse);
run;

 

you will see these columns in WANT.

snapshot.JPG

In my mind this is the expected behaviour - it is just not the same what you'd like to archive with:

 convert x= x_movSum /transformout = (movsum 10 trimleft 9);

 

Hope this makes sense,

Udo

Reeza
Super User

Makes sense, thanks Udo 🙂

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

Discussion stats
  • 2 replies
  • 2593 views
  • 0 likes
  • 2 in conversation