Solved
Super Contributor
Posts: 506

I have the following data.

data have;
input name_ID var1;
datalines;

1 1
2 3
3 6
4 2
5 4
6 5
7 2
8 1
9 5
10 8
;

I try to replicate the Excel calculation of a new variable following the below rule:
skip _n_<4
_n_= 4 then var2=sum var1 from record1 to record 4 or var1+lag1+lag2+lag3

_n_>4 then var2=var1+ lag(var2)-lag(var2)/4

The last step is something really new to me.

I play around with what I know but it only calculate 1 record and then turn missing value.
what make it difficult for me is that the value of var2 in record say 10 is still affected by the very first record having var2 value.

Many thanks if you can help me with it.

HHC

P.S.

The output file should look like that

1 1

2 3

3 6

4 2 12

5 4 13

6 5 14.75

7 2 13.0625

8 1 10.796875

9 5 13.09765625

10 8 17.82324219

Accepted Solutions
Solution
‎09-09-2014 12:39 AM
Posts: 4,736

data have;

input name_ID var1;

datalines;

1 1

2 3

3 6

4 2

5 4

6 5

7 2

8 1

9 5

10 8

;

run;

data want;

set have;

if _n_>4 then _var2=sum(var1,.75*_var2);

else _var2+var1;

if _n_>=4 then var2=_var2;

run;

All Replies
Super User
Posts: 23,771

EDITED:

data have;

input name_ID var1;

datalines;

1 1

2 3

3 6

4 2

5 4

6 5

7 2

8 1

9 5

10 8

;

run;

data want;

set have;

retain cum_sum 0 var2;

var3=var2;

cum_sum=cum_sum+var1;

if _n_ <4 then var2=.;

else if _n_=4 then var2=cum_sum;

else if _n_>4 then var2=var1+ var3 - var3/4;

run;

Solution
‎09-09-2014 12:39 AM
Posts: 4,736

data have;

input name_ID var1;

datalines;

1 1

2 3

3 6

4 2

5 4

6 5

7 2

8 1

9 5

10 8

;

run;

data want;

set have;

if _n_>4 then _var2=sum(var1,.75*_var2);

else _var2+var1;

if _n_>=4 then var2=_var2;

run;

Super User
Posts: 8,120

You are probably trying to call the LAG() functions conditionally.  That will not work because you need to call them on every observations so that it can build up the stack of values it uses to return the lagged value.

data have ;

input id var1 expected ;

cards;

1 1 .

2 3 .

3 6 .

4 2 12

5 4 13

6 5 14.75

7 2 13.0625

8 1 10.796875

9 5 13.09765625

10 8 17.82324219

run;

data want ;

set have ;

lag1=lag1(var1);

lag2=lag2(var1);

lag3=lag3(var1);

retain var2;

if _n_=4 then var2=sum(of var1 lag1-lag3);

if _n_>4 then var2=sum(var1,.75*var2);

put (id var1 var2 expected) (;

run;

1 1 . .

2 3 . .

3 6 . .

4 2 12 12

5 4 13 13

6 5 14.75 14.75

7 2 13.0625 13.0625

8 1 10.796875 10.796875

9 5 13.09765625 13.09765625

10 8 17.823242188 17.82324219

Super Contributor
Posts: 506

I appreciate that you 3 experts stop by to help me.

I can understand Tom's code.

However, I have difficulty understand Patrick code. It's interesting to see how you handle it.

I have 2 questions:

How SAS process this line?

else _var2+var1;

I guess for _n_=1-4, it will add var1 value up.

when I replace it with _var2=_var2+var1; it doesn't work.

Also, why suddenly, this below code works even though there is no indicator of such LAG?

_var2=sum(var1,.75*_var2);

Thank you so much.

HHC

Posts: 3,215