09-08-2014 11:24 PM
I have the following data.
data have;
input name_ID var1;
datalines;
1 1
2 3
3 6
4 2
5 4
6 5
7 2
8 1
9 5
10 8
;
I try to replicate the Excel calculation of a new variable following the below rule:
skip _n_<4
_n_= 4 then var2=sum var1 from record1 to record 4 or var1+lag1+lag2+lag3
_n_>4 then var2=var1+ lag(var2)-lag(var2)/4
The last step is something really new to me.
I play around with what I know but it only calculate 1 record and then turn missing value.
what make it difficult for me is that the value of var2 in record say 10 is still affected by the very first record having var2 value.
Many thanks if you can help me with it.
HHC
P.S.
The output file should look like that
1 1
2 3
3 6
4 2 12
5 4 13
6 5 14.75
7 2 13.0625
8 1 10.796875
9 5 13.09765625
10 8 17.82324219
09-09-2014 12:39 AM
data have;
input name_ID var1;
datalines;
1 1
2 3
3 6
4 2
5 4
6 5
7 2
8 1
9 5
10 8
;
run;
data want;
set have;
if _n_>4 then _var2=sum(var1,.75*_var2);
else _var2+var1;
if _n_>=4 then var2=_var2;
run;
09-08-2014 11:45 PM
EDITED:
data have;
input name_ID var1;
datalines;
1 1
2 3
3 6
4 2
5 4
6 5
7 2
8 1
9 5
10 8
;
run;
data want;
set have;
retain cum_sum 0 var2;
var3=var2;
cum_sum=cum_sum+var1;
if _n_ <4 then var2=.;
else if _n_=4 then var2=cum_sum;
else if _n_>4 then var2=var1+ var3 - var3/4;
run;
09-09-2014 12:39 AM
data have;
input name_ID var1;
datalines;
1 1
2 3
3 6
4 2
5 4
6 5
7 2
8 1
9 5
10 8
;
run;
data want;
set have;
if _n_>4 then _var2=sum(var1,.75*_var2);
else _var2+var1;
if _n_>=4 then var2=_var2;
run;
09-09-2014 08:04 AM
You are probably trying to call the LAG() functions conditionally. That will not work because you need to call them on every observations so that it can build up the stack of values it uses to return the lagged value.
data have ;
input id var1 expected ;
cards;
1 1 .
2 3 .
3 6 .
4 2 12
5 4 13
6 5 14.75
7 2 13.0625
8 1 10.796875
9 5 13.09765625
10 8 17.82324219
run;
data want ;
set have ;
lag1=lag1(var1);
lag2=lag2(var1);
lag3=lag3(var1);
retain var2;
if _n_=4 then var2=sum(of var1 lag1-lag3);
if _n_>4 then var2=sum(var1,.75*var2);
put (id var1 var2 expected) (;
run;
1 1 . .
2 3 . .
3 6 . .
4 2 12 12
5 4 13 13
6 5 14.75 14.75
7 2 13.0625 13.0625
8 1 10.796875 10.796875
9 5 13.09765625 13.09765625
10 8 17.823242188 17.82324219
09-09-2014 10:35 AM
I appreciate that you 3 experts stop by to help me.
I can understand Tom's code.
However, I have difficulty understand Patrick code. It's interesting to see how you handle it.
I have 2 questions:
How SAS process this line?
else _var2+var1;
I guess for _n_=1-4, it will add var1 value up.
when I replace it with _var2=_var2+var1; it doesn't work.
Also, why suddenly, this below code works even though there is no indicator of such LAG?
_var2=sum(var1,.75*_var2);
Thank you so much.
HHC
09-09-2014 11:50 AM
That weird line of Patrick is the use of the sum-statement SAS(R) 9.4 Statements: Reference, Third Edition A very smart approach it is including the earlier mentioned retain and a calculation.
Need further help from the community? Please ask a new question.