I have data consists ID, V1, and V2 for around 11000 observations. The sample format is given below for six observations.
From this data, I wanna create V3, V4, and V5 as follows:
V3 is the inverse order of V2. Example for ID=1, V2= (2,1) then V3=(1,2).
V4 Is the cumulative greater type of V1 (in order of V3). Example for ID=1, V1=(1, 7) then V4=(7,8).
V5 is always equal to 0 if V3=1, else V5 is equal to lag1 of V4.
The expected output that I did manually is given below:
This gives you v3 and v4.
Regarding v5. You say you want the lag of v4. Though your desired results suggest you want the lead of v4?
data have;
input ID v1 v2;
datalines;
1 1 1
1 7 2
2 16 1
2 3 2
3 11 1
4 13 1
4 3 2
5 4 1
5 3 2
5 2 3
5 2 4
6 2 1
6 4 2
6 3 3
;
data want(drop = c n);
do _N_ = 1 by 1 until (last.ID);
set have;
by ID;
if first.ID then c = 0;
array vv1{999} _temporary_;
array vv2{999} _temporary_;
c + v1;
vv1 [_N_] = c;
vv2 [_N_] = v2;
end;
n = _N_;
v4 = c;
do _N_ = 1 to _N_;
set have;
v3 = vv2[1 + n - _N_];
v4 = v4 - ifn(_N_ > 1, lag(v1), 0);
output;
end;
call missing(of vv1[*], of vv2[*]);
run;
Result:
ID v1 v2 v4 v3 1 1 1 8 2 1 7 2 7 1 2 16 1 19 2 2 3 2 3 1 3 11 1 11 1 4 13 1 16 2 4 3 2 3 1 5 4 1 11 4 5 3 2 7 3 5 2 3 4 2 5 2 4 2 1 6 2 1 9 3 6 4 2 7 2 6 3 3 3 1
This gives you v3 and v4.
Regarding v5. You say you want the lag of v4. Though your desired results suggest you want the lead of v4?
data have;
input ID v1 v2;
datalines;
1 1 1
1 7 2
2 16 1
2 3 2
3 11 1
4 13 1
4 3 2
5 4 1
5 3 2
5 2 3
5 2 4
6 2 1
6 4 2
6 3 3
;
data want(drop = c n);
do _N_ = 1 by 1 until (last.ID);
set have;
by ID;
if first.ID then c = 0;
array vv1{999} _temporary_;
array vv2{999} _temporary_;
c + v1;
vv1 [_N_] = c;
vv2 [_N_] = v2;
end;
n = _N_;
v4 = c;
do _N_ = 1 to _N_;
set have;
v3 = vv2[1 + n - _N_];
v4 = v4 - ifn(_N_ > 1, lag(v1), 0);
output;
end;
call missing(of vv1[*], of vv2[*]);
run;
Result:
ID v1 v2 v4 v3 1 1 1 8 2 1 7 2 7 1 2 16 1 19 2 2 3 2 3 1 3 11 1 11 1 4 13 1 16 2 4 3 2 3 1 5 4 1 11 4 5 3 2 7 3 5 2 3 4 2 5 2 4 2 1 6 2 1 9 3 6 4 2 7 2 6 3 3 3 1
Ok. Try this then
data have;
input ID v1 v2;
datalines;
1 1 1
1 7 2
2 16 1
2 3 2
3 11 1
4 13 1
4 3 2
5 4 1
5 3 2
5 2 3
5 2 4
6 2 1
6 4 2
6 3 3
;
data want(drop = c n);
do _N_ = 1 by 1 until (last.ID);
set have;
by ID;
if first.ID then c = 0;
array vv1{999} _temporary_;
array vv2{999} _temporary_;
c + v1;
vv1 [_N_] = c;
vv2 [_N_] = v2;
end;
n = _N_;
v4 = c;
do _N_ = 1 to _N_;
set have;
v3 = vv2[1 + n - _N_];
v4 = v4 - ifn(_N_ > 1, lag(v1), 0);
v5 = (v4 - v1) * (v3 ne 1);
output;
end;
call missing(of vv1[*], of vv2[*]);
run;
Result:
Obs ID v1 v2 v4 v3 v5 1 1 1 1 8 2 7 2 1 7 2 7 1 0 3 2 16 1 19 2 3 4 2 3 2 3 1 0 5 3 11 1 11 1 0 6 4 13 1 16 2 3 7 4 3 2 3 1 0 8 5 4 1 11 4 7 9 5 3 2 7 3 4 10 5 2 3 4 2 2 11 5 2 4 2 1 0 12 6 2 1 9 3 7 13 6 4 2 7 2 3 14 6 3 3 3 1 0
Anytime 🙂
data have; input ID v1 v2; datalines; 1 1 1 1 7 2 2 16 1 2 3 2 3 11 1 4 13 1 4 3 2 5 4 1 5 3 2 5 2 3 5 2 4 6 2 1 6 4 2 6 3 3 ; data temp; set have; by id; if first.id then n=0; n+1; run; proc sort data=temp;by id descending n;run; data temp1(keep=v3) temp2(keep=id n v4); set temp; by id; v3=v2; if first.id then v4=0; v4+v1; run; proc sort data=temp2;by id n;run; proc sort data=temp2 out=temp3;by id descending n;run; data temp3; set temp3; by id; v5=lag(v4); if first.id then v5=.; run; proc sort data=temp3;by id n;run; data want; merge have temp1 temp2(keep=v4) temp3(keep=v5); if v3=1 then v5=0; run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.