Dear all,
I have the following data.
I would like to linearly interpolate vertically the 0 values for each variable V1, V2, V3 etc. except the initial obs starting from id 1 until they reach a number different than zero.
For example for V1 id 1 & 2, I would like to replace 0's with the first non-zero obs = 170,463
for V2 id 1 , I would like to replace 0 with the first non-zero obs = 587,061 etc.
For the rest of obs I would like to get the interpolated values between two consecutive non-zero values.
Finally if after the last non-zero observation only 0's follow, the interpolation ends leaving remaining zeroes unchanged.
I would very much appreciate your help.
Best regards
Nik
Id | V1 | V2 | V3 | V4 | V5 | V6 | V7 |
1 | 0 | 0 | 313,297 | 0 | 0 | 191,998 | 0 |
2 | 0 | 587,061 | 0 | 0 | 574,356 | 0 | 0 |
3 | 170,463 | 0 | 0 | 157,126 | 0 | 0 | 453,651 |
4 | 0 | 0 | 178,991 | 0 | 0 | 375,225 | 0 |
5 | 0 | 49,953 | 0 | 0 | 354,196 | 0 | 0 |
6 | 73,251 | 0 | 0 | 196,670 | 0 | 0 | 365,625 |
7 | 0 | 0 | 142,764 | 0 | 0 | 397,749 | 0 |
8 | 0 | 43,863 | 0 | 0 | 104,785 | 0 | 0 |
9 | 72,576 | 0 | 0 | 208,359 | 0 | 0 | 440,703 |
10 | 0 | 0 | 139,264 | 0 | 0 | 275,076 | 0 |
11 | 0 | 32,148 | 0 | 0 | 99,595 | 0 | 0 |
12 | 69,456 | 0 | 0 | 185,833 | 0 | 0 | 256,350 |
13 | 0 | 0 | 120,520 | 0 | 0 | 275,076 | 0 |
14 | 0 | 204,020 | 0 | 0 | 49,595 | 0 | 0 |
15 | 322,905 | 0 | 0 | 156,963 | 0 | 0 | 251,390 |
16 | 0 | 0 | 20,520 | 0 | 0 | 275,076 | 0 |
17 | 0 | 204,020 | 0 | 0 | 49,595 | 0 | 0 |
18 | 213,405 | 0 | 0 | 156,963 | 0 | 0 | 236,590 |
19 | 0 | 0 | 20,520 | 0 | 0 | 172,976 | 0 |
20 | 0 | 204,020 | 0 | 0 | 49,595 | 0 | 0 |
21 | 213,405 | 0 | 0 | 142,172 | 0 | 0 | 416,590 |
22 | 0 | 0 | 20,520 | 0 | 0 | 172,976 | 0 |
23 | 0 | 293,820 | 0 | 0 | 49,595 | 0 | 0 |
24 | 213,405 | 0 | 0 | 142,172 | 0 | 0 | 100,000 |
25 | 0 | 0 | 20,520 | 0 | 0 | 150,896 | 0 |
26 | 0 | 18,820 | 0 | 0 | 10,000 | 0 | 0 |
27 | 213,405 | 0 | 0 | 90,200 | 0 | 0 | 100,000 |
28 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
29 | 0 | 1,980 | 0 | 0 | 0 | 0 | 0 |
30 | 241,500 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
31 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
32 | 0 | 1,980 | 0 | 0 | 0 | 0 | 0 |
33 | 241,500 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
34 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
35 | 0 | 1,980 | 0 | 0 | 0 | 0 | 0 |
36 | 241,500 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
37 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
38 | 0 | 1,980 | 0 | 0 | 0 | 0 | 0 |
39 | 241,500 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
40 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
41 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
42 | 107,000 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
43 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
44 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
45 | 107,000 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
46 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
47 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
48 | 107,000 | 0 | 0 | 0 | 0 | 0 | 200,000 |
Nikos: The following produces the same results as Data_null_'s and KSharp's code:
proc transpose data=have out=wide;
var v1-v7;
run;
data need (drop=value stopit i j last start finish first inc);
set wide;
array v_array(*) col:;
do i=1 to dim(v_array);
if v_array(i) gt 0 then do;
first=i;
value=v_array(i);
leave;
end;
end;
if first gt 1 then do i=1 to first-1;
v_array(i)=value;
end;
last=0;
do i=dim(v_array) to 1 by -1;
if v_array(i) gt 0 then do;
last=i;
leave;
end;
end;
do i=1 to last-1;
if v_array(i) gt 0 and v_array(i+1) eq 0 then do;
start=i;
finish=start;
stopit=0;
do until(stopit);
finish+1;
if finish eq last or v_array(finish) gt 0 then do;
inc=(v_array(finish)-v_array(start))/(finish-start);
do j=start+1 to finish-1;
v_array(j)=v_array(j-1)+inc;
end;
stopit=1;
end;
end;
end;
end;
run;
proc transpose data=need out=need2;
var col:;
id _name_;
run;
data want;
set have (keep=id);
set need2 (drop=_name_);
run;
Is it worth exploring solutions based on PROC EXPAND? Do you have access to SAS/ETS?
PG
Unfortunately, I do not have access to SAS/ETS.
Thank you.
Then a custom solution would be (I hope I got it right!) :
proc transpose data=have out=havelist;
by id;
var v:;
run;
proc sort data=havelist out=haveValue; by _name_ id; run;
data havePairs;
set haveValue(where=(col1>0)); by _name_;
idFrom = lag(id); valueFrom = lag(col1);
idTo = id; valueTo = col1;
if first._name_ then call missing(idFrom, valueFrom);
keep _name_ idFrom valueFrom idTo valueTo;
run;
proc sql;
create table wantList as
select
v.id,
v._Name_ as v,
case
when idTo is missing then 0
when idFrom is missing then valueTo
else (valueFrom * (idTo-Id) + valueTo * (Id-idFrom)) / (idTo-idFrom)
end as value format=comma10.1
from
haveValue as v left join
havePairs as p on
v._NAME_ = p._NAME_ and
v.id > idFrom and
v.Id <= idTo
order by id, v;
quit;
proc transpose data=wantList out=want(drop=_:);
by id;
id v;
var value;
run;
PG
I fixed my code above to give the proper interpolated values. - PG
Nikos: Given your example, what would you like the result to look like?
Arthur, to produce a new data set with the new calculated values.
Thank you.
Nikos: I'm not sure what new values you want.
For v1, you currently have:
Obs Id v1
1 1 0
2 2 0
3 3 170463
4 4 0
5 5 0
6 6 73251
7 7 0
8 8 0
9 9 72576
10 10 0
11 11 0
12 12 69456
13 13 0
14 14 0
15 15 322905
16 16 0
17 17 0
18 18 213405
19 19 0
20 20 0
21 21 213405
22 22 0
23 23 0
24 24 213405
25 25 0
26 26 0
27 27 213405
28 28 0
29 29 0
30 30 241500
31 31 0
32 32 0
33 33 241500
34 34 0
35 35 0
36 36 241500
37 37 0
38 38 0
39 39 241500
40 40 0
41 41 0
42 42 107000
43 43 0
44 44 0
45 45 107000
46 46 0
47 47 0
48 48 107000
What do you want for each of those 48 values?
Arthur, please see V1new and V2 new
Thank you
Id | V1 | V1new | V2 | V2new | ||
1 | 0 | 170,463 | 0 | 587,061 | ||
2 | 0 | 170,463 | 587,061 | 587,061 | ||
3 | 170,463 | 170,463 | 0 | interpolated value | ||
4 | 0 | interpolated value | 0 | interpolated value | ||
5 | 0 | interpolated value | 49,953 | 49,953 | ||
6 | 73,251 | 73,251 | 0 | interpolated value | ||
7 | 0 | interpolated value | 0 | interpolated value | ||
8 | 0 | interpolated value | 43,863 | 43,863 | ||
9 | 72,576 | 72,576 | 0 | interpolated value | ||
10 | 0 | interpolated value | 0 | interpolated value | ||
11 | 0 | interpolated value | 32,148 | 32,148 | ||
12 | 69,456 | 69,456 | 0 | interpolated value | ||
13 | 0 | interpolated value | 0 | interpolated value | ||
14 | 0 | interpolated value | 204,020 | 204,020 | ||
15 | 322,905 | 322,905 | 0 | interpolated value | ||
16 | 0 | interpolated value | 0 | interpolated value | ||
17 | 0 | interpolated value | 204,020 | 204,020 | ||
18 | 213,405 | 213,405 | 0 | interpolated value | ||
19 | 0 | interpolated value | 0 | interpolated value | ||
20 | 0 | interpolated value | 204,020 | 204,020 | ||
21 | 213,405 | 213,405 | 0 | interpolated value | ||
22 | 0 | interpolated value | 0 | interpolated value | ||
23 | 0 | interpolated value | 293,820 | 293,820 | ||
24 | 213,405 | 213,405 | 0 | interpolated value | ||
25 | 0 | interpolated value | 0 | interpolated value | ||
26 | 0 | interpolated value | 18,820 | 18,820 | ||
27 | 213,405 | 213,405 | 0 | interpolated value | ||
28 | 0 | interpolated value | 0 | interpolated value | ||
29 | 0 | interpolated value | 1,980 | 1,980 | ||
30 | 241,500 | 241,500 | 0 | interpolated value | ||
31 | 0 | interpolated value | 0 | interpolated value | ||
32 | 0 | interpolated value | 1,980 | 1,980 | ||
33 | 241,500 | 241,500 | 0 | interpolated value | ||
34 | 0 | interpolated value | 0 | interpolated value | ||
35 | 0 | interpolated value | 1,980 | 1,980 | ||
36 | 241,500 | 241,500 | 0 | interpolated value | ||
37 | 0 | interpolated value | 0 | interpolated value | ||
38 | 0 | interpolated value | 1,980 | 1,980 | ||
39 | 241,500 | 241,500 | 0 | 0 | ||
40 | 0 | interpolated value | 0 | 0 | ||
41 | 0 | interpolated value | 0 | 0 | ||
42 | 107,000 | 107,000 | 0 | 0 | ||
43 | 0 | interpolated value | 0 | 0 | ||
44 | 0 | interpolated value | 0 | 0 | ||
45 | 107,000 | 107,000 | 0 | 0 | ||
46 | 0 | interpolated value | 0 | 0 | ||
47 | 0 | interpolated value | 0 | 0 | ||
48 | 107,000 | 107,000 | 0 | 0 |
Message was edited by: data _null_
Take V2 as an example.
data have; input Id (V1 V2 V3 V4 V5 V6 V7) (: comma20.) ; cards; 1 0 0 313,297 0 0 191,998 0 2 0 587,061 0 0 574,356 0 0 3 170,463 0 0 157,126 0 0 453,651 4 0 0 178,991 0 0 375,225 0 5 0 49,953 0 0 354,196 0 0 6 73,251 0 0 196,670 0 0 365,625 7 0 0 142,764 0 0 397,749 0 8 0 43,863 0 0 104,785 0 0 9 72,576 0 0 208,359 0 0 440,703 10 0 0 139,264 0 0 275,076 0 11 0 32,148 0 0 99,595 0 0 12 69,456 0 0 185,833 0 0 256,350 13 0 0 120,520 0 0 275,076 0 14 0 204,020 0 0 49,595 0 0 15 322,905 0 0 156,963 0 0 251,390 16 0 0 20,520 0 0 275,076 0 17 0 204,020 0 0 49,595 0 0 18 213,405 0 0 156,963 0 0 236,590 19 0 0 20,520 0 0 172,976 0 20 0 204,020 0 0 49,595 0 0 21 213,405 0 0 142,172 0 0 416,590 22 0 0 20,520 0 0 172,976 0 23 0 293,820 0 0 49,595 0 0 24 213,405 0 0 142,172 0 0 100,000 25 0 0 20,520 0 0 150,896 0 26 0 18,820 0 0 10,000 0 0 27 213,405 0 0 90,200 0 0 100,000 28 0 0 0 0 0 104,800 0 29 0 1,980 0 0 0 0 0 30 241,500 0 0 75,000 0 0 200,000 31 0 0 0 0 0 104,800 0 32 0 1,980 0 0 0 0 0 33 241,500 0 0 75,000 0 0 200,000 34 0 0 0 0 0 104,800 0 35 0 1,980 0 0 0 0 0 36 241,500 0 0 75,000 0 0 200,000 37 0 0 0 0 0 104,800 0 38 0 1,980 0 0 0 0 0 39 241,500 0 0 75,000 0 0 200,000 40 0 0 0 0 0 104,800 0 41 0 0 0 0 0 0 0 42 107,000 0 0 75,000 0 0 200,000 43 0 0 0 0 0 104,800 0 44 0 0 0 0 0 0 0 45 107,000 0 0 75,000 0 0 200,000 46 0 0 0 0 0 104,800 0 47 0 0 0 0 0 0 0 48 107,000 0 0 0 0 0 200,000 ; run; data temp; retain found; do until(v2 ne 0 or end ) ; set have end=end; end; _v=v2; do until(v2 ne 0 or last ) ; set have end=last; new_v2=ifn(found,v2,_v); if new_v2 ne 0 or (new_v2 eq 0 and _v eq 0 ) then output; end; found=1; keep id new_v2; run; data want; merge temp temp(firstobs=2 rename=(id=_id new_v2=_new_v2)); output; inter=divide((_new_v2-new_v2),(_id-id)) ; if missing(_id) then _id=0; do i=id+1 to _id-1; id=i;new_v2=new_v2+inter;output; end; keep id new_v2; run;
Xia Keshan
Message was edited by: xia keshan
Message was edited by: xia keshan
Dear Xia,
I need to apply it across all variables not anyone in particular (V2 was an example how I wish all variables to end up V1-V7)
Thank you
(Unfortunately proc iml is not included in my modules)
You can make a maro and go through all of these variables.
data have;
input Id (V1 V2 V3 V4 V5 V6 V7) (: comma20.) ;
cards;
1 0 0 313,297 0 0 191,998 0
2 0 587,061 0 0 574,356 0 0
3 170,463 0 0 157,126 0 0 453,651
4 0 0 178,991 0 0 375,225 0
5 0 49,953 0 0 354,196 0 0
6 73,251 0 0 196,670 0 0 365,625
7 0 0 142,764 0 0 397,749 0
8 0 43,863 0 0 104,785 0 0
9 72,576 0 0 208,359 0 0 440,703
10 0 0 139,264 0 0 275,076 0
11 0 32,148 0 0 99,595 0 0
12 69,456 0 0 185,833 0 0 256,350
13 0 0 120,520 0 0 275,076 0
14 0 204,020 0 0 49,595 0 0
15 322,905 0 0 156,963 0 0 251,390
16 0 0 20,520 0 0 275,076 0
17 0 204,020 0 0 49,595 0 0
18 213,405 0 0 156,963 0 0 236,590
19 0 0 20,520 0 0 172,976 0
20 0 204,020 0 0 49,595 0 0
21 213,405 0 0 142,172 0 0 416,590
22 0 0 20,520 0 0 172,976 0
23 0 293,820 0 0 49,595 0 0
24 213,405 0 0 142,172 0 0 100,000
25 0 0 20,520 0 0 150,896 0
26 0 18,820 0 0 10,000 0 0
27 213,405 0 0 90,200 0 0 100,000
28 0 0 0 0 0 104,800 0
29 0 1,980 0 0 0 0 0
30 241,500 0 0 75,000 0 0 200,000
31 0 0 0 0 0 104,800 0
32 0 1,980 0 0 0 0 0
33 241,500 0 0 75,000 0 0 200,000
34 0 0 0 0 0 104,800 0
35 0 1,980 0 0 0 0 0
36 241,500 0 0 75,000 0 0 200,000
37 0 0 0 0 0 104,800 0
38 0 1,980 0 0 0 0 0
39 241,500 0 0 75,000 0 0 200,000
40 0 0 0 0 0 104,800 0
41 0 0 0 0 0 0 0
42 107,000 0 0 75,000 0 0 200,000
43 0 0 0 0 0 104,800 0
44 0 0 0 0 0 0 0
45 107,000 0 0 75,000 0 0 200,000
46 0 0 0 0 0 104,800 0
47 0 0 0 0 0 0 0
48 107,000 0 0 0 0 0 200,000
;
run;
%macro inter(vname=);
data temp ;
retain found;
do until(&vname ne 0 or end ) ;
set have end=end;
end;
_v=&vname ;
do until(&vname ne 0 or last ) ;
set have end=last;
new_&vname =ifn(found,&vname ,_v);
if new_&vname ne 0 or (new_&vname eq 0 and _v eq 0) then output;
end;
found=1;
keep id new_&vname ;
run;
data want_&vname ;
merge temp temp(firstobs=2 rename=(id=_id new_&vname =_new_&vname ));
output;
inter=divide((_new_&vname - new_&vname),(_id-id)) ;
if missing(_id) then _id=0;
do i=id+1 to _id-1;
id=i;new_&vname =new_&vname +inter;output;
end;
keep id new_&vname ;
run;
%mend inter;
%inter(vname=v1)
%inter(vname=v2)
%inter(vname=v3)
%inter(vname=v4)
%inter(vname=v5)
%inter(vname=v6)
%inter(vname=v7)
data want;
merge want_:;
by id;
run;
Xia Keshan
Message was edited by: xia keshan
Nikos: The following produces the same results as Data_null_'s and KSharp's code:
proc transpose data=have out=wide;
var v1-v7;
run;
data need (drop=value stopit i j last start finish first inc);
set wide;
array v_array(*) col:;
do i=1 to dim(v_array);
if v_array(i) gt 0 then do;
first=i;
value=v_array(i);
leave;
end;
end;
if first gt 1 then do i=1 to first-1;
v_array(i)=value;
end;
last=0;
do i=dim(v_array) to 1 by -1;
if v_array(i) gt 0 then do;
last=i;
leave;
end;
end;
do i=1 to last-1;
if v_array(i) gt 0 and v_array(i+1) eq 0 then do;
start=i;
finish=start;
stopit=0;
do until(stopit);
finish+1;
if finish eq last or v_array(finish) gt 0 then do;
inc=(v_array(finish)-v_array(start))/(finish-start);
do j=start+1 to finish-1;
v_array(j)=v_array(j-1)+inc;
end;
stopit=1;
end;
end;
end;
end;
run;
proc transpose data=need out=need2;
var col:;
id _name_;
run;
data want;
set have (keep=id);
set need2 (drop=_name_);
run;
Without SAS/ETS you can use PROC TRANSREG:
%macro foo;
data v_foo/view=v_foo;
set foo;
array v[7];
do _n_=1 to 7;
if v[_n_]=0 then call missing(v[_n_]);
end;
run;
proc means data=v_foo;
var v1-v7;
output out=nknots n= /autoname;
run;
data _null_;
set nknots;
array v[7] v1_n--v7_n;
do i=1 to 7;
call symputx(cats('nknots', i), v-2);
end;
stop;
run;
data foospline;
set foo;
call missing(of pv1-pv7);
run;
%do i=1 %to 7;
proc transreg data=v_foo;
model identity(v&i.)=spline(id / degree=1 nknots=&&nknots&i..);
output out=fooLin&i.(keep=id pv&i.) predicted;
run;
data foospline;
update foospline foolin&i.;
by id;
run;
%end;
%mend;
%foo
%macro foo;
data v_foo/view=v_foo;
set foo;
array v[7];
do _n_=1 to 7;
if v[_n_]=0 then call missing(v[_n_]);
end;
run;
proc means data=v_foo;
var v1-v7;
output out=nknots n= /autoname;
run;
data _null_;
set nknots;
array v[7] v1_n--v7_n;
do i=1 to 7;
call symputx(cats('nknots', i), v-2);
end;
stop;
run;
data foospline;
set foo;
call missing(of pv1-pv7);
run;
%do i=1 %to 7;
proc transreg data=v_foo;
model identity(v&i.)=spline(id / degree=1 nknots=&&nknots&i..);
output out=fooLin&i.(keep=id pv&i.) predicted;
run;
data foospline;
update foospline foolin&i.;
by id;
run;
%end;
%mend;
%foo
With SAS/ETS:
data foo;
set foo;
array v[7];
do _n_=1 to 7;
if v[_n_]=0 then call missing(v[_n_]);
end;
run;
*remove comments to forward copy over beginning missing values in series;
/*
data a;
set foo;
a=1;
run;
proc sort data=a; by descending id; run;
data a(drop=a);
update a(obs=0) a;
by a;
run;
data foo;
update foo a;
by id;
run;
*/
%macro foo;
proc expand data=foo out=fooLin;
%do i=1 %to 7;
convert v&i.=pv&i. /method=join; *or could be convert v1-v7 /method=join --without macro, difference is no copy of input values;
%end;
run;
%mend;
%foo
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.