Dear all,
I have the following data.
I would like to linearly interpolate vertically the 0 values for each variable V1, V2, V3 etc. except the initial obs starting from id 1 until they reach a number different than zero.
For example for V1 id 1 & 2, I would like to replace 0's with the first non-zero obs = 170,463
for V2 id 1 , I would like to replace 0 with the first non-zero obs = 587,061 etc.
For the rest of obs I would like to get the interpolated values between two consecutive non-zero values.
Finally if after the last non-zero observation only 0's follow, the interpolation ends leaving remaining zeroes unchanged.
I would very much appreciate your help.
Best regards
Nik
Id | V1 | V2 | V3 | V4 | V5 | V6 | V7 |
1 | 0 | 0 | 313,297 | 0 | 0 | 191,998 | 0 |
2 | 0 | 587,061 | 0 | 0 | 574,356 | 0 | 0 |
3 | 170,463 | 0 | 0 | 157,126 | 0 | 0 | 453,651 |
4 | 0 | 0 | 178,991 | 0 | 0 | 375,225 | 0 |
5 | 0 | 49,953 | 0 | 0 | 354,196 | 0 | 0 |
6 | 73,251 | 0 | 0 | 196,670 | 0 | 0 | 365,625 |
7 | 0 | 0 | 142,764 | 0 | 0 | 397,749 | 0 |
8 | 0 | 43,863 | 0 | 0 | 104,785 | 0 | 0 |
9 | 72,576 | 0 | 0 | 208,359 | 0 | 0 | 440,703 |
10 | 0 | 0 | 139,264 | 0 | 0 | 275,076 | 0 |
11 | 0 | 32,148 | 0 | 0 | 99,595 | 0 | 0 |
12 | 69,456 | 0 | 0 | 185,833 | 0 | 0 | 256,350 |
13 | 0 | 0 | 120,520 | 0 | 0 | 275,076 | 0 |
14 | 0 | 204,020 | 0 | 0 | 49,595 | 0 | 0 |
15 | 322,905 | 0 | 0 | 156,963 | 0 | 0 | 251,390 |
16 | 0 | 0 | 20,520 | 0 | 0 | 275,076 | 0 |
17 | 0 | 204,020 | 0 | 0 | 49,595 | 0 | 0 |
18 | 213,405 | 0 | 0 | 156,963 | 0 | 0 | 236,590 |
19 | 0 | 0 | 20,520 | 0 | 0 | 172,976 | 0 |
20 | 0 | 204,020 | 0 | 0 | 49,595 | 0 | 0 |
21 | 213,405 | 0 | 0 | 142,172 | 0 | 0 | 416,590 |
22 | 0 | 0 | 20,520 | 0 | 0 | 172,976 | 0 |
23 | 0 | 293,820 | 0 | 0 | 49,595 | 0 | 0 |
24 | 213,405 | 0 | 0 | 142,172 | 0 | 0 | 100,000 |
25 | 0 | 0 | 20,520 | 0 | 0 | 150,896 | 0 |
26 | 0 | 18,820 | 0 | 0 | 10,000 | 0 | 0 |
27 | 213,405 | 0 | 0 | 90,200 | 0 | 0 | 100,000 |
28 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
29 | 0 | 1,980 | 0 | 0 | 0 | 0 | 0 |
30 | 241,500 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
31 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
32 | 0 | 1,980 | 0 | 0 | 0 | 0 | 0 |
33 | 241,500 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
34 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
35 | 0 | 1,980 | 0 | 0 | 0 | 0 | 0 |
36 | 241,500 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
37 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
38 | 0 | 1,980 | 0 | 0 | 0 | 0 | 0 |
39 | 241,500 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
40 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
41 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
42 | 107,000 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
43 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
44 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
45 | 107,000 | 0 | 0 | 75,000 | 0 | 0 | 200,000 |
46 | 0 | 0 | 0 | 0 | 0 | 104,800 | 0 |
47 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
48 | 107,000 | 0 | 0 | 0 | 0 | 0 | 200,000 |
Nikos: The following produces the same results as Data_null_'s and KSharp's code:
proc transpose data=have out=wide;
var v1-v7;
run;
data need (drop=value stopit i j last start finish first inc);
set wide;
array v_array(*) col:;
do i=1 to dim(v_array);
if v_array(i) gt 0 then do;
first=i;
value=v_array(i);
leave;
end;
end;
if first gt 1 then do i=1 to first-1;
v_array(i)=value;
end;
last=0;
do i=dim(v_array) to 1 by -1;
if v_array(i) gt 0 then do;
last=i;
leave;
end;
end;
do i=1 to last-1;
if v_array(i) gt 0 and v_array(i+1) eq 0 then do;
start=i;
finish=start;
stopit=0;
do until(stopit);
finish+1;
if finish eq last or v_array(finish) gt 0 then do;
inc=(v_array(finish)-v_array(start))/(finish-start);
do j=start+1 to finish-1;
v_array(j)=v_array(j-1)+inc;
end;
stopit=1;
end;
end;
end;
end;
run;
proc transpose data=need out=need2;
var col:;
id _name_;
run;
data want;
set have (keep=id);
set need2 (drop=_name_);
run;
Is it worth exploring solutions based on PROC EXPAND? Do you have access to SAS/ETS?
PG
Unfortunately, I do not have access to SAS/ETS.
Thank you.
Then a custom solution would be (I hope I got it right!) :
proc transpose data=have out=havelist;
by id;
var v:;
run;
proc sort data=havelist out=haveValue; by _name_ id; run;
data havePairs;
set haveValue(where=(col1>0)); by _name_;
idFrom = lag(id); valueFrom = lag(col1);
idTo = id; valueTo = col1;
if first._name_ then call missing(idFrom, valueFrom);
keep _name_ idFrom valueFrom idTo valueTo;
run;
proc sql;
create table wantList as
select
v.id,
v._Name_ as v,
case
when idTo is missing then 0
when idFrom is missing then valueTo
else (valueFrom * (idTo-Id) + valueTo * (Id-idFrom)) / (idTo-idFrom)
end as value format=comma10.1
from
haveValue as v left join
havePairs as p on
v._NAME_ = p._NAME_ and
v.id > idFrom and
v.Id <= idTo
order by id, v;
quit;
proc transpose data=wantList out=want(drop=_:);
by id;
id v;
var value;
run;
PG
I fixed my code above to give the proper interpolated values. - PG
Nikos: Given your example, what would you like the result to look like?
Arthur, to produce a new data set with the new calculated values.
Thank you.
Nikos: I'm not sure what new values you want.
For v1, you currently have:
Obs Id v1
1 1 0
2 2 0
3 3 170463
4 4 0
5 5 0
6 6 73251
7 7 0
8 8 0
9 9 72576
10 10 0
11 11 0
12 12 69456
13 13 0
14 14 0
15 15 322905
16 16 0
17 17 0
18 18 213405
19 19 0
20 20 0
21 21 213405
22 22 0
23 23 0
24 24 213405
25 25 0
26 26 0
27 27 213405
28 28 0
29 29 0
30 30 241500
31 31 0
32 32 0
33 33 241500
34 34 0
35 35 0
36 36 241500
37 37 0
38 38 0
39 39 241500
40 40 0
41 41 0
42 42 107000
43 43 0
44 44 0
45 45 107000
46 46 0
47 47 0
48 48 107000
What do you want for each of those 48 values?
Arthur, please see V1new and V2 new
Thank you
Id | V1 | V1new | V2 | V2new | ||
1 | 0 | 170,463 | 0 | 587,061 | ||
2 | 0 | 170,463 | 587,061 | 587,061 | ||
3 | 170,463 | 170,463 | 0 | interpolated value | ||
4 | 0 | interpolated value | 0 | interpolated value | ||
5 | 0 | interpolated value | 49,953 | 49,953 | ||
6 | 73,251 | 73,251 | 0 | interpolated value | ||
7 | 0 | interpolated value | 0 | interpolated value | ||
8 | 0 | interpolated value | 43,863 | 43,863 | ||
9 | 72,576 | 72,576 | 0 | interpolated value | ||
10 | 0 | interpolated value | 0 | interpolated value | ||
11 | 0 | interpolated value | 32,148 | 32,148 | ||
12 | 69,456 | 69,456 | 0 | interpolated value | ||
13 | 0 | interpolated value | 0 | interpolated value | ||
14 | 0 | interpolated value | 204,020 | 204,020 | ||
15 | 322,905 | 322,905 | 0 | interpolated value | ||
16 | 0 | interpolated value | 0 | interpolated value | ||
17 | 0 | interpolated value | 204,020 | 204,020 | ||
18 | 213,405 | 213,405 | 0 | interpolated value | ||
19 | 0 | interpolated value | 0 | interpolated value | ||
20 | 0 | interpolated value | 204,020 | 204,020 | ||
21 | 213,405 | 213,405 | 0 | interpolated value | ||
22 | 0 | interpolated value | 0 | interpolated value | ||
23 | 0 | interpolated value | 293,820 | 293,820 | ||
24 | 213,405 | 213,405 | 0 | interpolated value | ||
25 | 0 | interpolated value | 0 | interpolated value | ||
26 | 0 | interpolated value | 18,820 | 18,820 | ||
27 | 213,405 | 213,405 | 0 | interpolated value | ||
28 | 0 | interpolated value | 0 | interpolated value | ||
29 | 0 | interpolated value | 1,980 | 1,980 | ||
30 | 241,500 | 241,500 | 0 | interpolated value | ||
31 | 0 | interpolated value | 0 | interpolated value | ||
32 | 0 | interpolated value | 1,980 | 1,980 | ||
33 | 241,500 | 241,500 | 0 | interpolated value | ||
34 | 0 | interpolated value | 0 | interpolated value | ||
35 | 0 | interpolated value | 1,980 | 1,980 | ||
36 | 241,500 | 241,500 | 0 | interpolated value | ||
37 | 0 | interpolated value | 0 | interpolated value | ||
38 | 0 | interpolated value | 1,980 | 1,980 | ||
39 | 241,500 | 241,500 | 0 | 0 | ||
40 | 0 | interpolated value | 0 | 0 | ||
41 | 0 | interpolated value | 0 | 0 | ||
42 | 107,000 | 107,000 | 0 | 0 | ||
43 | 0 | interpolated value | 0 | 0 | ||
44 | 0 | interpolated value | 0 | 0 | ||
45 | 107,000 | 107,000 | 0 | 0 | ||
46 | 0 | interpolated value | 0 | 0 | ||
47 | 0 | interpolated value | 0 | 0 | ||
48 | 107,000 | 107,000 | 0 | 0 |
Message was edited by: data _null_
Take V2 as an example.
data have; input Id (V1 V2 V3 V4 V5 V6 V7) (: comma20.) ; cards; 1 0 0 313,297 0 0 191,998 0 2 0 587,061 0 0 574,356 0 0 3 170,463 0 0 157,126 0 0 453,651 4 0 0 178,991 0 0 375,225 0 5 0 49,953 0 0 354,196 0 0 6 73,251 0 0 196,670 0 0 365,625 7 0 0 142,764 0 0 397,749 0 8 0 43,863 0 0 104,785 0 0 9 72,576 0 0 208,359 0 0 440,703 10 0 0 139,264 0 0 275,076 0 11 0 32,148 0 0 99,595 0 0 12 69,456 0 0 185,833 0 0 256,350 13 0 0 120,520 0 0 275,076 0 14 0 204,020 0 0 49,595 0 0 15 322,905 0 0 156,963 0 0 251,390 16 0 0 20,520 0 0 275,076 0 17 0 204,020 0 0 49,595 0 0 18 213,405 0 0 156,963 0 0 236,590 19 0 0 20,520 0 0 172,976 0 20 0 204,020 0 0 49,595 0 0 21 213,405 0 0 142,172 0 0 416,590 22 0 0 20,520 0 0 172,976 0 23 0 293,820 0 0 49,595 0 0 24 213,405 0 0 142,172 0 0 100,000 25 0 0 20,520 0 0 150,896 0 26 0 18,820 0 0 10,000 0 0 27 213,405 0 0 90,200 0 0 100,000 28 0 0 0 0 0 104,800 0 29 0 1,980 0 0 0 0 0 30 241,500 0 0 75,000 0 0 200,000 31 0 0 0 0 0 104,800 0 32 0 1,980 0 0 0 0 0 33 241,500 0 0 75,000 0 0 200,000 34 0 0 0 0 0 104,800 0 35 0 1,980 0 0 0 0 0 36 241,500 0 0 75,000 0 0 200,000 37 0 0 0 0 0 104,800 0 38 0 1,980 0 0 0 0 0 39 241,500 0 0 75,000 0 0 200,000 40 0 0 0 0 0 104,800 0 41 0 0 0 0 0 0 0 42 107,000 0 0 75,000 0 0 200,000 43 0 0 0 0 0 104,800 0 44 0 0 0 0 0 0 0 45 107,000 0 0 75,000 0 0 200,000 46 0 0 0 0 0 104,800 0 47 0 0 0 0 0 0 0 48 107,000 0 0 0 0 0 200,000 ; run; data temp; retain found; do until(v2 ne 0 or end ) ; set have end=end; end; _v=v2; do until(v2 ne 0 or last ) ; set have end=last; new_v2=ifn(found,v2,_v); if new_v2 ne 0 or (new_v2 eq 0 and _v eq 0 ) then output; end; found=1; keep id new_v2; run; data want; merge temp temp(firstobs=2 rename=(id=_id new_v2=_new_v2)); output; inter=divide((_new_v2-new_v2),(_id-id)) ; if missing(_id) then _id=0; do i=id+1 to _id-1; id=i;new_v2=new_v2+inter;output; end; keep id new_v2; run;
Xia Keshan
Message was edited by: xia keshan
Message was edited by: xia keshan
Dear Xia,
I need to apply it across all variables not anyone in particular (V2 was an example how I wish all variables to end up V1-V7)
Thank you
(Unfortunately proc iml is not included in my modules)
You can make a maro and go through all of these variables.
data have;
input Id (V1 V2 V3 V4 V5 V6 V7) (: comma20.) ;
cards;
1 0 0 313,297 0 0 191,998 0
2 0 587,061 0 0 574,356 0 0
3 170,463 0 0 157,126 0 0 453,651
4 0 0 178,991 0 0 375,225 0
5 0 49,953 0 0 354,196 0 0
6 73,251 0 0 196,670 0 0 365,625
7 0 0 142,764 0 0 397,749 0
8 0 43,863 0 0 104,785 0 0
9 72,576 0 0 208,359 0 0 440,703
10 0 0 139,264 0 0 275,076 0
11 0 32,148 0 0 99,595 0 0
12 69,456 0 0 185,833 0 0 256,350
13 0 0 120,520 0 0 275,076 0
14 0 204,020 0 0 49,595 0 0
15 322,905 0 0 156,963 0 0 251,390
16 0 0 20,520 0 0 275,076 0
17 0 204,020 0 0 49,595 0 0
18 213,405 0 0 156,963 0 0 236,590
19 0 0 20,520 0 0 172,976 0
20 0 204,020 0 0 49,595 0 0
21 213,405 0 0 142,172 0 0 416,590
22 0 0 20,520 0 0 172,976 0
23 0 293,820 0 0 49,595 0 0
24 213,405 0 0 142,172 0 0 100,000
25 0 0 20,520 0 0 150,896 0
26 0 18,820 0 0 10,000 0 0
27 213,405 0 0 90,200 0 0 100,000
28 0 0 0 0 0 104,800 0
29 0 1,980 0 0 0 0 0
30 241,500 0 0 75,000 0 0 200,000
31 0 0 0 0 0 104,800 0
32 0 1,980 0 0 0 0 0
33 241,500 0 0 75,000 0 0 200,000
34 0 0 0 0 0 104,800 0
35 0 1,980 0 0 0 0 0
36 241,500 0 0 75,000 0 0 200,000
37 0 0 0 0 0 104,800 0
38 0 1,980 0 0 0 0 0
39 241,500 0 0 75,000 0 0 200,000
40 0 0 0 0 0 104,800 0
41 0 0 0 0 0 0 0
42 107,000 0 0 75,000 0 0 200,000
43 0 0 0 0 0 104,800 0
44 0 0 0 0 0 0 0
45 107,000 0 0 75,000 0 0 200,000
46 0 0 0 0 0 104,800 0
47 0 0 0 0 0 0 0
48 107,000 0 0 0 0 0 200,000
;
run;
%macro inter(vname=);
data temp ;
retain found;
do until(&vname ne 0 or end ) ;
set have end=end;
end;
_v=&vname ;
do until(&vname ne 0 or last ) ;
set have end=last;
new_&vname =ifn(found,&vname ,_v);
if new_&vname ne 0 or (new_&vname eq 0 and _v eq 0) then output;
end;
found=1;
keep id new_&vname ;
run;
data want_&vname ;
merge temp temp(firstobs=2 rename=(id=_id new_&vname =_new_&vname ));
output;
inter=divide((_new_&vname - new_&vname),(_id-id)) ;
if missing(_id) then _id=0;
do i=id+1 to _id-1;
id=i;new_&vname =new_&vname +inter;output;
end;
keep id new_&vname ;
run;
%mend inter;
%inter(vname=v1)
%inter(vname=v2)
%inter(vname=v3)
%inter(vname=v4)
%inter(vname=v5)
%inter(vname=v6)
%inter(vname=v7)
data want;
merge want_:;
by id;
run;
Xia Keshan
Message was edited by: xia keshan
Nikos: The following produces the same results as Data_null_'s and KSharp's code:
proc transpose data=have out=wide;
var v1-v7;
run;
data need (drop=value stopit i j last start finish first inc);
set wide;
array v_array(*) col:;
do i=1 to dim(v_array);
if v_array(i) gt 0 then do;
first=i;
value=v_array(i);
leave;
end;
end;
if first gt 1 then do i=1 to first-1;
v_array(i)=value;
end;
last=0;
do i=dim(v_array) to 1 by -1;
if v_array(i) gt 0 then do;
last=i;
leave;
end;
end;
do i=1 to last-1;
if v_array(i) gt 0 and v_array(i+1) eq 0 then do;
start=i;
finish=start;
stopit=0;
do until(stopit);
finish+1;
if finish eq last or v_array(finish) gt 0 then do;
inc=(v_array(finish)-v_array(start))/(finish-start);
do j=start+1 to finish-1;
v_array(j)=v_array(j-1)+inc;
end;
stopit=1;
end;
end;
end;
end;
run;
proc transpose data=need out=need2;
var col:;
id _name_;
run;
data want;
set have (keep=id);
set need2 (drop=_name_);
run;
Without SAS/ETS you can use PROC TRANSREG:
%macro foo;
data v_foo/view=v_foo;
set foo;
array v[7];
do _n_=1 to 7;
if v[_n_]=0 then call missing(v[_n_]);
end;
run;
proc means data=v_foo;
var v1-v7;
output out=nknots n= /autoname;
run;
data _null_;
set nknots;
array v[7] v1_n--v7_n;
do i=1 to 7;
call symputx(cats('nknots', i), v-2);
end;
stop;
run;
data foospline;
set foo;
call missing(of pv1-pv7);
run;
%do i=1 %to 7;
proc transreg data=v_foo;
model identity(v&i.)=spline(id / degree=1 nknots=&&nknots&i..);
output out=fooLin&i.(keep=id pv&i.) predicted;
run;
data foospline;
update foospline foolin&i.;
by id;
run;
%end;
%mend;
%foo
%macro foo;
data v_foo/view=v_foo;
set foo;
array v[7];
do _n_=1 to 7;
if v[_n_]=0 then call missing(v[_n_]);
end;
run;
proc means data=v_foo;
var v1-v7;
output out=nknots n= /autoname;
run;
data _null_;
set nknots;
array v[7] v1_n--v7_n;
do i=1 to 7;
call symputx(cats('nknots', i), v-2);
end;
stop;
run;
data foospline;
set foo;
call missing(of pv1-pv7);
run;
%do i=1 %to 7;
proc transreg data=v_foo;
model identity(v&i.)=spline(id / degree=1 nknots=&&nknots&i..);
output out=fooLin&i.(keep=id pv&i.) predicted;
run;
data foospline;
update foospline foolin&i.;
by id;
run;
%end;
%mend;
%foo
With SAS/ETS:
data foo;
set foo;
array v[7];
do _n_=1 to 7;
if v[_n_]=0 then call missing(v[_n_]);
end;
run;
*remove comments to forward copy over beginning missing values in series;
/*
data a;
set foo;
a=1;
run;
proc sort data=a; by descending id; run;
data a(drop=a);
update a(obs=0) a;
by a;
run;
data foo;
update foo a;
by id;
run;
*/
%macro foo;
proc expand data=foo out=fooLin;
%do i=1 %to 7;
convert v&i.=pv&i. /method=join; *or could be convert v1-v7 /method=join --without macro, difference is no copy of input values;
%end;
run;
%mend;
%foo
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.