Solved: filling missing values

bkoksal · Posted 02-08-2012 02:59 AM

Dear Stata Users,

I have a data set similar to the below. I want to have two new variables such that for missing values of y, the first variable will have the first previous nonmissing value of y, and the second variable will have the first next nonmissing value of y. Basically, I want to fill missing values of y so that these values are equal to the average of previous and next nonmissing values of y. Thanks.

Have

t	y
1	54
2	66
3	.
4	87
5	92
6	.
7	.
8	8
9	21
10	53
11	89
12	.
13	.
14	.
15	75
16	80
17	79
18	45
19	94
20	76

Want:

t	y	lag	for	new_y
1	54	.	.	54
2	66	.	.	66
3	.	66	87	76.5
4	87	.	.	87
5	92	.	.	92
6	.	92	8	50
7	.	92	8	50
8	8	.	.	8
9	21	.	.	21
10	53	.	.	53
11	89	.	.	89
12	.	89	75	82
13	.	89	75	82
14	.	89	75	82
15	75	.	.	75
16	80	.	.	80
17	79	.	.	79
18	45	.	.	45
19	94	.	.	94
20	76	.	.	76

data_null__ · Posted 02-08-2012 09:07 AM

In SAS/Graph this is called STEP interpolation both LEFT and RIGHT. Here is how I think it is done with PROC EXPAND. I don't know if PROC EXPAND can be made to transform Y1 and Y2 into a mean. You could do that in a data step.

data Have;
   input t  y @@;
   cards;
1     54 2  66 3  .
4     87 5  92 6  .
7     .  8  8  9  21
10    53 11 89 12 .
13    .  14 .  15 75
16    80 17 79 18 45
19    94 20 76
;;;;
   run;
 
proc expand data=have out=test;
   id t;
   convert y=y1 / method=step;
   convert y=y2 / tin=(reverse) tout=(reverse) method=step;
   run;
proc print;
   run;

Obs t y1 y2 y

1 1 54 54 54

2 2 66 66 66

3 3 66 87 .

4 4 87 87 87

5 5 92 92 92

6 6 92 8 .

7 7 92 8 .

8 8 8 8 8

9 9 21 21 21

10 10 53 53 53

11 11 89 89 89

12 12 89 75 .

13 13 89 75 .

14 14 89 75 .

15 15 75 75 75

16 16 80 80 80

17 17 79 79 79

18 18 45 45 45

19 19 94 94 94

20 20 76 76 76

View solution in original post

Haikuo · Posted 02-08-2012 08:19 AM

Well, I could not figure out how to do it in one data step. And I am sure someone will come up with more elegant approaches.

data have;

infile cards;

input Havet y;

cards;

1 54

2 66

3 .

4 87

5 92

6 .

7 .

8 8

9 21

10 53

11 89

12 .

13 .

14 .

15 75

16 80

17 79

18 45

19 94

20 76

;

data want1 (drop=_:);

do _n_=1 by 1 until (eof);

set have end=eof;

_lag_y=ifn(y ne ., y,_lag_y);

lag_y=ifn(y ne ., .,_lag_y);

output;

end;

run;

proc sort data=want1;

by descending havet;

run;

data want2 (drop=_:);

do _n_=1 by 1 until (eof);

set want1 end=eof;

_lead_y=ifn(y ne ., y,_lead_y);

lead_y=ifn(y ne ., .,_lead_y);

new_y=ifn(y ne ., y,mean(lag_y,lead_y));

output;

end;

run;

proc sort data=want2;

by havet;

run;

Regards,

Haikou

data_null__ · Posted 02-08-2012 09:07 AM

In SAS/Graph this is called STEP interpolation both LEFT and RIGHT. Here is how I think it is done with PROC EXPAND. I don't know if PROC EXPAND can be made to transform Y1 and Y2 into a mean. You could do that in a data step.

data Have;
   input t  y @@;
   cards;
1     54 2  66 3  .
4     87 5  92 6  .
7     .  8  8  9  21
10    53 11 89 12 .
13    .  14 .  15 75
16    80 17 79 18 45
19    94 20 76
;;;;
   run;
 
proc expand data=have out=test;
   id t;
   convert y=y1 / method=step;
   convert y=y2 / tin=(reverse) tout=(reverse) method=step;
   run;
proc print;
   run;

Obs t y1 y2 y

1 1 54 54 54

2 2 66 66 66

3 3 66 87 .

4 4 87 87 87

5 5 92 92 92

6 6 92 8 .

7 7 92 8 .

8 8 8 8 8

9 9 21 21 21

10 10 53 53 53

11 11 89 89 89

12 12 89 75 .

13 13 89 75 .

14 14 89 75 .

15 15 75 75 75

16 16 80 80 80

17 17 79 79 79

18 18 45 45 45

19 19 94 94 94

20 20 76 76 76

bkoksal · Posted 02-08-2012 09:54 AM

Excellent. Many thanks.

filling missing values

Re: filling missing values

Re: filling missing values

Re: filling missing values

Re: filling missing values

t	y
1	54
2	66
3	.
4	87
5	92
6	.
7	.
8	8
9	21
10	53
11	89
12	.
13	.
14	.
15	75
16	80
17	79
18	45
19	94
20	76

t	y
1	54
2	66
3	.
4	87
5	92
6	.
7	.
8	8
9	21
10	53
11	89
12	.
13	.
14	.
15	75
16	80
17	79
18	45
19	94
20	76

filling missing values

Re: filling missing values

Re: filling missing values

Re: filling missing values

Re: filling missing values

Registration is open

SAS Training: Just a Click Away

t	y
1	54
2	66
3	.
4	87
5	92
6	.
7	.
8	8
9	21
10	53
11	89
12	.
13	.
14	.
15	75
16	80
17	79
18	45
19	94
20	76