## missing values using surrounding 5 year period

Occasional Contributor
Posts: 5

# missing values using surrounding 5 year period

I am trying to find a way to write a code to an average  of the surrounding 5 year period for a missing data point.

For example, this is how my data looks like:

Years                76          77             78              79          80             81             82              83            84              85             85

 Tennessee Eastern 0 4 0 2 1 0 5 15 5 3 5 Tennessee Middle 1 1 2 3 0 8 5 2 1 10 5 Tennessee Western 2 7 3 5 7 7 4 85 12 28 7 Tennessee 3 12 5 10 8 15 14 102 18 41 17 Texas Eastern 0 1 3 . 3 19 11 8 4 5 3 Texas Northern 6 4 4 7 5 5 15 9 7 2 11 Texas Southern 8 3 6 6 1 0 1 11 12 2 14 Texas Western 4 2 0 . 3 6 8 11 21 8 0 Texas 18 10 13 13 12 30 35 39 44 17 28 Utah 0 0 2 1 . 4 0 5 0 7 2 Vermont 0 0 1 0 . 0 1 0 0 0 0 Virginia Eastern 4 4 1 1 1 13 13 . 3 0 25 Virginia Western 0 1 1 0 0 5 0 3 3 0 0 Virginia 4 5 2 1 1 18 13 3 6 0 25 Washington Eastern 0 0 0 0 0 0 0 0 0 0 0 Washington Western 0 1 0 0 0 5 0 3 3 0 0 Washington 0 1 0 0 0 5 0 3 3 0 0 WV Northern 0 1 0 3 1 0 0 0 2 2 1 WV Southern 2 0 6 3 . 0 3 2 12 6 7 West_Virginia 2 1 6 6 1 0 3 2 14 8 8 Wisconsin Eastern 1 4 2 0 1 2 11 13 10 7 1 Wisconsin Western 0 3 0 1 1 0 0 5 0 1 2 Wisconsin 1 7 2 1 2 2 11 18 10 8 3 Wyoming 0 0 0 0 0 0 0 2 1 0 1

So, in essence, I am trying to find a formula that will calculate the average of the missing data set within a 5 year period. But, the years the data are missing for are different.  Any help is appreciated.

Thanks.

Super Contributor
Posts: 644

## Re: missing values using surrounding 5 year period

Posted in reply to Valentine

There is a problem with your data

Alaska has 7 values, including the missing ones

Solving for Alaska, ignoring the extra value, would need to use the remaining 4 non-null values; is that what you want?

I suspect your dataset has more years than shown.  Please update your question.

Richard

Occasional Contributor
Posts: 5

## Re: missing values using surrounding 5 year period

Posted in reply to RichardinOz

Just updated the data set. thanks.

Occasional Contributor
Posts: 5

## Re: missing values using surrounding 5 year period

Posted in reply to RichardinOz

The dataset ranges from 1976-2012. Thanks

Posts: 5,535

## Re: missing values using surrounding 5 year period

Posted in reply to Valentine

If you have only that small number of years, you can simply do :

proc sql;

create table want as

select

state,

coalesce(_90, mean(_91,_92,_93,_94)) as _90,

coalesce(_91, mean(_90,_92,_93,_94)) as _91,

coalesce(_92, mean(_90,_91,_93,_94)) as _92,

coalesce(_93, mean(_91,_92,_94,_95)) as _93,

coalesce(_94, mean(_92,_93,_95,_96)) as _94,

coalesce(_95, mean(_92,_93,_94,_96)) as _95,

coalesce(_96, mean(_92,_93,_94,_95)) as _96

from have;

quit;

PG

PG
Posts: 5,535

## Re: missing values using surrounding 5 year period

Posted in reply to Valentine

If you have more years, it is more practical to use an array:

data want;

set have;

array y{*} _1976-_2012;

do i = 1 to dim(y);

if missing(y{i}) then do;

s = min(max(i-2, 1),dim(y)-4);

y{i} = mean(y{s},y{s+1},y{s+2},y{s+3},y{s+4});

end;

end;

drop i s;

run;

PG

PG
Posts: 5,535

## Re: missing values using surrounding 5 year period

Or you might feel more comfortable with not reusing the previously calculated values :

data want;

set have;

array y{*} _1976-_2012;

array _y{*} __1976-__2012;

do i = 1 to dim(y);

if missing(y{i}) then do;

s = min(max(i-2, 1),dim(y)-4);

_y{i} = mean(y{s},y{s+1},y{s+2},y{s+3},y{s+4});

end;

else _y{i} =y{i};

end;

do i = 1 to dim(y);

y{i}=_y{i};

end;

drop i s __:;

run;

PG

PG
Occasional Contributor
Posts: 5

## Re: missing values using surrounding 5 year period

I will have a look. Thanks.

Super Contributor
Posts: 644

## Re: missing values using surrounding 5 year period

Posted in reply to Valentine

Valentine

I think you still need to think more carefully about defining your problem.  I think PGStats has interpreted your

"surrounding 5 year period" to mean 2 years before and 2 years after a missing value (4 non missing values), except near the start and end of each series where the closest 5 years (including the missing value) are used.  It's a good solution, but is this what you want?  What do you want to do in the case of 2 missing values adjacent or separated by a non missing value, in which case only 3 years would be used to calculate the mean?

Richard

Posts: 5,535

## Re: missing values using surrounding 5 year period

Posted in reply to RichardinOz

Thank you Richard for describing my assumptions so well. - PG

PG
Occasional Contributor
Posts: 5

## Re: missing values using surrounding 5 year period

Posted in reply to RichardinOz

Thanks for the reply PG and Richard. I think I would like to calculate the mean of 7 year period even when 2 periods are missing.I am still not a 100% sure about this yet. And, SAS is not working at my uni right now, so I can't even run the above written program yet. Having said that, I am really start to just learn SAS, so it will take time for me to try this out.

Maybe it is easier to do it manually in excel. I have 47 data  missing altogether.

Discussion stats
• 10 replies
• 364 views
• 1 like
• 3 in conversation