missing values using surrounding 5 year period

Reply
Occasional Contributor
Posts: 5

missing values using surrounding 5 year period

I am trying to find a way to write a code to an average  of the surrounding 5 year period for a missing data point.

For example, this is how my data looks like:

Years                76          77             78              79          80             81             82              83            84              85             85

Tennessee Eastern040210515535
Tennessee Middle112308521105
Tennessee Western27357748512287
Tennessee31251081514102184117
Texas Eastern013.319118453
Texas Northern6447551597211
Texas Southern83661011112214
Texas Western420.368112180
Texas1810131312303539441728
Utah0021.405072
Vermont0010.010000
Virginia Eastern441111313.3025
Virginia Western01100503300
Virginia45211181336025
Washington Eastern00000000000
Washington Western01000503300
Washington01000503300
WV Northern01031000221
WV Southern2063.0321267
West_Virginia216610321488
Wisconsin Eastern14201211131071
Wisconsin Western03011005012
Wisconsin17212211181083
Wyoming00000002101

So, in essence, I am trying to find a formula that will calculate the average of the missing data set within a 5 year period. But, the years the data are missing for are different.  Any help is appreciated.

Thanks.

Super Contributor
Posts: 644

Re: missing values using surrounding 5 year period

There is a problem with your data

Alaska has 7 values, including the missing ones

Solving for Alaska, ignoring the extra value, would need to use the remaining 4 non-null values; is that what you want?

I suspect your dataset has more years than shown.  Please update your question.

Richard

Occasional Contributor
Posts: 5

Re: missing values using surrounding 5 year period

Just updated the data set. thanks.

Occasional Contributor
Posts: 5

Re: missing values using surrounding 5 year period

The dataset ranges from 1976-2012. Thanks

Respected Advisor
Posts: 4,651

Re: missing values using surrounding 5 year period

If you have only that small number of years, you can simply do :

proc sql;

create table want as

select

     state,

     coalesce(_90, mean(_91,_92,_93,_94)) as _90,

     coalesce(_91, mean(_90,_92,_93,_94)) as _91,

     coalesce(_92, mean(_90,_91,_93,_94)) as _92,

     coalesce(_93, mean(_91,_92,_94,_95)) as _93,

     coalesce(_94, mean(_92,_93,_95,_96)) as _94,

     coalesce(_95, mean(_92,_93,_94,_96)) as _95,

     coalesce(_96, mean(_92,_93,_94,_95)) as _96

from have;

quit;

PG

PG
Respected Advisor
Posts: 4,651

Re: missing values using surrounding 5 year period

If you have more years, it is more practical to use an array:

data want;

set have;

array y{*} _1976-_2012;

do i = 1 to dim(y);

     if missing(y{i}) then do;

          s = min(max(i-2, 1),dim(y)-4);

          y{i} = mean(y{s},y{s+1},y{s+2},y{s+3},y{s+4});

          end;

     end;

drop i s;

run;

PG

PG
Respected Advisor
Posts: 4,651

Re: missing values using surrounding 5 year period

Or you might feel more comfortable with not reusing the previously calculated values :

data want;

set have;

array y{*} _1976-_2012;

array _y{*} __1976-__2012;

do i = 1 to dim(y);

     if missing(y{i}) then do;

          s = min(max(i-2, 1),dim(y)-4);

          _y{i} = mean(y{s},y{s+1},y{s+2},y{s+3},y{s+4});

          end;

     else _y{i} =y{i};

     end;

do i = 1 to dim(y);

     y{i}=_y{i};

     end;

drop i s __:;

run;

PG

PG
Occasional Contributor
Posts: 5

Re: missing values using surrounding 5 year period

I will have a look. Thanks.

Super Contributor
Posts: 644

Re: missing values using surrounding 5 year period

Valentine

I think you still need to think more carefully about defining your problem.  I think PGStats has interpreted your

"surrounding 5 year period" to mean 2 years before and 2 years after a missing value (4 non missing values), except near the start and end of each series where the closest 5 years (including the missing value) are used.  It's a good solution, but is this what you want?  What do you want to do in the case of 2 missing values adjacent or separated by a non missing value, in which case only 3 years would be used to calculate the mean? 


Richard

Respected Advisor
Posts: 4,651

Re: missing values using surrounding 5 year period

Thank you Richard for describing my assumptions so well. - PG

PG
Occasional Contributor
Posts: 5

Re: missing values using surrounding 5 year period

Thanks for the reply PG and Richard. I think I would like to calculate the mean of 7 year period even when 2 periods are missing.I am still not a 100% sure about this yet. And, SAS is not working at my uni right now, so I can't even run the above written program yet. Having said that, I am really start to just learn SAS, so it will take time for me to try this out.

Maybe it is easier to do it manually in excel. I have 47 data  missing altogether.

Ask a Question
Discussion stats
  • 10 replies
  • 342 views
  • 1 like
  • 3 in conversation