BookmarkSubscribeRSS Feed
EpiMoby
Calcite | Level 5

I have a long data set with 6 years of data for every state. There are 7 variables that have missing data. I want to replace data by state and year where if a variable is missing for a year, replace it with the average of the data value for the year directly above and below, else replace it with data from the nearest year. I have been working on this array for a long time and cannot make it work - with this code I am receiving the following error: ERROR: Array subscript out of range at line 2276 column 49.

 

data new;
set old;

 

/* Set missing values to the average of the previous and next years */
array indicators[7] indicator1 indicator2 indicator3 indicator4 indicator5 indicator6 indicator7;

 

/* Handle missing values for the first year separately */
by state year;

 

if first.state then do;
do i = 1 to dim(indicators);
if missing(indicators[i]) and not missing(indicators[i+1]) then
indicators[i] = indicators[i+1];
end;
end;

/* Handle missing values for the last year separately */
if last.state then do;
do i = 1 to dim(indicators);
if missing(indicators[i]) and not missing(indicators[i-1]) then
indicators[i] = indicators[i-1];
end;
end;
/*
/* Handle missing values for the intermediate years */
if not first.state and not last.state then do;
do i = 1 to dim(indicators);
if missing(indicators[i]) then do;
/* Calculate the average of the previous and next years */
if not missing(indicators[i-1]) and not missing(indicators[i+1]) then
indicators[i] = (indicators[i-1] + indicators[i+1]) / 2;
/* Look for the closest non-missing year */
else if not missing(indicators[i-1]) then
indicators[i] = indicators[i-1];
else if not missing(indicators[i+1]) then
indicators[i] = indicators[i+1];
end;
end;
end;

drop i;
run;

 

 

Any help would be GREATLY appreciated!

14 REPLIES 14
PaigeMiller
Diamond | Level 26

Whenever there are errors in the log, please show us the ENTIRE log for the data step (or PROC) that has the error. We don't know what line 2276 is without the log. Please copy the log as text, click on the </> icon and paste it into the window that appears. 

PaigeMiller_0-1663012019648.png

 

It would also help if you show us a portion of the actual data, following these examples and instructions and not via any other method.

--
Paige Miller
ballardw
Super User

Please share which bits of your code you believe are getting the value from the previous year's data? And the following?

 

I am asking because I see nothing in the posted code looking "above" or "below".

 

If you want anything resembling a working solution you likely will need to provide some example data in the form of data step code so that we can actually see what you are working with. Instructions here: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... will show how to turn an existing SAS data set into data step code that can be pasted into a forum code box using the </> icon or attached as text to show exactly what you have and that we can test code against.

 


@EpiMoby wrote:

I have a long data set with 6 years of data for every state. There are 7 variables that have missing data. I want to replace data by state and year where if a variable is missing for a year, replace it with the average of the data value for the year directly above and below, else replace it with data from the nearest year. I have been working on this array for a long time and cannot make it work - with this code I am receiving the following error: ERROR: Array subscript out of range at line 2276 column 49.

 

data new;
set old;

 

/* Set missing values to the average of the previous and next years */
array indicators[7] indicator1 indicator2 indicator3 indicator4 indicator5 indicator6 indicator7;

 

/* Handle missing values for the first year separately */
by state year;

 

if first.state then do;
do i = 1 to dim(indicators);
if missing(indicators[i]) and not missing(indicators[i+1]) then
indicators[i] = indicators[i+1];
end;
end;

/* Handle missing values for the last year separately */
if last.state then do;
do i = 1 to dim(indicators);
if missing(indicators[i]) and not missing(indicators[i-1]) then
indicators[i] = indicators[i-1];
end;
end;
/*
/* Handle missing values for the intermediate years */
if not first.state and not last.state then do;
do i = 1 to dim(indicators);
if missing(indicators[i]) then do;
/* Calculate the average of the previous and next years */
if not missing(indicators[i-1]) and not missing(indicators[i+1]) then
indicators[i] = (indicators[i-1] + indicators[i+1]) / 2;
/* Look for the closest non-missing year */
else if not missing(indicators[i-1]) then
indicators[i] = indicators[i-1];
else if not missing(indicators[i+1]) then
indicators[i] = indicators[i+1];
end;
end;
end;

drop i;
run;

 

 

Any help would be GREATLY appreciated!


 

Reeza
Super User
I don't think arrays are the correct approach here. Arrays are used for working along a single row technically or you can cross rows, but looking forward is difficult.

Assuming you have SAS ETS, using PROC EXPAND with interpolate is the best option. You say the 'nearest' time point, does it matter if it's before or after? What if there's a tie for before/after?

https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/etsug/etsug_expand_gettingstarted04.htm#:~:te....

EpiMoby
Calcite | Level 5
Year State Variable 1 Variable 2Variable 3
201410.2.0.6
20151...
201610.50.6.
201710.50.60.3
201810.6..
201910.3.0.7
20142  2
20152   
20162   
20172   
20182   
20192   
20143   
20153   
20163   
20173   
20183   
20193   

This is essentially the format of my dataset. 

For variable 1, I want the 2015 value replaced with an average of the 2014 and 2016 values. 

For variable 2, I want 2014 and 2015 values replaced with 2016 value and 2018 and 2019 values replaced with 2017 value 

For variable 3, I want 2015 replaced with 2014, 2016 replaced with 2017, and 2018 replaced with an average of 2017 and 2019

Reeza
Super User
Do you have SAS/ETS?
EpiMoby
Calcite | Level 5

No only SAS 9.4

Reeza
Super User
You can check with proc product_status;run;


EpiMoby
Calcite | Level 5

Here was the ouput:

 

3403 proc product_status;run;

For Base SAS Software ...
Custom version information: 9.4_M3
Image version information: 9.04.01M3P062415
For SAS/STAT ...
Custom version information: 14.1
For SAS/GRAPH ...
Custom version information: 9.4_M3
For SAS/ETS ...
Custom version information: 14.1
For SAS/FSP ...
Custom version information: 9.4_M3
For SAS/OR ...
Custom version information: 14.1
For SAS/AF ...
Custom version information: 9.4_M3
For SAS/IML ...
Custom version information: 14.1
For SAS/QC ...
Custom version information: 14.1
For SAS/ASSIST ...
Custom version information: 9.4
Image version information: 9.04.01M0P061913
For SAS/CONNECT ...
Custom version information: 9.4_M3
For SAS/TOOLKIT ...
Custom version information: 9.4
For SAS/GIS ...
Custom version information: 9.4_M3
For SAS/ACCESS to Hadoop ...
Custom version information: 9.43
For SAS/ACCESS to Postgres ...
Custom version information: 9.4_M3
For SAS Integration Technologies ...
Custom version information: 9.4_M3
For High Performance Suite ...
Custom version information: 2.2_M4
For SAS/ACCESS Interface to Oracle ...
Custom version information: 9.4_M3
For SAS/ACCESS Interface to PC Files ...
Custom version information: 9.4_M3
For SAS/ACCESS Interface to ODBC ...
Custom version information: 9.4_M3
For SAS/ACCESS Interface to OLE DB ...
Custom version information: 9.4_M3
For SAS/ACCESS Interface to MySQL ...
Custom version information: 9.4_M3
NOTE: PROCEDURE PRODUCT_STATUS used (Total process time):
real time 0.11 seconds
cpu time 0.07 seconds

 

Reeza
Super User

@EpiMoby wrote:

Here was the ouput:

 

3403 proc product_status;run;

For Base SAS Software ...
Custom version information: 9.4_M3
Image version information: 9.04.01M3P062415
For SAS/STAT ...
Custom version information: 14.1
For SAS/GRAPH ...
Custom version information: 9.4_M3
For SAS/ETS ...
Custom version information: 14.1
For SAS/FSP ...
Custom version information: 9.4_M3
For SAS/OR ...
Custom version information: 14.1
For SAS/AF ...
Custom version information: 9.4_M3
For SAS/IML ...
Custom version information: 14.1
For SAS/QC ...
Custom version information: 14.1
For SAS/ASSIST ...
Custom version information: 9.4
Image version information: 9.04.01M0P061913
For SAS/CONNECT ...
Custom version information: 9.4_M3
For SAS/TOOLKIT ...
Custom version information: 9.4
For SAS/GIS ...
Custom version information: 9.4_M3
For SAS/ACCESS to Hadoop ...
Custom version information: 9.43
For SAS/ACCESS to Postgres ...
Custom version information: 9.4_M3
For SAS Integration Technologies ...
Custom version information: 9.4_M3
For High Performance Suite ...
Custom version information: 2.2_M4
For SAS/ACCESS Interface to Oracle ...
Custom version information: 9.4_M3
For SAS/ACCESS Interface to PC Files ...
Custom version information: 9.4_M3
For SAS/ACCESS Interface to ODBC ...
Custom version information: 9.4_M3
For SAS/ACCESS Interface to OLE DB ...
Custom version information: 9.4_M3
For SAS/ACCESS Interface to MySQL ...
Custom version information: 9.4_M3
NOTE: PROCEDURE PRODUCT_STATUS used (Total process time):
real time 0.11 seconds
cpu time 0.07 seconds

 


You do have SAS ETS licensed. If you have flex in your methodology I'd suggest trying the methods in my initial response. If you're 100% set on methodology, you'll likely need a data step.

EpiMoby
Calcite | Level 5

Okay - I am very open to this and would like to learn.

 

Would you be able to suggest draft code for this?

Reeza
Super User
There is example code in the link, it really is that simple with ETS.

Slightly different example.
https://gist.github.com/statgeek/07a3708dee1225ceb9d4aa75daab2c52
EpiMoby
Calcite | Level 5

Thank you - unfortunately the code isn't that simple to me. Ideally I would like to impute the data based on the following rules:

 

1) if the variable is missing data for a year, such as 2015, then use the average of 2015 and 2016 for the imputed data 

2) if the variable is missing data for a year without data available for the year prior and after, then use the closest year available regardless of whether that is above or below the missing year 

 

Do you have any experience writing something like this for proc expand?

Reeza
Super User

As I mentioned above, if you're fixed on the methodology, then I don't believe ETS will work for you, you will need to write your own rules in a data step. 

 

If your methodology is flexible then ETS is an option. 

 

data have;
infile cards truncover;
input Year   State   Variable1  Variable2  Variable3;
date=mdy(1, 1, year);
format date year4.;
cards;
2014   1   0.2 .   0.6
2015   1   .   .   .
2016   1   0.5 0.6 .
2017   1   0.5 0.6 0.3
2018   1   0.6 .   .
2019   1   0.3 .   0.7
2014   2    .  .    2
2015   2            
2016   2            
2017   2            
2018   2            
2019   2            
2014   3            
2015   3            
2016   3            
2017   3            
2018   3            
2019   3 
;
run;

proc expand data=have out=want from=year extrapolate;
by state;
id date;
convert variable1=var1 variable2=var2 variable3=var3/ method = join ;
run;


title 'Interpolated';
proc print data=want;
run;
           
mkeintz
PROC Star

 

If a var is missing for a given year (say var{2016} is missing), then the mean(var{2015},var{2017}) would generate either

  1. The actual mean if both var{2015} and var{2017} are valid values
    or
  2. The closest value if only one of var{2015} or var{2017} is a valid value.
    or
  3. Otherwise miss.

If you have case 3 (otherwise missing), then take the mean of var{2014} and var{2018}, i.e. expand the range by one year in each direction.

 

Using that principle, then this code could work:

 

%let min_year=2014;
%let max_year=2019;
%let span=%eval(&max_year-&min_year);
%let array_lower_bound=%eval(&min_year-&span);
%let array_upper_bound=%eval(&max_year+&span);
%put _user_;

data have;
infile cards truncover;
input Year   State   a b c;
date=mdy(1, 1, year);
format date year4.;
cards;
2014   1   0.2 .   0.6
2015   1   .   .   .
2016   1   0.5 0.6 .
2017   1   0.5 0.6 0.3
2018   1   0.6 .   .
2019   1   0.3 .   0.7
2014   2    .  .    2
2015   2            
2016   2            
2017   2   . 0.4 .
2018   2   . 0.5 .  
2019   2            
2014   3            
2015   3            
2016   3            
2017   3            
2018   3            
2019   3  0.9 0.8 0.7
run;

data want (drop=v span);
  set have (in=firstpass)   have (in=secondpass) ;
  by state;

  array suspect_var {*}  a b c ;
  array ah{&array_lower_bound:&array_upper_bound,3} _temporary_;  /* Actual history */

  if first.state then call missing(of ah{*});

  if firstpass then do v=1 to dim(suspect_var);  /* Build the actual history array*/
    ah{year,v}=suspect_var{v};
  end;

  if secondpass;

  do v=1 to dim(suspect_var);
    do span=0 to &span while (suspect_var{v}=.);
      if n(ah{year-span,v},ah{year+span,v})>0 then suspect_var{v}=mean(ah{year-span,v},ah{year+span,v});
    end;
  end;
run;

Notice the array lower and upper bounds go beyond the actual earliest and latest years.  All those pre-study and post-study years will simply have missing values for each variable.  This allows a simplistic expansion of the span used to generate means without looking beyond the array bounds.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 14 replies
  • 3204 views
  • 0 likes
  • 5 in conversation