BookmarkSubscribeRSS Feed
eliz1
Calcite | Level 5

Hi users,

 

I am using an array to impute the average  value of the next two non-missing variables in a 5 column sorted by year dataset.

 

I am getting the error "ERROR: Array subscript out of range at line 107 column 21" .  My code looks like this:

 

data oud2;
set oud1;

Array oud oud2011-oud2015;
Array imp imp2011-imp2015;

retain oud;
retain imp;

do i=1 to 5;
if 0< oud(i)>5 then oud{i}=.;
if 0< imp(i)>5 then imp(i)=.;

If missing(oud(i))
and not missing (oud(i+1))
and not missing(oud(i+2)) then do;

imp(i) =(oud(i+1)+oud(i+2))/2;
end;
end;

drop i;
run;

 

I am looking for insight into how to streamlilne this process and set the bounds properly - thank you in advance! 

9 REPLIES 9
PaigeMiller
Diamond | Level 26
and not missing (oud(i+1))

When i is 5, this looks for the sixth element of the array, which doesn't exist. so you can't do this.

--
Paige Miller
eliz1
Calcite | Level 5

Thanks! intended the if then statement to set the value to missing once i exceeded 5. 

PaigeMiller
Diamond | Level 26

If you want missings, then the loop can only go 3 (not 5) and then you won't get the error, and imp(i) (for i=1,2,3) will be assigned a value. imp(i) for i=4,5 doesn't exist and will be missing.

--
Paige Miller
Kurt_Bremser
Super User

The crash actually happens here

and not missing(oud(i+2)) then do;

when i is 4 and you're looking for a non-existent 6th element of your array.

You need to find logic for the end-cases of your algorithm. How should your calculation for imp2014 and imp2015 look like?

eliz1
Calcite | Level 5

ok, thanks. Was hoping there was a way to prevent the i from getting to 6 but can set it to 3 and create separate loop.

 

Appreciate your feedback!

Patrick
Opal | Level 21

@eliz1

We can't know how your logic must look like to do what you're after and you haven't really told us. Based on the code posted and if it's simply about avoiding the out of range condition, adding a simple check could do. Something like: If i+2<=dim(oud)

 

data oud2;
  set oud1;
  Array oud oud2011-oud2015;
  Array imp imp2011-imp2015;
  retain oud;
  retain imp;

  do i=1 to 5;
    if 0< oud(i)>5 then
      oud{i}=.;

    if 0< imp(i)>5 then
      imp(i)=.;

    If i+2<=dim(oud) 
      and missing(oud(i))
      and not missing (oud(i+1))
      and not missing(oud(i+2)) then
      do;
        imp(i) =(oud(i+1)+oud(i+2))/2;
      end;
  end;

  drop i;
run;

 

Oh, and this condition in your code looks not right.

if 0< oud(i) >5 the oud{i}=.;

Can you describe in words when oud(I) should get set to missing? Your current logic is the same like: if oud(I)>5;

eliz1
Calcite | Level 5
Thanks Patrick,

MY goal is to impute missing values in a table with five columns each
representing a year - the intent is to substitute either the average of the
previous and last, or of the next two years. in the case of the first and
last year the criteria changes to moving only forward or backward.

I made it work using do loops with separate code for the first, middle and
last years - your stop idea may be the way to avoid this, I'll give it a
try, thank you!
Patrick
Opal | Level 21

@eliz1

Just as an idea: If you always select two array elements for your calculation and it's only about selecting the right ones based on the value of I then using an informat could make things quite simple.

data have(keep=invar_:);
  array invar_ {5} 8.;
  do obs=1 to 3;
    do val=1 to 5;
      invar_[val]=obs*val;
      if ceil(ranuni(1)*4)=1 then call missing(invar_[val]);
    end;
    output;
  end;
run;

proc format;
  invalue cycle
    0=3
    6=3
  ;
run;
data want;
  set have;
  array invars  {*} invar_:;
  array outvars {*} 8. outvar_1 - outvar_5;
  do i=1 to dim(invars);
    outvars[i]=mean(invars[input(i-1,cycle.)],invars[input(i+1,cycle.)]);
  end;
run;
Patrick
Opal | Level 21

@eliz1

Or actually: You can use the same variable multiple times in an array definition and though you could just add the variables you need for the first and the last iteration of the do loop to the array. Something like below:

data have(keep=invar_:);
  array invar_ {5} 8.;
  do obs=1 to 3;
    do val=1 to 5;
      invar_[val]=obs*val;
      if ceil(ranuni(1)*4)=1 then call missing(invar_[val]);
    end;
    output;
  end;
run;


data want;
  set have;
  array invars  {0:6} invar_3 invar_: invar_3;
  array outvars {*} 8. outvar_1 - outvar_5;
  do i=1 to dim(invars)-2;
    outvars[i]=mean(invars[i-1],invars[i+1]);
  end;
run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2011 views
  • 0 likes
  • 4 in conversation