Re: array subscript out of range

eliz1 · Posted 05-11-2018 01:36 PM

Hi users,

I am using an array to impute the average value of the next two non-missing variables in a 5 column sorted by year dataset.

I am getting the error "ERROR: Array subscript out of range at line 107 column 21" . My code looks like this:

data oud2;
set oud1;

Array oud oud2011-oud2015;
Array imp imp2011-imp2015;

retain oud;
retain imp;

do i=1 to 5;
if 0< oud(i)>5 then oud{i}=.;
if 0< imp(i)>5 then imp(i)=.;

If missing(oud(i))
and not missing (oud(i+1))
and not missing(oud(i+2)) then do;

imp(i) =(oud(i+1)+oud(i+2))/2;
end;
end;

drop i;
run;

I am looking for insight into how to streamlilne this process and set the bounds properly - thank you in advance!

PaigeMiller · Posted 05-11-2018 01:43 PM

and not missing (oud(i+1))

When i is 5, this looks for the sixth element of the array, which doesn't exist. so you can't do this.

--
Paige Miller

eliz1 · Posted 05-11-2018 01:51 PM

Thanks! intended the if then statement to set the value to missing once i exceeded 5.

PaigeMiller · Posted 05-11-2018 01:57 PM

If you want missings, then the loop can only go 3 (not 5) and then you won't get the error, and imp(i) (for i=1,2,3) will be assigned a value. imp(i) for i=4,5 doesn't exist and will be missing.

--
Paige Miller

Kurt_Bremser · Posted 05-11-2018 01:52 PM

The crash actually happens here

and not missing(oud(i+2)) then do;

when i is 4 and you're looking for a non-existent 6th element of your array.

You need to find logic for the end-cases of your algorithm. How should your calculation for imp2014 and imp2015 look like?

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

eliz1 · Posted 05-11-2018 02:09 PM

ok, thanks. Was hoping there was a way to prevent the i from getting to 6 but can set it to 3 and create separate loop.

Appreciate your feedback!

Patrick · Posted 05-11-2018 11:47 PM

@eliz1

We can't know how your logic must look like to do what you're after and you haven't really told us. Based on the code posted and if it's simply about avoiding the out of range condition, adding a simple check could do. Something like: If i+2<=dim(oud)

data oud2;
  set oud1;
  Array oud oud2011-oud2015;
  Array imp imp2011-imp2015;
  retain oud;
  retain imp;

  do i=1 to 5;
    if 0< oud(i)>5 then
      oud{i}=.;

    if 0< imp(i)>5 then
      imp(i)=.;

    If i+2<=dim(oud) 
      and missing(oud(i))
      and not missing (oud(i+1))
      and not missing(oud(i+2)) then
      do;
        imp(i) =(oud(i+1)+oud(i+2))/2;
      end;
  end;

  drop i;
run;

Oh, and this condition in your code looks not right.

if 0< oud(i) >5 the oud{i}=.;

Can you describe in words when oud(I) should get set to missing? Your current logic is the same like: if oud(I)>5;

eliz1 · Posted 05-12-2018 12:06 AM

Thanks Patrick,

MY goal is to impute missing values in a table with five columns each
representing a year - the intent is to substitute either the average of the
previous and last, or of the next two years. in the case of the first and
last year the criteria changes to moving only forward or backward.

I made it work using do loops with separate code for the first, middle and
last years - your stop idea may be the way to avoid this, I'll give it a
try, thank you!

Patrick · Posted 05-12-2018 01:23 AM

@eliz1

Just as an idea: If you always select two array elements for your calculation and it's only about selecting the right ones based on the value of I then using an informat could make things quite simple.

data have(keep=invar_:);
  array invar_ {5} 8.;
  do obs=1 to 3;
    do val=1 to 5;
      invar_[val]=obs*val;
      if ceil(ranuni(1)*4)=1 then call missing(invar_[val]);
    end;
    output;
  end;
run;

proc format;
  invalue cycle
    0=3
    6=3
  ;
run;
data want;
  set have;
  array invars  {*} invar_:;
  array outvars {*} 8. outvar_1 - outvar_5;
  do i=1 to dim(invars);
    outvars[i]=mean(invars[input(i-1,cycle.)],invars[input(i+1,cycle.)]);
  end;
run;

Patrick · Posted 05-12-2018 01:34 AM

@eliz1

Or actually: You can use the same variable multiple times in an array definition and though you could just add the variables you need for the first and the last iteration of the do loop to the array. Something like below:

data have(keep=invar_:);
  array invar_ {5} 8.;
  do obs=1 to 3;
    do val=1 to 5;
      invar_[val]=obs*val;
      if ceil(ranuni(1)*4)=1 then call missing(invar_[val]);
    end;
    output;
  end;
run;


data want;
  set have;
  array invars  {0:6} invar_3 invar_: invar_3;
  array outvars {*} 8. outvar_1 - outvar_5;
  do i=1 to dim(invars)-2;
    outvars[i]=mean(invars[i-1],invars[i+1]);
  end;
run;

Registration is open

SAS Training: Just a Click Away