Hi users,
I am using an array to impute the average value of the next two non-missing variables in a 5 column sorted by year dataset.
I am getting the error "ERROR: Array subscript out of range at line 107 column 21" . My code looks like this:
data oud2;
set oud1;
Array oud oud2011-oud2015;
Array imp imp2011-imp2015;
retain oud;
retain imp;
do i=1 to 5;
if 0< oud(i)>5 then oud{i}=.;
if 0< imp(i)>5 then imp(i)=.;
If missing(oud(i))
and not missing (oud(i+1))
and not missing(oud(i+2)) then do;
imp(i) =(oud(i+1)+oud(i+2))/2;
end;
end;
drop i;
run;
I am looking for insight into how to streamlilne this process and set the bounds properly - thank you in advance!
and not missing (oud(i+1))
When i is 5, this looks for the sixth element of the array, which doesn't exist. so you can't do this.
Thanks! intended the if then statement to set the value to missing once i exceeded 5.
If you want missings, then the loop can only go 3 (not 5) and then you won't get the error, and imp(i) (for i=1,2,3) will be assigned a value. imp(i) for i=4,5 doesn't exist and will be missing.
The crash actually happens here
and not missing(oud(i+2)) then do;
when i is 4 and you're looking for a non-existent 6th element of your array.
You need to find logic for the end-cases of your algorithm. How should your calculation for imp2014 and imp2015 look like?
ok, thanks. Was hoping there was a way to prevent the i from getting to 6 but can set it to 3 and create separate loop.
Appreciate your feedback!
We can't know how your logic must look like to do what you're after and you haven't really told us. Based on the code posted and if it's simply about avoiding the out of range condition, adding a simple check could do. Something like: If i+2<=dim(oud)
data oud2;
set oud1;
Array oud oud2011-oud2015;
Array imp imp2011-imp2015;
retain oud;
retain imp;
do i=1 to 5;
if 0< oud(i)>5 then
oud{i}=.;
if 0< imp(i)>5 then
imp(i)=.;
If i+2<=dim(oud)
and missing(oud(i))
and not missing (oud(i+1))
and not missing(oud(i+2)) then
do;
imp(i) =(oud(i+1)+oud(i+2))/2;
end;
end;
drop i;
run;
Oh, and this condition in your code looks not right.
if 0< oud(i) >5 the oud{i}=.;
Can you describe in words when oud(I) should get set to missing? Your current logic is the same like: if oud(I)>5;
Just as an idea: If you always select two array elements for your calculation and it's only about selecting the right ones based on the value of I then using an informat could make things quite simple.
data have(keep=invar_:);
array invar_ {5} 8.;
do obs=1 to 3;
do val=1 to 5;
invar_[val]=obs*val;
if ceil(ranuni(1)*4)=1 then call missing(invar_[val]);
end;
output;
end;
run;
proc format;
invalue cycle
0=3
6=3
;
run;
data want;
set have;
array invars {*} invar_:;
array outvars {*} 8. outvar_1 - outvar_5;
do i=1 to dim(invars);
outvars[i]=mean(invars[input(i-1,cycle.)],invars[input(i+1,cycle.)]);
end;
run;
Or actually: You can use the same variable multiple times in an array definition and though you could just add the variables you need for the first and the last iteration of the do loop to the array. Something like below:
data have(keep=invar_:);
array invar_ {5} 8.;
do obs=1 to 3;
do val=1 to 5;
invar_[val]=obs*val;
if ceil(ranuni(1)*4)=1 then call missing(invar_[val]);
end;
output;
end;
run;
data want;
set have;
array invars {0:6} invar_3 invar_: invar_3;
array outvars {*} 8. outvar_1 - outvar_5;
do i=1 to dim(invars)-2;
outvars[i]=mean(invars[i-1],invars[i+1]);
end;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.