Can anyone tell me what the variable i does in this do loop? Does it count the times it's ran horizontally rather than vertically and line feeds after it gets to i? I have attached the output as well.
/***************************************/
data temp1;
input x y;
*if x=. then x=0;
datalines;
3 .
. 4
. .
2 6
;
run;
/***************************************/
proc print data=temp1;
title 'temp1';
run;
/***************************************/
data temp2;
set temp1;
array z[2] x y;
do i = 1 to 2;
if z[i]=. then z[i]=0;
end;
/***************************************/
proc print data=temp2;
title 'temp2';
run;
/***************************************/
quit;
This is my output:
Obs x y i
1 3 0 3
2 0 4 3
3 0 0 3
4 2 6 3
In other words why does i = 3 for every iteration of the do loop in this array?
A useful exercise to see the the development of i is to put an output statement inside the do loop and force an output statement at the bottom of the data step (faking the implicit output statement) like this
data temp2;
set temp1;
array z[2] x y;
do i = 1 to 2;
if z[i]=. then z[i]=0;
output;
end;
output; /* Implicit output statement */
run;
proc print data=temp2;
title 'temp2';
run;
i is used in your do loop as an iterating variable. You are essentially creating the line
if z[i]=. then z[i]=0;
twice with the values i=1 and i=2. Here z[1] means the first entry of the array z and z[2] is the second entry of the array.
To answer your question: why does i = 3 for every iteration of the do loop in this array?
It does not equal 3 for every iteration of the do loop. When the line above is executed for i=2, the do loop increases i to 3 and checks whether 3 is in the do loop bounds. It is not, so the data step jumps to the line after the do loop. That is why i=3 appears in every outputted record in temp2.
A useful exercise to see the the development of i is to put an output statement inside the do loop and force an output statement at the bottom of the data step (faking the implicit output statement) like this
data temp2;
set temp1;
array z[2] x y;
do i = 1 to 2;
if z[i]=. then z[i]=0;
output;
end;
output; /* Implicit output statement */
run;
proc print data=temp2;
title 'temp2';
run;
You are mixing data step iterations and do loop iterations in your mind.
Do loop iterations are controlled by the corresponding do and end statements, while data step iterations go from the data statement to the next step boundary (usually the run statement) and are controlled by the input into the data step (in your case, the number of observations in the dataset named in the set statement).
The do loop runs its course for every data step iteration, and since it ends when the iteration variable goes past the end value of the do statement, you get the 3 every time the do loop finishes. The implicit output at the end of the data step iteration then writes that value to the new dataset.
I suggest you take the time and work through the documentation for how a data step works, which can be found here.
Hint: a very nice tool to see what's going on is the put statement, which allows you to write values to the log. Putting one inside the do loop and one after will nicely show the development of values.
You will learn the most, the fastest, if you construct your own program to provide some insight. For this case, you could try:
data temp2;
set temp1;
array z{2} x y;
do i = 1 to 2;
if z{i} . then z{i} = 0;
put 'Inside the loop: ' i=;
end;
put 'Outside the loop: ' i=;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.