How does the below code work?
QUESTION
The SAS data set BANKS is listed below:
BANKS
name rate
FirstCapital0.0718
DirectBank0.0721
VirtualDirect0.0728
The following SAS program is submitted:
data newbank;
do year = 1 to 3;
set banks;
capital + 5000;
end;
run;
Which one of the following represents how many observations and variables will
exist in the SAS data set NEWBANK?
A. 0 observations and 0 variables
B. 1 observations and 4 variables
C. 3 observations and 3 variables
D. 9 observations and 2 variables
Ans –B
The data step you presented involved a coding structure called 'DOW', which has a rich history and broad implementation , and it is hard to explain it in a short post like this without causing more misunderstanding.You will need to read some paper to really under the in and out of it. Yes, the correct is B. In a very short (not much information) version of explanation, the code presented creates two new variables and only output once at the end of the do-loop. Please read the paper to figure out why.
This little piece of the SAS code, regardless its insignificant size, touches several most fundamental concepts of data step, and Paul explains much better than me.
Hi @Bulleride,
The paper linked by @Haikuo is great, but it's fairly advanced, I think. Here is a more elementary one:
Hi,
Thanks for a wonderful document (http://www2.sas.com/proceedings/sugi28/099-28.pdf.) first of all. Although, i have not been able to exactly decipher what's happening in the question above.Because in the pdf, the do loop is controlled by "last.pt" so all the data for a particular pt is iterated within the loop and then when the control reaches back to data step we have another patient. In this particular example however,
The SAS data set BANKS is listed below:
BANKS
name rate
FirstCapital0.0718
DirectBank0.0721
VirtualDirect0.0728
The following SAS program is submitted:
data newbank;
do year = 1 to 3;
set banks;
capital + 5000;
end;
run;
I don't think the same thing applies.It IS a DOW,just of a different sort.Could you please help out in understanding it's iterations.
Thanks in advance!
Hi Alisha93,
Reading the old question from Bulleride again (now, after almost 9 months), I would hesitate to call this DO loop a DOW loop, because the typical UNTIL condition is missing here. (Of course, one could add such a condition pro forma: until(year>3) should fit.)
Apparently, the purpose of the code shown is just to test the reader's understanding of how the SAS data step works. It is not suitable for demonstrating the benefits of a DOW loop.
You can see more easily how the code works if you insert diagnostic PUT statements:
data newbank;
put 'Before loop: ' _all_;
do year = 1 to 3;
put 'In loop, before SET: ' _all_;
set banks;
put 'In loop, after SET: ' _all_;
capital + 5000;
end;
put 'After loop: ' _all_;
run;
The above data step will write the following lines to the log:
Before loop: year=. name= rate=. capital=0 _ERROR_=0 _N_=1 In loop, before SET: year=1 name= rate=. capital=0 _ERROR_=0 _N_=1 In loop, after SET: year=1 name=FirstCapital rate=0.0718 capital=0 _ERROR_=0 _N_=1 In loop, before SET: year=2 name=FirstCapital rate=0.0718 capital=5000 _ERROR_=0 _N_=1 In loop, after SET: year=2 name=DirectBank rate=0.0721 capital=5000 _ERROR_=0 _N_=1 In loop, before SET: year=3 name=DirectBank rate=0.0721 capital=10000 _ERROR_=0 _N_=1 In loop, after SET: year=3 name=VirtualDirect rate=0.0728 capital=10000 _ERROR_=0 _N_=1 After loop: year=4 name=VirtualDirect rate=0.0728 capital=15000 _ERROR_=0 _N_=1 Before loop: year=. name=VirtualDirect rate=0.0728 capital=15000 _ERROR_=0 _N_=2 In loop, before SET: year=1 name=VirtualDirect rate=0.0728 capital=15000 _ERROR_=0 _N_=2
Within the first iteration of the data step (see the lines with _N_=1 written by the PUT statements) the SET statement is executed three times as part of the body of the DO loop. It writes the first, second and third observation, resp., of dataset BANKS to the program data vector (PDV), each time overwriting the previous contents of the PDV. Thus, after the loop (YEAR=4), NAME and RATE only from the last observation are present in the PDV and when the RUN statement is reached, these values, together with YEAR=4 and CAPITAL=15000, are automatically written to dataset NEWBANK. (Variable CAPITAL had been initialized to 0 and incremented by 5000 in each iteration of the DO loop by means of the Sum statement.)
At the beginning of the second iteration of the data step (_N_=2) variable YEAR is reinitialized to missing, whereas the values of NAME, RATE and CAPITAL are retained. (For NAME and RATE this is because of the SET statement, for CAPITAL it's a feature of the Sum statement.)
Now the DO loop starts again with year=1 (as is shown by the last line written by the PUT statements). But as soon as the SET statement attempts to read the non-existing fourth observation of dataset BANKS, the data step terminates. So, NEWBANK contains one observation with four variables (year=4 name=VirtualDirect rate=0.0728 capital=15000), resulting from the first iteration of the data step (which in turn included three iterations of the DO loop).
A SET statement reads a new row from an input SAS table every single time it's called. You've got set banks; within a do loop which iterates 3 times and though you are reading 3 lines of input rows in total.
You don't have an OUTPUT statement within the do loop and with SAS if there is no explicit output statement in the code then SAS executes one at the very end before the RUN. In your case that's outside of the DO LOOP and therefore only one row gets written to the target table NEWBANK.
As for the number of variables:
Every single data step has a compilation and an execution phase. In the compilation phase all variables get created in the PDV (=all variables coming from the source table and all variables you create within the data step). This sums up to 4 variable in your case.
In the execution phase if your Set statement wouldn't be in a Do Loop then SAS would read a row from the source table, execute all the code in your data step until it reaches the RUN (also executing the implicit OUTPUT so writing a row to the output table for every row coming from the input table). Because your SET statement is within a Do Loop SAS iterates only once through the data step; and that's why you're only getting a single row in the output data set (=row 3 from the input data set plus the modifications).
You can find the detailed documentation of how this works here. It's very worthwhile to spend the time and understand this.
The reasons why some of the variables are automatically retained here can be found in the first two bullet points of section "Redundancy" in the documentation of the RETAIN statement:
Fair Enough!Thanks a ton 🙂
In your code sample there is only a single iteration through the data step (because the SET statement is inside the Do Loop).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.