BookmarkSubscribeRSS Feed
Bulleride
Obsidian | Level 7

How does the below code work?

QUESTION  
The SAS data set BANKS is listed below:
BANKS
name rate
FirstCapital0.0718
DirectBank0.0721
VirtualDirect0.0728


The following SAS program is submitted:


data newbank;
do year = 1 to 3;
set banks;
capital + 5000;
end;
run;
Which one of the following represents how many observations and variables will
exist in the SAS data set NEWBANK?

A. 0 observations and 0 variables
B. 1 observations and 4 variables
C. 3 observations and 3 variables
D. 9 observations and 2 variables

 

Ans –B

14 REPLIES 14
Haikuo
Onyx | Level 15

DOW

Bulleride
Obsidian | Level 7
Sorry! I didn't get you.
Haikuo
Onyx | Level 15

The data step you presented involved a coding structure called 'DOW', which has a rich history and broad implementation , and it is hard to explain it in a short post like this without causing more misunderstanding.You will need to read some paper to really under the in and out of it.  Yes, the correct is B.  In a very short (not much information) version of explanation, the code presented creates two new variables and only output once at the end of the do-loop. Please read the paper to figure out why.

Bulleride
Obsidian | Level 7
Oh! Thanks a lot 🙂
Haikuo
Onyx | Level 15

This little piece of the SAS code, regardless its insignificant size,  touches several most fundamental concepts of data step, and Paul explains much better than me.

FreelanceReinh
Jade | Level 19

Hi @Bulleride,

 

The paper linked by @Haikuo is great, but it's fairly advanced, I think. Here is a more elementary one:

http://www2.sas.com/proceedings/sugi28/099-28.pdf.

Bulleride
Obsidian | Level 7
Thanks @FreelanceReinhard 🙂
Alisha93
Fluorite | Level 6

Hi,

Thanks for a wonderful document (http://www2.sas.com/proceedings/sugi28/099-28.pdf.) first of all. Although, i have not been able to exactly decipher what's happening in the question above.Because in the pdf, the do loop is controlled by "last.pt" so all the data for a particular pt is iterated within the loop and then when the control reaches back to data step we have another patient. In this particular example however,

The SAS data set BANKS is listed below:
BANKS
name rate
FirstCapital0.0718
DirectBank0.0721
VirtualDirect0.0728

 

The following SAS program is submitted:

 

data newbank;
do year = 1 to 3;
set banks;
capital + 5000;
end;
run;


I don't think the same thing applies.It IS a DOW,just of a different sort.Could you please help out in understanding it's iterations.

Thanks in advance!

FreelanceReinh
Jade | Level 19

Hi Alisha93,

 

Reading the old question from Bulleride again (now, after almost 9 months), I would hesitate to call this DO loop a DOW loop, because the typical UNTIL condition is missing here. (Of course, one could add such a condition pro forma: until(year>3) should fit.)

 

Apparently, the purpose of the code shown is just to test the reader's understanding of how the SAS data step works. It is not suitable for demonstrating the benefits of a DOW loop.

 

You can see more easily how the code works if you insert diagnostic PUT statements:

data newbank; 
put 'Before loop: ' _all_;
do year = 1 to 3; 
  put 'In loop, before SET: ' _all_;
  set banks; 
  put 'In loop, after SET: ' _all_;
  capital + 5000; 
end;
put 'After loop: ' _all_;
run;

The above data step will write the following lines to the log:

Before loop: year=. name=  rate=. capital=0 _ERROR_=0 _N_=1
In loop, before SET: year=1 name=  rate=. capital=0 _ERROR_=0 _N_=1
In loop, after SET: year=1 name=FirstCapital rate=0.0718 capital=0 _ERROR_=0 _N_=1
In loop, before SET: year=2 name=FirstCapital rate=0.0718 capital=5000 _ERROR_=0 _N_=1
In loop, after SET: year=2 name=DirectBank rate=0.0721 capital=5000 _ERROR_=0 _N_=1
In loop, before SET: year=3 name=DirectBank rate=0.0721 capital=10000 _ERROR_=0 _N_=1
In loop, after SET: year=3 name=VirtualDirect rate=0.0728 capital=10000 _ERROR_=0 _N_=1
After loop: year=4 name=VirtualDirect rate=0.0728 capital=15000 _ERROR_=0 _N_=1
Before loop: year=. name=VirtualDirect rate=0.0728 capital=15000 _ERROR_=0 _N_=2
In loop, before SET: year=1 name=VirtualDirect rate=0.0728 capital=15000 _ERROR_=0 _N_=2

Within the first iteration of the data step (see the lines with _N_=1 written by the PUT statements) the SET statement is executed three times as part of the body of the DO loop. It writes the first, second and third observation, resp., of dataset BANKS to the program data vector (PDV), each time overwriting the previous contents of the PDV. Thus, after the loop (YEAR=4), NAME and RATE only from the last observation are present in the PDV and when the RUN statement is reached, these values, together with YEAR=4 and CAPITAL=15000, are automatically written to dataset NEWBANK. (Variable CAPITAL had been initialized to 0 and incremented by 5000 in each iteration of the DO loop by means of the Sum statement.)

 

At the beginning of the second iteration of the data step (_N_=2) variable YEAR is reinitialized to missing, whereas the values of NAME, RATE and CAPITAL are retained. (For NAME and RATE this is because of the SET statement, for CAPITAL it's a feature of the Sum statement.)

 

Now the DO loop starts again with year=1 (as is shown by the last line written by the PUT statements). But as soon as the SET statement attempts to read the non-existing fourth observation of dataset BANKS, the data step terminates. So, NEWBANK contains one observation with four variables (year=4 name=VirtualDirect rate=0.0728 capital=15000), resulting from the first iteration of the data step (which in turn included three iterations of the DO loop).

Patrick
Opal | Level 21

A SET statement reads a new row from an input SAS table every single time it's called. You've got set banks; within a do loop which iterates 3 times and though you are reading 3 lines of input rows in total.

 

You don't have an OUTPUT statement within the do loop and with SAS if there is no explicit output statement in the code then SAS executes one at the very end before the RUN. In your case that's outside of the DO LOOP and therefore only one row gets written to the target table NEWBANK.

 

As for the number of variables:

Every single data step has a compilation and an execution phase. In the compilation phase all variables get created in the PDV (=all variables coming from the source table and all variables you create within the data step). This sums up to 4 variable in your case.

 

In the execution phase if your Set statement wouldn't be in a Do Loop then SAS would read a row from the source table, execute all the code in your data step until it reaches the RUN (also executing the implicit OUTPUT so writing a row to the output table for every row coming from the input table). Because your SET statement is within a Do Loop SAS iterates only once through the data step; and that's why you're only getting a single row in the output data set (=row 3 from the input data set plus the modifications).

 

You can find the detailed documentation of how this works here. It's very worthwhile to spend the time and understand this.

http://support.sas.com/documentation/cdl/en/lrcon/68089/HTML/default/viewer.htm#p08a4x7h9mkwqvn16jg3...

 

 

 

 

Alisha93
Fluorite | Level 6
Thanks for an elaborate explaination.It cleared quite a few concepts for me,except in the second iteration where the values of Name and Rate are retained. Aren't all the non-retaining variables set to missing in the PDV ,at the top of DATA step?Could you also tell me what is the exception here.
FreelanceReinh
Jade | Level 19

The reasons why some of the variables are automatically retained here can be found in the first two bullet points of section "Redundancy" in the documentation of the RETAIN statement:

http://support.sas.com/documentation/cdl/en/syntaxidx/68719/HTML/default/index.htm#/documentation/cd...

Alisha93
Fluorite | Level 6

Fair Enough!Thanks a ton 🙂

Patrick
Opal | Level 21

@alisha

In your code sample there is only a single iteration through the data step (because the SET statement is inside the Do Loop).

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2226 views
  • 15 likes
  • 5 in conversation