data d;
do until(age in(16,15,11));
set sashelp.class;
end;
put _ALL_;
run;
Is it correct method to retrieve the age observations?
i am practicing do until ,do while loops is above method also correct?
I would say yes, it's one possible method.
If I saw this code in production, I would be confused, because the simple approach of using subsetting IF or WHERE is easier and clearer.
That said, since you said you are practicing, code like this is a GREAT way to practice and really learn SAS. Playing with code like this allows you to develop a deep understanding of how the DATA step works, how the Program Data Vector works, etc. For example, consider how you would write the same step with DO WHILE, and whether DO WHILE is riskier than DO UNTIL in this setting.
Also, the code structure of placing a SET statement inside a DO UNTIL() loop is so useful that it has been given a colloquial name ("DoW-loop") and there are several papers written about why it is a useful coding pattern. A good starting point would be Paul Dorfman's paper: https://support.sas.com/resources/papers/proceedings12/156-2012.pdf .
If you have come up with this idea independent of seeing someone else use this approach, then I would congratulate you. For me, learning approaches like this (mostly from SAS-L), was what helped to move me from simply 'using' the DATA step to programming in the DATA step.
Maxim 4. Run it and look at the results and the log.
Hint: a SET statement in the DATA step means that the data step will run until either the SET tries to read past the last observation, or a STOP statement is executed.
And the pointer of the SET will not be reset when a new data step iteration begins (and the next DO loop starts).
@Kurt_Bremser wrote (in part):
Hint: a SET statement in the DATA step means that the data step will run until either the SET tries to read past the last observation, or a STOP statement is executed.
Or until the data step iterates and does not execute the SET statement, which is what makes the difference in DO WHILE vs DO UNTIL interesting for this example.
As a quiz question, if you gave a SAS programmer a printout of sashelp.class:
Name Age Alfred 14 Alice 13 Barbara 13 Carol 14 Henry 14 James 12 Jane 12 Janet 15 Jeffrey 13 John 12 Joyce 11 Judy 14 Louise 12 Mary 15 Philip 16 Robert 12 Ronald 15 Thomas 11 William 15
And asked them which records are output by the DO UNTIL step vs DO WHILE:
data want1;
do until(age in(16,15,11));
set sashelp.class;
end;
run;
data want2;
do while(age NOT in(16,15,11));
set sashelp.class;
end;
run;
I suspect the success rate would be low. It could be an interesting interview question for an intermediate/advanced SAS programmer. Even if a candidate couldn't answer correctly, it would be insightful to see how they worked through the problem.
Definitely for _advanced_ SAS programmer. 😉
Bart
Yeah, I would probably get this question wrong. : ) But even for the intermediate candidate, the discussion is useful. I had a great boss once, and we talked about how to interview SAS folks. And he said, when he interviewed beginner or even intermediate programmers, he didn't care so much if they got questions right. He wanted to see if they got interested / excited when he introduced them to new approaches, and what sort of questions they asked.
When he interviewed me he asked if I preferred to use positional parameters or keyword parameters in my macros, and why. My answer was acceptable enough to show that I knew what a macro parameter was, but it wasn't great. When he explained his preferences and reasoning, I was interested / excited enough to demonstrate that I wanted learn.
Great approach! Working with such manage had to be a pleasure.
Bart
P.S. Side note. I can see with eyes of my imagination, how this thread would bloom in SAS-L discussion 🙂
( @DonH @hashman @data_null__ @mkeintz @RichardDeVen @rogerjdeangelis )
If "correct" means: "gives the same result as other (more popular) approaches" - then I would say - yes it is correct method. But I dare to say quite "nonstandard".
I bet $5 that in 90% cases it would be rather IF-subseiting or WHERE statement:
data d2;
set sashelp.class;
where age in (16,15,11);
put _ALL_;
run;
data d3;
set sashelp.class;
if age in (16,15,11);
put _ALL_;
run;
Result of your code is "somewhere between" those two classic approaches, because in the log you will see "19 observations were read" but also you will see maximum value of _N_ equal to 7:
1 data d;
2 do until(age in (16,15,11));
3 set sashelp.class;
4 end;
5 put _ALL_;
6 run;
age=15 Name=Janet Sex=F Height=62.5 Weight=112.5 _ERROR_=0 _N_=1
age=11 Name=Joyce Sex=F Height=51.3 Weight=50.5 _ERROR_=0 _N_=2
age=15 Name=Mary Sex=F Height=66.5 Weight=112 _ERROR_=0 _N_=3
age=16 Name=Philip Sex=M Height=72 Weight=150 _ERROR_=0 _N_=4
age=15 Name=Ronald Sex=M Height=67 Weight=133 _ERROR_=0 _N_=5
age=11 Name=Thomas Sex=M Height=57.5 Weight=85 _ERROR_=0 _N_=6
age=15 Name=William Sex=M Height=66.5 Weight=112 _ERROR_=0 _N_=7
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.D has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
7
8 data d2;
9 set sashelp.class;
10 where age in (16,15,11);
11 put _ALL_;
12 run;
Name=Janet Sex=F Age=15 Height=62.5 Weight=112.5 _ERROR_=0 _N_=1
Name=Joyce Sex=F Age=11 Height=51.3 Weight=50.5 _ERROR_=0 _N_=2
Name=Mary Sex=F Age=15 Height=66.5 Weight=112 _ERROR_=0 _N_=3
Name=Philip Sex=M Age=16 Height=72 Weight=150 _ERROR_=0 _N_=4
Name=Ronald Sex=M Age=15 Height=67 Weight=133 _ERROR_=0 _N_=5
Name=Thomas Sex=M Age=11 Height=57.5 Weight=85 _ERROR_=0 _N_=6
Name=William Sex=M Age=15 Height=66.5 Weight=112 _ERROR_=0 _N_=7
NOTE: There were 7 observations read from the data set SASHELP.CLASS.
WHERE age in (11, 15, 16);
NOTE: The data set WORK.D2 has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
13
14 data d3;
15 set sashelp.class;
16 if age in (16,15,11);
17 put _ALL_;
18 run;
Name=Janet Sex=F Age=15 Height=62.5 Weight=112.5 _ERROR_=0 _N_=8
Name=Joyce Sex=F Age=11 Height=51.3 Weight=50.5 _ERROR_=0 _N_=11
Name=Mary Sex=F Age=15 Height=66.5 Weight=112 _ERROR_=0 _N_=14
Name=Philip Sex=M Age=16 Height=72 Weight=150 _ERROR_=0 _N_=15
Name=Ronald Sex=M Age=15 Height=67 Weight=133 _ERROR_=0 _N_=17
Name=Thomas Sex=M Age=11 Height=57.5 Weight=85 _ERROR_=0 _N_=18
Name=William Sex=M Age=15 Height=66.5 Weight=112 _ERROR_=0 _N_=19
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set WORK.D3 has 7 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
My question/comment is: interesting approach, what are advantages of it?
First what came into my mind is: we can calculate distance between listed values in the data set (and the beginning of the dataset itself) with this approach:
data test;
input x;
cards;
1
2
3
4
5
6
7
8
9
;
run;
data d;
DISTANCE_BETWEEN=0;
do until(x in (2,6,9));
set test;
DISTANCE_BETWEEN + 1;
end;
put _ALL_;
run;
Log:
DISTANCE_BETWEEN=2 x=2 _ERROR_=0 _N_=1
DISTANCE_BETWEEN=4 x=6 _ERROR_=0 _N_=2
DISTANCE_BETWEEN=3 x=9 _ERROR_=0 _N_=3
What can I say, interesting use of conditional DoW-loop!
Bart
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.