- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am in a SAS coding workgroup and I am trying to figure out some practical uses of DO-WHILE and DO-UNTIL for demonstrating while presenting on DO loops. These are two functions of the loop that I have never found a use for. We work in a health department, so I would especially be interested in an example applicable to health data if anyone can come up with something. Here is what I have so far: (Data note: the values are all randomly generated numbers I threw together in Excel, these are not reflective of a real program)
DATA sample;
INPUT coach $ patid $ intake lbs_s1-lbs_s10 @@;
DATALINES;
Alex 6914 265 265 266 265 266 264 261 262 264 263 265
Blake 7506 234 236 235 232 233 233 236 236 239 241 243
Cindy 2172 202 199 202 203 200 201 999 200 198 199 202
Alex 4100 191 193 192 191 194 195 193 191 193 195 196
Blake 9348 216 215 216 999 216 213 210 211 209 212 999
Cindy 8537 274 272 272 274 272 271 269 270 999 267 265
Alex 8659 248 245 999 248 247 250 251 254 253 252 999
Blake 1986 232 231 231 230 231 228 231 234 234 237 238
Cindy 6340 287 286 289 286 287 288 291 293 291 999 292
Alex 4590 285 288 291 293 294 295 294 291 289 289 286
Blake 7197 275 276 276 277 276 999 273 272 273 999 999
Cindy 1738 317 318 315 314 314 314 315 317 316 313 314
Alex 8250 280 277 276 276 275 999 274 277 276 274 273
Blake 4230 228 228 226 999 224 225 224 225 226 223 222
Cindy 7662 281 282 279 279 276 273 272 271 270 272 273
Alex 3675 297 999 999 293 293 290 288 289 290 293 296
Blake 3576 211 213 216 218 215 212 210 213 214 214 212
Cindy 7782 225 227 229 231 234 234 237 236 233 233 236
Alex 1941 226 225 226 228 225 223 226 223 225 227 225
Blake 1586 260 258 255 257 260 259 262 261 259 257 258
Cindy 4466 233 232 231 228 226 999 222 222 223 222 223
Alex 8180 211 211 213 216 213 211 212 209 207 205 203
Blake 6577 202 201 204 201 204 205 205 206 206 209 210
Cindy 3164 208 208 209 210 999 211 210 208 210 211 999
Alex 8028 345 345 346 343 999 342 999 339 341 344 346
Blake 9577 265 265 267 264 261 264 261 263 263 264 263
Cindy 1767 187 188 189 187 186 189 192 191 999 999 190
Alex 2709 246 999 243 245 242 239 237 239 240 240 242
Blake 5686 271 268 270 268 269 269 999 272 273 273 273
Cindy 1285 268 268 999 266 265 267 269 267 269 270 272
;
RUN;
PROC PRINT DATA = sample NOOBS;
RUN;
PROC MEANS DATA = sample N MIN MAX MEAN MEDIAN;
TITLE "Variation of weight for each session";
VAR lbs_s1-lbs_s10;
RUN;
*PROBLEM: Missing weights were coded as 999, which is messing up our results.
SOLUTION: Recode all 999 values a missing.;
DATA sample;
SET sample;
ARRAY mothman (10) lbs_s1-lbs_s10;
DO i = 1 to 10;
IF mothman(i) = 999 then mothman(i) = .;
END; *NOTE: All do-loops require and end.;
DROP i;
RUN;
PROC PRINT DATA = sample NOOBS;
RUN;
PROC MEANS DATA = sample N MIN MAX MEAN MEDIAN;
TITLE "Variation of weight for each session";
VAR lbs_s1-lbs_s10;
RUN;
*PROBLEM: We want to include data on the location of each of these participants before sending a copy to the CDC.
SOLUTION: We know where each of these coaches works and so we can add columns for the hospital, city, and zipcode.;
*** This is what it would look like without a do-loop ***;
DATA sample_x;
SET sample;
LENGTH hospital $ 25 city $ 8 zipcode $ 5;
IF coach = "Alex" THEN hospital = "St. Peter's Health";
ELSE IF coach = "Blake" THEN hospital = "Intermountain St. Vincent";
ELSE IF coach = "Cindy" THEN hospital = "Providence St. Patrick";
IF coach = "Alex" THEN city = "Helena";
ELSE IF coach = "Blake" THEN city = "Billings";
ELSE IF coach = "Cindy" THEN city = "Missoula";
IF coach = "Alex" THEN zipcode = "59601";
ELSE IF coach = "Blake" THEN zipcode = "59101";
ELSE IF coach = "Cindy" THEN zipcode = "59802";
RUN;
PROC SORT DATA = sample_x;
BY coach;
RUN;
PROC PRINT DATA = sample_x;
Title "Sample output 1";
RUN;
*This works, but can be cumbersome if there are a lot of actions to execute and a lot of sites.
Even moreso if a staffing change occurs and we need to update the coding for future datasets;
****This is what it looks like with a do-loop executing multiple actions***;
DATA sample;
SET sample;
LENGTH hospital $ 25 city $ 8 zipcode $ 5;
IF coach = "Alex" THEN DO;
hospital = "St. Peter's Health";
city = "Helena";
zipcode = "59601";
END; *Notice that each loop has a paired end.;
IF coach = "Blake" THEN DO;
hospital = "Intermountain St. Vincent";
city = "Billings";
zipcode = "59101";
END;
IF coach = "Cindy" THEN DO;
hospital = "Providence St. Patrick";
city = "Missoula";
zipcode = "59802";
END;
RUN;
PROC SORT DATA = sample;
BY coach;
RUN;
PROC PRINT DATA = sample;
Title "Sample output 2";
RUN;
*DO-UNTIL and DO-WHILE: These are conditional loops that checks for a criteria
for when to stop advancing. DO-UNTIL checks for the condition to be met at the
bottom of the loop (after completing all steps), and DO-WHILE checks for the
condition to be met at the top of the loop.;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
WHILE and UNTIL are useful when the underlying data are going to be used to determine when to stop the iteration (looping) rather than having it continue until a fixed point (as you would generally do in a DO loop). The following is a little bit manufactured because I don't really know anything about your data or objective:
data test;
set sample;
array lb {*} lbs_:;
* lets say we want to find the first point at which the person
has had 3 consecutive seasons (?) with an average weight<250 ;
s=3;
under250_3seas=0; * initialize this binary variable to 0 for this person ;
do while (under250_3seas=0 and s<dim(lb));
if mean(lb[s-2], lb[s-1], lb[s])<250 then under250_3seas=1;
else s+1;
end;
if under250_3seas=0 then s=.;
run;
So the resulting dataset will have a 0/1 indicator of whether the person met the condition, and if they did meet the condition, then the value of S will be the first point at which the condition was reached.
There are definitely more complicated things you can do with WHILE - that's just a simple, contrived example.
I would say that in general, UNTIL is used less than WHILE, at least in my experience, but it's just in the case that you want to run through the steps once even if the conditions are not met.
A lot of times (not always), you could do the same thing with a regular DO loop, but it might be slightly more awkward:
data test;
set sample;
array lb {*} lbs_:;
* lets say we want to find the first point at which the person
has had 3 consecutive seasons (?) with an average weight<250 ;
under250_3seas=0; * initialize this binary variable to 0 for this person ;
do s=3 to dim(lb);
if mean(lb[s-2], lb[s-1], lb[s])<250 then do;
under250_3seas=1;
LEAVE;
end;
end;
if under250_3seas=0 then s=.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Suppose you have an iterative procedure, it keeps going until some threshold is crossed.
do while (quantity >= threshold);
or
do until (quantity <= threshold);
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
WHILE and UNTIL are useful when the underlying data are going to be used to determine when to stop the iteration (looping) rather than having it continue until a fixed point (as you would generally do in a DO loop). The following is a little bit manufactured because I don't really know anything about your data or objective:
data test;
set sample;
array lb {*} lbs_:;
* lets say we want to find the first point at which the person
has had 3 consecutive seasons (?) with an average weight<250 ;
s=3;
under250_3seas=0; * initialize this binary variable to 0 for this person ;
do while (under250_3seas=0 and s<dim(lb));
if mean(lb[s-2], lb[s-1], lb[s])<250 then under250_3seas=1;
else s+1;
end;
if under250_3seas=0 then s=.;
run;
So the resulting dataset will have a 0/1 indicator of whether the person met the condition, and if they did meet the condition, then the value of S will be the first point at which the condition was reached.
There are definitely more complicated things you can do with WHILE - that's just a simple, contrived example.
I would say that in general, UNTIL is used less than WHILE, at least in my experience, but it's just in the case that you want to run through the steps once even if the conditions are not met.
A lot of times (not always), you could do the same thing with a regular DO loop, but it might be slightly more awkward:
data test;
set sample;
array lb {*} lbs_:;
* lets say we want to find the first point at which the person
has had 3 consecutive seasons (?) with an average weight<250 ;
under250_3seas=0; * initialize this binary variable to 0 for this person ;
do s=3 to dim(lb);
if mean(lb[s-2], lb[s-1], lb[s])<250 then do;
under250_3seas=1;
LEAVE;
end;
end;
if under250_3seas=0 then s=.;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
...s <= dim(lb)
...rather than...
...s < dim(lb)
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
With that data you might want to loop across all of those replicated variables. Using WHILE() or UNTIL() will let you stop once you have found the answer.
For example you might want to make a flag that detects if the subject ever exceeded 250 pounds.
data want;
set sample;
array lbs_s[10];
do i=1 to dim(lbs_s) until(over250);
if (999 ne lbs_s[i]) then over250 = lbs_s[i]>250;
end;
drop i;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DO WHILE and DO UNTIL are often useful when there is also a counter involved like - DO I = 1 TO 100 UNTIL - where the counter maximum is set for safety's sake but you also want to stop when the UNTIL condition is met.