I need to end a loop (can be a do loop but that's not too important) when the input dataset has zero observations. A complex series of calculations will be performed on an input dataset and the algorithm will be repeated until all observations exceed a critical value. At the end of each loop the dataset is split into passing observations (exceed a critical value) and failing observations (below a threshold). The dataset with the failing observations is sent back through the loop until all observations exceed the set threshold.
Because the number of iterations is not pre-determined but is data dependent, I do not know how to end this loop. Please see the attached example. Thanks for any help that you can give me.
Why bother counting how many are in the less than 50 dataset you are already testing if Z is less than 50?
%let num=0;
data ds_3 ds_4;
set ds_2;
if z < 50 then do;
output ds_4;
call symputx('num','1');
end;
if z > 49 then output ds_3;
run;
proc append base=ds_complete data=ds_3 force;
run;
%if &num = 0 %then %return;
Hi. I'd like to help but I never download attachments from public forums. So, could you please include your code in your reply as text, pasted into the window that appears when you click on the "little running man" icon? Thanks.
There is no reason to attach code files unless they are very long. Yours isn't. Open a text box and paste the code:
/* Example code */ /* I need to perform a complex series of calculations on a dataset. The algorithm will be repeated until all observations exceed a critical value. At the end of each loop the dataset is split into pass and fail observations. The dataset with the failing obs are sent back through the loop. What I need help with is knowing when to stop the algorithm. Specifically I need you to show me how to close a loop when the input dataset will have zero observations. For simplicity I coded a example with a trivial mathematical operation: mulitply by 3 until the value is greater than a threshold. In practice my algorithm will be significantly more complex. Again, please let me know how to end a loop when my input dataset grinds down to zero observations. Thanks so much! */ /* Initial dataset */ Data ds_0; input x $ y; datalines; a 2 b 7 c 17 ; Run; /* Not too sure if this step is needed */ /* I only inserted this data step because I wanted to preserve the original input data and not have it overwritten */ /* Observations in ds_1 will be subject to repeated loops until achieving a passing value */ Data ds_1; set ds_0; Run; /* ********************************* */ /* ********************************* */ /* *** loop will start here *** */ /* note: didn't specifiy type of loop but it can be a simple do loop */ /* perform algorithm */ Data ds_2; set ds_1; z=3*y; Run; /* split the dataset based upon results */ Data ds_1 ds_3; set ds_2; if z < 50 then output ds_1; if z > 49 then output ds_3; Run; Proc append base=ds_finished data=ds_3 force; Run; /* Using this simplified example there would be 3 loops */ /* Dataset composition by loop would be: LOOP 1 ds_1: a 6 b 21 ds_finished: c 51 LOOP 2 ds_1: a 18 ds_finished: c 51 b 63 LOOP 3 ds_1: 0 observations ds_finished: c 51 b 63 a 54 */ /* how to end when at zero obs??? */
The SET statement option NOBS will return the number of observations.
data _null_; set sashelp.class nobs=numobs; if _n_=1 then Put "number of observations is " numobs; run;
It creates a temporary variable, i.e. not written to the data set, with the variable name you provide after the Nobs= option.
You want to read the SET statement documentation for the details.
I would recommend a slight variation for this DATA step:
data _null_;
set sashelp.class nobs=numobs;
if _n_=1 then Put "number of observations is " numobs;
run;
As it stands, there are two defects. First, if there really are zero observations here, you will not get a message written to the log. The SET statement will end the DATA step when it encounters zero observations tor read in, and the PUT statement will therefore not execute. Second, and less important here, the DATA step will continue until it runs out of observations. So if you have 100M observations coming in, the DATA step reads all 100M. You only get one message, but you do run up the bill reading all the observations.
Here's a better version:
data _null_;
put "number of observations is " numobs;
stop;
set sashelp.class nobs=numobs;
run;
Thanks for all the help. I probably should have given a somewhat clearer sample code to illustrate what I needed. Here's the solution that I went with:
/* End a loop when a dataset no longer has any obs */
/* Initial dataset */
Data ds_0;
input x $ y;
datalines;
a 2
b 7
c 17
;
Run;
/* Observations in ds_1 will be subject to repeated loops until achieving a passing value */
Data ds_1;
set ds_0;
Run;
/* ********************************* */
/* ********************************* */
%macro looper;
%do %until(&num = 0);
/* perform algorithm */
/* note: the algorithm will actually be a complex series of data steps and sql statements */
Data ds_2;
set ds_1;
z=3*y;
Run;
/* split the dataset based upon results */
Data ds_3 ds_4;
set ds_2;
if z < 50 then output ds_4;
if z > 49 then output ds_3;
Run;
proc append base=ds_complete data=ds_3 force;
run;
%let dsid=%sysfunc(open(ds_4));
%let num=%sysfunc(attrn(&dsid, nobs));
%let rc=%sysfunc(close(&dsid));
%if &num = 0 %then %return;
Data ds_1;
set ds_4;
y=z;
drop z;
Run;
%end;
%mend;
%looper;
Why bother counting how many are in the less than 50 dataset you are already testing if Z is less than 50?
%let num=0;
data ds_3 ds_4;
set ds_2;
if z < 50 then do;
output ds_4;
call symputx('num','1');
end;
if z > 49 then output ds_3;
run;
proc append base=ds_complete data=ds_3 force;
run;
%if &num = 0 %then %return;
Very nice! Thank you.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.