My program generates a single observation and, as the last step, appends this single observation to a results file (RESULTS1). The results file starts empty. I want to save every individual result but limit the size of the results file to 100,000 observations. The observation generated by the 100,001th iteration would be automatically written to a new (empty) results file (Results2) again, up to 100,000 in size. The 200,001th observation would go to another new results file (lets call it Results3) etc....So, in the end, if I run 350,000 iterations, I will end up with RESULTS1 (100,000 obs), RESULTS2 (100,000 obs), RESULTS3 (100,000 obs), RESULTS4 (50,000 obs)
Think of it as: I have five gallons of results that pour into a container one drop at a time, but I have pint sized containers so when one is full, the conveyor belt moves to next container and starts to fill it (and never miss a drop).
How can I limit the size of the results file, keep it, and then start a new one with a sequential name?
Thanks
proc append base=RESULTS1 data=newestresult force;
run;
I modified Astounding's suggestion and it works well.
proc append FORCE base=RESULT&result_set data=newestresult;
run;
%if %eval(&i/(&result_set * 50000))>1 %then %let result_set = %eval(&result_set + 1);
I also added a subsequent macro to join all of the pieces:
%let num_files = &result_set;
%macro join;
%do i=1 %to &num_files;
%let padded_number = %sysfunc(putn(&i, 4.0));
proc append base=full_set
data=RESULT&padded_number;
run;
proc datasets lib=work nolist;
delete RESULT&padded_number;
quit;
run;
%end;
%mend join;
%join
Tom, I didn't get to test yours, but I am sure it works similarly.
Much thanks, once again, to SAS Communities!
Are you saying your program will append one observation at a time, with the expectation that you will end up with multiple datasets of size 100,000? That is the opposite of efficient. It may be much better to think about how to append in larger batches that to worry immediately about dividing the result into datasets with fixed maximum size.
Perhaps you can tells us about the program that generates one result observation at a time.
But as to your question. Setting data set sizes to 100,000 is a much easier task. Let's say you have data set HAVE with some large number of obs, but you know there are no more than 600,000 obs (i.e. you need no more than 6 WANT datasets). Then:
data want1 want2 want3 want4 want5 want6;
set have end=end_of_have nobs=n_have;
if _n_<=100000 then output want1; else
if _n_<=200000 then output want2; else
if _n_<=300000 then output want3; else
if _n_<=400000 then output want4; else
if _n_<=500000 then output want5; else
output want6;
if end_of_have;
if n_have <=500000 then call execute ('proc delete data=want6;run;');
if n_have <=400000 then call execute ('proc delete data=want5;run;');
if n_have <=300000 then call execute ('proc delete data=want4;run;');
if n_have <=200000 then call execute ('proc delete data=want3;run;');
if n_have <=100000 then call execute ('proc delete data=want2;run;');
run;
The above is a fairly straightforward program. But as I said, appending one at a time - at the scale you contemplate - is an invitation to lots and lots of waiting.
Actually, some of the details of your program matter.
Presumably, you are using macro language to iterate hundreds of thousands of times. If that's incorrect, this approach may not apply.
Within the macro program, you might have (and could certainly add) a macro variable to keep track of which iteration you are working on.
And presumably you could add a small amount of code to this section:
proc append base=RESULTS1 data=newestresult force;
run;
Change it to look like this:
proc append base=RESULTS&result_set data=newestresult force;
run;
Then all you would need to do is to use your "number of iterations" macro variable. Very early, outside the loop, add:
%let result_set = 1;
Then just after PROC APPEND, add some logic to increment &RESULT_SET. One possibility:
%if %eval(&n_iterations / 100000 * 100000) = &n_iterations then
%let result_set = %eval(&result_set + 1);
Yes, its a macro that is running the iterations. Your suggestion makes sense. I just need to figure out how to create an iteration counter (which I believe you have called n_iterations). Thanks
That logic would need to switch from %EVAL to %SYSEVALF. The %EVAL function performs integer arithmatic, so it drops any remainders. In fact, that's the reason why the original logic that I posted works:
%if %eval(&n_iterations / 100000 * 100000) = &n_iterations then
%let result_set = %eval(&result_set + 1);
When &n_iterations is 100001, the division:
&n_iterations / 100000
produces 1 as the result. The remainder is dropped. Multiplying by 100000 generates 100000 as the result, which is not equal to 100001.
There are a few ways to get working logic out of this. Just make sure the logic increments whenever &N_ITERATIONS is a multiple of 100000.
Just keep count.
Not sure how you are currently ending your iterations but here is one way to use two loops. One that just increments the target dataset name and the other counts how many iterations you have done for this set.
%let set=1 ;
%let done=0;
%do %until(&done);
%let inner_loop=1;
%do %until(&done or &inner_loop>100000);
/* your processing here
including logic to set macro variable DONE=1 then time to stop looping
*/
proc append base=RESULTS&set data=newestresult force;
run;
%let inner_loop=%eval(&inner_loop+1);
%end; %* inner loop ;
%let set=%eval(&set+1);
%end; %* outer loop;
I modified Astounding's suggestion and it works well.
proc append FORCE base=RESULT&result_set data=newestresult;
run;
%if %eval(&i/(&result_set * 50000))>1 %then %let result_set = %eval(&result_set + 1);
I also added a subsequent macro to join all of the pieces:
%let num_files = &result_set;
%macro join;
%do i=1 %to &num_files;
%let padded_number = %sysfunc(putn(&i, 4.0));
proc append base=full_set
data=RESULT&padded_number;
run;
proc datasets lib=work nolist;
delete RESULT&padded_number;
quit;
run;
%end;
%mend join;
%join
Tom, I didn't get to test yours, but I am sure it works similarly.
Much thanks, once again, to SAS Communities!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.