07-27-2016 09:30 PM - edited 07-27-2016 11:26 PM
I have a SAS macro part that will run the entire dataset and do some analysis, and during the analysis part, each instance is dealt with once a time.so that the entire dataset can be run and we could keep an eye on the 'log' file.
However, I would like to split the entire dataset into several parts. (Ex, first 500 observations, 501~1000 observations, etc). Which means the engine will stop after running 500 instances, and then run again. And in the end they can be generated in one table as before. How can I add this 'split' part into my prior code?
Part of Initial Code:
OPTIONS NOTES SOURCE SOURCE2 MPRINT MLOGIC MERROR SYMBOLGEN;
set WORK.LOCATION end=last;
if last then call symput('nfiles',_n_);
%do i=1 %to &nfiles;
if &i=_n_ then call symput('code',LOCATION_ID);
create table WORK.pt as select
%if %sysfunc(exist(WORK.result)) %then %do;
set WORK.result WORK.pt;
Where 'WORK.LOCATION' is the dataset that I call in 'proc sql' procedure that contains all the 'LOCATION_ID" information that I need.
I want to add split part somewhere around '%do i=1 %to &nfiles;' (in bold) And work.result is the result gonna stay.
07-27-2016 11:10 PM
You don't show how/where you want the split to occur so it's hard to answer your question.
This usually isn't a good way to work with SAS.
To to accomplish this look at FIRSTOBS and OBS dataset options that allow you to control the first observation read and the number of observations read. You'll want to pay attention to how they interact.
07-28-2016 04:46 AM
I would agree with @Reeza, SAS is designed to work with by groups of values. In your dataset, set a column which identifies what group of data it is associated with, in your case if you want 1-500, 501-1000 then something simple like:
data want; set have; group=(_n_ / 500) + 1; run;
Then you can use the group variable as a by clause:
proc means data=have; by group; ... run;
This has two benefits, the primary one being that your code is far simpler, easier to read, and more manageable. The second is that execution time will be faster - anything you write will be slower than the out of the box functionality.
07-28-2016 09:10 AM
Thank you! In this way, do I still run the entire dataset altogether, and cannot make it auto break (for example, every 500 observations, etc) ?
07-28-2016 09:37 AM
Yes, in this way you run the code once over all the data, the by group breaks the means calculations up into your by group values, so say you had 1 = 1-500, and 2=501-1000, you would end up with means as:
GROUP N MEAN ...
So grouped by the group value. There is no need to split a dataset, that will generally just make your code harder to read/maintain and run slower.