Hi,
I have created list of datasets in my library (called block) using below.
proc contents data=block._all_ noprint out=contents (keep=libname memname);
run;
proc sort data=contents nodup;
by libname memname;
run;
I like to loop through all of these datasets and apply the split macro below to each dataset.
Since I have >200 datasets, how do I this without having to do it write split macro individually for each dataset. I'm having quite a bit trouble with this.
Thanks.
%macro split (SRC_DATASET=, OUT_PREFIX=, SPLIT_NUM=, SPLIT_DEF=); /* Parameters: /* SRC_DATASET - name of the source data set */ /* OUT_PREFIX - prefix of the output data sets */ /* SPLIT_NUM - split number */ /* SPLIT_DEF - split definition (=SETS or =NOBS) */ %local I K S TLIST; /* number of observations &K, number of smaller datasets &S */ data _null_; if 0 then set &SRC_DATASET nobs=N; if upcase("&SPLIT_DEF")='NOBS' then do; call symputx('K',&SPLIT_NUM); call symputx('S',ceil(N/&SPLIT_NUM)); put "***MACRO SPLIT: Splitting into datasets of no more than &SPLIT_NUM observations"; end; else if upcase("&SPLIT_DEF")='SETS' then do; call symputx('S',&SPLIT_NUM); call symputx('K',ceil(N/&SPLIT_NUM)); put "***MACRO SPLIT: Splitting into &SPLIT_NUM datasets"; end; else put "***MACRO SPLIT: Incorrect SPLIT_DEF=&SPLIT_DEF value. Must be either SETS or NOBS."; stop; run; /* terminate macro if nothing to split */ %if (&K le 0) or (&S le 0) %then %return; /* generate list of smaller dataset names */ %do I=1 %to &S; %let TLIST = &TLIST &OUT_PREFIX._&I; %end; /* split source dataset into smaller datasets */ data &TLIST; set &SRC_DATASET; select; %do I=1 %to &S; when(_n_ <= &K * &I) output &OUT_PREFIX._&I; %end; end; run; %mend split;
How about dosubl?
data _null_;
set contents;
rc=dosubl(cats('%split(SRC_DATASET=BLOCK.',MEMNAME,',out_prefix=BLOCK.',MEMNAME,',split_num=200,SPLIT_DEF=NOBS)'));
run;
Like this?
data _null_;
set CONTENTS;
call execute(catt('%split(ds=BLOCK.',MEMNAME,',out_prefix=_a,split_num=200);'));
run;
Hi Chris,
Thanks for your quick reply.
I tried yours method, but it is not working for me. I don't get any new datasets output to my work library.
Basically, I have about 237 dataset labeled as below
block.block_A, block.block_B, block.block_C.....
Each dataset have roughly about 600 observations.
I want to split each dataset into smaller dataset (let's just say 200 observations each).
So if block.block_A has 630 observations and I use
%split (block.block_A, want.block_A, 200, nobs) then I will get
want.block_A_1 with 200 obs
want.block_A_2 with 200 obs
want.block_A_3 with 200 obs
want.block_A_4 with 30 obs
So I like to do this for each of my 237 datasets without having use write the %split macro above for each dataset. If each of my dataset has 600 observations, then I expect to have
237*3=711 new datasets (200 obs each) after applying the split macro.
It seems that Chris forgot the SPLIT_DEF parameter in the macro-call, adding it should solve the issue.
> I don't get any new datasets output to my work library.
What's in the log?
More importantly, what's the point of creating all these tables rather than using data set options?
Hi Chris,
If you have an easier way of splitting all of these datasets using data set, please share. I would love to try it. May be I am overlooking something that can be solved quite easily.
I ran the following:
data _null_;
set contents;
by memname;
call execute (catt('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS);'));
run;
The logs are below - I am getting this for each of the dataset.
NOTE: CALL EXECUTE generated line.
1 + data _null_; if 0 then set block.BLOCK_10 nobs=N; if upcase("NOBS")='NOBS' then do;
call symputx('K',200); call symputx('S',ceil(N/200)); put "***MACRO SPLIT: Splitting into datasets of no
more
2 + than 200 observations"; end; else if upcase("NOBS")='SETS' then do; call
symputx('S',200); call symputx('K',ceil(N/200)); put "***MACRO SPLIT: Splitting into 200 datasets";
end;
3 + else put "***MACRO SPLIT: Incorrect SPLIT_DEF=NOBS value. Must be either SETS or NOBS."; stop; run;
***MACRO SPLIT: Splitting into datasets of no more than 200 observations
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
After doing some research, I thought it may be the order in which the variables are being executed. So I also tried with %nrstr, but I'm don't think I am doing this right since it is giving errors.
data _null_;
set contents;
by memname;
if first.memname;
call execute ('%nrstr(cat('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS)'))');
run;
Log is saying:
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24
25 GOPTIONS ACCESSIBLE;
26
27 data _null_;
28 set contents;
29 by memname;
30
31 call execute ('%nrstr(catt('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS);'))');
NOTE: Line generated by the invoked macro "SPLIT".
31 data _null_; if 0 then
___
388
76
ERROR: Invalid date/time/datetime constant '%nrstr(catt('d.
ERROR: File ,memname, does not exist.
______________
77
NOTE: Line generated by the macro variable "SRC_DATASET".
31 block.',memname,'
______
22
201
ERROR 388-185: Expecting an arithmetic operator.
ERROR 76-322: Syntax error, statement will be ignored.
ERROR 77-185: Invalid number conversion on '%nrstr(catt('d.
ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, ;, CUROBS, END, INDSNAME, KEY, KEYRESET, KEYS,
NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_.
ERROR 201-322: The option is not recognized and will be ignored.
2 The SAS System 13:12 Tuesday, September 8, 2020
NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
31 call execute ('%nrstr(catt('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS);'))');
____
180
ERROR 180-322: Statement is not valid or it is used out of proper order.
1.
> If you have an easier way of splitting all of these datasets using data set, please share.
My point was: Why split? Why not use options on the original data set? Like
proc whateveryouaredoinglater data=ORIGINALTABLE(firstobs=200 obs=400);
2. This statement cannot work, the syntax is invalid for several reasons.
call execute ('%nrstr(cat('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS)'))');
Try the syntax I provided.
How about dosubl?
data _null_;
set contents;
rc=dosubl(cats('%split(SRC_DATASET=BLOCK.',MEMNAME,',out_prefix=BLOCK.',MEMNAME,',split_num=200,SPLIT_DEF=NOBS)'));
run;
> How about dosubl?
There is no benefit here over call execute, and there is the overhead of starting new SAS threads.
Hi Chris,
Believe it or not. I tried hhinohar's dosubl It actually works.
Well, at least one thing is for sure after this, I need to read up on how to better use SAS macro.
Thanks for your help working through this. My technique is ugly. I wish it was more elegant.
Thanks!
According to documentation it says,
"The DOSUBL function enables the immediate execution of SAS code after a text string is passed. The function imports macro variables from the calling environment, and macro variables that are created or updated during the execution of the submitted code are exported back to the calling environment."
"However, when you use a DATA step to set the value of a global macro variable, and then you use the CALL EXECUTE routine to call the macro, the DATA step code executes after the current DATA step completes. Example 1 demonstrates execution of the DATA step code."
I think this makes sense and I learned a lot from your SAS macro which is a great program!
@hhinohar wrote:
How about dosubl?
data _null_; set contents; rc=dosubl(cats('%split(SRC_DATASET=BLOCK.',MEMNAME,',out_prefix=BLOCK.',MEMNAME,',split_num=200,SPLIT_DEF=NOBS)')); run;
It does not matter if the macro runs after the data step so there is no reason to use DOSUBL() instead of CALL EXECUTE(), you are just wasting computer time. But if the macro is complicated then you want it to run after the data step and not during the CALL EXECUTE() statement so wrap the %XXX inside of %NRSTR() so the macro call is pushed to run after the data step. Not only will this eliminate timing issues it will also make your SAS log easier to read.
Also you do not want to run the macro for every variable in the dataset, just once for each dataset.
proc contents data=block._all_ noprint out=contents;
run;
data _null_;
set contents;
by memname;
if first.memname;
call execute(cats('%nrstr(%split)'
,'(SRC_DATASET=',catx('.',libname,memname)
,',out_prefix=',catx('.',libname,memname)
,',split_num=200,SPLIT_DEF=NOBS)'
));
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.