BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
edhuang
Obsidian | Level 7

Hi,

 

I have created list of datasets in my library (called block) using below.

 

proc contents data=block._all_ noprint out=contents (keep=libname memname);
run;
proc sort data=contents nodup;
by libname memname;
run;

 

I like to loop through all of these datasets and apply the split macro below to each dataset.

Since I have >200 datasets, how do I this without having to do it write split macro individually for each dataset.  I'm having quite a bit trouble with this.

Thanks.

 

%macro split (SRC_DATASET=, OUT_PREFIX=, SPLIT_NUM=, SPLIT_DEF=);
/* Parameters:
/*   SRC_DATASET - name of the source data set     */
/*   OUT_PREFIX - prefix of the output data sets   */
/*   SPLIT_NUM - split number                      */
/*   SPLIT_DEF - split definition (=SETS or =NOBS) */
 
   %local I K S TLIST;
 
   /* number of observations &K, number of smaller datasets &S */
   data _null_;
      if 0 then set &SRC_DATASET nobs=N;
      if upcase("&SPLIT_DEF")='NOBS' then
         do;
            call symputx('K',&SPLIT_NUM); 
            call symputx('S',ceil(N/&SPLIT_NUM));
            put "***MACRO SPLIT: Splitting into datasets of no more than &SPLIT_NUM observations";
         end;
         else if upcase("&SPLIT_DEF")='SETS' then
         do;
            call symputx('S',&SPLIT_NUM); 
            call symputx('K',ceil(N/&SPLIT_NUM));
            put "***MACRO SPLIT: Splitting into &SPLIT_NUM datasets";
        end;
         else put "***MACRO SPLIT: Incorrect SPLIT_DEF=&SPLIT_DEF value. Must be either SETS or NOBS.";
      stop; 
   run;
 
   /* terminate macro if nothing to split */
   %if (&K le 0) or (&S le 0) %then %return;
 
    /* generate list of smaller dataset names */
   %do I=1 %to &S;
      %let TLIST = &TLIST &OUT_PREFIX._&I;
   %end;
 
   /* split source dataset into smaller datasets */
   data &TLIST;
      set &SRC_DATASET;
      select;
         %do I=1 %to &S;
            when(_n_ <= &K * &I) output &OUT_PREFIX._&I; 
         %end;
      end;
   run;
 
%mend split;   

 

1 ACCEPTED SOLUTION

Accepted Solutions
hhinohar
Quartz | Level 8

How about dosubl?

data _null_;
	set contents;
	rc=dosubl(cats('%split(SRC_DATASET=BLOCK.',MEMNAME,',out_prefix=BLOCK.',MEMNAME,',split_num=200,SPLIT_DEF=NOBS)'));
run;

 

View solution in original post

11 REPLIES 11
ChrisNZ
Tourmaline | Level 20

Like this?

data _null_;
  set CONTENTS;
  call execute(catt('%split(ds=BLOCK.',MEMNAME,',out_prefix=_a,split_num=200);'));
run;

 

edhuang
Obsidian | Level 7

Hi Chris,

 

Thanks for your quick reply.

I tried yours method, but it is not working for me. I don't get any new datasets output to my work library.

 

Basically, I have about 237 dataset labeled as below

block.block_A, block.block_B, block.block_C.....

Each dataset have roughly about 600 observations.

 

I want to split each dataset into smaller dataset (let's just say 200 observations each).

So if block.block_A has 630 observations and I use

%split (block.block_A, want.block_A, 200, nobs) then I will get

want.block_A_1 with 200 obs

want.block_A_2 with 200 obs

want.block_A_3 with 200 obs

want.block_A_4 with 30 obs

 

So I like to do this for each of my 237 datasets without having use write the %split macro above for each dataset.  If each of my dataset has 600 observations, then I expect to have

237*3=711 new datasets (200 obs each) after applying the split macro.

 

 

 

andreas_lds
Jade | Level 19

It seems that Chris forgot the SPLIT_DEF parameter in the macro-call, adding it should solve the issue.

ChrisNZ
Tourmaline | Level 20

>  I don't get any new datasets output to my work library.

What's in the log?

 

More importantly, what's the point of creating all these tables rather than using data set options?

edhuang
Obsidian | Level 7

Hi Chris,

 

If you have an easier way of splitting all of these datasets using data set, please share.  I would love to try it.  May be I am overlooking something that can be solved quite easily.

 

I ran the following:

 

data _null_;
set contents;
by memname;
call execute (catt('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS);'));
run;

 

The logs are below - I am getting this for each of the dataset.

NOTE: CALL EXECUTE generated line.
1 + data _null_; if 0 then set block.BLOCK_10 nobs=N; if upcase("NOBS")='NOBS' then do;
call symputx('K',200); call symputx('S',ceil(N/200)); put "***MACRO SPLIT: Splitting into datasets of no
more
2 + than 200 observations"; end; else if upcase("NOBS")='SETS' then do; call
symputx('S',200); call symputx('K',ceil(N/200)); put "***MACRO SPLIT: Splitting into 200 datasets";
end;
3 + else put "***MACRO SPLIT: Incorrect SPLIT_DEF=NOBS value. Must be either SETS or NOBS."; stop; run;

***MACRO SPLIT: Splitting into datasets of no more than 200 observations
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

 

After doing some research, I thought it may be the order in which the variables are being executed.  So I also tried with %nrstr, but I'm don't think I am doing this right since it is giving errors.

 

data _null_;
set contents;
by memname;
if first.memname;
call execute ('%nrstr(cat('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS)'))');
run;

 

Log is saying:

NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
24
25 GOPTIONS ACCESSIBLE;
26
27 data _null_;
28 set contents;
29 by memname;
30
31 call execute ('%nrstr(catt('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS);'))');
NOTE: Line generated by the invoked macro "SPLIT".
31 data _null_; if 0 then
___
388
76
ERROR: Invalid date/time/datetime constant '%nrstr(catt('d.
ERROR: File ,memname, does not exist.

______________
77
NOTE: Line generated by the macro variable "SRC_DATASET".
31 block.',memname,'
______
22
201

ERROR 388-185: Expecting an arithmetic operator.

ERROR 76-322: Syntax error, statement will be ignored.

ERROR 77-185: Invalid number conversion on '%nrstr(catt('d.

ERROR 22-322: Syntax error, expecting one of the following: a name, a quoted string, ;, CUROBS, END, INDSNAME, KEY, KEYRESET, KEYS,
NOBS, OPEN, POINT, _DATA_, _LAST_, _NULL_.

ERROR 201-322: The option is not recognized and will be ignored.
2 The SAS System 13:12 Tuesday, September 8, 2020


NOTE: The SAS System stopped processing this step because of errors.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

31 call execute ('%nrstr(catt('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS);'))');
____
180
ERROR 180-322: Statement is not valid or it is used out of proper order.

ChrisNZ
Tourmaline | Level 20

1.

> If you have an easier way of splitting all of these datasets using data set, please share. 

My point was: Why split?  Why not use options on the original data set? Like

proc whateveryouaredoinglater data=ORIGINALTABLE(firstobs=200 obs=400);

 

2. This statement cannot work, the syntax is invalid for several reasons.

call execute ('%nrstr(cat('%split(SRC_DATASET=block.',memname,', OUT_PREFIX=_a, SPLIT_NUM=200, SPLIT_DEF=NOBS)'))');

Try the syntax I provided.

 

hhinohar
Quartz | Level 8

How about dosubl?

data _null_;
	set contents;
	rc=dosubl(cats('%split(SRC_DATASET=BLOCK.',MEMNAME,',out_prefix=BLOCK.',MEMNAME,',split_num=200,SPLIT_DEF=NOBS)'));
run;

 

ChrisNZ
Tourmaline | Level 20

> How about dosubl?

There is no benefit here over call execute, and there is the overhead of starting new SAS threads.

edhuang
Obsidian | Level 7

Hi Chris,

 

Believe it or not.  I tried hhinohar's dosubl  It actually works.

Well, at least one thing is for sure after this, I need to read up on how to better use SAS macro.

Thanks for your help working through this.  My technique is ugly.  I wish it was more elegant.

 

hhinohar
Quartz | Level 8

Thanks!

According to documentation it says,

"The DOSUBL function enables the immediate execution of SAS code after a text string is passed. The function imports macro variables from the calling environment, and macro variables that are created or updated during the execution of the submitted code are exported back to the calling environment."

"However, when you use a DATA step to set the value of a global macro variable, and then you use the CALL EXECUTE routine to call the macro, the DATA step code executes after the current DATA step completes. Example 1 demonstrates execution of the DATA step code."

 

https://documentation.sas.com/?docsetId=lefunctionsref&docsetTarget=p09dcftd1xxg1kn1brnjyc0q93yk.htm... 

 

I think this makes sense and I learned a lot from your SAS macro which is a great program!

 

Tom
Super User Tom
Super User

@hhinohar wrote:

How about dosubl?

data _null_;
	set contents;
	rc=dosubl(cats('%split(SRC_DATASET=BLOCK.',MEMNAME,',out_prefix=BLOCK.',MEMNAME,',split_num=200,SPLIT_DEF=NOBS)'));
run;

 


It does not matter if the macro runs after the data step so there is no reason to use DOSUBL() instead of CALL EXECUTE(), you are just wasting computer time.  But if the macro is complicated then you want it to run after the data step and not during the CALL EXECUTE() statement so wrap the %XXX inside of %NRSTR() so the macro call is pushed to run after the data step.  Not only will this eliminate timing issues it will also make your SAS log easier to read.

 

Also you do not want to run the macro for every variable in the dataset, just once for each dataset.

proc contents data=block._all_ noprint out=contents;
run;
data _null_;
  set contents;
  by memname;
  if first.memname;
  call execute(cats('%nrstr(%split)'
    ,'(SRC_DATASET=',catx('.',libname,memname)
    ,',out_prefix=',catx('.',libname,memname)
    ,',split_num=200,SPLIT_DEF=NOBS)'
  ));
run;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1369 views
  • 3 likes
  • 5 in conversation