BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kiteulf
Quartz | Level 8

Hi

 

I am getting an error due to the number of files that I need to split. The macro worked fine with <100 obs.

53 DATA DK_PANDE2_SME;
54 set DK_Pande END=EOT;
55 RECNUM = STRIP(PUT(_n_, BEST.));
56 IF EOT THEN DO;
57 CALL SYMPUTX( 'nobs', RECNUM);
58 END;
59 run;

NOTE: There were 17217 observations read from the data set WORK.DK_PANDE.
NOTE: The data set WORK.DK_PANDE2_SME has 17217 observations and 12 variables.
NOTE: DATA statement used (Total process time):
real time 5.62 seconds
cpu time 5.56 seconds

 

%put &nobs;
17217
62
63 proc datasets library=output kill;
NOTE: Deleting OUTPUT.ADDRESSHISTORY (memtype=DATA).
64 run;

65 quit;

NOTE: PROCEDURE DATASETS used (Total process time):
real time 0.04 seconds
cpu time 0.01 seconds

66
67 %macro split (SRC_DATASET=, OUT_PREFIX=, SPLIT_NUM=, SPLIT_DEF=);
68 /* Parameters:
69 /* SRC_DATASET - name of the source data set */
70 /* OUT_PREFIX - prefix of the output data sets */
71 /* SPLIT_NUM - split number */
72 /* SPLIT_DEF - split definition (=SETS or =NOBS) */
73
74 %local I K S TLIST;
75
76 /* number of observations &K, number of smaller datasets &S */
77 data _null_;
78 if 0 then set &SRC_DATASET nobs=N;
79 if upcase("&SPLIT_DEF")='NOBS' then
80 do;
81 call symputx('K',&SPLIT_NUM);
82 call symputx('S',ceil(N/&SPLIT_NUM));
83 put "***MACRO SPLIT: Splitting into datasets of no more than &SPLIT_NUM observations";
84 end;
85 else if upcase("&SPLIT_DEF")='SETS' then
86 do;
87 call symputx('S',&SPLIT_NUM);
88 call symputx('K',ceil(N/&SPLIT_NUM));
89 put "***MACRO SPLIT: Splitting into &SPLIT_NUM datasets";
90 end;
91 else put "***MACRO SPLIT: Incorrect SPLIT_DEF=&SPLIT_DEF value. Must be either SETS or NOBS.";
3 The SAS System 09:23 Friday, February 26, 2021

92 stop;
93 run;
94
95 /* terminate macro if nothing to split */
96 %if (&K le 0) or (&S le 0) %then %return;
97
98 /* generate list of smaller dataset names */
99 %do I=1 %to &S;
100 %let TLIST = &TLIST &OUT_PREFIX._&I;
101 %end;
102
103 /* split source dataset into smaller datasets */
104 data &TLIST;
105 set &SRC_DATASET;
106 select;
107 %do I=1 %to &S;
108 when(_n_ <= &K * &I) output &OUT_PREFIX._&I;
109 %end;
110 end;
111 run;
112
113 %mend split;
114
115
116 %split(SRC_DATASET=WORK.DK_PANDE2_SME, OUT_PREFIX=WORK.DK_PANDE3_SME, SPLIT_NUM=1, SPLIT_DEF=NOBS);

***MACRO SPLIT: Splitting into datasets of no more than 1 observations
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds

ERROR: The text expression length (65540) exceeds maximum length (65534). The text expression has been truncated to 65534
characters.

1 ACCEPTED SOLUTION

Accepted Solutions
Kiteulf
Quartz | Level 8

Thanks

@Kurt_Bremser and @Astounding  and everybody else;

 

I took the macro I had and split into approx 100 in each data set, then I could split each of those datasets into 1 obs per dataset through looping through the 100 dataset (another macro). Works very well!

 

The reason I had to have 1 obs per dataset is that the xml in each obs is about 200 000 char long. So instead of opening the file everytime in a subsequent macro I can pick up just one observation/dataset. 

 

Time spent per operation fell with a factor of 30 🙂

View solution in original post

6 REPLIES 6
Kurt_Bremser
Super User

In about 99 % of cases, splitting a dataset is not needed, solves no purpose, makes further processing harder, and is therefore an exercise in stupidity.

And in your particular case, there is NOTHING gained by putting each observation into its own dataset. You can always restrict any procedure or data step with WHERE conditions, and you can use BY processing to do an action for subsets of the dataset, all in one step.

 

Saving large data into macro variables is not possible, as they are restricted to a length of 65534 (as you have seen). Always store data in datasets, that's where it belongs.

 

Finally, what are you trying to achieve with this at all?

Astounding
PROC Star

While I agree with everything that @Kurt_Bremser said, you might need to appreciate the magnitude of the problem.  This is not a simple case where you need a few extra characters to fit.  Consider what happens when you use:

%let TLIST = &TLIST &OUT_PREFIX._&I;

That's clearly the point where you overwhelm the length of a macro variable.  Suppose you cut it down as short as possible:

%let TLIST = &TLIST _&I;

After all, you could always add the prefix back in later.  However, even in this case where you only add _&I, and even in the narrow range of looping from 1000 to 9999, the length of &TLIST comes out to 53,999.  Clearly, this program will not be feasible using your current approach when you have so many files.

FreelanceReinh
Jade | Level 19

Hi @Kiteulf,

 

I also would strongly prefer one 17217-observations dataset to 17217 one-observation datasets. But if this is really what you want and the macro approach fails, then use a different approach.

 

For example, create ODS OUTPUT datasets from PROC REPORT:

 

data _tmp / view=_tmp;
set DK_PANDE2_SME;
_seqno=_n_;
run;

ods select none;
ods noresults;
options nonotes;
ods output report(match_all)=DK_PANDE3_SME_1(drop=_seqno _break_);
proc report data=_tmp;
by _seqno;
run;
ods select all;
ods results;
options notes;

proc delete data=_tmp (mt=view);
run;

Datasets DK_PANDE3_SME_1-DK_PANDE3_SME_17217 will contain the 1st, ..., 17217th observation of dataset DK_PANDE2_SME.

 

 

Kiteulf
Quartz | Level 8

Thanks

@Kurt_Bremser and @Astounding  and everybody else;

 

I took the macro I had and split into approx 100 in each data set, then I could split each of those datasets into 1 obs per dataset through looping through the 100 dataset (another macro). Works very well!

 

The reason I had to have 1 obs per dataset is that the xml in each obs is about 200 000 char long. So instead of opening the file everytime in a subsequent macro I can pick up just one observation/dataset. 

 

Time spent per operation fell with a factor of 30 🙂

Tom
Super User Tom
Super User

There is no reason to try to create the monster macro variable. 

%do I=1 %to &S;
  %let TLIST = &TLIST &OUT_PREFIX._&I;
%end;

Instead just expand the individual macro variables where you need them.

data 
%do I=1 %to &S;
 &TLIST &OUT_PREFIX._&I
%end;
;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1842 views
  • 0 likes
  • 6 in conversation