Hi all,
I have a huge volumetry table I need to use to do a lot of operations. In order to improve the performance we conclude the best way is to divide the table in 1000 rows part, then to call the macroA where all operation are done, and then to make an append (macro union) with the result of the previous parts on which we already have the result.
data _null_; set TABLE1 nobs=nobs ; file code ; do part=0 to int(nobs/1000); put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000,part=' part')' ; put '%union(inputtable = TABLE2, part = ' part')' ; end; stop; run; %include code / source2 ;
When running macroA on few parts, it works. When running union on few parts, it works. But I want to run it on the all parts and I have the following issue:
ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid.
The issue pop up in the middle of the loop number 442, when my input table (TABLE1) has 442,167 rows.
The loop number 442 should concern rows from 442,000 to 443,000.
I suppose the issue is due to the fact that there are not any rows from 442,168 to 443,000.
But I don't know what I should do to ask my process to stop if there are no rows remaining, can you help me?
Thanks,
Hi all,
I have a huge volumetry table I need to use to do a lot of operations. In order to improve the performance we conclude the best way is to divide the table in 1000 rows part, then to call the macroA where all operation are done, and then to make an append (macro union) with the result of the previous treated parts.
data _null_; set TABLE1 nobs=nobs ; file code ; do part=0 to int(nobs/1000); put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000,part=' part')' ; put '%union(inputtable = TABLE2, part = ' part')' ; end; end; stop; run; %include code / source2 ;
When running macroA on few parts, it works. When running union on few parts, it works. But I want to run it on the all parts and I have the following issue:
ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, or invalid.
The issue pop up in the middle of the loop number 442, when my input table (TABLE1) has 442,167 rows.
The loop number 442 should concern rows from 442,000 to 443,000.
I suppose the issue is due to the fact that there are not any rows from 442,168 to 443,000.
But I don't know what I should do to ask my process to stop if there are no rows remaining, can you help me?
Thanks,
What @RW9 said.
What does your macro do? I havent seen many programs that cannot be speeded up.
To answer your question, I hope you do something like this to read the subsets:
data T;
set SASHELP.CLASS(firstobs=1 obs=999);
run;
in which case a number that's too large doesn't matter.
But that's a bad idea for the reasons explained.
I think it is depending on how you limit the observations in macroA.
If you use "if _n_ > part and _n_ < part+size" it should work ?!
I would have used a macro instead of a data step, think it is a bit easier....
proc sql noprint;
select int(count(*)/1000) into :nobs from TABLE1;
quit;
%macro loop;
%do part = 0 %to &nobs;
%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000, part=&part);
%union(inputtable = TABLE2, part=&part);
%end;
%mend loop;
%loop;
//Fredrik
And how is that going to speed things up? What you are doing is this:
Creating lots of files, each with their own copy of the header information - thus using more disk space
You are then starting and stopping procedures over and over again - using more ram and cpu
Then you are writing lots of messy code which loops over lots of data - uses more programmer resource to build an maintain
Finally you then have to write code to put all this back together again - this adds to disk space use, programmer resource etc.
Thus the only logical conclusion is to not split similar data up as that provides no benefit at all. If it takes a long time, develop on a subset, then leave the program on the server or a spare desktop running, or assign more resources to it.
While I agree entirely with @RW9's observation that your strategy is likely to make your project more inefficient, there is still a diagnosis to be made on the symptom you report. Without seeing the macro code, I suspect that the SIZE parameter in macroA has to be less than 1000 for the last iteration of the loop whenever dataset TABLE1 does not have NOBS in the exact 1,000's. If so, you can do this:
data _null_;
set TABLE1 nobs=nobs ;
file code ;
nloops=int(nobs/1000);
do part=0 to nloops;
if part=nloops then size=mod(nobs,1000);
else size=1000;
put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=' size ',part=' part')' ;
put '%union(inputtable = TABLE2, part = ' part')' ;
end;
stop;
run;
%include code / source2 ;
You can test this on a dataset with, say, 1011 observations. No need to wait for 442 iterations.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.