DATA Step, Macro, Functions and more

Loop stop issue

Reply
Contributor
Posts: 41

Loop stop issue

[ Edited ]

Hi all,

 

I have a huge volumetry table I need to use to do a lot of operations. In order to improve the performance we conclude the best way is to divide the table in 1000 rows part, then to call the macroA where all operation are done, and then to make an append (macro union) with the result of the previous parts on which we already have the result.

 

 

 

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;
  do part=0 to int(nobs/1000);
     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000,part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
  end;
  stop;
run;
%include code / source2 ;

 

 

When running macroA on few parts, it works. When running union on few parts, it works. But I want to run it on the all parts and I have the following issue:

ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, 
       or invalid.

 

The issue pop up in the middle of the loop number 442, when my input table (TABLE1) has 442,167 rows.

The loop number 442 should concern rows from 442,000 to  443,000.

I suppose the issue is due to the fact that there are not any rows from 442,168 to 443,000.

 

But I don't know what I should do to ask my process to stop if there are no rows remaining, can you help me?

Thanks,

 

Frequent Contributor
Posts: 109

Re: Loop stop issue

Please provide the macro scripts.
Super Contributor
Posts: 391

Re: Loop stop issue

I think it is depending on how you limit the observations in macroA.

If you use "if _n_ > part and _n_ < part+size" it should work ?!

 

I would have used a macro instead of a data step, think it is a bit easier....

 

proc sql noprint;
    select int(count(*)/1000) into :nobs from TABLE1;
quit;

 

%macro loop;
  %do part = 0 %to &nobs;
    %macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000, part=&part); 
    %union(inputtable = TABLE2, part=&part);
  %end;
%mend loop;
%loop;

 

//Fredrik

Highlighted
Contributor
Posts: 41

Loop issue

Hi all,

 

I have a huge volumetry table I need to use to do a lot of operations. In order to improve the performance we conclude the best way is to divide the table in 1000 rows part, then to call the macroA where all operation are done, and then to make an append (macro union) with the result of the previous treated parts.

 

 

 

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;
  do part=0 to int(nobs/1000);
     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000,part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
	end;
  end;
  stop;
run;
%include code / source2 ;

 

 

When running macroA on few parts, it works. When running union on few parts, it works. But I want to run it on the all parts and I have the following issue:

ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, 
       or invalid.

 

The issue pop up in the middle of the loop number 442, when my input table (TABLE1) has 442,167 rows.

The loop number 442 should concern rows from 442,000 to  443,000.

I suppose the issue is due to the fact that there are not any rows from 442,168 to 443,000.

 

But I don't know what I should do to ask my process to stop if there are no rows remaining, can you help me?

Thanks,

 

PROC Star
Posts: 2,353

Re: Loop issue

What @RW9 said.

What does your macro do? I havent seen many programs that cannot be speeded up. 

 

To answer your question, I hope you do something like this to read the subsets:

data T;
  set SASHELP.CLASS(firstobs=1 obs=999);
run;

in which case a number that's too large doesn't matter.

 

But that's a bad idea for the reasons explained. 

Super User
Super User
Posts: 9,599

Re: Loop stop issue

And how is that going to speed things up?  What you are doing is this:

Creating lots of files, each with their own copy of the header information - thus using more disk space

You are then starting and stopping procedures over and over again - using more ram and cpu

Then you are writing lots of messy code which loops over lots of data - uses more programmer resource to build an maintain

Finally you then have to write code to put all this back together again - this adds to disk space use, programmer resource etc.

 

Thus the only logical conclusion is to not split similar data up as that provides no benefit at all.  If it takes a long time, develop on a subset, then leave the program on the server or a spare desktop running, or assign more resources to it.

Trusted Advisor
Posts: 1,337

Re: Loop stop issue

While I agree entirely  with @RW9's observation that your  strategy is likely to make your project more inefficient, there is still a diagnosis to be made on the symptom you report.  Without seeing the macro code, I suspect that the SIZE parameter in macroA has to be less than 1000 for the last iteration of the loop whenever dataset TABLE1 does not have NOBS in the exact 1,000's.  If so, you can do this:

 

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;

  nloops=int(nobs/1000);
  do part=0 to nloops;
     if part=nloops then size=mod(nobs,1000);
	 else size=1000;

     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=' size ',part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
  end;
  stop;
run;
%include code / source2 ;

You can test this on a  dataset with, say, 1011 observations. No need to wait for 442 iterations.

Ask a Question
Discussion stats
  • 6 replies
  • 353 views
  • 5 likes
  • 6 in conversation