Re: Loop stop issue

FP12 · Posted 02-01-2018 06:59 AM

Hi all,

I have a huge volumetry table I need to use to do a lot of operations. In order to improve the performance we conclude the best way is to divide the table in 1000 rows part, then to call the macroA where all operation are done, and then to make an append (macro union) with the result of the previous parts on which we already have the result.

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;
  do part=0 to int(nobs/1000);
     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000,part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
  end;
  stop;
run;
%include code / source2 ;

When running macroA on few parts, it works. When running union on few parts, it works. But I want to run it on the all parts and I have the following issue:

ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, 
       or invalid.

The issue pop up in the middle of the loop number 442, when my input table (TABLE1) has 442,167 rows.

The loop number 442 should concern rows from 442,000 to 443,000.

I suppose the issue is due to the fact that there are not any rows from 442,168 to 443,000.

But I don't know what I should do to ask my process to stop if there are no rows remaining, can you help me?

Thanks,

FP12 · Posted 02-01-2018 06:56 AM

Hi all,

I have a huge volumetry table I need to use to do a lot of operations. In order to improve the performance we conclude the best way is to divide the table in 1000 rows part, then to call the macroA where all operation are done, and then to make an append (macro union) with the result of the previous treated parts.

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;
  do part=0 to int(nobs/1000);
     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000,part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
	end;
  end;
  stop;
run;
%include code / source2 ;

When running macroA on few parts, it works. When running union on few parts, it works. But I want to run it on the all parts and I have the following issue:

ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, 
       or invalid.

The issue pop up in the middle of the loop number 442, when my input table (TABLE1) has 442,167 rows.

The loop number 442 should concern rows from 442,000 to 443,000.

I suppose the issue is due to the fact that there are not any rows from 442,168 to 443,000.

But I don't know what I should do to ask my process to stop if there are no rows remaining, can you help me?

Thanks,

ChrisNZ · Posted 02-01-2018 05:26 PM

What @RW9 said.

What does your macro do? I havent seen many programs that cannot be speeded up.

To answer your question, I hope you do something like this to read the subsets:

data T;
  set SASHELP.CLASS(firstobs=1 obs=999);
run;

in which case a number that's too large doesn't matter.

But that's a bad idea for the reasons explained.

High-Performance SAS Coding - Third Edition

Satish_Parida · Posted 02-01-2018 07:10 AM

Please provide the macro scripts.

FredrikE · Posted 02-01-2018 07:23 AM

I think it is depending on how you limit the observations in macroA.

If you use "if _n_ > part and _n_ < part+size" it should work ?!

I would have used a macro instead of a data step, think it is a bit easier....

proc sql noprint;
select int(count(*)/1000) into :nobs from TABLE1;
quit;

%macro loop;
%do part = 0 %to &nobs;
%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000, part=&part);
%union(inputtable = TABLE2, part=&part);
%end;
%mend loop;
%loop;

//Fredrik

RW9 · Posted 02-01-2018 07:56 AM

And how is that going to speed things up? What you are doing is this:

Creating lots of files, each with their own copy of the header information - thus using more disk space

You are then starting and stopping procedures over and over again - using more ram and cpu

Then you are writing lots of messy code which loops over lots of data - uses more programmer resource to build an maintain

Finally you then have to write code to put all this back together again - this adds to disk space use, programmer resource etc.

Thus the only logical conclusion is to not split similar data up as that provides no benefit at all. If it takes a long time, develop on a subset, then leave the program on the server or a spare desktop running, or assign more resources to it.

mkeintz · Posted 02-03-2018 09:07 AM

While I agree entirely with @RW9's observation that your strategy is likely to make your project more inefficient, there is still a diagnosis to be made on the symptom you report. Without seeing the macro code, I suspect that the SIZE parameter in macroA has to be less than 1000 for the last iteration of the loop whenever dataset TABLE1 does not have NOBS in the exact 1,000's. If so, you can do this:

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;

  nloops=int(nobs/1000);
  do part=0 to nloops;
     if part=nloops then size=mod(nobs,1000);
	 else size=1000;

     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=' size ',part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
  end;
  stop;
run;
%include code / source2 ;

You can test this on a dataset with, say, 1011 observations. No need to wait for 442 iterations.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Loop stop issue

Loop issue

Re: Loop issue