BookmarkSubscribeRSS Feed
FP12
Obsidian | Level 7

Hi all,

 

I have a huge volumetry table I need to use to do a lot of operations. In order to improve the performance we conclude the best way is to divide the table in 1000 rows part, then to call the macroA where all operation are done, and then to make an append (macro union) with the result of the previous parts on which we already have the result.

 

 

 

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;
  do part=0 to int(nobs/1000);
     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000,part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
  end;
  stop;
run;
%include code / source2 ;

 

 

When running macroA on few parts, it works. When running union on few parts, it works. But I want to run it on the all parts and I have the following issue:

ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, 
       or invalid.

 

The issue pop up in the middle of the loop number 442, when my input table (TABLE1) has 442,167 rows.

The loop number 442 should concern rows from 442,000 to  443,000.

I suppose the issue is due to the fact that there are not any rows from 442,168 to 443,000.

 

But I don't know what I should do to ask my process to stop if there are no rows remaining, can you help me?

Thanks,

 

6 REPLIES 6
FP12
Obsidian | Level 7

Hi all,

 

I have a huge volumetry table I need to use to do a lot of operations. In order to improve the performance we conclude the best way is to divide the table in 1000 rows part, then to call the macroA where all operation are done, and then to make an append (macro union) with the result of the previous treated parts.

 

 

 

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;
  do part=0 to int(nobs/1000);
     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000,part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
	end;
  end;
  stop;
run;
%include code / source2 ;

 

 

When running macroA on few parts, it works. When running union on few parts, it works. But I want to run it on the all parts and I have the following issue:

ERROR: Invalid DO loop control information, either the INITIAL or TO expression is missing or the BY expression is missing, zero, 
       or invalid.

 

The issue pop up in the middle of the loop number 442, when my input table (TABLE1) has 442,167 rows.

The loop number 442 should concern rows from 442,000 to  443,000.

I suppose the issue is due to the fact that there are not any rows from 442,168 to 443,000.

 

But I don't know what I should do to ask my process to stop if there are no rows remaining, can you help me?

Thanks,

 

ChrisNZ
Tourmaline | Level 20

What @RW9 said.

What does your macro do? I havent seen many programs that cannot be speeded up. 

 

To answer your question, I hope you do something like this to read the subsets:

data T;
  set SASHELP.CLASS(firstobs=1 obs=999);
run;

in which case a number that's too large doesn't matter.

 

But that's a bad idea for the reasons explained. 

Satish_Parida
Lapis Lazuli | Level 10
Please provide the macro scripts.
FredrikE
Rhodochrosite | Level 12

I think it is depending on how you limit the observations in macroA.

If you use "if _n_ > part and _n_ < part+size" it should work ?!

 

I would have used a macro instead of a data step, think it is a bit easier....

 

proc sql noprint;
    select int(count(*)/1000) into :nobs from TABLE1;
quit;

 

%macro loop;
  %do part = 0 %to &nobs;
    %macroA(inputtable=TABLE1, outputtable = TABLE2, size=1000, part=&part); 
    %union(inputtable = TABLE2, part=&part);
  %end;
%mend loop;
%loop;

 

//Fredrik

RW9
Diamond | Level 26 RW9
Diamond | Level 26

And how is that going to speed things up?  What you are doing is this:

Creating lots of files, each with their own copy of the header information - thus using more disk space

You are then starting and stopping procedures over and over again - using more ram and cpu

Then you are writing lots of messy code which loops over lots of data - uses more programmer resource to build an maintain

Finally you then have to write code to put all this back together again - this adds to disk space use, programmer resource etc.

 

Thus the only logical conclusion is to not split similar data up as that provides no benefit at all.  If it takes a long time, develop on a subset, then leave the program on the server or a spare desktop running, or assign more resources to it.

mkeintz
PROC Star

While I agree entirely  with @RW9's observation that your  strategy is likely to make your project more inefficient, there is still a diagnosis to be made on the symptom you report.  Without seeing the macro code, I suspect that the SIZE parameter in macroA has to be less than 1000 for the last iteration of the loop whenever dataset TABLE1 does not have NOBS in the exact 1,000's.  If so, you can do this:

 

data _null_;
  set TABLE1 nobs=nobs ;
  file code ;

  nloops=int(nobs/1000);
  do part=0 to nloops;
     if part=nloops then size=mod(nobs,1000);
	 else size=1000;

     put '%macroA(inputtable=TABLE1, outputtable = TABLE2, size=' size ',part=' part')' ;
     put '%union(inputtable = TABLE2, part = ' part')' ;
  end;
  stop;
run;
%include code / source2 ;

You can test this on a  dataset with, say, 1011 observations. No need to wait for 442 iterations.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 2546 views
  • 5 likes
  • 6 in conversation