How to improve performance copying a huge table while also creating additional indexes

Patrick — Sat, 14 Feb 2026 06:01:01 GMT

Hi all,

I'm dealing with a case where analysts/modellers are dealing with Base SAS analytical tables that can get up-to 2.5TB in volume (compressed) with up-to 1000 variables.

Obviously such volumes are not ideal for adhoc analysis especially because there aren't many indexes on these tables.

I can't do anything about these source tables and though what I'm thinking:

Write a macro that allows these users to create a copy of such tables under their user folder with ideally only a sub-set of columns and added indexes of their choice.

Because even reading a 2.5TB table also takes really long, I'm trying to come-up with a table copy approach that performs as good as possible.

The environment is a SAS Grid, SAS 9.4M5, rhel with 32 cpu's per node (actual). The default for an EG session the users spawn are 4 nodes.

Below demo code is the best I could come up with and I'd love your advise if there is anything else I could do to improve performance both for creating the copy as well as query performance using where clauses.

I assume for the copy and index creation process it's all about minimizing i/o and passes through the data.

For this reason, I've added creation of new indexes as data set option to the append base table ...but I don't really know if SAS will actually create these indexes via a single pass through the data together with the copy process or if creating the indexes later on via a proc datasets create index would perform the same.

I'm creating the target table using the spde engine. I don't have a lot of hands-on experience with this engine but I believe because the data copy needs to be under a user folder, there is no use for more than one data or index folder.

About the number of partitions: Based on what I've read in the docu, I'm thinking about defining them as "size of source tables/number of actual cpu". Not sure if I got that right though.

And here the sample code (self contained and fully executable as is).

options fullstimer msglevel=i dlcreatedir;
%let root_path=%sysfunc(pathname(work));
%let tbl      =analytical_table;

/*-- source table: existing analytical table with up to 2.5TB (compressed) and up to 1000 variables --*/
libname src "&root_path.\src";
data src.&tbl.;
  set sashelp.class;  
  array var_ {20} 8 ;
  do k=1 to 10000;
    do i=1 to dim(var_);
      var_[i]=i*_n_;
    end;
    output;
  end;
run;
proc sort data=src.&tbl. out=src.&tbl.(index=(name age n_a=(name age)));
  by name;
run;


/*-- target table: copy of analytical table with added indexes and reduced number of columns --*/

/* create required folders */
libname trg "&root_path.\trg"; 
libname trg "&root_path.\trg\spde_&tbl";

/* assign target library using spde engine */ 
libname trg spde "&root_path.\trg" datapath=("&root_path.\trg\spde_&tbl") indexpath=("&root_path.\trg\spde_&tbl") compress=char /*partsize=&gb_part_size.G*/;

/* create target table with added indexes and less columns */
proc datasets lib=trg nolist nowarn;
  delete &tbl.;
  run;
  append
    base=trg.&tbl.(index=(var_1 var_2 idx_01=(var_1 var_2)))
    data=src.&tbl.(keep=name--var_4)
    getsort
    ;
  run;
quit;

/* list indexes on target table */
proc sql;
  select *
  from dictionary.indexes
  where libname='TRG' and memname=%upcase("&tbl.")
  ;
quit;

Thanks, Patrick

Re: How to improve performance copying a huge table while also creating additional indexes

Ksharp — Sun, 15 Feb 2026 02:51:03 GMT

Patrick,
Since you have already used SPDE engine, I think you could check options too :
option bufsize=128k bufno=100;

these options might enhance your performance of copying sas dataset.

Re: How to improve performance copying a huge table while also creating additional indexes

LinusH — Mon, 16 Feb 2026 08:30:44 GMT

Generally speaking, SPDE does not usually speed up writing, because the data is still fed in single thread.

But it has a flip side when it comes to updating indexes, which is done in parallel while inserting data.

And of course, querying the data...

topic How to improve performance copying a huge table while also creating additional indexes in SAS Programming

How to improve performance copying a huge table while also creating additional indexes

Re: How to improve performance copying a huge table while also creating additional indexes

Re: How to improve performance copying a huge table while also creating additional indexes