<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to improve performance copying a huge table while also creating additional indexes in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-copying-a-huge-table-while-also/m-p/983487#M379494</link>
    <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm dealing with a case where analysts/modellers are dealing with Base SAS analytical tables that can get up-to 2.5TB in volume (compressed) with up-to 1000 variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Obviously such volumes are not ideal for adhoc analysis especially because there aren't many indexes on these tables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can't do anything about these source tables and though what I'm thinking:&lt;/P&gt;
&lt;P&gt;Write a macro that allows these users to create a copy of such tables under their user folder with ideally only a sub-set of columns and added indexes of their choice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Because even reading a 2.5TB table also takes really long, I'm trying to come-up with a table copy approach that performs as good as possible.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The environment is a SAS Grid, SAS 9.4M5, rhel with 32 cpu's per node (actual). The default for an EG session the users spawn are 4 nodes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below demo code is the best I could come up with and I'd love your advise if there is anything else I could do to improve performance both for creating the copy as well as query performance using where clauses.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I assume for the copy and index creation process it's all about minimizing i/o and passes through the data.&lt;/P&gt;
&lt;P&gt;For this reason, I've added creation of new indexes as data set option to the append base table ...but I don't really know if SAS will actually create these indexes via a single pass through the data together with the copy process or if creating the indexes later on via a proc datasets create index would perform the same.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm creating the target table using the spde engine. I don't have a lot of hands-on experience with this engine but I believe because the data copy needs to be under a user folder, there is no use for more than one data or index folder.&lt;/P&gt;
&lt;P&gt;About the number of partitions: Based on what I've read in the docu, I'm thinking about defining them as "size of source tables/number of actual cpu". Not sure if I got that right though.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And here the sample code (self contained and fully executable as is).&lt;/P&gt;
&lt;PRE&gt;options fullstimer msglevel=i dlcreatedir;
%let root_path=%sysfunc(pathname(work));
%let tbl      =analytical_table;

/*-- source table: existing analytical table with up to 2.5TB (compressed) and up to 1000 variables --*/
libname src "&amp;amp;root_path.\src";
data src.&amp;amp;tbl.;
  set sashelp.class;  
  array var_ {20} 8 ;
  do k=1 to 10000;
    do i=1 to dim(var_);
      var_[i]=i*_n_;
    end;
    output;
  end;
run;
proc sort data=src.&amp;amp;tbl. out=src.&amp;amp;tbl.(index=(name age n_a=(name age)));
  by name;
run;


/*-- target table: copy of analytical table with added indexes and reduced number of columns --*/

/* create required folders */
libname trg "&amp;amp;root_path.\trg"; 
libname trg "&amp;amp;root_path.\trg\spde_&amp;amp;tbl";

/* assign target library using spde engine */ 
libname trg spde "&amp;amp;root_path.\trg" datapath=("&amp;amp;root_path.\trg\spde_&amp;amp;tbl") indexpath=("&amp;amp;root_path.\trg\spde_&amp;amp;tbl") compress=char /*partsize=&amp;amp;gb_part_size.G*/;

/* create target table with added indexes and less columns */
proc datasets lib=trg nolist nowarn;
  delete &amp;amp;tbl.;
  run;
  append
    base=trg.&amp;amp;tbl.(index=(var_1 var_2 idx_01=(var_1 var_2)))
    data=src.&amp;amp;tbl.(keep=name--var_4)
    getsort
    ;
  run;
quit;

/* list indexes on target table */
proc sql;
  select *
  from dictionary.indexes
  where libname='TRG' and memname=%upcase("&amp;amp;tbl.")
  ;
quit;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks, Patrick&lt;/P&gt;</description>
    <pubDate>Sat, 14 Feb 2026 06:01:01 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2026-02-14T06:01:01Z</dc:date>
    <item>
      <title>How to improve performance copying a huge table while also creating additional indexes</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-copying-a-huge-table-while-also/m-p/983487#M379494</link>
      <description>&lt;P&gt;Hi all,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm dealing with a case where analysts/modellers are dealing with Base SAS analytical tables that can get up-to 2.5TB in volume (compressed) with up-to 1000 variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Obviously such volumes are not ideal for adhoc analysis especially because there aren't many indexes on these tables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I can't do anything about these source tables and though what I'm thinking:&lt;/P&gt;
&lt;P&gt;Write a macro that allows these users to create a copy of such tables under their user folder with ideally only a sub-set of columns and added indexes of their choice.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Because even reading a 2.5TB table also takes really long, I'm trying to come-up with a table copy approach that performs as good as possible.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The environment is a SAS Grid, SAS 9.4M5, rhel with 32 cpu's per node (actual). The default for an EG session the users spawn are 4 nodes.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below demo code is the best I could come up with and I'd love your advise if there is anything else I could do to improve performance both for creating the copy as well as query performance using where clauses.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I assume for the copy and index creation process it's all about minimizing i/o and passes through the data.&lt;/P&gt;
&lt;P&gt;For this reason, I've added creation of new indexes as data set option to the append base table ...but I don't really know if SAS will actually create these indexes via a single pass through the data together with the copy process or if creating the indexes later on via a proc datasets create index would perform the same.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm creating the target table using the spde engine. I don't have a lot of hands-on experience with this engine but I believe because the data copy needs to be under a user folder, there is no use for more than one data or index folder.&lt;/P&gt;
&lt;P&gt;About the number of partitions: Based on what I've read in the docu, I'm thinking about defining them as "size of source tables/number of actual cpu". Not sure if I got that right though.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;And here the sample code (self contained and fully executable as is).&lt;/P&gt;
&lt;PRE&gt;options fullstimer msglevel=i dlcreatedir;
%let root_path=%sysfunc(pathname(work));
%let tbl      =analytical_table;

/*-- source table: existing analytical table with up to 2.5TB (compressed) and up to 1000 variables --*/
libname src "&amp;amp;root_path.\src";
data src.&amp;amp;tbl.;
  set sashelp.class;  
  array var_ {20} 8 ;
  do k=1 to 10000;
    do i=1 to dim(var_);
      var_[i]=i*_n_;
    end;
    output;
  end;
run;
proc sort data=src.&amp;amp;tbl. out=src.&amp;amp;tbl.(index=(name age n_a=(name age)));
  by name;
run;


/*-- target table: copy of analytical table with added indexes and reduced number of columns --*/

/* create required folders */
libname trg "&amp;amp;root_path.\trg"; 
libname trg "&amp;amp;root_path.\trg\spde_&amp;amp;tbl";

/* assign target library using spde engine */ 
libname trg spde "&amp;amp;root_path.\trg" datapath=("&amp;amp;root_path.\trg\spde_&amp;amp;tbl") indexpath=("&amp;amp;root_path.\trg\spde_&amp;amp;tbl") compress=char /*partsize=&amp;amp;gb_part_size.G*/;

/* create target table with added indexes and less columns */
proc datasets lib=trg nolist nowarn;
  delete &amp;amp;tbl.;
  run;
  append
    base=trg.&amp;amp;tbl.(index=(var_1 var_2 idx_01=(var_1 var_2)))
    data=src.&amp;amp;tbl.(keep=name--var_4)
    getsort
    ;
  run;
quit;

/* list indexes on target table */
proc sql;
  select *
  from dictionary.indexes
  where libname='TRG' and memname=%upcase("&amp;amp;tbl.")
  ;
quit;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks, Patrick&lt;/P&gt;</description>
      <pubDate>Sat, 14 Feb 2026 06:01:01 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-copying-a-huge-table-while-also/m-p/983487#M379494</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2026-02-14T06:01:01Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance copying a huge table while also creating additional indexes</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-copying-a-huge-table-while-also/m-p/983503#M379501</link>
      <description>Patrick,&lt;BR /&gt;Since you have already used SPDE engine, I think you could check options too :&lt;BR /&gt;option bufsize=128k bufno=100;&lt;BR /&gt;&lt;BR /&gt;these options might enhance your performance of copying sas dataset.&lt;BR /&gt;</description>
      <pubDate>Sun, 15 Feb 2026 02:51:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-copying-a-huge-table-while-also/m-p/983503#M379501</guid>
      <dc:creator>Ksharp</dc:creator>
      <dc:date>2026-02-15T02:51:03Z</dc:date>
    </item>
    <item>
      <title>Re: How to improve performance copying a huge table while also creating additional indexes</title>
      <link>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-copying-a-huge-table-while-also/m-p/983517#M379503</link>
      <description>&lt;P&gt;Generally speaking, SPDE does not usually speed up writing, because the data is still fed in single thread.&lt;/P&gt;
&lt;P&gt;But it has a flip side when it comes to updating indexes, which is done in parallel while inserting data.&lt;/P&gt;
&lt;P&gt;And of course, querying the data...&lt;/P&gt;</description>
      <pubDate>Mon, 16 Feb 2026 08:30:44 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/How-to-improve-performance-copying-a-huge-table-while-also/m-p/983517#M379503</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2026-02-16T08:30:44Z</dc:date>
    </item>
  </channel>
</rss>

