BookmarkSubscribeRSS Feed
Patrick
Opal | Level 21

Hi all,

 

I'm dealing with a case where analysts/modellers are dealing with Base SAS analytical tables that can get up-to 2.5TB in volume (compressed) with up-to 1000 variables.

 

Obviously such volumes are not ideal for adhoc analysis especially because there aren't many indexes on these tables.

 

I can't do anything about these source tables and though what I'm thinking:

Write a macro that allows these users to create a copy of such tables under their user folder with ideally only a sub-set of columns and added indexes of their choice.

 

Because even reading a 2.5TB table also takes really long, I'm trying to come-up with a table copy approach that performs as good as possible.

 

The environment is a SAS Grid, SAS 9.4M5, rhel with 32 cpu's per node (actual). The default for an EG session the users spawn are 4 nodes.

 

Below demo code is the best I could come up with and I'd love your advise if there is anything else I could do to improve performance both for creating the copy as well as query performance using where clauses.

 

I assume for the copy and index creation process it's all about minimizing i/o and passes through the data.

For this reason, I've added creation of new indexes as data set option to the append base table ...but I don't really know if SAS will actually create these indexes via a single pass through the data together with the copy process or if creating the indexes later on via a proc datasets create index would perform the same.

 

I'm creating the target table using the spde engine. I don't have a lot of hands-on experience with this engine but I believe because the data copy needs to be under a user folder, there is no use for more than one data or index folder.

About the number of partitions: Based on what I've read in the docu, I'm thinking about defining them as "size of source tables/number of actual cpu". Not sure if I got that right though.

 

And here the sample code (self contained and fully executable as is).

options fullstimer msglevel=i dlcreatedir;
%let root_path=%sysfunc(pathname(work));
%let tbl      =analytical_table;

/*-- source table: existing analytical table with up to 2.5TB (compressed) and up to 1000 variables --*/
libname src "&root_path.\src";
data src.&tbl.;
  set sashelp.class;  
  array var_ {20} 8 ;
  do k=1 to 10000;
    do i=1 to dim(var_);
      var_[i]=i*_n_;
    end;
    output;
  end;
run;
proc sort data=src.&tbl. out=src.&tbl.(index=(name age n_a=(name age)));
  by name;
run;


/*-- target table: copy of analytical table with added indexes and reduced number of columns --*/

/* create required folders */
libname trg "&root_path.\trg"; 
libname trg "&root_path.\trg\spde_&tbl";

/* assign target library using spde engine */ 
libname trg spde "&root_path.\trg" datapath=("&root_path.\trg\spde_&tbl") indexpath=("&root_path.\trg\spde_&tbl") compress=char /*partsize=&gb_part_size.G*/;

/* create target table with added indexes and less columns */
proc datasets lib=trg nolist nowarn;
  delete &tbl.;
  run;
  append
    base=trg.&tbl.(index=(var_1 var_2 idx_01=(var_1 var_2)))
    data=src.&tbl.(keep=name--var_4)
    getsort
    ;
  run;
quit;

/* list indexes on target table */
proc sql;
  select *
  from dictionary.indexes
  where libname='TRG' and memname=%upcase("&tbl.")
  ;
quit;

 

Thanks, Patrick

2 REPLIES 2
Ksharp
Super User
Patrick,
Since you have already used SPDE engine, I think you could check options too :
option bufsize=128k bufno=100;

these options might enhance your performance of copying sas dataset.
LinusH
Tourmaline | Level 20

Generally speaking, SPDE does not usually speed up writing, because the data is still fed in single thread.

But it has a flip side when it comes to updating indexes, which is done in parallel while inserting data.

And of course, querying the data...

Data never sleeps

Catch up on SAS Innovate 2026

Dive into keynotes, announcements and breakthroughs on demand.

Explore Now →
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 237 views
  • 2 likes
  • 3 in conversation