Thank you @rogerjdeangelis and @LinusH for your comments.
Unfortunately the sample table here is not publicly available at the moment.The SAS code used is very basic, for example, for the PROC HPSUMMARY it is something like :
/* run PROC HPSUMMARY ACCROSS THE NODES */ proc hpsummary data=hivelib.megacorp2; performance nodes=all details; var expenses; output out=work.expenses_by_products; class productbrand y; run;
Those tests are more indicative and a way to show that the choice of the file format for the hadoop storage can be important depending on your use case. But they should not be considered as a reference (not like official benchamrk that are frequently published by our EEC service). Regarding SPDE, please not that I have used the SPDE format on HDFS (not the traditionnal SPDE format on local File System) for the comparison.
I agree with LinusH final comments : Hadoop does not necessarily means better performance (especially if your SAS Server is well tuned, has good I/O and that the table is not that big). Hadoop has really been designed to provide scalability on huge amount of data that could not fit on a single machine or could not be processed in time or efficiently using SMP.
Thanks
Raphael
... View more