poumy

‎10-04-2018

SAS Employee

Member since

2 Posts
0 Likes Given
0 Solutions
3 Likes Received

Follow poumy

Re: SAS with Hadoop: Performance considerations and monitoring strateg...

2372

‎10-04-2016 03:59 AM

SAS with Hadoop: Performance considerations and monitoring strategies

2401

‎10-03-2016 03:57 AM

Activity Feed for poumy

Got a Like for Re: SAS with Hadoop: Performance considerations and monitoring strategies. ‎10-04-2016 04:03 AM
Posted Re: SAS with Hadoop: Performance considerations and monitoring strategies on SAS Data Management. ‎10-04-2016 03:59 AM
Got a Like for SAS with Hadoop: Performance considerations and monitoring strategies. ‎10-03-2016 07:49 AM
Got a Like for SAS with Hadoop: Performance considerations and monitoring strategies. ‎10-03-2016 06:34 AM
Posted SAS with Hadoop: Performance considerations and monitoring strategies on SAS Data Management. ‎10-03-2016 03:57 AM

My Liked Posts

Subject

Likes

Posted

Re: SAS with Hadoop: Performance considerations and monitoring strateg...

‎10-04-2016 03:59 AM

SAS with Hadoop: Performance considerations and monitoring strategies

‎10-03-2016 03:57 AM

Thank you @rogerjdeangelis and @LinusH for your comments. Unfortunately the sample table here is not publicly available at the moment.The SAS code used is very basic, for example, for the PROC HPSUMMARY it is something like : /* run PROC HPSUMMARY ACCROSS THE NODES */ proc hpsummary data=hivelib.megacorp2; performance nodes=all details; var expenses; output out=work.expenses_by_products; class productbrand y; run; Those tests are more indicative and a way to show that the choice of the file format for the hadoop storage can be important depending on your use case. But they should not be considered as a reference (not like official benchamrk that are frequently published by our EEC service). Regarding SPDE, please not that I have used the SPDE format on HDFS (not the traditionnal SPDE format on local File System) for the comparison. I agree with LinusH final comments : Hadoop does not necessarily means better performance (especially if your SAS Server is well tuned, has good I/O and that the table is not that big). Hadoop has really been designed to provide scalability on huge amount of data that could not fit on a single machine or could not be processed in time or efficiently using SMP. Thanks Raphael

Likes from

User

Likes Count

Online Status	Offline
Date Last Visited	‎10-04-2018 06:47 AM