12-01-2014 02:23 AM
I am looking to understand how much do I allocate the disk space for sasdata and saswork location on my linux server.
I have around 50G of raw data to start with, using which i would need to build a data mart for the analysis team.
The team would be using this datamart for the various analysis that they would be running.
Any suggestions on how can I come up with the calculation of space requirement.
12-01-2014 02:48 AM
i can give you an initial estimate (based on nothing else but your raw data), like 100GB for data and 200GB for work and temp. You should also keep in mind the sizing for your users SAS folders.
by the way, everything will depend on:
the amount if your users
prediction about the growth of your data
other datasources more than raw data
other variables specific for your business needs
keep in mind the second consideration: the future. I would never advise to stick only to the present, but keep an eye on the future, so you won't have troubles with your disks in the close future: SAS data and requirements grows pretty fast!
12-01-2014 04:14 AM
- size of data. Depending on the format of the raw data, the SAS tables will be between 0.5 and 2.0 of the size of your source data (using compress=yes), tables with wide character columns will shrink considerably. This also reduces I/O load during processing
- number of users, esp. concurrent users. Size your SASWORK along ((size of biggest dataset - uncompressed!) * 3 + (size of biggest dataset probably merged) * 2) * (number of concurrent users). SASUTIL should be (size of biggest dataset - uncompressed!) * (number of concurrent users).
Wherever users have write access, set up a quota system so that one single user can't cause a loss of service for all others. This means SASWORK, SASUTIL and the volume for the home directories (where the SASUSER libs will reside). If sized correctly, a quota overrun will signal bad coding practice (like a piece of SQL that causes a cartesian join where 10,000 * 10,000 suddenly results in 100,000,000 records)
As Juan said, keep a wary eye on data growth.