Another avenue that hasn't been mentioned yet is using the SPDE engine/format. We had a set of very large files and processing streams that took several days to run. Converting the masters to SPDE datasets resulted in substantial disk savings. Often 90%.
While not all applications will benefit (random I/O being one), that is certainly something to consider. SPDE also uses available memory and CPU which is both good and bad. On the largest monthly job I was politely asked if I would run that on the weekend. Oops.
--Ben
... View more