03-12-2014 01:56 PM
I have flatfile of size 247 gb .. with help of this flatfile I am creating 8 different datasets which happens to read the sequentially into datasets one after the other .. so overall time thats taking is around 23 hrs to complete my job. As this file exists in unix enviromnent .. intially I thought of using split command and to divide the file but issue here is its going to take lot of space on the server .. I would really appreciate if anyone could suggest better approch to reduce the time .. I am thinking for a solution where I can reduce time by reading data parallelism type ...
Thanks in advance
03-12-2014 02:18 PM
Check the admin of your system, they probably will be able to give more practical advice.
Reading that large of a file will take a long time, but writing is most likely the bottleneck. Probably the biggest improvement will be in using as many different physical disks as possible. So replicate your program 8 times with each one creating one of the output files and run it in parallel , but make sure to write to different output disks.