08-28-2012 08:53 AM
I found recently this link about rsubmit and piping which got me quite excited. It is a technique I wasn't aware of so far and it opened a whole new world of possibilities to me when it comes to performance tweaking.
I still feel that I might not fully understand how this works. So for example for "Piping Between Data Steps": The processes run asynchronously. Does this now mean (on a multiprocessor machine) that the second task starts reading from the pipe as soon as the first task writes stuff into it? Or is it just so that the first task writes everything into the pipe and only then the 2nd task starts reading and the main benefit is "only" avoidance of disk I/O?
And: What does that mean for memory usage. Does information read from the pipe still remain in memory or does it get removed as soon as read from the pipe? And how long does something written to a pipe remain in memory? When does the memory get freed (eg. when the task ends (signoff) or earlier)?
I would be also really interested to hear from people who have used some of these techniques what experiences they've made, where it was really advantageous to implement this way and also where the traps, hurdles and "don't s" are.
Need further help from the community? Please ask a new question.