I recently stumbled across the FCOPY function and quite frankly it's performance appears to be too good to be true.
I'm running on 64-Bit AIX, SAS 9.4 M5. We have a permanent SAS data set created daily that's almost a gigabyte in size, about 65 million long records that we copy to a user-accessible directory so users stay out of the production directory.
Proc Copy takes about 40 minutes to copy the data set and the index in one step. I know that Fcopy copies files as binary images but it takes only about two minutes to do the same thing in two separate steps, data set then index. That's almost unbelievable so I did it a few times at different times of the day. Same performance. I've never tried copying the files with the system mv command so I don't know how long takes.
I pulled a million records from the Proc Copy copy and compared them to the same records in the Fcopy copy (thank you Proc Compare) and no unequal values were found.
I'm not looking a gift horse in the mouth, but I'd like to understand why Fcopy is sooooo much faster than Proc Copy? Is it that a binary copy process uses a bigger block size or proc copy is has a validation/verification step?
Thanks in advance.
John
This is a conjecture:
PROC COPY has to support user requests on change encryption keys, changing data engine, and other differences that I suspect require converting the input data variable by variable, which in turn would mean reading the variables into something like the program data vector in a data step.
Or, consider this example that requests a new index be created in the output data set that wasn't in the input, which certainly requires actual processing of the data:
proc copy in=sashelp out=work override=(index=(name));
select class ;
quit;
But, for binary copy, FCOPY is a "dumb" data transcription - no data parsing required.
This is a conjecture:
PROC COPY has to support user requests on change encryption keys, changing data engine, and other differences that I suspect require converting the input data variable by variable, which in turn would mean reading the variables into something like the program data vector in a data step.
Or, consider this example that requests a new index be created in the output data set that wasn't in the input, which certainly requires actual processing of the data:
proc copy in=sashelp out=work override=(index=(name));
select class ;
quit;
But, for binary copy, FCOPY is a "dumb" data transcription - no data parsing required.
2 minutes for one gigabyte still looks quite slow to me. That should be finished in ~20 seconds on any decent storage nowadays. Are you copying over the network?
Actually, 2 minutes is fast here... the system is totally I/O bound.
I'm copying the file to a different directory on the same file system.
John
2GB / 120 seconds ~ 17 MB/s. That's not slow, that's pathetic. Get a system that was built after the millenium 😉
Or is so little value placed on data warehousing where you work?
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.