BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
bentleyj1
Quartz | Level 8

I recently stumbled across the FCOPY function and quite frankly it's performance appears to be too good to be true.

 

I'm running on 64-Bit AIX, SAS 9.4 M5.  We have a permanent SAS data set created daily that's almost a gigabyte in size, about 65 million long records that we copy to a user-accessible directory so users stay out of the production directory. 

 

Proc Copy takes about 40 minutes to copy the data set and the index in one step.  I know that Fcopy copies files as binary images but it takes only about two minutes to do the same thing in two separate steps, data set then index.  That's almost unbelievable so I did it a few times at different times of the day.  Same performance.  I've never tried copying the files with the system mv command so I don't know how long takes.

 

I pulled a million records from the Proc Copy copy and compared them to the same records in the Fcopy copy (thank you Proc Compare) and no unequal values were found.

 

I'm not looking a gift horse in the mouth, but I'd like to understand why Fcopy is sooooo much faster than Proc Copy? Is it that a binary copy process uses a bigger block size or proc copy is has a validation/verification step? 

 

Thanks in advance.

 

John

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

This is a conjecture:

 

PROC COPY has to support user requests on change encryption keys, changing data engine, and other differences that I suspect require converting the input data variable by variable, which in turn would mean reading the variables into something like the program data vector in a data step.

 

Or, consider this example that requests a new index be created in the output data set that wasn't in the input, which certainly requires actual processing of the data:

 

proc copy in=sashelp out=work override=(index=(name));
  select class ;
quit;

 

 

But, for binary copy, FCOPY is a "dumb" data transcription - no data parsing required.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

This is a conjecture:

 

PROC COPY has to support user requests on change encryption keys, changing data engine, and other differences that I suspect require converting the input data variable by variable, which in turn would mean reading the variables into something like the program data vector in a data step.

 

Or, consider this example that requests a new index be created in the output data set that wasn't in the input, which certainly requires actual processing of the data:

 

proc copy in=sashelp out=work override=(index=(name));
  select class ;
quit;

 

 

But, for binary copy, FCOPY is a "dumb" data transcription - no data parsing required.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
bentleyj1
Quartz | Level 8

Actually, 2 minutes is fast here... the system is totally I/O bound.  

 

I'm copying the file to a different directory on the same file system.

 

John

Kurt_Bremser
Super User

2GB / 120 seconds ~ 17 MB/s. That's not slow, that's pathetic. Get a system that was built after the millenium 😉

 

Or is so little value placed on data warehousing where you work?

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 3109 views
  • 1 like
  • 3 in conversation