Proc COPY : provide an option to run a 'bit-perfect' copy command at s...

ronan · ‎11-07-2016

Hello,

This idea supersedes if not altogether deprecates this previous suggestion :

https://communities.sas.com/t5/SASware-Ballot-Ideas/Dataset-Digital-Signature-how-to-identify-unique...

Basically, we - I, for that matter - need some way to uniquely identify a SAS table (V7). This kind of checksum identification is already available at the operating system level, if not deeper at the storage stack level (modern mass storage systems sometimes can compute files hashes on the fly in order to optimize their cache memory) or even at the filesystem level (See next-gen https://en.wikipedia.org/wiki/Btrfs ) : think of md5sum or sha1sum Linux bash tools, for instance.

The V7 engine writes, recopies data only with a slight (random ?) variation between source and target. Therefore, computing a hash key of the corresponding files is useless : false negatives results will likely occur, with exact copies (SAS-wise) being misjudged as different (system-wise).

Wouldn't it be useful to be able to copy bit-perfect SAS datasets with Proc COPY, exactly like a 'blind' copy command at OS level (cp, copy, TSO COPY etc.) ?

This kind of feature, moreover, could be enabled by default with a corresponding System Option (COPYPGM, like SORTPGM for Proc Sort/SyncSort).

Therefore, this could perfectly align SAS tables with filesystem *.sas7bdat members, and possibly even speed up copy creation and duplicates identification; storage could also be optimized somehow.

Proc COPY : provide an option to run a 'bit-perfect' copy command at system level