Hi,
not sure this is the right place to ask the question but i'll try.
If you have a sas7bdat file on a secure location and pull it down (from one server to another) with some file transport tool. What is the chance that a variable can change content without the user picking up on it?
For example if it's numerical value it changes from 5 to 99.234.
If it's a character variable it changes from 'ABC' to 'mike'.
Note that the data is straight forward ASCII.
One way of ensuring the file is the same is to check the md5 sum before and after for example. But i would argue that if you have a scenario where the file is somehow corrupted you would pick it up just by looking at it, i.e. it would be immediately apparent due to strange characters etc.. This would be due to encoding and all the metadata in the format.
Am i wrong? If i am what is the likelihood of this occurring?
BR
Jan
As long as you use a copy method which keeps the file as is (e.g. cp on UNIX, a proper backup/restore, binary method when using SFTP), there will not be changes to the data. Improper methods will most likely cause the file to be unusable in the first place.
You are correct that an md5 (or equivalent) hash checksum is the right way to check if a file has been corrupted or in any way modified.
Since m6 (thanks @yabwon for bringing this to my attention) there is also a function native in SAS that can do a full-file checksum. It is called
hashing_file()
and can even be used in pure macro, eg:
%put %sysfunc(hashing_file(md5,/path/to/file.blob,0));
If you would like to hash an entire directory (of directories) of datasets, you are welcome to use this macro: https://core.sasjs.io/mp__hashdirectory_8sas.html
It would help if you explained your actual use case in more detail. Are you moving SAS datasets from one SAS installation to another or are you restoring them to the same SAS installation they were created on originally?
If it is the former then I'm assuming these installations use the same OS as otherwise the SAS datasets won't be compatible or usable, unless copied in transport format. Care might also be needed if the SAS encoding settings are different between installations. If it is the latter then of course there would be no compatibility problems as long as the file transfer process didn't change the files in any way.
BTW this type of post belongs in the Administration and Deployment Community rather than SAS Risk Management.
As long as you use a copy method which keeps the file as is (e.g. cp on UNIX, a proper backup/restore, binary method when using SFTP), there will not be changes to the data. Improper methods will most likely cause the file to be unusable in the first place.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.