BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
janpeter
Fluorite | Level 6

Hi,

 

not sure this is the right place to ask the question but i'll try.

 

If you have a sas7bdat file on a secure location and pull it down (from one server to another) with some file transport tool. What is the chance that a variable can change content without the user picking up on it? 

 

For example if it's numerical value it changes from 5 to 99.234.

If it's a character variable it changes from 'ABC' to 'mike'.

 

Note that the data is straight forward ASCII.

 

One way of ensuring the file is the same is to check the md5 sum before and after for example. But i would argue that if you have a scenario where the file is somehow corrupted you would pick it up just by looking at it, i.e. it would be immediately apparent due to strange characters etc.. This would be due to encoding and all the metadata in the format.

 

Am i wrong? If i am what is the likelihood of this occurring?

 

BR

Jan

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

As long as you use a copy method which keeps the file as is (e.g. cp on UNIX, a proper backup/restore, binary method when using SFTP), there will not be changes to the data. Improper methods will most likely cause the file to be unusable in the first place.

 

View solution in original post

6 REPLIES 6
AllanBowe
Barite | Level 11

You are correct that an md5 (or equivalent) hash checksum is the right way to check if a file has been corrupted or in any way modified.

 

Since m6 (thanks @yabwon for bringing this to my attention) there is also a function native in SAS that can do a full-file checksum.  It is called 

hashing_file() and can even be used in pure macro, eg:

 

%put %sysfunc(hashing_file(md5,/path/to/file.blob,0)); 

If you would like to hash an entire directory (of directories) of datasets, you are welcome to use this macro:  https://core.sasjs.io/mp__hashdirectory_8sas.html

 

/Allan
SAS Challenges - SASensei
MacroCore library for app developers
SAS networking events (BeLux, Germany, UK&I)

Data Workflows, Data Contracts, Data Lineage, Drag & drop excel EUCs to SAS 9 & Viya - Data Controller
DevOps and AppDev on SAS 9 / Viya / Base SAS - SASjs
janpeter
Fluorite | Level 6
Hi Allan,

thanks for the valuable information. I will use this going forward.

The question was more whether you can pull information from one server to another and demonstrate that it has integrity. The generic question is on any file format. This specific is for sas7bdat.
I would argue that you would spot if something is of fairly fast if you work with the file since it would probably display "weird" characters or not import at all for that matter.

Appreciate your quick answer though.

BR
jan
SASKiwi
PROC Star

It would help if you explained your actual use case in more detail. Are you moving SAS datasets from one SAS installation to another or are you restoring them to the same SAS installation they were created on originally?

 

If it is the former then I'm assuming these installations use the same OS as otherwise the SAS datasets won't be compatible or usable, unless copied in transport format. Care might also be needed if the SAS encoding settings are different between installations. If it is the latter then of course there would be no compatibility problems as long as the file transfer process didn't change the files in any way.

 

BTW this type of post belongs in the Administration and Deployment Community rather than SAS Risk Management.     

janpeter
Fluorite | Level 6
Hi SASKiwi,

thanks for letting me know about the community. I will probably follow up there.

The issue is not compatibility.

Regardless of platform. Let's say the case is that you have a repository on one server..
1. You pull down as sas7bdat file from it.
2. You open it on your platform on a different server
Can you be 100% certain that the information you see is the same as in the repository? More background. The same question would apply to xpt files. Can you be certain that if this was xpt that you have the same data?

i would argue that sas7bdat is more robust than xpt given the encoding and multiple metadata levels.

Anyway thanks for your reply. Appreciated.

BR
Jan
JayKyleFCC
Calcite | Level 5
Or the very simple way is to zip it up, then unzip it on the other side. The ZIP format does all of the md5 checking as part of it's zip/unzip process
Kurt_Bremser
Super User

As long as you use a copy method which keeps the file as is (e.g. cp on UNIX, a proper backup/restore, binary method when using SFTP), there will not be changes to the data. Improper methods will most likely cause the file to be unusable in the first place.

 

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2205 views
  • 4 likes
  • 5 in conversation