When it comes to archive SAS datasets as text file using SAS, what file format would you recommend?
json with proc http, csv file, xml using ???, owl (I'm not fit on that one, it could be fully non relevant)...
Thanks
Why would you ever archive a SAS data set as text? You will potentially lose metadata about the dataset and about the variables, and text will undoubtedly be larger than the actual SAS data set.
Because in 40-year time, who know what the technology will be like and you need to still be able to read the data when authorities request them for example.
I suggest you talk to the IT storage experts in your company. Typically companies have storage standards which dictate how long in what form data must be kept. Is there any need to go beyond these standards, which in my experience don't go beyond 7 to 10 years?
@xxformat_com wrote:
Clinical research goes beyond 10 years.
All the more reason to align with your company's standards in terms of what you need to keep and in what form.
@SASKiwi wrote:
I suggest you talk to the IT storage experts in your company. Typically companies have storage standards which dictate how long in what form data must be kept. Is there any need to go beyond these standards, which in my experience don't go beyond 7 to 10 years?
Social Services was 100 years record retention.
We had paper in storage, tapes and then DB for anything in the past 70 years.
@xxformat_com wrote:
Because in 40-year time, who know what the technology will be like and you need to still be able to read the data when authorities request them for example.
As others have mentioned, data storage experts ought to be consulted, rather than you trying to come up with your own solution. JSON and XML probably would work better than plain text as these enable you to store the metadata in the file.
First, get to Know Your Data (Maxim 3). This will determine if you can safely use a delimiter or any line separator (character variables which contain line separators may force you to use a fixed-length record structure).
I would combine the text file and a result of PROC CONTENTS in a ZIP archive, so you keep all data and metadata and also optimize storage consumption.
IF you really must store as text files then I'd got for either XML or Json as these are text based formats which also future software should be able to read AND they also store metadata like the data types.
@Patrick wrote:
IF you really must store as text files then I'd got for either XML or Json as these are text based formats which also future software should be able to read AND they also store metadata like the data types.
I would probably prefer to have a more portable storage method than use the latest fad structure.
So as someone suggested write the data as plain text. Perhaps delimited files or perhaps fixed length records.
But also include the documentation of what the data is.
You will need that documentation even if did jump on the JSON or XML band wagon since they really do not add that much more information. In theory you could write more XML that will define the data structure and perhaps that is better since it could be tested. And since you are starting with simple rectangular SAS dataset structures there is not any really complex structure that would require that effort.
So just write clear description of the dataset.
You could probably automate some of it. See https://github.com/sasutils/macros/blob/master/ds2post.sas as a simple example. But that will not include the real business knowledge about what the data is. Which is probably the most important part.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.