BookmarkSubscribeRSS Feed
xxformat_com
Barite | Level 11

When it comes to archive SAS datasets as text file using SAS, what file format would you recommend?

json with proc http, csv file, xml using ???, owl (I'm not fit on that one, it could be fully non relevant)...

Thanks

14 REPLIES 14
PaigeMiller
Diamond | Level 26

Why would you ever archive a SAS data set as text? You will potentially lose metadata about the dataset and about the variables, and text will undoubtedly be larger than the actual SAS data set.

--
Paige Miller
xxformat_com
Barite | Level 11

Because in 40-year time, who know what the technology will be like and you need to still be able to read the data when authorities request them for example.

Reeza
Super User
IME if archiving for audits are required typically text files are not main storage systems. You usually set up a DB that is managed and audited to ensure no corruption and tracked over time. It's too easy to lose track of a folder/file here and there not in a managed system of some sort.

If you have no other choice, I'd be creating text files (CSV or delimited), the program to read the SAS file, And a data dictionary for each file containing the metadata. The data dictionary would include the business and technical definition not just metadata.

xxformat_com
Barite | Level 11
Thank you @Reeza for your input. IME=Input Method Editor?
SASKiwi
PROC Star

I suggest you talk to the IT storage experts in your company. Typically companies have storage standards which dictate how long in what form data must be kept. Is there any need to go beyond these standards, which in my experience don't go beyond 7 to 10 years?

xxformat_com
Barite | Level 11
Clinical research goes beyond 10 years.
SASKiwi
PROC Star

@xxformat_com wrote:
Clinical research goes beyond 10 years.

All the more reason to align with your company's standards in terms of what you need to keep and in what form.

Reeza
Super User

@SASKiwi wrote:

I suggest you talk to the IT storage experts in your company. Typically companies have storage standards which dictate how long in what form data must be kept. Is there any need to go beyond these standards, which in my experience don't go beyond 7 to 10 years?


Social Services was 100 years record retention. 

We had paper in storage, tapes and then DB for anything in the past 70 years. 

PaigeMiller
Diamond | Level 26

@xxformat_com wrote:

Because in 40-year time, who know what the technology will be like and you need to still be able to read the data when authorities request them for example.


As others have mentioned, data storage experts ought to be consulted, rather than you trying to come up with your own solution. JSON and XML probably would work better than plain text as these enable you to store the metadata in the file.

--
Paige Miller
Kurt_Bremser
Super User

First, get to Know Your Data (Maxim 3). This will determine if you can safely use a delimiter or any line separator (character variables which contain line separators may force you to use a fixed-length record structure).

 

I would combine the text file and a result of PROC CONTENTS in a ZIP archive, so you keep all data and metadata and also optimize storage consumption.

xxformat_com
Barite | Level 11
Thanks for your input.
Patrick
Opal | Level 21

IF you really must store as text files then I'd got for either XML or Json as these are text based formats which also future software should be able to read AND they also store metadata like the data types.

Tom
Super User Tom
Super User

@Patrick wrote:

IF you really must store as text files then I'd got for either XML or Json as these are text based formats which also future software should be able to read AND they also store metadata like the data types.


I would probably prefer to have a more portable storage method than use the latest fad structure.

 

So as someone suggested write the data as plain text.  Perhaps delimited files or perhaps fixed length records.

But also include the documentation of what the data is. 

 

You will need that documentation even if did jump on the JSON or XML band wagon since they really do not add that much more information.   In theory you could write more XML that will define the data structure and perhaps that is better since it could be tested.  And since you are starting with simple rectangular SAS dataset structures there is not any really complex structure that would require that effort. 

 

So just write clear description of the dataset.  

You could probably automate some of it.  See https://github.com/sasutils/macros/blob/master/ds2post.sas as a simple example. But that will not include the real business knowledge about what the data is.  Which is probably the most important part.

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2415 views
  • 6 likes
  • 7 in conversation