BookmarkSubscribeRSS Feed
Ruminare
Fluorite | Level 6

As the title asks, I'm wondering if there are any plans for SAS to expand on the kinds of data formats that can be read into SAS. Specifically, wondering if there are any plans to implement .fst, .feather, or .sqlite formats as these are remarkably powerful formats that our organization has begun to use in R and we would like to allow some of our SAS programmers to work with.

8 REPLIES 8
RW9
Diamond | Level 26 RW9
Diamond | Level 26

What is "modern" about these?  SQLite databases, are SQL databases in a file, so you can use dbms:

https://communities.sas.com/t5/Administration-and-Deployment/Is-there-a-way-to-read-a-SQLite-databas...

 

As for the other two, binary files eeeurrrgh!  If you want modern then move away from proprietary binary file formats, and get your data in CSV, Json, XML etc.  Plain text, portable, utlimately readable by any system etc.  I believe R and Python both have great libraries to process Json, XML, CSV = cross platform/system setup.

For Python its possible to go either way:

https://blogs.sas.com/content/sasdummy/2017/04/08/python-to-sas-saspy/

https://blogs.sas.com/content/sgf/2018/01/10/come-on-in-were-open-the-openness-of-sas-94/

 

Can't answer if there are any plans, they would not likely put that information out on a Q&A board.

Ruminare
Fluorite | Level 6

Well the .fst and .feather are particularly useful for our organization because they allow us to read relatively large data sets with ease and speed. We're working with terabyte scale data and .fst works particularly well for speed here. I'm not aware of a way to do this with .csv in SAS other than to use Hadoop or a database system like Oracle or Teradata. Is there something I'm missing there?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

You would just read in the CSV to a dataset, then do your processing.

Ruminare
Fluorite | Level 6

Thanks @RW9 - when we go about using .csv (and some of our large datasets are already worked with as .sas7bdat datasets), we run into slow I/O when working with large datasets. Sometimes reading datasets on the order of 500GB can take a long time, especially if we are processing it with temporary datasets, too. Is there some mechanism you use to speed things up? For reference, we use SAS version 9.4 TS1M5 in linux.

Edit: our SAS datasets do use indices as well, which speeds things up. We've also experimented with using SAS views, but no improvement in performance when doing so.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

"we run into slow I/O" - am not sure how I can provide any information on that, there are thousands of hardware/software setups.  Consult with your IT group to analyse the reasons behind it.  

500gb doesn't sound that much.

"especially if we are processing it with temporary datasets" - what do you mean?  SAS code you have written being run writing to temporary datasets?  If so look at the code you are writing.  Whilst there will always be a certain amount of resources needed to process that data (and do note that R loads directly into memory so is limited to your memory, but will be quicker), its more likely to how the data is being processed which will impact speed.  Excessive use of macro, ignoring Base SAS and by group processing, Excel thinking (all of which you can see plenty of examples on this forum) often cause lots of processing, read/writes etc. which consume resources.  Start by looking at the log, see which procedures/processes take up most time/resource, then refactor them.

Kurt_Bremser
Super User

Put that binary **** where the sun don't shine. Use them for storage internal to the software, but not to move data. The only thing that works reliably over time and gives you control over what is transferred how is text.

 

Just look at the myriad of problems caused by Excel.

 

In 20 years, those files you mentioned will either be out of use, or have undergone changes that break programs.

TomKari
Onyx | Level 15

While I don't work for SAS, I think it's safe to assume that SAS wants to protect its reputation as one of the best platforms for dealing with other data formats. Based on that, I'm sure that they monitor the usage of all of these data formats, and once one of them becomes heavily used, they'll invest the resources needed to deal with it natively. Until then, using the file formats suggested in these posts is your only option.

 

You can certainly discuss your need for native access to these formats with your local SAS representatives. I'm sure those requests would get passed up the chain. The problem is, there are so many different file formats out there it's impractical to support all of them, so SAS has to pick the most heavily used to support.

 

My 2 cents!
Tom

Ruminare
Fluorite | Level 6

Thanks Tom. This was helpful. We'll reach out to our SAS reps. 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 8 replies
  • 1903 views
  • 2 likes
  • 4 in conversation