08-09-2016 10:04 AM
Looking for options/guidance on best ways to load a SAS dataset into a SQL server database, having the load process be executed by SQL server (SSIS or whatever) rather than as a SAS job.
The person loading the data does have a PC SAS license (and access to a SAS BI server), so it looks like the SAS ODBC driver may be an option. I also saw this post on using OLE http://tech.lanesnotes.com/2012/06/importing-sas7bdat-file-into-sql-server.html .
I know we could write a SAS job to load the data. And may eventually go that way.
But currently this DBA manages all the data loads with non-SAS tools, and we only connect to the database from SAS to read data. Now the DBA has a project where a SAS dataset will arrive weekly and need to be loaded automatically. I'm hoping to recommend to him a way he could load the data that he would be comfortable with, and be able to manage the load using tools he is already familiar with (SQL server management studio, etc), rather than introduce the execution of SAS jobs into his processes.
Run-time efficiency is not a critical issue, as will only be loading two tables, and will be less than a million records.
08-09-2016 10:23 AM
08-09-2016 11:13 AM
Agree. In my brief reading everything suggests you need a SAS session (local or remote) for both ODBC and OLE, e.g.
In my current use-case I'm oaky with using a SAS session, as long as the SAS session is mostly transparent to the DBA. That is, I don't want them to be writing SAS code or feel like they are managing a SAS job.
But I absolutely agree, it would be great to have a way to allow ODBC access (or similar) to SAS datasets without needing a SAS session. Since SAS gives away the SAS Universal Viewer for reading SAS data sets, doesn't seem like it should be that hard to make the SAS data available to non-SAS applications, without requiring a full SAS session.
Googling turns up lots of recommendations to dump the SAS dataset to CSV, or play with scripting some mix of third-party solutions (cozyroc, stat transfer, R etc).
08-09-2016 03:37 PM
A couple of suggestions:
08-10-2016 10:27 AM
08-09-2016 01:32 PM
08-10-2016 10:22 AM
08-10-2016 10:44 AM
You could get a utilty tool such as STAT/Transfer for the DBA to use. They could use that to then deal with many sources of data, not just SAS datasets.
You could also look at R and Python utilities that can read SAS datasets.
08-10-2016 10:58 AM
If I changed the question to how to load a SAS Transport file (instead of a native SAS dataset) into MS SQL Server, without SAS code or a third-party application, would that make life easier for the DBA?
My vague memory is that the format of SAS Transport files is not proprietary, so it's possible there may be free ODBC drivers provided by SAS (or others) for reading data from them??
08-11-2016 10:49 AM
08-11-2016 12:09 PM
Thanks @LinusH. If changing to tranpsort files would make our DBA's life easier on the load side, my thought was to ask the data provisioner to change to send tranpsort files instead of .sas7bdat.
But I hadn't realized that only the v6 xport files were open. I'm not going to ask the provisioner to go back to 8 character var names etc.
08-11-2016 03:43 PM
08-11-2016 04:18 PM
Sure, but with csv you lose metadata and have to think about maintaining precision. So maybe XML to keep the metadata with the file...
So yeah, there are alternatives to delivering SAS datasets, but the core question remains, if a non-SASsy DBA receives a SAS dataset, what is the best/easiest way for them to read it into their database?
I'm casually aware that there are R /python / etc folks who have tried to read SAS datasets with varying success (and updates with SAS versions etc), and I would guess cozyroc has written their own ODBC diver to do similar (?). Then as mentioned, stat/transfer and formerly DBMS copy are probably scriptable. But having outsiders reverse-engineer this can only work so well. I see a lot of posts online where a DBA is asking "what the #$?!! can I do with this SAS dataset that was dropped in my hands."
I'm fine with SAS having a closed/proprietary format for storing data. But would be nice to have an open transport format as well that maintained variable names, labels, formats, and numeric precision. Even if it was less efficient, and perhaps didn't have all the bells and whistles of a native SAS dataset (integrity constraints, audit trails, indexes, etc.).
08-11-2016 04:47 PM