After having tested the local hosting of our SDD environment, we're now considering the option of hosting at SAS Cary. Our biggest concern (and I assume it's the same for most of us) is the upload of our clinical data (oracle databases) to the SDD instance in the US.
Currently, we have a set of SAS programs that are extracting the information from oracle (libname statement) to create the SAS datasets.
I thought about the following options:
- keep running these SAS programs using the same libname statements. We should open a B2B connection with SAS US but I have concerns about the performances. We run these programs many times a day (data cleaning purpose) and some studies are pretty large (up to 2/3 Gb).
- variation of the first option: we keep running the same SAS programs but only once a day. Meaning that we lose the "live" access to our EDC, not really an option for the cleaning...
- we run the SAS programs locally to generate SAS datasets and we then transfer the datasets to SDD (webdav? sdddc? proc copy?). The only advantage I see here is that there is less data transferred over the WAN. But again we lose the direct access, might have compatibility issues, would need a dedicated server within the company, etc.
- the last option is to replicate our oracle db in Cary using oracle stuff but I'm not familiar with these synchronization technologies (so I don't even know if it's possible) and I'm not even sure SAS or my company would be willing to do so...
In other words, how do you manage to have your oracle data in SDD?
Re the dedicated line to Cary?
Have you made tests simulating the live data transfers? I would suggest that you could try this on a SAS instance with similar amount of data, with that similar number of users at the same time.
Re the variation option to the first one:
I would say that it’s a business decision? The question is, if can you afford to have such a setup. Can you afford to live without the direct access to the “live” data in the db?
Re we run the SAS programs locally to generate SAS datasets and we then transfer the datasets to SDD –
Sounds a something similar to us, using Xythos ;-)
The question once again is do you want all your data to be in SDD or some where else? Again a business decision. For us we need to have traceability in accordance with regulatory compliance, so everything goes into SDD and all activities are done within SDD as well.
Re the last option is to replicate our oracle db in Cary using oracle stuff but I'm not familiar with these synchronization technologies
This indeed is an expensive option, even then what is SAS’s experience with this? It also depends where your company is located.
I wrote a SAS program that is using the SDD command facility macros and the newer SDD API macros. It is scripting some typical actions and interactions between a local SAS session (client) and an SDD instance (server):
- create folder
- upload/download datasets of different sizes
- create/compile/delete SAS programs (stored on the repository and accessed via WEBDAV filename)
I measured the time it takes to perform each action, from different locations within GSK (LAN, WAN, VPN) against two SDD instances: the one we're hosting in-house and one kindly provided by SAS Cary for testing. The purpose was to estimate the impact of higher latency / smaller bandwidth on the SDD end-user experience.
For the Oracle link, I've not been able to test it from Cary but by comparing the other measures with similar WAN connections, I'm confident. It might be 2 or 3 times slower but as I said in my previous post, we still have room to change some business processes and put a SAS server on each side to reduce the amount of data actually transferred over the B2B line. Now, if we change nothing to our way of working and use a "simple" B2B line, it could cost us up to 5 FTE a day!
On the other hand, higher latency / lower bandwidth does significantly impact the transfer of small/large files! The test instance is easily 20-30 times slower and since our internet connection is not the most reliable one in the world, measures greatly varies from one attempt to another. I'm now more concerned by the end-user experience and the lags that might occur if we use our internet connection (shared by all employees, aound 40 Mbits/s).
So, again, I'm asking the community for experiences regarding the connection to Cary. Internet, Leased line, VPN? What bandwidth? Are users complaining about lags?
And BTW, did you notice that the new SDD API macros are systematically slower than the older command facility?!