01-19-2016 01:38 AM
As part of the requirement, we need to copy tables/data from LASR server (In memory data) to the SAS server and store as a physical tables. We need to copy this multiple times in a day and data size is around 900 GB.
What is the most efficient way to perform this operation?
01-19-2016 02:01 AM
AFAIK, the LASR server is represented to Base SAS as a library with it's special engine. So a simple data step should suffice.
The bottleneck will most probably be the write performance of your standard SAS server and the connection between the LASR and the standard server.
Depending on the data structure, compressing the target data set might (even dramatically) improve your performance.
01-19-2016 03:17 AM
01-19-2016 03:53 AM
We are trying to leverage reload at start functionality and hence want to copy/back up data from LASR to SAS server i.e. co-located storage, so it can be backed up to LASR memory at the time of server re-start.
Few facts for your reference.
a) LASR table is updated multiple times in a day with the security/Auth rules.
b) because of the infra challenges we decided to take Data Server --> SAS ETL ---> Upload to LASR flow at the start of the project.
c) Now we are close to completion and re-designing a solution would involve lot of efforts. so planning to back up data from LASR to SAS server i.e. co-located storage to leverage re-load at start.
Using a Data step might be slower.
And FTP would not work since we are copying data from memory to Physical server I suppose?
01-19-2016 04:39 AM
A DATA step is usually the fastest method of moving data within SAS, especially when it involves a transformation (here from LASR in-memory storage back to Base SAS dataset). Your limits will not be found within SAS, but in the I/O subsystem of your server(s) and the network.
What did you already try, and what was the result?
01-19-2016 07:16 AM
Maybe I'm missing a piece here, but wouldn't a synchronization between LASR and the co-located storage be the by far the most efficient routine?
Also, I'm bit concerned about LASR data is being updated frequently. Usually. LASR data is a basis for analytics, not a database that should host frequent updates. What is your master data store? What kind of data and updates are we talking about?
Data step may one of the fasted way of processing data (in a single threaded environment, outside LASR/VA). But the trick is minimize movement between MPP an SMP environments. There is a reason why SAS has invested in MPP, that is because of the huge data volumes that it can host.
01-20-2016 12:30 AM
Our master data store is SAS. on this data Account level user authorization information is processed and data is hosted in LASR. Row level security is applied in LASR to be used by SAS VA for dashboard. . LASR Updates we mentioned are about change in access/autorization information.
After applying user authorization information size of the data gets increased exponentially. As mentioned earlier beacuse of the infra challenges we opted for this routine. and unfortunately we could not use Out of box synchronization unless this data is copied back to the Physical location in SAS.
Hence looking out for suggestion on the what could be the most eficient way to do this. We have not tried Data step so not sure about the performance yet.
01-20-2016 06:31 AM
Sorry, my brains is kinda slow sometimes, so just to see if I understand you correctly:
So far, it seems that to co-located storage is out of the picture.
But still, if the reason is just to quickly reload the latest version of all data, save the current data in LASR to the co-located storage should be the fastest way to get things up and running.
can't you use the build-in record level authorization within VA, in conjunction with the use of Star Schemas?
If so, you just reload the the authorization table. The quite smaller (than the current structure) analysis data can bu quickly loaded separately (by the co-located storage :-) )
01-20-2016 11:17 PM
That was a joke right.. If your brain is slow (even sometimes) in the SAS DWH/BI area then god knows if someone like me even has a brain..
Your understanding is correct in the 3 bullet points you mentioned. Just one correction, we are not using Data builder. We are using SAS DI/ETL for doing this processing.
In a small POC we did, it was noticed that Star schema approach is impacting the end user performance, so did not use that.
We are using build-in record level authorization within VA. thats the reason data size in LASR is increased as we ended up storing one record per user based on the access rights.
One question. I am assumimg by co-located storage location you meant a Base SAS location? If yes then we are planning store/back up data in LASR to the this location. and looking out for suggestion for th most efficient way to do this.
Or I am missing your point here.
01-21-2016 03:26 AM
Co-located storage is only relevant if you have a distributed VA/LASR environment. So I take it from your answers that you are on a single node, right?
If so, yes, "SAS" is your data storage.
I have no experience in how/when to synchronize this kind of set-up. If you are using Base SAS for your main "off line" storage, make sure that it has as fast I/O as possible. You could use SPDE which could speed up read access (to fasten up the load to LASR).