BookmarkSubscribeRSS Feed
deleted_user
Not applicable
I have problems getting one sas server to read data hosted on another sas server. Hence, I was thinking of getting a shared storage that is directly connected to both sas servers. In that case, is it possible for both sas servers to use the same workspace and even read the same data sets concurrently (since the directory is now connected directly to both sas servers via SAN or DAS). thanks for your advice in advance
13 REPLIES 13
deleted_user
Not applicable
That is a non-trivial configuration, requiring special clustered filesystem software, and may still not work. Oracle requires this kind of thing and is written to work with it, but SAS probably isn't, especially BaseSAS.

Are you using SAS/CONNECT or SAS/SHARE?
deleted_user
Not applicable
I was thinking that if it cant be done, perhaps we can manipulate the SAN, so that the data sets and flat files used by one server is in one LUN, and then copied over (batch job) to another LUN (on the same storage) which is visible to the 2nd sas server. Hence, both sas servers are able to use the same data sets with minimal effort made to copy data sets over for sharing. The problem is that we have neither share nor connect, though I am making noise to procure them (and in this case, the data is on datasets and flat files, not on databases) Message was edited by: Joshua
Doc_Duke
Rhodochrosite | Level 12
We are doing it with two Unix (Solaris) servers. The share on the SAN is directly connected to one server and we use NFS mounting to allow the other server to access the data. It's transparent to the users, though there is a bit of a performance hit for large datasets.

(We did it because the SAN controller requires Solaris 10 and the older server is still on Solaris 9.x, I'm not sure if it is a general requirement to do it that way in Unix.)
deleted_user
Not applicable
Are you using SAS/SHARE?

Are you really allowing simultaneous access to identically the same file?

----------------------------------------------

Joshua, copying from one to another, that's not the same thing that I thought you were asking.


I prefer the idea of using a regular RDMS to provide access to shared data.

but, if data in on one server and you want to pick it of there and process it on a different server, using NFS for the processing server to have access is a slick way to go. A lot of admins use NFS mounts to simplify installing applications, removing the need to have to have space on the box for the install files, and from having to have direct access to the CD/DVD drive on the box. It also saves on the cleanup.
deleted_user
Not applicable
I would hope it could be performed concurrently, so I dont need to mirror all that data per connecting server (where I will need one LUN for one server, and another LUN with the same data for another server). NFS is probably a good way to go, but to mount that much data to be read by another server via NFS is bound to cause some network latency issues. I was actually hoping to see if I could use one workspace for all my users (who might access either box A or B at different times, but different users from both box A and B might want to access the same data), but I suppose thats only possible if the metadata is identical on both boxes, or if I have one common metadata repository.

Another problem with NFS is that my two boxes are windows and sun, so its down to samba which isnt as reliable as I'd like it to be.

I wish I could test all of this out, but we cant afford a testing environment (or test licenses), so I am trying to see if all of this works conceptually before we move to the next step of firming up what we need to procure. but I am worried that in actual practise, you cannot have two different sas servers (or instances) access the same data concurrently (even if just for reading)

Has anyone done any usability or performance tests to see if reading the data through mounted directories or to access the same data through SAS/SHARE yields better results (in terms of performance and reliability)
deleted_user
Not applicable
The SAS WORK is going to be on the physical server that SAS will run on. It is most unwise (foolish) to attempt to do otherwise.

If you need to bring data from one server to another, it usually can be selected (reduced) first and then that piece brought over. SAS can do that itself, even from within EG.

I strongly recommend a single metadata repository for EG, which defines for the user community logical servers on the two physical servers.

I would also highly recommend you start with a simple setup and then directly experiment using EG to see what you can do, and how it is done.
deleted_user
Not applicable
I was hoping that with a common metadata repository, we could also use a common workspace for all 10 users (so each user wont be taking up one more work space directory per server, but one consolidated directory per user, which should logically save us some space). Is it possible to deploy the metadata server on one of the two sas servers, or must it be deployed on a stand-alone box?

Granted that EG is able to see the data on both servers (via defined libaries), but in actual practise, wont the transfer or connection of the data be via the ethernet, which will be considerably slower than doing so via the SAN infrastructure (or even a shared DAS)?

I am trying to do some usability tests too, but its quite difficult with a modest budget that does not cater for a testing environment

Given that each server might not have more than 400GB of local disk space, I was thinking that it would make more sense to mount workspace, sort_dev and all temp space on the storage instead, and for each user to use the same workspace (even if this one user tranverses more than one sas server)
deleted_user
Not applicable
Joshua, you are not paying attention to the way SAS works.

Each SAS session/instance creates a temporary WORK directory/and files. The base location for this is defined in the sasv9.cfg configuration file. So each physical server is going to have at least one place where this "workspace" is going to be. You have no choice in that matter. Minimally, SORTDEV is not defined, so WORK is used. What you have to be careful of is SASUSER, which is persistent, and EG tends to want to use this space, which will get confused and clogged up. This is irregardless of where the metatdata respository sits. The metadata repository has nothing to do with "workspace".

It is my recommendation to set a policy to use "Assign Library..." to create EGTASK for each project, and to keep each EG project and its related files (query results) in that specific directory/folder for that project. It'll make for easier maintenance.

If each server has only 400 GB of space, and that is all local disk (no SAN space), then you may run into space and performance problems, especially as EG work accumulates, and if there are any significant datasets. You and management cannot be chintsy here, or you're asking for a lot of extra work, and disgruntled users.

That is my opinion.
deleted_user
Not applicable
You are right on that. I didnt think SAS was suited for our needs given our modest budget and lack of enterprise architects who are familiar with SAS and enterprise scale deployments. But given that I have inherited some SAS servers and need to get the job done with existing constraints, I have to think out of the box. The SAS licenses arent cheap, and neither are the components. The way I see it, theres better ROI in buying more disk space (SAN) and more memory (so you can dump more transactions onto memory instead of running on disk). I am under the impression that SAS 9.1.3 SP4 might not scale well (or linearly) with multiple CPUs, or at least it might not be worth our while to procure more than 4core cpu. Even for 32bit vs 64bit operating system, I seem to find alot of online material on support.sas that talks alot about 32bit and mentions little about 64bit. And I am still peeved that some BI components don't run on 64bit.

But I would assume that 64bit is probably the way to go if your best hope of performance improvement is in dumping everything to physical memory without resorting to paging. With /pae, you'd probably feel some performance degradation once you're hitting above 6 or 8 GB. I'm on a steep learning curve, so do bear with me as I sound all these ideas off the forum while I am also catching up with all the white papers and online documentation, which is alot to cover in the past 6 weeks that I was exposed to SAS
deleted_user
Not applicable
I am under the impression that having SAS IT on both wintel and unix server should allow the EG user to access the datasets hosted on both the wintel and unix server, at the same time in the same workspace session. But I would wonder if using SAS Connect, SAS Share, or SAS Access might be a more effective way for an EG user to be able to make use of the data sets (and flat files) hosted respectively on wintel and unix SAS servers.

Again, I'll have to read through the documentation for each of the various products, but it would be helpful to know how other administrators work around the problem of hosting and sharing data files on different sas servers and address the problem of network latency and performance.

-Joshua
deleted_user
Not applicable
Has anyone managed to use CEDA to share dataset across hetergeneous sas servers? If so, it might be more effective than going through having SAS IT + SAS SHARE on each sas server.

- Joshua
Doc_Duke
Rhodochrosite | Level 12
We have found that putting that work directories local to the server enhances performance because it minimizes network traffic (even though we are using a dedicated gigabit pipe, the network writes take longer than bus-based writes). At the end of the server activity, the disk space is released for other users. Because it is temporary space, it is not backed up and not RAID, which also helps on on network management.
deleted_user
Not applicable
That makes perfect sense. But if you had various users and multiple servers, is there a practical way to consolidate the workspace onto SAN such that one user would use the same workspace regardless of the different SAS servers he accesses from time to time? For the data tier, I am assuming that each sas server needs a dedicated LUN for SASUSER and you cant have two sas servers read the same files concurrently. So one way would probably be to have a big shared directory that each sas server would periodically dump a copy of its datasets to. So its like a common data mart that is accessible to each sas server, but each sas server keeps the most updated copy of its own data sets on its particular and dedicated LUN.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 2492 views
  • 0 likes
  • 2 in conversation