BookmarkSubscribeRSS Feed
Ullsokk
Pyrite | Level 9

SAS is the only datawarehousing tool I have come across where just viewing a table will cause a lock and hinder the ETL job. I have always found this very puzzling. I mightsee the argument for not updating a table while another process is running an active query, but I cannot see any reason why someone simply having looked at a table and not closed the view should cause a lock, especially users with read only access. 

 

 

Does anyone know why this feautre\bug exists? And why seemingly no other database tool has chosen the same apporach?

9 REPLIES 9
LinusH
Tourmaline | Level 20

Call it a feature is almost mean. Base SAS tables were created in far past where the typical use case was single user access.
The benefit is no overhead and relatively fast vs traditional RDBMS.
But it if you are in a large heavily updated/used data warehouse Base SAS tables wouldn't be my recommendation. If you look in the SAS stack Scalable Performance Data Server deals with locking (and scale) in much better way. For smaller tables that might be used for data entry you could look at SAS/SHARE.

Data never sleeps
TomKari
Onyx | Level 15

I apologize in advance for the long answer, but there is some history involved.

 

First of all, you're hitting these table locks when your data storage is on SAS datasets. That's only one of the many data storage mechanisms accomodated by the SAS environment; there are trade-offs between the different mechanisms. If your data storage was on SQL Server or Oracle, you wouldn't have table locks, but you'd have the overhead of the software needed to manage concurrency. I've used SAS against a database that was optimized for analytics; performance was amazing, 10 to 100 times faster than SQL Server or Oracle for analytical read operations. Part of the price of this was it only allowed a single updater. I feel it's critically important to find the tools that produce the results you need for your requirements; if your requirements can't handle table locks, use something else.

 

As @LinusH mentions, the table locks are used because SAS datasets are an old storage mechanism. It's unreasonable to call them a database; if you need database features, use one. As with almost all sequential files, creating or updating one prevents other users from accessing them, to prevent data corruption problems.

 

Tom

SASKiwi
PROC Star

SAS datasets are OS files formatted in a proprietary way. As such SAS is constrained by the OS file-locking behaviour.

 

You can prove this for yourself. Try opening a SAS dataset using a non-SAS tool like MS Notepad or Word and just leave it open (yes, I know it won't read it properly). Now try updating the same dataset in SAS. You will get the same file lock error that you would get using a second SAS session to open the dataset. What I'm trying to say is it is not SAS controlling the file lock behaviour it is the OS.

 

If you want a better way of controlling file locks then explore the options mentioned by @LinusH. Also if your file locking problems are caused by other SAS jobs processing the same datasets for a short time then the FILELOCKWAIT option may help - https://documentation.sas.com/?docsetId=hostwin&docsetTarget=n0fh4oaq31gvqkn1hlh75lvv9396.htm&docset...

 

However I too wish there was a better way to handle this out of the box without the complications of extra software.

 

 

Patrick
Opal | Level 21

@Ullsokk wrote:

SAS is the only datawarehousing tool I have come across where just viewing a table will cause a lock and hinder the ETL job. I have always found this very puzzling. I mightsee the argument for not updating a table while another process is running an active query, but I cannot see any reason why someone simply having looked at a table and not closed the view should cause a lock, especially users with read only access. 

 

 

Does anyone know why this feautre\bug exists? And why seemingly no other database tool has chosen the same apporach?


@Ullsokk

SAS is not a datawarehousing tool and SAS tables are not tables in a DBMS. SAS can be used for data warehousing tasks and SAS offers a specialized tool for ETL (SAS DI Studio; and new ones with SAS Viya) which allows you to define ETL processes metadata driven. But you can also code your ETL processes without SAS.

Using SAS you can interface with most DBMS and it's then up to the DBMS if it allows read/write concurrency or not. If using SAS tables then these are just files on a file system and it's the OS level file locks which prevent concurrency.

SAS honors the locks but at least under Unix/Linux (and also depending on the file system used I believe) you can use OS commands which don't honor such file locks.

It depends what you need to do but let's say it's about a full replace of a table. Your ETL could first load into a newly created table with another name and then at the very end use OS commands (i.e: mv -f <newly created table> <target table> ) to replace the old table (file) with the new table.

Ullsokk
Pyrite | Level 9

Thank you for all the in depth replies. Makes much more sense now. The next question would be why someone would choose to use SAS as a stand in for a DBMS, and not just have a SQL database which is accessed by SAS, if that is your chosen BI toolkit. 

TomKari
Onyx | Level 15

1. SQL databases cost a LOT of money!

 

2. Given that most statistical operations are table scans, in many cases you'll get better performance from a flat file than from a SQL database. SQL databases specialize in use cases where a lot of random records need to be accessed by a lot of different users.

 

 

AndrewHowell
Moderator

@Ullsokk,

 

SAS will (generally) perform faster tasks on SAS-proprietary data formats. ("In-database" processing is a separate discussion.)

 

If you are concerned about file locks on SAS tables, then why not emulate in SAS what is almost standard in relational databases - don't access the data tables directly(which then lock), instead VIEW the data through a data-step or proc-sql view.

 

 

 

 

Ullsokk
Pyrite | Level 9

Sure, I tend to view data through data steps, but in an organization with dozens or hundreds of users, a few are bound to not be aware that simply opening what appears to be a table view in EG, actually locks the table from beeing updated by the datawarehouse.

 

It would seem like a no brainer to me to code this so that a view of a table (as in clicking on a table in a libname to have a look) does not physically open the actual file, but instead does the equivalent of a datastep with a 100 row limitation, or something to that effect. Seems like somehing that could be set to to by default be an operation that does not cause a table lock. 

 

AngusLooney
SAS Employee

I may not have exactly grasped the context of the original post, but one spin on this is that the path you use to update tables doesn't have to the same one that an end-user might use to view data, so a RW library for ETL and RO library for exploitation.

 

That or end-users access to VIEWs onto tables, rather than the real tables themselves. Using the VIEW approach offers a buffer between end-users and backend changes that might crop up over time, I've seen it turn out to be very useful in the long run, after everyone thought it was overkill originally, a bit like putting detailed logging into your ETL jobs/flows.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 4484 views
  • 13 likes
  • 7 in conversation