10-18-2017 10:17 AM
I was hoping to find this somewhere in the community, but after spending some time searching, I haven't found a similar question.
I am looking for some feedback from the community about the pro / cons of using proc SQL vs pass-through SQL to delete a large number of observations ( minimum 15 million ) from a very large SQL table. We've noticed that if we us pass-through with ODBC not all records are deleted, but I am concerned that using proc SQL will blow up the SQL log or present other, new and different, issues. We've worked around this by having large deletes executed though SQL Mgmt Studio (smaller number of records work just fine); however, it is far more desirable to have the entire process executed within the overnight processing by SAS. We have current versions of SAS and SQL and the table is indexed. I am wondering if it is just to much for the ODBC and we really needs to just take the plunge for the SQL access engine, or if there are tips / tricks that might help.
Any feedback / guidance is appreciated!
10-18-2017 10:22 AM
Bit confused by your terminology. Pass through is a mechanism for passing SQL to the database's SQL parser for execution. Unless you are transferring the data to SAS, then you would be using pass-through - and with that much data I would not expect you to transfer it to SAS just for some admin tasks. What database is it, can you not use the interface with the database directly, most big ones will have an SQL window where you can execute commands, this will be much faster than any other process
10-18-2017 10:51 AM
10-18-2017 01:02 PM
I believe that in many SQL dialects there's a statement that will remove all of the data in a table by simply resetting metadata indicators (in Oracle it's TRUNCATE TABLE). Doing this using pass-though would seem to be your best option, based on what you're describing. It should run extremely quickly, as it doesn't process any data.
10-18-2017 11:40 AM
10-18-2017 01:13 PM
10-19-2017 05:11 PM
Deleting 15m+ rows in SQLServer shouldn't be a big deal.
There's loads of ways to attack this but I would start of by looking at your indexes, they can slow things down a lot in some circumstances. Sometimes it can be better to drop indexes then recreate after the operation.
Have you run the estimated execution plan in sql management studio when doing the delete there? might give some tips
I'd also consider looking at using transactions so things roll back rather than end up with a random amount of remaining data
10-20-2017 10:43 PM - edited 10-20-2017 10:43 PM
It sounds very much like a timeout issue. SAS sends a command to the DB and then waits for the response which it eventually only gets once the DB process has finished. Besides of timeout settings it's sometimes also security software/virus scanners which close connections that appear to be inactive and it can take quite a bit of investigation and tweaking to resolve such issues.
I would expect that an incomplete/aborted operation on the DB issue a roll-back and you should either end-up with no rows deleted or all rows deleted. If this is not the case then I guess there are commits after a certain number of rows deleted. Make sure DBCOMMIT is set to zero.
SAS will always try and send the SQL directly to the database for execution. With explicit pass-through that happens always, with implicit pass-through (SQL statements defined in SAS SQL syntax) SAS will still try to convert the SQL to the database flavor and send as much as possible to the DB for execution. You can see in the SAS log what SAS is trying to do and what code gets sent by using options:
options sastrace=',,,d' sastraceloc=saslog nostsuffix;
For the time it takes to actually delete the data in SQL Server:
If that's a regular process and you can change the table in SQL Server then I'd go for a monthly partitioned table as this would then allow you to simply truncate the data in a specific partition.
I personally would also implement using explicit pass-through SQL as this allows me to develop and unit test the code directly on the database and then use this code 1:1 within my SAS program with full control of what's going on and with all database specific commands available to me (i.e. truncation of a specific partition).
10-23-2017 09:08 AM
Thanks to everyone for all of the great feedback. I am obviously not a SQL gal and your explanations not only helped me trouble-shoot the problem, they will help me when I communicate the varying roles of SAS v SQL in this process with the DBA.
@itchyeyeballs yes, we have run the execution plan several times directly in SQL and it works like a charm; however, during the course of my follow tests, I did find out that there were some changes that were made in the trouble shooting that mislead me to believe that the number of records changes, it does not. @Patrick hit the nail that there are aborted operation on the DB when executing the package. It looks like it will get through about six iterations before timing out. Applying this logic in testing, I can predict with great accuracy how many records will be left. While that doesn't solve the problem, it is a great stride in verifying what exactly is going on. As I said, I can now take the various proposed solutions to the team for discussion (I really like the partitioned table idea...).
Thanks again to this great community for giving me some good insights!