SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Proc SQL vs Pass-through SQL

Reply
Occasional Contributor
Posts: 6

Proc SQL vs Pass-through SQL

 

I was hoping to find this somewhere in the community, but after spending some time searching, I haven't found a similar question.

 

I am looking for some feedback from the community about the pro / cons of using proc SQL vs pass-through SQL to delete a large number of observations ( minimum 15 million ) from a very large SQL table. We've noticed that if we us pass-through with ODBC not all records are deleted, but I am concerned that using proc SQL will blow up the SQL log or present other, new and different, issues. We've worked around this by having large deletes executed though SQL Mgmt Studio (smaller number of records work just fine); however, it is far more desirable to have the entire process executed within the overnight processing by SAS. We have current versions of SAS and SQL and the table is indexed. I am wondering if it is just to much for the ODBC and we really needs to just take the plunge for the SQL access engine, or if there are tips / tricks that might help.

 

Any feedback / guidance is appreciated!

Super User
Super User
Posts: 9,397

Re: Proc SQL vs Pass-through SQL

Posted in reply to CharynFaenza

Bit confused by your terminology.  Pass through is a mechanism for passing SQL to the database's SQL parser for execution.  Unless you are transferring the data to SAS, then you would be using pass-through - and with that much data I would not expect you to transfer it to SAS just for some admin tasks.  What database is it, can you not use the interface with the database directly, most big ones will have an SQL window where you can execute commands, this will be much faster than any other process

Occasional Contributor
Posts: 6

Re: Proc SQL vs Pass-through SQL

I am using the term pass-through SQL as you stated. We are submitting a SQL statement to the database to execute through an ODBC connection in a SAS macro. The problem is that the delete does not complete. The exact same code submitted directly in SQL performs as expected. For example: recently we attempted to delete 17M records using a pass-though SQL statement and it left behind > 3M records. The number of records deleted and the number of records remaining vary.

The reason we want to do this is SAS is that it is part of a larger process that we would like to automate. Essentially, if we want to reprocess a subset of data for a particular month, it looks at the target SQL data set and, if the month exists, it deletes all records, and then replaces them with the reprocessed data. This is much faster than an update query due to the nature of the changes when data is reprocessed.

Yes, we can do this right in SQL, but I would much prefer to just run the sas code in batch than jump between applications.
PROC Star
Posts: 1,288

Re: Proc SQL vs Pass-through SQL

Posted in reply to CharynFaenza

I believe that in many SQL dialects there's a statement that will remove all of the data in a table by simply resetting metadata indicators (in Oracle it's TRUNCATE TABLE). Doing this using pass-though would seem to be your best option, based on what you're describing. It should run extremely quickly, as it doesn't process any data.

Occasional Contributor
Posts: 6

Re: Proc SQL vs Pass-through SQL

@Tom - thanks for the tip. I'll see what I can dig up on the truncate function in SQL!
Super User
Posts: 5,849

Re: Proc SQL vs Pass-through SQL

Posted in reply to CharynFaenza
If the DELETE doesn't complete it's an issue that you want to address.
Again terminology: pass through means that the operation takes place in the data base. For syntax perspective you can either use SAS SQL tjat acta om a libname (implicit) or use EXECUTE (explicit).
If you are using implicit you can use options to feed back RDBMS logging to SAS.
Either way try to enable logging in the target.
To me it might be some kind of time out or other system level constraint issue?
Data never sleeps
Occasional Contributor
Posts: 6

Re: Proc SQL vs Pass-through SQL

Yes, currently the operation takes place in the database. Here is a generic form of the code:

proc sql;
connect to odbc as MyDb ( DSN="MyDSN" AuthDomain="MyAuthDomain" );
execute
(
execute dbo.uspMyDeleteProcess
@tablename = '[dbo].[MyTable]',
@datecolumn = 'MyDate',
@increment = 2000000,
@date = &MyDateVar.;
)
by MyDb;
disconnect from MyDb;
quit;

Right now - the time-out theory is our leading theory as the code works correctly in SQL. I am hoping there is some hints / tips for us to either do this a different way or to prevent the timeout.
Frequent Contributor
Posts: 130

Re: Proc SQL vs Pass-through SQL

Posted in reply to CharynFaenza

Deleting 15m+ rows in SQLServer shouldn't be a big deal.

 

There's loads of ways to attack this but I would start of by looking at your indexes, they can slow things down a lot in some circumstances. Sometimes it can be better to drop indexes then recreate after the operation. 

 

Have you run the estimated execution plan in sql management studio when doing the delete  there? might give some tips

 

I'd also consider looking at using transactions so things roll back rather than end up with a random amount of remaining data

Respected Advisor
Posts: 4,665

Re: Proc SQL vs Pass-through SQL

[ Edited ]
Posted in reply to CharynFaenza

@CharynFaenza

It sounds very much like a timeout issue. SAS sends a command to the DB and then waits for the response which it eventually only gets once the DB process has finished. Besides of timeout settings it's sometimes also security software/virus scanners which close connections that appear to be inactive and it can take quite a bit of investigation and tweaking to resolve such issues.

 

I would expect that an incomplete/aborted operation on the DB issue a roll-back and you should either end-up with no rows deleted or all rows deleted. If this is not the case then I guess there are commits after a certain number of rows deleted. Make sure DBCOMMIT is set to zero.

https://support.sas.com/documentation/cdl/en/acreldb/65247/HTML/default/viewer.htm#p00lgy3xwh61b8n16...

 

SAS will always try and send the SQL directly to the database for execution. With explicit pass-through that happens always, with implicit pass-through (SQL statements defined in SAS SQL syntax) SAS will still try to convert the SQL to the database flavor and send as much as possible to the DB for execution. You can see in the SAS log what SAS is trying to do and what code gets sent by using options: 

options sastrace=',,,d' sastraceloc=saslog nostsuffix;

 

For the time it takes to actually delete the data in SQL Server:

If that's a regular process and you can change the table in SQL Server then I'd go for a monthly partitioned table as this would then allow you to simply truncate the data in a specific partition. 

https://docs.microsoft.com/en-us/sql/t-sql/statements/truncate-table-transact-sql

 

I personally would also implement using explicit pass-through SQL as this allows me to develop and unit test the code directly on the  database and then use this code 1:1 within my SAS program with full control of what's going on and with all database specific commands available to me (i.e. truncation of a specific partition).

 

Occasional Contributor
Posts: 6

Re: Proc SQL vs Pass-through SQL

Posted in reply to CharynFaenza

Thanks to everyone for all of the great feedback.  I am obviously not a SQL gal and your explanations not only helped me trouble-shoot the problem, they will help me when I communicate the varying roles of SAS v SQL in this process with the DBA. 

 

@itchyeyeballs yes, we have run the execution plan several times directly in SQL and it works like a charm; however, during the course of my follow tests, I did find out that there were some changes that were made in the trouble shooting that mislead me to believe that the number of records changes, it does not.  @Patrick hit the nail that there are aborted operation on the DB when executing the package.  It looks like it will get through about six iterations before timing out.  Applying this logic in testing, I can predict with great accuracy how many records will be left.  While that doesn't solve the problem, it is a great stride in verifying what exactly is going on.  As I said, I can now take the various proposed solutions to the team for discussion (I really like the partitioned table idea...).

 

Thanks again to this great community for giving me some good insights!

 

--Charyn

 

 

 

Ask a Question
Discussion stats
  • 9 replies
  • 873 views
  • 0 likes
  • 6 in conversation