Solved: SAS optimizations with MS T-SQL and robust systems/servers suggestions

jroe · Posted 12-30-2015 03:21 PM

Hello, I am primarily a T-SQL user while using SAS to compare data sets to T-SQL tables at times. Are there any other SAS to Microsoft SQL tricks/parallel processing/thread tips to be implemented? SAS seems to work just as good with 2gb of ram as it does with 64gb of ram along with 1 thread vs 8 +. These are larger data sets (2mil vs 11mil records or 10mil vs 10mil etc) for comparison purposes. SGIO flag helps with large vs large but not when its < 2gb vs 10gb then it slows down the query by 25%.

a. PC running sas: My system has 64gb of ram and 8 cores, I rarely see SAS use 20% cpu usage and it doesn't seem to improve performance to a large extent when allocating more memory to SAS.

b. server running sql: similar specs but 32 cores, rarely see more than 1-2 cores be used while large sas queries are communicating via procsql and libname to sql.

questions:

a. I have altered the sas9v.cfg file to various settings and performance is within 1-2 % regardless of settings. Rarely does sas go over 20% cpu usage and even if it does, the real time results are within a few seconds of each other.

b. It seems that if I give more ram to SAS it doesn't speed things up and in some cases slows it down (seems to take a while to cache the tables to ram which offsets any gains).

c. when proc sql talks to sql it doesn't appear to use more than 1 or 2 cores or much ram. So my guess is the slow down is pulling the data from sql single threaded.

changes in sas9v.cfg: (please note buffsize 32k, 64k, 128 have been used, buffno 10-500, mem/sum/sort/realmem sizes 2gb-32gb and the results all appear similar)

-MEMSIZE 16G
-realmemsize 8G
-SORTSIZE 4G
-sumsize 4G
-BUFSIZE = 64k
-UBUFSIZE = 64k
-IBUFSIZE = 32767
-UBUFNO = 20
-IBUFNO = 20
-BUFNO = 40
-CPUCOUNT = actual
-THREADS
-dbsliceparm=all

-sascmd="sas"
-ALIGNSASIOFILES

in sas top of code/script query:

OPTIONS sascmd="sas";
OPTIONS THREADS = yes;
OPTIONS CPUCOUNT = ACTUAL;
OPTIONS dbsliceparm = all;
options threads = yes;

options within the two libname libraries:

dbsliceparm=all;
options threads = yes;
OPTIONS THREADS CPUCOUNT=ACTUAL;

example code:

proc sql;

create table /*local.*/test123 as select

a,b,c,d.... from test1 t1 full outer join test2

on test1.a= test2.a test1.b=test2.b......

where test1.a is not null or test2.b is not null

SASKiwi · Posted 01-10-2016 02:56 PM

This reference is useful when understanding how READBUFF works with SQL Server ODBC:

http://support.sas.com/documentation/cdl/en/acreldb/68028/HTML/default/viewer.htm#p0u61lysy7qgzmn1bd...

I've found that leaving READBUFF default value for SQL Server of 0 works the best in most situations. Ultimately it is the bandwidth of the network connection between SQL Server and SAS that dictates overall performance.

I strongly suggest you explore SQL PASSTHRU as a means of improving performance as has already been discussed in this thread.

Here is an example of what you could try:

proc sql noprint;
  connect to odbc (noprompt = "server=tester;DRIVER=SQL Server Native Client 10.0;Trusted Connection=yes;DATABASE=test1;" );
  create table test as 
  select * from connection to odbc
  (SELECT *
   FROM [tester].[test1].[tablename]
  )
  ;
  disconnect from odbc;
quit;

View solution in original post

LinusH · Posted 12-30-2015 06:02 PM

Quick question: is test a SQL Server library?
If yes, a major part of the processing will take part there, not in SAS. Use
Options sastrace=',,,d' sastraceloc=saslog nostsuffix;
to find out what is executed where.
When it comes to SAS and performance:
- memory is utilised espacially when sorting is involved (which will include many join operations)
- threads: again sorting and aggregating. Also I/O (external databases and the SAS SPDE engine).
There are quite a few papers on the subject, search on support.Sas.com.

Data never sleeps

Ksharp · Posted 12-30-2015 08:00 PM

Yeah. SAS has similar function. Check PROC FEDSQL and PROC DS2 ;

jroe · Posted 12-30-2015 09:23 PM

I read about 40 articles this week ... and about 100 threads, trying to crash course myself a bit. Fedsql will be what I try next, thanks for the feedback. 🙂 Similar code in sql works, I may beed to rewrite it a bit if fedsql is literal since tsql and procsql are a little different. Once concern is sas treats '' and nulls as unknowns while sql... nulls are nulls and blanks '' are not nulls although similar. If a column has nulls or '' in sql... sas treats them similary. However sql will treat them differently unless isnull or replacement is performed. I am not sure if fedsql will process nulls and blanks '' (sas style or tsql). The work around is to make sure the tables are consistent with similar values (all nulls or '', not mixed or different for the same columns from two tables). Sas proc sql handles the mix similary on joins.

jroe · Posted 12-30-2015 09:26 PM

I will also add Options sastrace=',,,d' sastraceloc=saslog nostsuffix; to config file. I had removed it previously due to libname warnings of being over 260chars ... I had not tried it in the config file.

jakarman · Posted 12-31-2015 10:20 AM

It looks to me you are getting lost in technical topics but misssing the start with: why what who when where questions as "line".

Why ...There must be some code that is running and you to process that faster. (wall-clock)
Where is your data? local in sas-datasets (that is wiht all those sas options) or in a DBMS (sastrace)
What is your bottleneck. a/ the external RDBMS network traffic b/ Cpu (don't believe that one) c/ your way of coding (that one hurts) d/ IO speed of the data. e/ ?? You need to collect statistics logs on what is going one first.
When going into timing issues is good to know what other impact can be arround. Shared resources are everywhere.
who is processing what not only you as the intitator of the work but are theres some other services involved?

Your background is https://technet.microsoft.com/en-us/library/ms189826(v=sql.90).aspx ?
Than there is on big difference SQL DBMS are assumed not to be ordered. For OLTP only hitting a small portion of the data.
With SAS-datasets access is assumed to be sequential (ordred) hitting all data. You can access both of them in the othere way but that is less known.

---->-- ja karman --<-----

jroe · Posted 01-04-2016 10:16 AM

The bottleneck appears to be when using libname and procsql to compare to tables stored in t-sql (microsoft) dbms. When the queries reach out to SQL they are single threaded. You then see sas when working locally use 2 cores (although never goes over 20% cpu usage) and will utilize the ram settings in the config files. After turning on the logging, the issue appears to be due to using proc sql to compare two libnames that are odbc sql tables. I have not had success yet running "pass through procsql" or "fedsql", due to time and "still learning".

Often SAS will be used against other sas tables or temp tables or may communicate with sql directly. I wanted to leverage as many simple optimizations possible to be flexible for both types of data sources. Thanks for the feedback.

sas messages when comparing the two odbc sql tables:

"ERROR: This SQL statement will not be passed to the DBMS for processing because it involves a
join across librefs with different connection properties."

http://support.sas.com/kb/33/417.html

https://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a002679005.htm

and this is most likely when I see locally on my pc cpu jumps when SAS sorts:

NOTE: SAS threaded sort was used.

fedsql generates this error when re-using the libnames and from the articles I read there were no related solutions that I could find as of last week:

"ERROR: TKTS initialization failed."

jroe · Posted 01-04-2016 01:18 PM

attempting to utiliz sql passthrough for proc sql and I continue to recieve the message "
ERROR: This SQL statement will not be passed to the DBMS for processing because it involves a
join across librefs with different connection properties."

These are two difference databases/schemas on the same server that are being joined (ODBC TO SQL)

jakarman · Posted 01-04-2016 03:30 PM

The message is exactly telling you what is happening. You have set up libnames to different part of the SQl database.

As a result all data is processed by downloading it to sas and do all logic there. What you wanted is executing in the database.
Did you check: https://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a003228411.htm ?

---->-- ja karman --<-----

jroe · Posted 01-04-2016 04:15 PM

Thanks for the reply! I have tried the options command without success, I receive the same message.

Is the same error true when only comparing two seperate databases in the odbc or is it also true if I was comparing libname (sas) vs libname (odbc) or two sas libnames databases?

options dbidirectexec sastraceloc=saslog;

jakarman · Posted 01-05-2016 01:20 AM

When comparing do you use "proc compare" ir just programmed SQL statements?
Proc compare is very sofisticated but dedicated to sas. Needing to run at SAS datasets.
SQL statements are very generic but less sofisticated.

It is finding the balance what can be expoited in SQL (and by that in a RDBMS) and what is happening outside.

Missings.... o my: the true/false binary logic (SAS) and the 3 value logic (true/false/unknown-null) of a rdbms.
You cannot convert those two easilye when using a reversed logic (binary) as of ease of coding.
When using RDBMS 3 value logic specify the wanted records as and never use a double denial.

SAS claims the null to be translated to SAS as "". Yes that works fine in SAS datasets but not in real rdbms systems.

---->-- ja karman --<-----

jakarman · Posted 01-04-2016 04:31 PM

It can only work when all data is in the rdbms sql server and you could run if there.
In all other cases there is always a download to Sas.
As you read those notes there are many small reasons why data is downloaded to Sas.

---->-- ja karman --<-----

jroe · Posted 01-04-2016 05:30 PM

It helped to find some goof errors by running the query in sql. I had double called the table like:

select mydb.table.name from mydb.table...

fixed it to: select name as name_one from mydbtable

I then added: options dbidirectexec;

Afterwards I modified the query to use the passthrough connection logic. I believe it is working since I saw sql use multiple cores and my local task manager use less ram/cores (allowing sql to process).

I'm stuck with how sql and sas differ on handling nulls. In t-sql a null = null and a '' = blank (not a null). In sas a '' blank can represent nulls. So when comparing "a column name from table A vs column name from table B"...:

a. if using sas and not sql pass through: sas handles nulls and '' no problem

b. when you run the same queries in sql, if a.name = null and b.name= '', then they are not the same (a full outerjoin would display both results).

So this is why sas is handy (if one programm keeps nulls as nulls and another casts nulls as blanks...) sas can handle it when doing a comparison. Sql you need to cast them all to be nulls or blanks (the same).

I now need to figure out how to have SAS handle the nulls/blanks but pass the processing power to sql.

http://analytics.ncsu.edu/sesug/2001/P-109.pdf

I appologize if this doesn't make sense, it is difficult to describe in writing.

jroe · Posted 01-04-2016 05:58 PM

I am going to guess the gains in the select directly from sql and make the joins there are neglible in this case because it takes "40 seconds" to create one of the tables in sas on its own, and another 40 seconds to do the other (80 sec total).... while the time it takes to process evertyhing using the current code in SAS is only 2 minutes. From what I have read is that when it tries to join the two tables from different libnames, it then backs out and builds the tables in SAS anyway.

It would be neat to see how I could take two "work/temp" tables from SAS and then post it to sql with passthrough rather than SAS. I'll have to learn some more.

update: I also hoped to use DIRECT_SQL=NO; and other various direct_Sql options so that the queries would be multithreaded in sql but sas would perform the joins which does not seem to be accurate (still slow)

jroe · Posted 01-04-2016 07:27 PM

I am giving up for now but if anyone has ideas on how proc sql can kick off multithreading in sql with just using libname please pass it on. It almost seems like whatever SAS does tells MS SQL to "not use multithreading".

SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions

Re: SAS optimizations with MS T-SQL and robust systems/servers suggestions