Hello, I am primarily a T-SQL user while using SAS to compare data sets to T-SQL tables at times. Are there any other SAS to Microsoft SQL tricks/parallel processing/thread tips to be implemented? SAS seems to work just as good with 2gb of ram as it does with 64gb of ram along with 1 thread vs 8 +. These are larger data sets (2mil vs 11mil records or 10mil vs 10mil etc) for comparison purposes. SGIO flag helps with large vs large but not when its < 2gb vs 10gb then it slows down the query by 25%.
a. PC running sas: My system has 64gb of ram and 8 cores, I rarely see SAS use 20% cpu usage and it doesn't seem to improve performance to a large extent when allocating more memory to SAS.
b. server running sql: similar specs but 32 cores, rarely see more than 1-2 cores be used while large sas queries are communicating via procsql and libname to sql.
questions:
a. I have altered the sas9v.cfg file to various settings and performance is within 1-2 % regardless of settings. Rarely does sas go over 20% cpu usage and even if it does, the real time results are within a few seconds of each other.
b. It seems that if I give more ram to SAS it doesn't speed things up and in some cases slows it down (seems to take a while to cache the tables to ram which offsets any gains).
c. when proc sql talks to sql it doesn't appear to use more than 1 or 2 cores or much ram. So my guess is the slow down is pulling the data from sql single threaded.
changes in sas9v.cfg: (please note buffsize 32k, 64k, 128 have been used, buffno 10-500, mem/sum/sort/realmem sizes 2gb-32gb and the results all appear similar)
-MEMSIZE 16G
-realmemsize 8G
-SORTSIZE 4G
-sumsize 4G
-BUFSIZE = 64k
-UBUFSIZE = 64k
-IBUFSIZE = 32767
-UBUFNO = 20
-IBUFNO = 20
-BUFNO = 40
-CPUCOUNT = actual
-THREADS
-dbsliceparm=all
-sascmd="sas"
-ALIGNSASIOFILES
in sas top of code/script query:
OPTIONS sascmd="sas";
OPTIONS THREADS = yes;
OPTIONS CPUCOUNT = ACTUAL;
OPTIONS dbsliceparm = all;
options threads = yes;
options within the two libname libraries:
dbsliceparm=all;
options threads = yes;
OPTIONS THREADS CPUCOUNT=ACTUAL;
example code:
proc sql;
create table /*local.*/test123 as select
a,b,c,d.... from test1 t1 full outer join test2
on test1.a= test2.a test1.b=test2.b......
where test1.a is not null or test2.b is not null
This reference is useful when understanding how READBUFF works with SQL Server ODBC:
I've found that leaving READBUFF default value for SQL Server of 0 works the best in most situations. Ultimately it is the bandwidth of the network connection between SQL Server and SAS that dictates overall performance.
I strongly suggest you explore SQL PASSTHRU as a means of improving performance as has already been discussed in this thread.
Here is an example of what you could try:
proc sql noprint;
connect to odbc (noprompt = "server=tester;DRIVER=SQL Server Native Client 10.0;Trusted Connection=yes;DATABASE=test1;" );
create table test as
select * from connection to odbc
(SELECT *
FROM [tester].[test1].[tablename]
)
;
disconnect from odbc;
quit;
It looks to me you are getting lost in technical topics but misssing the start with: why what who when where questions as "line".
Your background is https://technet.microsoft.com/en-us/library/ms189826(v=sql.90).aspx ?
Than there is on big difference SQL DBMS are assumed not to be ordered. For OLTP only hitting a small portion of the data.
With SAS-datasets access is assumed to be sequential (ordred) hitting all data. You can access both of them in the othere way but that is less known.
The bottleneck appears to be when using libname and procsql to compare to tables stored in t-sql (microsoft) dbms. When the queries reach out to SQL they are single threaded. You then see sas when working locally use 2 cores (although never goes over 20% cpu usage) and will utilize the ram settings in the config files. After turning on the logging, the issue appears to be due to using proc sql to compare two libnames that are odbc sql tables. I have not had success yet running "pass through procsql" or "fedsql", due to time and "still learning".
Often SAS will be used against other sas tables or temp tables or may communicate with sql directly. I wanted to leverage as many simple optimizations possible to be flexible for both types of data sources. Thanks for the feedback.
sas messages when comparing the two odbc sql tables:
"ERROR: This SQL statement will not be passed to the DBMS for processing because it involves a
join across librefs with different connection properties."
http://support.sas.com/kb/33/417.html
https://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a002679005.htm
and this is most likely when I see locally on my pc cpu jumps when SAS sorts:
NOTE: SAS threaded sort was used.
fedsql generates this error when re-using the libnames and from the articles I read there were no related solutions that I could find as of last week:
"ERROR: TKTS initialization failed."
attempting to utiliz sql passthrough for proc sql and I continue to recieve the message "
ERROR: This SQL statement will not be passed to the DBMS for processing because it involves a
join across librefs with different connection properties."
These are two difference databases/schemas on the same server that are being joined (ODBC TO SQL)
The message is exactly telling you what is happening. You have set up libnames to different part of the SQl database.
As a result all data is processed by downloading it to sas and do all logic there. What you wanted is executing in the database.
Did you check: https://support.sas.com/documentation/cdl/en/acreldb/63647/HTML/default/viewer.htm#a003228411.htm ?
Thanks for the reply! I have tried the options command without success, I receive the same message.
Is the same error true when only comparing two seperate databases in the odbc or is it also true if I was comparing libname (sas) vs libname (odbc) or two sas libnames databases?
options dbidirectexec sastraceloc=saslog;
When comparing do you use "proc compare" ir just programmed SQL statements?
Proc compare is very sofisticated but dedicated to sas. Needing to run at SAS datasets.
SQL statements are very generic but less sofisticated.
It is finding the balance what can be expoited in SQL (and by that in a RDBMS) and what is happening outside.
Missings.... o my: the true/false binary logic (SAS) and the 3 value logic (true/false/unknown-null) of a rdbms.
You cannot convert those two easilye when using a reversed logic (binary) as of ease of coding.
When using RDBMS 3 value logic specify the wanted records as and never use a double denial.
SAS claims the null to be translated to SAS as "". Yes that works fine in SAS datasets but not in real rdbms systems.
It helped to find some goof errors by running the query in sql. I had double called the table like:
select mydb.table.name from mydb.table...
fixed it to: select name as name_one from mydbtable
I then added: options dbidirectexec;
Afterwards I modified the query to use the passthrough connection logic. I believe it is working since I saw sql use multiple cores and my local task manager use less ram/cores (allowing sql to process).
I'm stuck with how sql and sas differ on handling nulls. In t-sql a null = null and a '' = blank (not a null). In sas a '' blank can represent nulls. So when comparing "a column name from table A vs column name from table B"...:
a. if using sas and not sql pass through: sas handles nulls and '' no problem
b. when you run the same queries in sql, if a.name = null and b.name= '', then they are not the same (a full outerjoin would display both results).
So this is why sas is handy (if one programm keeps nulls as nulls and another casts nulls as blanks...) sas can handle it when doing a comparison. Sql you need to cast them all to be nulls or blanks (the same).
I now need to figure out how to have SAS handle the nulls/blanks but pass the processing power to sql.
http://analytics.ncsu.edu/sesug/2001/P-109.pdf
I appologize if this doesn't make sense, it is difficult to describe in writing.
I am going to guess the gains in the select directly from sql and make the joins there are neglible in this case because it takes "40 seconds" to create one of the tables in sas on its own, and another 40 seconds to do the other (80 sec total).... while the time it takes to process evertyhing using the current code in SAS is only 2 minutes. From what I have read is that when it tries to join the two tables from different libnames, it then backs out and builds the tables in SAS anyway.
It would be neat to see how I could take two "work/temp" tables from SAS and then post it to sql with passthrough rather than SAS. I'll have to learn some more.
update: I also hoped to use DIRECT_SQL=NO; and other various direct_Sql options so that the queries would be multithreaded in sql but sas would perform the joins which does not seem to be accurate (still slow)
I am giving up for now but if anyone has ideas on how proc sql can kick off multithreading in sql with just using libname please pass it on. It almost seems like whatever SAS does tells MS SQL to "not use multithreading".
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.