<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: SAS Sql subquery performance problem in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41891#M10869</link>
    <description>Ok, case of right tool for the job. &lt;BR /&gt;
Thanks for your time and assistance Patrick.</description>
    <pubDate>Thu, 28 Aug 2008 21:41:32 GMT</pubDate>
    <dc:creator>deleted_user</dc:creator>
    <dc:date>2008-08-28T21:41:32Z</dc:date>
    <item>
      <title>SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41877#M10855</link>
      <description>Hi, I am having problemms with the use of subqueries in Proc Sql using EG.&lt;BR /&gt;
The data I want is in table b, the key for table B is in table A. Table B is a large table (20+ mill rows) The key IS indexed&lt;BR /&gt;
The query to table A to get the right key takes only seconds.&lt;BR /&gt;
If I query table B directly using the key I get a return in seconds, If I use a subquery (A) to get the key however the return takes an unnaceptable amount of time (20+ minutes). Using in-line or rewriting as a join does not improve performance.&lt;BR /&gt;
The data returned by both queries is only a handful of rows. The same type of queries in SQLServer return in the expected several seconds.&lt;BR /&gt;
&lt;BR /&gt;
Any idea what the issue with Proc Sql is?&lt;BR /&gt;
&lt;BR /&gt;
QUERIES&lt;BR /&gt;
&lt;BR /&gt;
/*Query to Table A */&lt;BR /&gt;
&lt;BR /&gt;
PROC SQL;&lt;BR /&gt;
CREATE TABLE IPJOIN AS&lt;BR /&gt;
SELECT acct_no&lt;BR /&gt;
, client_key&lt;BR /&gt;
, IP_Address&lt;BR /&gt;
FROM tableA &lt;BR /&gt;
WHERE acct_no = &amp;amp;acctno&lt;BR /&gt;
;&lt;BR /&gt;
&lt;BR /&gt;
/*Query to Table B */&lt;BR /&gt;
&lt;BR /&gt;
PROC SQL  _METHOD;&lt;BR /&gt;
&lt;BR /&gt;
CREATE TABLE IPBET AS &lt;BR /&gt;
SELECT client_key&lt;BR /&gt;
, ip_addr&lt;BR /&gt;
FROM tableB&lt;BR /&gt;
WHERE client_key IN (SELECT DISTINCT client_key&lt;BR /&gt;
		FROM IPJOIN)&lt;BR /&gt;
&lt;BR /&gt;
;</description>
      <pubDate>Sun, 24 Aug 2008 23:37:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41877#M10855</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-08-24T23:37:17Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41878#M10856</link>
      <description>The log for _METHOD for the query above is below if that sheds any light.&lt;BR /&gt;
&lt;BR /&gt;
NOTE: SQL execution methods chosen are:&lt;BR /&gt;
&lt;BR /&gt;
      sqxcrta&lt;BR /&gt;
          sqxuniq&lt;BR /&gt;
              sqxsrc( OB_SAND.OB_FACT_BETSELECTION )&lt;BR /&gt;
&lt;BR /&gt;
NOTE: SQL subquery execution methods chosen are:&lt;BR /&gt;
&lt;BR /&gt;
          sqxsubq&lt;BR /&gt;
              sqxuniq&lt;BR /&gt;
                  sqxsrc( WORK.IPJOIN )&lt;BR /&gt;
NOTE: Compressing data set WORK.IPBET increased size by 100.00 percent. &lt;BR /&gt;
      Compressed is 2 pages; un-compressed would require 1 pages.&lt;BR /&gt;
NOTE: Table WORK.IPBET created, with 6 rows and 2 columns.&lt;BR /&gt;
&lt;BR /&gt;
24         &lt;BR /&gt;
25         %LET _CLIENTTASKLABEL=;&lt;BR /&gt;
26         %LET _EGTASKLABEL=;&lt;BR /&gt;
2                                        The SAS System            08:54 Monday, August 25, 2008&lt;BR /&gt;
&lt;BR /&gt;
27         %LET _CLIENTPROJECTNAME=;&lt;BR /&gt;
28         %LET _SASPROGRAMFILE=;&lt;BR /&gt;
29         &lt;BR /&gt;
30         ;*';*";*/;quit;&lt;BR /&gt;
NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;
      real time           17:28.25&lt;BR /&gt;
      cpu time            2:12.49</description>
      <pubDate>Sun, 24 Aug 2008 23:54:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41878#M10856</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-08-24T23:54:03Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41879#M10857</link>
      <description>Can't you do a Select distinct on tableA, the do a ordinary inner join? Then the SQL optimizer may use an indexed join strategy, even a hash join if the result set from tableA is small enough.&lt;BR /&gt;
&lt;BR /&gt;
Regards,&lt;BR /&gt;
Linus</description>
      <pubDate>Mon, 25 Aug 2008 09:20:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41879#M10857</guid>
      <dc:creator>LinusH</dc:creator>
      <dc:date>2008-08-25T09:20:13Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41880#M10858</link>
      <description>If I'm reading the two queries correctly, it looks to me like query A is looking for a single account numbers while query B is looking for all account numbers.  In query A, is the macro variable &amp;amp;acctno a single number (I think it has to be)?  It doesn't look like query B has any filters that would result in a single account number being returned from the subquery.&lt;BR /&gt;
&lt;BR /&gt;
Depending on the number of accounts, this could be the difference between returning 2,000 rows and 20,000,000 rows.&lt;BR /&gt;
&lt;BR /&gt;
If this isn't the issue, I've always found it useful to try the same query in another application.  This can sometimes narrow down whether the difference is due to the query or due to the application.  I've run into situations where queries run much quicker in another application than in SAS (and vice versa).</description>
      <pubDate>Mon, 25 Aug 2008 20:30:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41880#M10858</guid>
      <dc:creator>1162</dc:creator>
      <dc:date>2008-08-25T20:30:00Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41881#M10859</link>
      <description>Yes the result of query A is a single client_key which I am then using to gather multiple rows in query B (multiple instances of ip_addr), usually a few hundred.&lt;BR /&gt;
As I posted I am used to running this in SQLServer and it runs in seconds.&lt;BR /&gt;
Running queryB with the actual key value in the WHERE clause rather than the subquery returns in seconds. It would seem therefore that the problem is the use of the subquery. &lt;BR /&gt;
What I am trying to find out is why, and if there is a solution. Being forced to use joins rather than subqueries, esp for such simple queries seems like a very crippled sql implimentation (if  that is the cause).&lt;BR /&gt;
I am new to SAS and Proc Sql and find that the performance of subqueries seems poor in general.</description>
      <pubDate>Mon, 25 Aug 2008 21:08:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41881#M10859</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-08-25T21:08:41Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41882#M10860</link>
      <description>I could, but the query show is the smallest part of what is eventually going to be a much more complicated query. The use of subqueries is one of the best features of sql, what I am trying to understand is the reason for the performance problem, I haven't come across it in other sql implementations. Is it an inherent feature of SAS sql that subqueries perform poorly? My own Db suggest so, but they are SAS coders and have little use for sql.</description>
      <pubDate>Mon, 25 Aug 2008 21:14:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41882#M10860</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-08-25T21:14:57Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41883#M10861</link>
      <description>Thanks for the clarification.  I think I understand the issue better.&lt;BR /&gt;
&lt;BR /&gt;
I think your problem is similar to one I ran into and has to do with whether the query is actually being processed on the database server or on your SAS machine.  &lt;BR /&gt;
&lt;BR /&gt;
When you run the query with the actual key value, it runs quickly because the entire query is passed to the database where it is processed and only the relevant records are passed back to SAS which then puts the results into a table.&lt;BR /&gt;
&lt;BR /&gt;
When you run the query with a subquery that uses an existing SAS dataset, that query can't be entirely passed to the database because of the reference to the SAS dataset.  In this case, the database sends back the entire table B (all 20 million rows) and SAS does the processing of subsetting this table with the SAS dataset created in query A.  One way you can see this is by watching the temporary file that is being created while the query is running.  You'll probably see it grow to some very large size, and then near the end it will be replaced with a smaller file representing the merge of table B with IPJOIN.&lt;BR /&gt;
&lt;BR /&gt;
Subqueries work efficiently when they can be passed entirely to the database for processing.  Whenever possible, I don't recommend using SAS datasets in subqueries to databases unless the database tables are small.  On the other hand, I've found that using macro variables in the query doesn't hurt performance.&lt;BR /&gt;
&lt;BR /&gt;
Can you try this query?  It uses the query you used to build IPJOIN as the subquery for IPBET, but doesn't use any SAS datasets.  The side benefit is that your programming will be more efficient: one less query to the database and one less SAS dataset created.  I have a feeling this will run faster.  Let us know how it works.&lt;BR /&gt;
&lt;BR /&gt;
[pre]&lt;BR /&gt;
PROC SQL;&lt;BR /&gt;
  CREATE TABLE IPBET AS &lt;BR /&gt;
  SELECT client_key, ip_addr&lt;BR /&gt;
  FROM tableB&lt;BR /&gt;
  WHERE client_key IN (SELECT DISTINCT client_key&lt;BR /&gt;
    FROM tableA &lt;BR /&gt;
    WHERE acct_no = &amp;amp;acctno);&lt;BR /&gt;
QUIT;&lt;BR /&gt;
[/pre]</description>
      <pubDate>Wed, 27 Aug 2008 16:51:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41883#M10861</guid>
      <dc:creator>1162</dc:creator>
      <dc:date>2008-08-27T16:51:32Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41884#M10862</link>
      <description>Consider using the SASTRACE= System Option to better see what's going on:&lt;BR /&gt;
&lt;A href="http://support.sas.com/onlinedoc/913/getDoc/en/acreldb.hlp/a000433982.htm" target="_blank"&gt;http://support.sas.com/onlinedoc/913/getDoc/en/acreldb.hlp/a000433982.htm&lt;/A&gt;&lt;BR /&gt;
&lt;BR /&gt;
There is also a whole section in the SAS Online Doc about optimizing SQL queries:&lt;BR /&gt;
&lt;A href="http://support.sas.com/onlinedoc/913/getDoc/en/acreldb.hlp/a002254657.htm" target="_blank"&gt;http://support.sas.com/onlinedoc/913/getDoc/en/acreldb.hlp/a002254657.htm&lt;/A&gt;&lt;BR /&gt;
&lt;BR /&gt;
And if it's "only" a table lookup then consider using the hash object:&lt;BR /&gt;
&lt;A href="http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a002585310.htm" target="_blank"&gt;http://support.sas.com/onlinedoc/913/getDoc/en/lrcon.hlp/a002585310.htm&lt;/A&gt;&lt;BR /&gt;
&lt;BR /&gt;
HTH&lt;BR /&gt;
Patrick</description>
      <pubDate>Wed, 27 Aug 2008 23:47:46 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41884#M10862</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2008-08-27T23:47:46Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41885#M10863</link>
      <description>No joy unfortunately execution time was the same, using the query below&lt;BR /&gt;
Is using the pass through ability the answer?&lt;BR /&gt;
&lt;BR /&gt;
   PROC SQL  _METHOD;&lt;BR /&gt;
&lt;BR /&gt;
   CREATE TABLE IPBET AS &lt;BR /&gt;
   SELECT DISTINCT client_key&lt;BR /&gt;
   ,ip_addr&lt;BR /&gt;
&lt;BR /&gt;
   FROM tableB&lt;BR /&gt;
   WHERE client_key IN (SELECT DISTINCT client_key&lt;BR /&gt;
		     FROM tableA&lt;BR /&gt;
		     WHERE acct_no = &amp;amp;acctno) &lt;BR /&gt;
;&lt;BR /&gt;
QUIT;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
The log produced was&lt;BR /&gt;
&lt;BR /&gt;
15         PROC SQL  _METHOD;&lt;BR /&gt;
16         &lt;BR /&gt;
17         CREATE TABLE IPBET AS&lt;BR /&gt;
18         SELECT DISTINCT b.client_key&lt;BR /&gt;
19         , b.ip_addr&lt;BR /&gt;
20         &lt;BR /&gt;
21         FROm ob_sand.ob_fact_betselection b&lt;BR /&gt;
22         WHERE b.CLIENT_KEY IN (SELECT client_key&lt;BR /&gt;
23         						FROM ob_sand.ob_dim_client_current&lt;BR /&gt;
24         						WHERE acct_no = &amp;amp;acctno)&lt;BR /&gt;
25         &lt;BR /&gt;
26         &lt;BR /&gt;
27         ;&lt;BR /&gt;
&lt;BR /&gt;
NOTE: SQL execution methods chosen are:&lt;BR /&gt;
&lt;BR /&gt;
      sqxcrta&lt;BR /&gt;
          sqxuniq&lt;BR /&gt;
              sqxsrc( OB_SAND.OB_FACT_BETSELECTION(alias = B) )&lt;BR /&gt;
&lt;BR /&gt;
NOTE: SQL subquery execution methods chosen are:&lt;BR /&gt;
&lt;BR /&gt;
          sqxsubq&lt;BR /&gt;
              sqxsrc( OB_SAND.OB_DIM_CLIENT_CURRENT )&lt;BR /&gt;
NOTE: Compressing data set WORK.IPBET increased size by 100.00 percent. &lt;BR /&gt;
      Compressed is 2 pages; un-compressed would require 1 pages.&lt;BR /&gt;
2                                        The SAS System          09:46 Thursday, August 28, 2008&lt;BR /&gt;
&lt;BR /&gt;
NOTE: Table WORK.IPBET created, with 2 rows and 2 columns.&lt;BR /&gt;
&lt;BR /&gt;
28         QUIT;&lt;BR /&gt;
NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;
      real time           18:52.92&lt;BR /&gt;
      cpu time            2:18.12&lt;BR /&gt;
      &lt;BR /&gt;
&lt;BR /&gt;
29         &lt;BR /&gt;
30         %LET _CLIENTTASKLABEL=;&lt;BR /&gt;
31         %LET _EGTASKLABEL=;&lt;BR /&gt;
32         %LET _CLIENTPROJECTNAME=;&lt;BR /&gt;
33         %LET _SASPROGRAMFILE=;&lt;BR /&gt;
34         &lt;BR /&gt;
35         ;*';*";*/;quit;run;&lt;BR /&gt;
36         ODS _ALL_ CLOSE;&lt;BR /&gt;
37         &lt;BR /&gt;
38         &lt;BR /&gt;
39         QUIT; RUN;</description>
      <pubDate>Wed, 27 Aug 2008 23:51:14 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41885#M10863</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-08-27T23:51:14Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41886#M10864</link>
      <description>Both tables are SAS tables. Do I get that right?&lt;BR /&gt;
How many observations are in table A?&lt;BR /&gt;
&lt;BR /&gt;
Have a look at the example code below. This runs (even not indexed) in reasonable time.&lt;BR /&gt;
What's the difference to your real code?&lt;BR /&gt;
&lt;BR /&gt;
%let acctno=1;&lt;BR /&gt;
data tableA;&lt;BR /&gt;
  do client_key=1 to 1000;&lt;BR /&gt;
    acct_no=int(round(ranuni(0)));&lt;BR /&gt;
    output;&lt;BR /&gt;
  end;&lt;BR /&gt;
run;&lt;BR /&gt;
data tableB;&lt;BR /&gt;
  do ip_addr=1 to 3*10**6;  /* -&amp;gt; 3M obs */&lt;BR /&gt;
    client_key=ceil(ranuni(0)*10000);&lt;BR /&gt;
    output;&lt;BR /&gt;
  end;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
PROC SQL _METHOD;&lt;BR /&gt;
  CREATE TABLE IPBET AS &lt;BR /&gt;
    SELECT DISTINCT client_key, ip_addr&lt;BR /&gt;
      FROM tableB&lt;BR /&gt;
      WHERE client_key IN &lt;BR /&gt;
        (SELECT DISTINCT client_key&lt;BR /&gt;
          FROM tableA&lt;BR /&gt;
          WHERE acct_no = &amp;amp;acctno) &lt;BR /&gt;
    ;&lt;BR /&gt;
QUIT;

Message was edited by: Patrick</description>
      <pubDate>Thu, 28 Aug 2008 00:21:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41886#M10864</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2008-08-28T00:21:09Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41887#M10865</link>
      <description>Sorry Patrick, yes both are existing SAS tables (learn somethig new everyday)&lt;BR /&gt;
I don't know base sas, just sql.</description>
      <pubDate>Thu, 28 Aug 2008 03:53:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41887#M10865</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-08-28T03:53:37Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41888#M10866</link>
      <description>The example code I've posted does nothing else than creating two SAS tables tableA and tableB. TableA has 1000 records and 2 variables (acct_no and client_key), tableB 3million (with variables client_key and ip_addr).&lt;BR /&gt;
&lt;BR /&gt;
After that I'm doing nothing else than applying the SQL code you've provided - and everything runs in reasonable time on my laptop (Windows XP, SAS9.1.3).&lt;BR /&gt;
&lt;BR /&gt;
The SQL creates a table with around 150000 rows and needs around 3.5 seconds CPU (most of it will be I/O writing the rows).  Even by increasing tableA to 1M rows my job takes less than 30 seconds CPU&lt;BR /&gt;
&lt;BR /&gt;
You're talking about 20 minutes. Looking at the log it's:&lt;BR /&gt;
real time 18:52.92&lt;BR /&gt;
cpu time 2:18.12&lt;BR /&gt;
&lt;BR /&gt;
This huge gap between cpu time (2.18 min CPU still seems too long) and real time looks as if this SAS server is quite busy (SAS EG is only a client, the code is processed on the server). The long CPU time could also be due to I/O. How big is the result table - and how fast is the disc (most certainly the server disc) with you SAS work space?&lt;BR /&gt;
&lt;BR /&gt;
As you see: I assume it's less a SAS SQL issue than an infrastructure/installation issue. &lt;BR /&gt;
&lt;BR /&gt;
What you could do is to run the code I've posted and check how much time it takes. If your close to your real job then you know that it's about the SAS Server and you should contact your SAS admin, if the CPU time is close to what I get here on my laptop then I would look where the datasets you're accessing are stored (may be on another server with a slow connection to the SAS server).&lt;BR /&gt;
If both cases above don't provide an explanation then you should check what's different between the test code and the real code. May be one of your tables is a bit bigger than you think?&lt;BR /&gt;
&lt;BR /&gt;
HTH&lt;BR /&gt;
Patrick&lt;BR /&gt;
&lt;BR /&gt;
P.S: Pass-through SQL is not a solution as this is used to pass SQL code from SAS directly to a DBMS (eg. ORACLE) and have it executed there.</description>
      <pubDate>Thu, 28 Aug 2008 05:28:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41888#M10866</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2008-08-28T05:28:43Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41889#M10867</link>
      <description>Thanks for all the replies so far guys.&lt;BR /&gt;
The SAS server is an old box (single cpu) and we are long over due for a hardware upgrade, not sure of the disk speed.&lt;BR /&gt;
&lt;BR /&gt;
Table A is 441,267 obs 115 variables&lt;BR /&gt;
Table B is 22,983,750 obs 57 variables.&lt;BR /&gt;
&lt;BR /&gt;
My typical return from the query is 100 rows or so.&lt;BR /&gt;
&lt;BR /&gt;
Your code with a 146,000 row return ran in the times below:&lt;BR /&gt;
&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           3.64 seconds&lt;BR /&gt;
      cpu time            3.57 seconds&lt;BR /&gt;
&lt;BR /&gt;
NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;
      real time           5.92 seconds&lt;BR /&gt;
      cpu time            5.85 seconds</description>
      <pubDate>Thu, 28 Aug 2008 05:59:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41889#M10867</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-08-28T05:59:49Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41890#M10868</link>
      <description>O.K., it's now pretty obvious:&lt;BR /&gt;
I adopted the example code to the size of the tables you’re fighting with (number of columns and rows) and get now similar results.&lt;BR /&gt;
NOTE: PROCEDURE SQL used (Total process time):&lt;BR /&gt;
      real time           22:22.71&lt;BR /&gt;
      cpu time            3:51.10&lt;BR /&gt;
&lt;BR /&gt;
The reason is I/0! It just takes a long time to read tableB.&lt;BR /&gt;
&lt;BR /&gt;
I’m using an external USB drive (slow) and the data step for writing tableB takes already a LONG time.&lt;BR /&gt;
NOTE: The data set OB_SAND.TABLEB has 23000000 observations and 57 variables.&lt;BR /&gt;
NOTE: DATA statement used (Total process time):&lt;BR /&gt;
      real time           25:11.86&lt;BR /&gt;
      cpu time            4:49.20&lt;BR /&gt;
&lt;BR /&gt;
So: Everything o.k. with the SQL, it’s not a processing problem. I think this case is a good example for when data should be stored in a DBMS and not in SAS files.&lt;BR /&gt;
It seems that SAS has to scan the whole SAS file in order to pick the few relevant fields (running the same code with fewer variables in the table decreases time immensely).&lt;BR /&gt;
&lt;BR /&gt;
Conclusion&lt;BR /&gt;
A faster server disk and motherboard will decrease the time needed but best would be to store the data in a DBMS (eg. Teradata, SQL server, ORACLE or also SAS SPD Server) as a DBMS can accesses data a bit more intelligent.&lt;BR /&gt;
&lt;BR /&gt;
HTH&lt;BR /&gt;
Patrick&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
P.S: And here the code with the long run-time:&lt;BR /&gt;
&lt;BR /&gt;
libname OB_SAND 'e:\tests';&lt;BR /&gt;
%let acctno=1;&lt;BR /&gt;
data OB_SAND.tableA;&lt;BR /&gt;
  drop i;&lt;BR /&gt;
  array VarsA {113} $ 20;&lt;BR /&gt;
  do i=1 to dim(VarsA);&lt;BR /&gt;
    VarsA{i}='Just some content';&lt;BR /&gt;
  end;&lt;BR /&gt;
  do client_key=1 to 441267;&lt;BR /&gt;
    acct_no=int(round(ranuni(0)));&lt;BR /&gt;
    output;&lt;BR /&gt;
  end;&lt;BR /&gt;
run;&lt;BR /&gt;
data OB_SAND.tableB;&lt;BR /&gt;
  drop i;&lt;BR /&gt;
  array VarsB {55} $ 20;&lt;BR /&gt;
  do i=1 to dim(VarsB);&lt;BR /&gt;
    VarsB{i}='Just some content';&lt;BR /&gt;
  end;&lt;BR /&gt;
  do ip_addr=1 to 23*10**6;&lt;BR /&gt;
    client_key=ceil(ranuni(0)*5*10**10);&lt;BR /&gt;
    output;&lt;BR /&gt;
  end;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
PROC SQL _METHOD;&lt;BR /&gt;
  CREATE TABLE work.IPBET AS&lt;BR /&gt;
    SELECT DISTINCT client_key, ip_addr&lt;BR /&gt;
      FROM OB_SAND.tableB&lt;BR /&gt;
      WHERE client_key IN&lt;BR /&gt;
        (SELECT DISTINCT client_key&lt;BR /&gt;
          FROM OB_SAND.tableA&lt;BR /&gt;
          WHERE acct_no = &amp;amp;acctno)&lt;BR /&gt;
    ;&lt;BR /&gt;
QUIT;</description>
      <pubDate>Thu, 28 Aug 2008 19:20:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41890#M10868</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2008-08-28T19:20:09Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41891#M10869</link>
      <description>Ok, case of right tool for the job. &lt;BR /&gt;
Thanks for your time and assistance Patrick.</description>
      <pubDate>Thu, 28 Aug 2008 21:41:32 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41891#M10869</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-08-28T21:41:32Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41892#M10870</link>
      <description>I just joined a project with a similar problem: A lot of observations and variables and performance problems (also a Star model).&lt;BR /&gt;
&lt;BR /&gt;
The solution was: Horizontal and vertical fragmentation.&lt;BR /&gt;
&lt;BR /&gt;
So: Split the biggest SAS tables in several tables and create views. For your case take a view with as little as possible fields (or access the appropriate SAS table directly).&lt;BR /&gt;
&lt;BR /&gt;
I'm aware that this is a design change as the ETL jobs have to be adopted - but it might be worth it if the problems persist.&lt;BR /&gt;
&lt;BR /&gt;
Regards&lt;BR /&gt;
Patrick</description>
      <pubDate>Tue, 02 Sep 2008 10:48:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41892#M10870</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2008-09-02T10:48:19Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41893#M10871</link>
      <description>Thanks Patrick, that is exactly what I ended up asking for.</description>
      <pubDate>Tue, 02 Sep 2008 21:21:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41893#M10871</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2008-09-02T21:21:51Z</dc:date>
    </item>
    <item>
      <title>Re: SAS Sql subquery performance problem</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41894#M10872</link>
      <description>"It seems that SAS has to scan the whole SAS file in order to pick the few relevant fields (running the same code with fewer variables in the table decreases time immensely)."&lt;BR /&gt;
&lt;BR /&gt;
Have you tried creating an index on your SAS dataset?  That might speed things up.&lt;BR /&gt;
&lt;BR /&gt;
PROC DATASETS LIBRARY=libname NOLIST;&lt;BR /&gt;
MODIFY dataset;&lt;BR /&gt;
INDEX CREATE varname;&lt;BR /&gt;
RUN;&lt;BR /&gt;
QUIT;</description>
      <pubDate>Fri, 26 Sep 2008 13:56:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/SAS-Sql-subquery-performance-problem/m-p/41894#M10872</guid>
      <dc:creator>1162</dc:creator>
      <dc:date>2008-09-26T13:56:52Z</dc:date>
    </item>
  </channel>
</rss>

