BookmarkSubscribeRSS Feed
Karen
Calcite | Level 5

I have a list of of about 20,000 ids (length $20.). There is a SQL Server database table on a server somewhere with about 2 billion records. I need to get all the records from that table for my list of ids. I work on a 64-bit PC and connect to the database with an OLEDB libname A. I tried the following code:

proc sql;

create table NEW as select * from id_list i, A.bigTable t where i.id = t.id;

quit;

This took forever to run and failed because my server connection timed out at 8 hours. So I added a variable to id_list (obsnum=_n_;) and modified the syntax to:

create table NEW as select * from id_list (where=(1 le obsnum le 1000)) i, A.bigTable t where i.id = t.id;

This took 12 seconds (real time) to run. So I wrote a macro to do this and ran it 20 times, giving it a new thousand-id range each time.

Then I thought, why not try 2000 ids at a time, instead of 1000. 30 minutes later, this query was not complete so I stopped it.

My question is: what accounts for the huge increase in time when I try to get records for more than 1000 ids at a time? Is there a better method to go against a very large table to find records?

6 REPLIES 6
ballardw
Super User

How you connect may be an issue but one of the first things is only select the variables you need.

The selection may be quicker as pass-through as well.

Haikuo
Onyx | Level 15

Is it possible that you have missing IDs after first 1000 IDs?

Tom
Super User Tom
Super User

Can you create a table in the SQL database to store the list?

Otherwise you might be better off using the SAS dataset to generate WHERE clauses against the large table.

Here is an example. You might have to adjust if your ID variable is character or if it is too long to fit 1000 values into a single macro variable.

proc sql noprint ;

  select distinct id into :idlist1 separated by ',' from id_list (firstobs=1 obs=1000) ;

  select distinct id into :idlist2 separated by ',' from id_list (firstobs=1001 obs=2000) ;

  create table NEW as select * from A.BIGTABLE

  where id in (&idlist1,&idlist2)

  ;

quit;



Ksharp
Super User

You'd better create a INDEX for that prime key variable for a large table .and try to use PASS-Through SQL ,let SQL Server to do such heavy work.

Patrick
Opal | Level 21

Is in your libname pointing to the SQL Server option "MULTI_DATASRC_OPT=IN_CLAUSE" defined?

Use below option in your code to see in the SAS log what code actually gets sent to the dbms

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;

Is the code executed for 1000 and 2000 obs at a time the same? Just thinking that may-be you're exceeding with 2000 id's some threshold where SAS needs to generate different SQL code.

LinusH
Tourmaline | Level 20

My guess is that you are moving the whole source table to SAS for the join.

Try the DBKEY= ds option for the cross platform join.

If that doesn't work for you, consider to use a temporary table in SQLS so you are sure that the whole join will take place where the data is.

As said, be sure that the id column is indexed.

Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1213 views
  • 0 likes
  • 7 in conversation