Help using Base SAS procedures

How to get a small set of records from a very large table

Reply
Occasional Contributor
Posts: 5

How to get a small set of records from a very large table

I have a list of of about 20,000 ids (length $20.). There is a SQL Server database table on a server somewhere with about 2 billion records. I need to get all the records from that table for my list of ids. I work on a 64-bit PC and connect to the database with an OLEDB libname A. I tried the following code:

proc sql;

create table NEW as select * from id_list i, A.bigTable t where i.id = t.id;

quit;

This took forever to run and failed because my server connection timed out at 8 hours. So I added a variable to id_list (obsnum=_n_Smiley Wink and modified the syntax to:

create table NEW as select * from id_list (where=(1 le obsnum le 1000)) i, A.bigTable t where i.id = t.id;

This took 12 seconds (real time) to run. So I wrote a macro to do this and ran it 20 times, giving it a new thousand-id range each time.

Then I thought, why not try 2000 ids at a time, instead of 1000. 30 minutes later, this query was not complete so I stopped it.

My question is: what accounts for the huge increase in time when I try to get records for more than 1000 ids at a time? Is there a better method to go against a very large table to find records?

Super User
Posts: 11,343

Re: How to get a small set of records from a very large table

How you connect may be an issue but one of the first things is only select the variables you need.

The selection may be quicker as pass-through as well.

Respected Advisor
Posts: 3,156

Re: How to get a small set of records from a very large table

Is it possible that you have missing IDs after first 1000 IDs?

Super User
Super User
Posts: 7,055

Re: How to get a small set of records from a very large table

Can you create a table in the SQL database to store the list?

Otherwise you might be better off using the SAS dataset to generate WHERE clauses against the large table.

Here is an example. You might have to adjust if your ID variable is character or if it is too long to fit 1000 values into a single macro variable.

proc sql noprint ;

  select distinct id into :idlist1 separated by ',' from id_list (firstobs=1 obs=1000) ;

  select distinct id into :idlist2 separated by ',' from id_list (firstobs=1001 obs=2000) ;

  create table NEW as select * from A.BIGTABLE

  where id in (&idlist1,&idlist2)

  ;

quit;



Super User
Posts: 10,035

Re: How to get a small set of records from a very large table

You'd better create a INDEX for that prime key variable for a large table .and try to use PASS-Through SQL ,let SQL Server to do such heavy work.

Respected Advisor
Posts: 4,173

Re: How to get a small set of records from a very large table

Is in your libname pointing to the SQL Server option "MULTI_DATASRC_OPT=IN_CLAUSE" defined?

Use below option in your code to see in the SAS log what code actually gets sent to the dbms

OPTIONS SASTRACE=',,,d' SASTRACELOC=SASLOG NOSTSUFFIX;

Is the code executed for 1000 and 2000 obs at a time the same? Just thinking that may-be you're exceeding with 2000 id's some threshold where SAS needs to generate different SQL code.

Super User
Posts: 5,430

Re: How to get a small set of records from a very large table

My guess is that you are moving the whole source table to SAS for the join.

Try the DBKEY= ds option for the cross platform join.

If that doesn't work for you, consider to use a temporary table in SQLS so you are sure that the whole join will take place where the data is.

As said, be sure that the id column is indexed.

Data never sleeps
Ask a Question
Discussion stats
  • 6 replies
  • 273 views
  • 0 likes
  • 7 in conversation