DATA Step, Macro, Functions and more

Exploit Index with Large List

Reply
Contributor
Posts: 38

Exploit Index with Large List

Is there an efficient way to exploit an index (45k unique) using a list of ~1k values without entering them into the WHERE? Is there some method of loading the list into a hash and pulling it into the where instead of into a subsetting if?

Super User
Posts: 5,260

Re: Exploit Index with Large List

Not sure what you are asking. What is your "problem"?
How do you select your 1k values?
If you have your 1k values in a lookup table you could either fill them into a macro variable to be used in a WHERE statement.
Or a simple inner join, if you are lucky SQL will use hash join as it's internal plan.

Using a data step hash makes no sense (if I ubders5your inquiry correctly ), it will require a table scan, and will be able to use the index.
Data never sleeps
Contributor
Posts: 38

Re: Exploit Index with Large List

I guess another way to state "Is there a way to exploit an index outside of a WHERE"?

 

I'm not finding anything.

 

Background is that we need 6 tables (and counting...) attached to a table with 1.9 billion rows on a persistent basis. A full table scan is not an option, and SQL gets messy really quickly. There is an indexed field in common with one of the attached tables that needs to be filtered on.

 

The best I can think of is what you said, to populate a macro variable and load that into the WHERE.

Super User
Posts: 3,115

Re: Exploit Index with Large List

It would help if you could post your SQL so what you are trying to achieve is clearer to everyone.

Super User
Posts: 5,260

Re: Exploit Index with Large List

Still understand the "outside where" part.
Sounds you need to optimize joins between lookup tables and a large master table. So the best you can come with is either a where in () or SQL with indexed or hash join.

1.9 billion rows, in a Base SAS libname?
You could exploit SPDE, it may not optimize joins, but deals with indexes (update, where evaluation) better than Base.

If this is a really important use case, consider invest in SPD Server which has an index type specific for join optimization, and a star schema planner (which builds this in () clause automatically behind the scenes).
Data never sleeps
Ask a Question
Discussion stats
  • 4 replies
  • 194 views
  • 0 likes
  • 3 in conversation