The SAS Output Delivery System and reporting techniques

Best Approach to querying LARGE datasets

Reply
Occasional Contributor
Posts: 10

Best Approach to querying LARGE datasets

Hi there, i have an issue with certain large datasets that i am currently working on.
They contain way too much data and are a nightmare to query.

I just wanted to find out the best way of querying such datasets i.e DATA steps or PROC SQL or any other way.
Plus if there are any tips of working with such datasets it would be much appreciated.
Super Contributor
Super Contributor
Posts: 3,174

Re: Best Approach to querying LARGE datasets

Nearly always, it will be a responses like: "It depends.", "your mileage may vary." and "What do you consider to be large?"

The SAS system is influenced not only by data but mostly by the operating environment (client, server and/or both).

However, some SAS features to consider exploiting are listed below:

- SAS view
- SAS index
- WHERE statement / clause, instead of IF.
- use or don't use COMPRESS= option for efficiency (data dependent).
- using PROC FORMAT for user-defined formatted display rather than un-normalized SAS data (where extra data variables are carried along unnecessarily).

Do take advantage of the SAS.COM support website where there are topic-related papers, technical reference material, as well as SAS-hosted documentation, such as a "companion" guide for each supported Operating System (OS) environment where SAS runs.

Suggest you get your "query" code defined, tested, then come back to the forum with a specific "performance" or efficiency/effectiveness issue / problem / question, for focused attention / feedback from the forum subscribers.

Scott Barry
SBBWorks, Inc.
Super User
Posts: 9,671

Re: Best Approach to querying LARGE datasets

As far as I know Proc format and Hash Table are most fast way to execute query,especially for large table.



Ksharp
Valued Guide
Posts: 2,174

Re: Best Approach to querying LARGE datasets

SAS Scalable Performance Data Server provides an engine to handle large data As it is not always available, you may find the smaller brother SPDE (a SAS library engine) helpful. SPDE achieves performance in several ways. I think the main 2 are partitioning and index optimisation. It is just great the way multi-gigabyte tables perform when partitioned and indexed well. Use the system option MSGLEVEL=i to see which indexes are used.
the great thing about the (big brother) SPDS is that it is a separate server - increasing the capacity of the service to solve your query, and it provides further index handling optimisation. It is a bit like a database server optimised for SAS queries.
Good luck
peterC
Ask a Question
Discussion stats
  • 3 replies
  • 214 views
  • 0 likes
  • 4 in conversation