BookmarkSubscribeRSS Feed
chuckdee4
Calcite | Level 5
Hi there, i have an issue with certain large datasets that i am currently working on.
They contain way too much data and are a nightmare to query.

I just wanted to find out the best way of querying such datasets i.e DATA steps or PROC SQL or any other way.
Plus if there are any tips of working with such datasets it would be much appreciated.
3 REPLIES 3
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Nearly always, it will be a responses like: "It depends.", "your mileage may vary." and "What do you consider to be large?"

The SAS system is influenced not only by data but mostly by the operating environment (client, server and/or both).

However, some SAS features to consider exploiting are listed below:

- SAS view
- SAS index
- WHERE statement / clause, instead of IF.
- use or don't use COMPRESS= option for efficiency (data dependent).
- using PROC FORMAT for user-defined formatted display rather than un-normalized SAS data (where extra data variables are carried along unnecessarily).

Do take advantage of the SAS.COM support website where there are topic-related papers, technical reference material, as well as SAS-hosted documentation, such as a "companion" guide for each supported Operating System (OS) environment where SAS runs.

Suggest you get your "query" code defined, tested, then come back to the forum with a specific "performance" or efficiency/effectiveness issue / problem / question, for focused attention / feedback from the forum subscribers.

Scott Barry
SBBWorks, Inc.
Ksharp
Super User
As far as I know Proc format and Hash Table are most fast way to execute query,especially for large table.



Ksharp
Peter_C
Rhodochrosite | Level 12
SAS Scalable Performance Data Server provides an engine to handle large data As it is not always available, you may find the smaller brother SPDE (a SAS library engine) helpful. SPDE achieves performance in several ways. I think the main 2 are partitioning and index optimisation. It is just great the way multi-gigabyte tables perform when partitioned and indexed well. Use the system option MSGLEVEL=i to see which indexes are used.
the great thing about the (big brother) SPDS is that it is a separate server - increasing the capacity of the service to solve your query, and it provides further index handling optimisation. It is a bit like a database server optimised for SAS queries.
Good luck
peterC

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1245 views
  • 0 likes
  • 4 in conversation