BookmarkSubscribeRSS Feed
chuckdee4
Calcite | Level 5
Hi there, i have an issue with certain large datasets that i am currently working on.
They contain way too much data and are a nightmare to query.

I just wanted to find out the best way of querying such datasets i.e DATA steps or PROC SQL or any other way.
Plus if there are any tips of working with such datasets it would be much appreciated.
3 REPLIES 3
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Nearly always, it will be a responses like: "It depends.", "your mileage may vary." and "What do you consider to be large?"

The SAS system is influenced not only by data but mostly by the operating environment (client, server and/or both).

However, some SAS features to consider exploiting are listed below:

- SAS view
- SAS index
- WHERE statement / clause, instead of IF.
- use or don't use COMPRESS= option for efficiency (data dependent).
- using PROC FORMAT for user-defined formatted display rather than un-normalized SAS data (where extra data variables are carried along unnecessarily).

Do take advantage of the SAS.COM support website where there are topic-related papers, technical reference material, as well as SAS-hosted documentation, such as a "companion" guide for each supported Operating System (OS) environment where SAS runs.

Suggest you get your "query" code defined, tested, then come back to the forum with a specific "performance" or efficiency/effectiveness issue / problem / question, for focused attention / feedback from the forum subscribers.

Scott Barry
SBBWorks, Inc.
Ksharp
Super User
As far as I know Proc format and Hash Table are most fast way to execute query,especially for large table.



Ksharp
Peter_C
Rhodochrosite | Level 12
SAS Scalable Performance Data Server provides an engine to handle large data As it is not always available, you may find the smaller brother SPDE (a SAS library engine) helpful. SPDE achieves performance in several ways. I think the main 2 are partitioning and index optimisation. It is just great the way multi-gigabyte tables perform when partitioned and indexed well. Use the system option MSGLEVEL=i to see which indexes are used.
the great thing about the (big brother) SPDS is that it is a separate server - increasing the capacity of the service to solve your query, and it provides further index handling optimisation. It is a bit like a database server optimised for SAS queries.
Good luck
peterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 960 views
  • 0 likes
  • 4 in conversation