- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I've been working with large datasets, and I'm looking for tips and techniques to optimize performance. Specifically, I'm interested in strategies for improving processing speed and efficiency.
Any insights or best practices from your experiences would be greatly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
How large is "large"? How many rows? How many columns?
Different types of processing may require different optimization strategies. What processing are you doing?
Some things are always a good idea. For example:
- Drop columns that are not needed
- Pass through the data as few times as possible (I see people do a PROC SQL to extract data followed by another PROC SQL on the extracted data to add one new column which is a mathematical operation, like addition, of columns; this could be done all in one SQL)
- Use SAS procedures whenever possible, rather than writing your own code to do something (I see people writing their own code to compute an average, and sometimes get it wrong)
- Use BY processing whenever possible rather than loops
I'm sure there are other good ideas to improve the speed at which your program executes.
And do your best to learn and follow these rules: https://communities.sas.com/t5/SAS-Communities-Library/Maxims-of-Maximally-Efficient-SAS-Programmers...
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Other tips:
- Have the data stored on high speed storage
- Create indexes for columns common for filtering
- Move the data to a SPDE library
- Move the data to a external high speed (for querying) database
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, the acronym SPDE means nothing to me.
Paige Miller