Indexes

Accepted Solution Solved
Reply
Super Contributor
Super Contributor
Posts: 444
Accepted Solution

Indexes

this below is a part of an article i found online but i dont understand the bolded one. What they mean by page count? Some clarification please Smiley Happy

Here are a few things to keep in mind before

creating an index:

- If the page count as displayed in the CONTENTS procedure is less than 3 pages, avoid creating an index

- If the percentage of matches is approximately 15% or less then an index should be used


Accepted Solutions
Solution
‎04-29-2013 11:52 AM
Super Contributor
Posts: 418

Re: Indexes

When you do proc contents it gives you information about your datset. One of them is "number of Data set pages". All they're saying is that if this number is less than 3, do not add an index as it will actually slow down your data joins.

In order to fully understand why this is true, you would need to do some research on how data pages work, what they are, how indexes change them, etc... In short there are quite a few documents online that could help with this goal.

View solution in original post


All Replies
Solution
‎04-29-2013 11:52 AM
Super Contributor
Posts: 418

Re: Indexes

When you do proc contents it gives you information about your datset. One of them is "number of Data set pages". All they're saying is that if this number is less than 3, do not add an index as it will actually slow down your data joins.

In order to fully understand why this is true, you would need to do some research on how data pages work, what they are, how indexes change them, etc... In short there are quite a few documents online that could help with this goal.

Super Contributor
Super Contributor
Posts: 444

Re: Indexes

Posted in reply to Anotherdream

thank you a lot

Super User
Posts: 5,513

Re: Indexes

Loosely defined, a page is a chunk of storage space.  More pages = a bigger data set.

Keep in mind that indexes were created before hashing was implemented in SAS.  The article you are reading may be outdated.

Good luck.

Super Contributor
Super Contributor
Posts: 444

Re: Indexes

Posted in reply to Astounding

thanks .

is hashing replacing indexes now?

Super User
Posts: 5,513

Re: Indexes

It's not necessarily a replacement, but it can perform very similar functions.  The major differences (in my mind of course) ...

Indexes last until the data set changes.  So they can used over and over but need to be recreated whenever the data set changes.  But hash definitions last only for the duration of the current DATA step.  So every DATA step that utilizes hashing will have to redefine the hash table.

On the other hand, indexing is one of the slower forms of look-up.  (Rick Langston has written a short paper about this.)  It has to retrieve data from where it is stored on disk (that's a simplified description), so the I/O can add up.  But hash tables are stored in memory and are extremely fast.

Super Contributor
Super Contributor
Posts: 444

Re: Indexes

Posted in reply to Astounding

thanks for the explanation Astounding.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 216 views
  • 3 likes
  • 3 in conversation