BookmarkSubscribeRSS Feed
Haris
Lapis Lazuli | Level 10

I can't find any information on this: will index benefit PROC SQL code that uses ORDER BY and, especially, GROUP BY?

8 REPLIES 8
Haris
Lapis Lazuli | Level 10

... and while we're on the topic, if anyone knows if INDEX would speed up procedures that run random effects models such as MIXED and GLIMMIX that block the dataset by GROUPID, please let us know.

mkeintz
PROC Star

@Haris wrote:

... and while we're on the topic, if anyone knows if INDEX would speed up procedures that run random effects models such as MIXED and GLIMMIX that block the dataset by GROUPID, please let us know.


If the data are indexed by GROUPID but not sorted by GROUPID, I can imagine beneficial usage of indexes if you use a "BY GROUPID" statement. 

 

But I don't think that is what you mean by "block the dataset".  I presume you want an analysis that includes some sort of between-block contrasts.  In that case I don't see how indexing helps, since the entire dataset (not a subset) needs to be read in.  If so, it would be faster to read the whole dataset sequentially (i.e. not in block order), and accumulate the necessary statistics in memory for each block.

 

Maybe there are some types of iterative maximum-likelihood estimation techniques that benefit from re-ordering the data, but I assume the iterations that requiring re-reading data use a dynamically created (and organized) utility file - at least for large data collections that can't be efficiently managed in memory.

 

Or am I missing the question entirely?

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Haris
Lapis Lazuli | Level 10
You got it right. Since a categorical random effects model processes the data by levels of the random variable, I am wondering if index of that random factor gets used in computation.
mkeintz
PROC Star

@Haris wrote:
You got it right. Since a categorical random effects model processes the data by levels of the random variable, I am wondering if index of that random factor gets used in computation.

Well, I can't  say I know the answer to that question.

 

But I can say that I don't see any advantage in this case to indexing the random variable.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
jvdl
Obsidian | Level 7

According to the fourth edition of the Advanced Prep Guide for SAS 9, pages 229 and 230, the following queries can be optimised by an index:

 

  • a WHERE clause expression that contains a comparison operator, the TRIM or SUBSTR function, the CONTAINS operator, the LIKE operator
  • a subquery returning values to the IN operator
  • a correlated subquery in which the column being compared with the correlated reference is indexed
  • a join in which the join expression contains the “=” operator and/or all the columns in the join expression are indexed in one of the tables being joined
Haris
Lapis Lazuli | Level 10

When I run PROC SQL with GROUP BY, SAS log does not contain any mention of INDEX use but a specific program runs in just under 3 minutes instead of just under 4 minutes prior to the implementation of the INDEX.  Not a rigorous test but that's all I got.

yabwon
Onyx | Level 15

Hi,

 

did you try `options msglevel=i;` ?

that should display info if an index was used.

 

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Haris
Lapis Lazuli | Level 10
I did. There was no note about INDEX but 25% of the execution time was shaved off.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 8 replies
  • 2075 views
  • 0 likes
  • 4 in conversation