I can't find any information on this: will index benefit PROC SQL code that uses ORDER BY and, especially, GROUP BY?
... and while we're on the topic, if anyone knows if INDEX would speed up procedures that run random effects models such as MIXED and GLIMMIX that block the dataset by GROUPID, please let us know.
@Haris wrote:
... and while we're on the topic, if anyone knows if INDEX would speed up procedures that run random effects models such as MIXED and GLIMMIX that block the dataset by GROUPID, please let us know.
If the data are indexed by GROUPID but not sorted by GROUPID, I can imagine beneficial usage of indexes if you use a "BY GROUPID" statement.
But I don't think that is what you mean by "block the dataset". I presume you want an analysis that includes some sort of between-block contrasts. In that case I don't see how indexing helps, since the entire dataset (not a subset) needs to be read in. If so, it would be faster to read the whole dataset sequentially (i.e. not in block order), and accumulate the necessary statistics in memory for each block.
Maybe there are some types of iterative maximum-likelihood estimation techniques that benefit from re-ordering the data, but I assume the iterations that requiring re-reading data use a dynamically created (and organized) utility file - at least for large data collections that can't be efficiently managed in memory.
Or am I missing the question entirely?
@Haris wrote:
You got it right. Since a categorical random effects model processes the data by levels of the random variable, I am wondering if index of that random factor gets used in computation.
Well, I can't say I know the answer to that question.
But I can say that I don't see any advantage in this case to indexing the random variable.
According to the fourth edition of the Advanced Prep Guide for SAS 9, pages 229 and 230, the following queries can be optimised by an index:
When I run PROC SQL with GROUP BY, SAS log does not contain any mention of INDEX use but a specific program runs in just under 3 minutes instead of just under 4 minutes prior to the implementation of the INDEX. Not a rigorous test but that's all I got.
Hi,
did you try `options msglevel=i;` ?
that should display info if an index was used.
All the best
Bart
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.