Does the order of the data in a composite index matter? I have a composite index of an 8 character key + 4 character year. The year can only be one of 20 past years, the key can be any combination of letters and numbers. If my data has 100 million rows, approximately 5 million for each year, does it matter if I create a composite index as key+year or year+key?
I am wondering if the subsetting of the data by year would make the index (year+key) a bit faster than the other way around.
Is your data sorted? Usually you would try to mimic the sorted order of the data as the primary key. (It also depends on the patterns of records that you want to retrieve.) Here's why.
When retrieving a record using an index, SAS doesn't retrieve data directly from the source data set. Rather, it moves a block of records (that includes the next one you want) into memory. From that block, it retrieves the next record that you want.
The major cost is moving the block of data that contains the next desired record. Retrieving an individual record is a minimal cost. So minimize the movement of blocks of records.
Is your data sorted? Usually you would try to mimic the sorted order of the data as the primary key. (It also depends on the patterns of records that you want to retrieve.) Here's why.
When retrieving a record using an index, SAS doesn't retrieve data directly from the source data set. Rather, it moves a block of records (that includes the next one you want) into memory. From that block, it retrieves the next record that you want.
The major cost is moving the block of data that contains the next desired record. Retrieving an individual record is a minimal cost. So minimize the movement of blocks of records.
The data is not sorted. It could possibly take too much time to sort the 100 million records before I need to kick off the next process in a short monthly production window. However, I am testing this out. I am also going to check if separating this dataset into multiple datasets (by year/multiple years) would make it any faster. Thanks for the suggestion!
Why not make 2 composite indices, and run some tests?
I suspect, as @Astounding suggested, that you should order the components of a composite index that most closely approximate the physical record order of the indexed data set.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.