BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
set_all__
Fluorite | Level 6

Does the order of the data in a composite index matter? I have a composite index of an 8 character key + 4 character year. The year can only be one of 20 past years, the key can be any combination of letters and numbers. If my data has 100 million rows, approximately 5 million for each year, does it matter if I create a composite index as key+year or year+key?

 

I am wondering if the subsetting of the data by year would make the index (year+key) a bit faster than the other way around.

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Is your data sorted?  Usually you would try to mimic the sorted order of the data as the primary key.  (It also depends on the patterns of records that you want to retrieve.)  Here's why.

 

When retrieving a record using an index, SAS doesn't retrieve data directly from the source data set.  Rather, it moves a block of records (that includes the next one you want) into memory.  From that block, it retrieves the next record that you want.

 

The major cost is moving the block of data that contains the next desired record.  Retrieving an individual record is a minimal cost.  So minimize the movement of blocks of records.

View solution in original post

4 REPLIES 4
Astounding
PROC Star

Is your data sorted?  Usually you would try to mimic the sorted order of the data as the primary key.  (It also depends on the patterns of records that you want to retrieve.)  Here's why.

 

When retrieving a record using an index, SAS doesn't retrieve data directly from the source data set.  Rather, it moves a block of records (that includes the next one you want) into memory.  From that block, it retrieves the next record that you want.

 

The major cost is moving the block of data that contains the next desired record.  Retrieving an individual record is a minimal cost.  So minimize the movement of blocks of records.

set_all__
Fluorite | Level 6

The data is not sorted. It could possibly take too much time to sort the 100 million records before I need to kick off the next process in a short monthly production window. However, I am testing this out. I am also going to check if separating this dataset into multiple datasets (by year/multiple years) would make it any faster. Thanks for the suggestion!

mkeintz
PROC Star

Why not make 2 composite indices, and run some tests?

 

I suspect, as @Astounding suggested, that you should order the components of a composite index that most closely approximate the physical record order of the indexed data set.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
LinusH
Tourmaline | Level 20
If your key is unique within a year, you could even be fine with a single index.
What is your application? End users asking for a one key value at the time? And sub second response times?
If you are in a Base SAS libref now, consider SPDE instead - parallel IO and better index management and query plans.
Data never sleeps

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 783 views
  • 0 likes
  • 4 in conversation