BookmarkSubscribeRSS Feed

Leveraging Database Indexes to Improve Performance

Started yesterday by
Modified yesterday by
Views 65

The SAS Communities Blog Efficiency Matters: Using Indexes in SAS provides an excellent review of what indexes are, how to create them, and how they can improve SAS query performance.

Its summary captures the essence well:

 

“Indexes are a powerful tool for improving SAS performance, especially when working with large data. By creating simple or composite indexes, you allow SAS to locate observations efficiently without scanning the entire data set. Whether you are filtering with a WHERE statement, joining tables, or performing BY-

group processing, indexes can reduce run time dramatically.”

 

Since many SAS users query large datasets in databases such as Microsoft SQL Server via SAS/ACCESS interfaces, I wanted to validate whether similar performance gains could be expected when indexes exist on database tables. Using scenarios similar to those in the SAS Communities blog, I found that querying SQL Server tables via SAS/ACCESS was significantly faster when indexes were in place:

  • Filtering with a WHERE statement: 106× faster
  • Joining tables with a WHERE statement: 19× faster
  • BY-group processing: 6× faster

This blog outlines the validation steps taken to arrive at these findings.

_______________________________________________________________________________________________________________________

Test Specifications

To test the three scenarios, I created duplicate tables in Microsoft SQL Server—one set with indexes and one without. I downloaded data from the Efficiency Matters blog on GitHub: link

  • Employee_Master: 308 rows
  • Orders: 10,000 rows replicated 1,000 times for a total of 10 million rows

For testing, I chose non-clustered indexes because they can be created on single or multiple columns, don’t require sorting, and allow multiple indexes per table. By contrast, a clustered index forces rows to be stored in the order of the index key, limiting each table to one clustered index.

 

Creating Indexes via SQL Passthrough

proc sql;
    connect using MSSQLSVR /*previously defined LIBNAME*/ as SQLDB;
    EXECUTE(CREATE NONCLUSTERED INDEX SPINDEX ON orders_10m_indx (State_Province)) BY SQLDB;
    EXECUTE(CREATE NONCLUSTERED INDEX CUSTINDEX ON orders_10m_indx (Customer_ID)) BY SQLDB;
    EXECUTE(CREATE NONCLUSTERED INDEX EMPINDEX ON orders_10m_indx (Employee_ID)) BY SQLDB;
    EXECUTE(CREATE NONCLUSTERED INDEX EMPINDEXO ON emp_master_indx (Employee_ID)) BY SQLDB;
    disconnect from SQLDB;
quit;

To check or validate that indexes exist

proc sql;
    connect using MSSQLSVR /*previously defined LIBNAME*/ as SQLDB;
    create table indexed_fields as select * from connection to SQLDB 
        (select t.name as table_name, i.name as index_name, c.name as column_name,
        i.type_desc as index_type from sys.tables t join sys.indexes i on
        t.object_id=i.object_id join sys.index_columns ic on i.object_id=
        ic.object_id and i.index_id=ic.index_id join sys.columns c on
        ic.object_id=c.object_id and ic.column_id=c.column_id
        where t.name in  ('emp_master_indx','orders_10m_indx'));
    disconnect from SQLDB;
quit;

Example output:

pechin_0-1765814283997.png

If you don’t have permission to create indexes, work with your SQL Server administrator.

_______________________________________________________________________________________________________________________

Environment Specifications

  • SAS Viya 4 compute server (Version Stable 2025.08)
  • SAS/ACCESS Interface to Microsoft SQL Server

_______________________________________________________________________________________________________________________

Example 1: Effect of Index on WHERE Clause

data ex_1;
    set MSSQLSVR.orders_10M;
    where Customer_ID=10860;
run;

Results:

  • Produces 1,000 observations & 24 variables
  • Without index: completes in 37.28 seconds
  • With index on Customer_ID: completes in 0.25 seconds → 106× faster

_______________________________________________________________________________________________________________________

 Example 2: Joining Tables with WHERE Clause

proc sql;
    create table ex10 as
    select distinct(m.Employee_Name) as Name,
           count(p.Order_ID) as orders
    from mssqlsvr.emp_master as m
    inner join mssqlsvr.orders_10M as p
    on m.Employee_ID=p.Employee_ID
    where m.Employee_ID=120178;
quit;

Results:

  • Produces 1 observation & 2 variables
  • Without index: completes in 37.81 seconds
  • With index on Employee_ID (both tables): completes in 1.95 seconds → 19× faster

 ______________________________________________________________________________________________________________________

 Example 3: BY-Group Processing

proc means data=mssqlsvr.orders_10M mean;
    var quantity;
    by State_Province;
run;

Results:

  • Produces report for 228 State_Province values
  • Without index: completes in 38.35 seconds
  • With index on State_Province: completes in 38.92 seconds → No improvement

Why no Improvement? Ordering by a non-clustered index isn’t always faster because queries often need extra columns not in the index, forcing expensive lookups. If the index covers the query, performance improves.

 

To test this, I created a new covering index that included both variables in the Proc Means example (State_Province & Quantity). 

proc sql;
    connect using MSSQLSVR /*previously defined LIBNAME*/ as SQLDB;
    EXECUTE(CREATE NONCLUSTERED INDEX COMBO_INDEX
            ON orders_10m_indx (State_Province, Quantity)) BY SQLDB;
    disconnect from SQLDB;
quit;

Results after rerunning Proc Means:

  • Without index: completes in 38.35 seconds
  • With covering index: completes in 6.5 seconds → 6× faster

_______________________________________________________________________________________________________________________

Summary

Indexes in Microsoft SQL Server deliver substantial performance improvements when accessed through SAS/ACCESS, mirroring the benefits seen in native SAS datasets.

  • WHERE clause filtering can be over 100× faster with appropriate indexes.
  • Joins benefit significantly when both tables are indexed on join keys.
  • BY-group processing benefited from covering indexes that included all the columns necessary to deliver results.

For SAS users working with large database tables, collaborating with database administrators to ensure proper indexing can dramatically reduce query times and improve overall efficiency.

_______________________________________________________________________________________________________________________

Resources

 

Contributors
Version history
Last update:
yesterday
Updated by:

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags