BookmarkSubscribeRSS Feed

Modernize your SAS Data Estate with SAS Viya

Started 12 hours ago by
Modified 12 hours ago by
Views 90
Data is the foundation of analytics. Every SAS environment relies on analytical data stored across diverse locations and formats, including:

 

  • Raw text
  • Spreadsheets (.csv)
  • SAS datasets (.sas7bdat)
  • Relational and NoSQL databases
  • Other formats (e.g., JSON, HTML)

This article assumes you are planning modernization alongside a migration from SAS 9.4 to SAS Viya. It examines modern options for data formats and cloud storage technologies, with the following considerations:

 

  • SAS datasets (.sas7bdat) are likely central to your current data estate. This proprietary binary format is difficult for non-SAS tools to process.
  • Moving workloads to SAS Viya and the cloud often requires storing SAS datasets in cloud file systems rather than object storage (e.g., S3, ADLS), which can increase costs.

To address these challenges, we explore two modern data management technologies available in SAS Viya: Parquet file format and SAS SpeedyStore.

 

Article Structure:

 

  1.  Overview and comparison of Parquet and SAS SpeedyStore

  2.  Recommended steps for data modernization

 

Technology Candidates

Two technologies have been chosen to consider reflecting customer interest and marketplace activity.

 

Parquet

Apache Parquet is a free and open-source column-oriented file format offering efficient compression and encoding for improved performance. SAS Viya introduced Libname support for Parquet in 2021.2.6.

 

The importance of Parquet in SAS environments grew with the release of a SAS Viya Libname for DuckDB in 2025.07. DuckDB is an open-source, column-oriented OLAP database that runs in Viya, enabling it to better consume Parquet data and open table formats such as Iceberg and Delta Lake.

 

DuckDB can access this data from a file system or from low-cost cloud object storage.

SAS SpeedyStore

SAS SpeedyStore is modern cloud-native SAS storage solution built on on SingleStore technology. Its universal database design offers row, column and vector tables, with advanced compression, elastic scalability and virtually unlimited capacity using cloud object storage.


SpeedyStore unifies transactional and analytical workloads in a secure, high-performance environment. Pre-integrated with SAS Viya AI and analytics (including SAS Embedded Process), it enables real-time insights, cost efficiency and accelerated decision-making. SpeedyStore enhances SAS Visual Analytics workloads by offloading SQL processing and is the natural successor to SAS Scalable Performance Data (SPD) Server [SPD/S] ).

 

Comparison

Here's a brief comparison of these technologies:

 

Technology Advantages Considerations
Parquet file format

Open source & open format

Improved compression & performance

Likely requires code changes to existing SAS programs

Emerging adoption – limited enterprise-wide deployment

SAS SpeedyStore

Improved compression

SQL interface is a defacto open standard

High-performance RDBMS with close integration to SAS using the SAS Embedded Process and acceleration of SAS Visual Analytics.

Likely requires code changes to existing SAS programs

Licenced SAS product

.sas7bdat file format

No code changes
Proven
No risk

Proprietary – effectively unavailable to non-SAS tooling
Cloud-based POSIX file systems eg. Lustre & NetApp tend to be more expensive than object storage (S3 & ADLS)

 

Diversifying the Data Estate

When developing a modernization strategy, it’s important to recognize that adopting a single data format or technology for all use cases is unlikely. Modernization efforts may not encompass the entire data landscape, and some legacy formats will likely remain post-transition (see adjacent diagram).

 

Picture1.png

 

While most data is currently stored in SAS datasets, future environments will likely incorporate a mix of .sas7bdat, Parquet, and SAS SpeedyStore formats.

 

 

Steps to Modernize

 

Recommended Steps for Modernization

 

  1. Define objectives for data modernization.

  2. Assess your current SAS data landscape.

  3. Based on this assessment, determine strategies for managing existing and new data.

We’ll examine each step in detail.

 

1. Understand your objectives

Modernization is driven by several key factors:

 

  • Cloud Migration: Moving to the cloud and adopting SAS Viya—offers an opportunity to reassess your data landscape and leverage cloud-based storage.
  • Data Accessibility: Expanding access to analytic data and reducing reliance on proprietary formats.
  • Cost Optimization: Lowering storage costs, often the largest component of SAS platform TCO.
  • Innovation: Exploring new technologies that improve processing efficiency, enable modern workflows, and deliver cost savings.

If you are modernizing alongside a transition from SAS 9.4 to SAS Viya, consider:

 

  • Data Formats: .sas7bdat datasets likely form a significant part of your current SAS environment. These binary files are difficult for non-SAS tools to consume.
  • Cloud Storage Implications: Migrating workloads to SAS Viya typically requires storing .sas7bdat files in cloud-based file systems rather than object storage (e.g., S3, ADLS). This approach is more costly to implement.

 

2. Understanding your SAS Data Landscape

Before making decisions, assess your current SAS environment by reviewing:

 

  • Storage: Total capacity (TB) and data locations (volumes, folders, directories).
  • Inventory: Number of datasets and their sizes.
  • Usage Patterns: Identify active (recently accessed) vs. inactive datasets.
  • Access Details: For active datasets, determine which SAS users or programs access them.
  • Dataset Feature Dependency: For selected datasets, understand the use of .sas7bdat features (labels, formats, etc.) and whether processing is sequential (e.g., first/last).

Tools to help gather this information include:

  • SAS Content Assessment (CA)
    A suite of applications for analyzing your SAS 9.4 deployment and data landscape. Available at support.sas.com.
  • SAS Enterprise Session Monitor (ESM)
    licenced product that scans SAS log files for capacity planning and provides insights into data usage and consumption.

The table below presents key methods for gathering insights using these two products.

 

Insight Tools
Hot/cold data CA Inventory report or DataMart. See example below:
PaulGittins_0-1765098839435.png
Note that many IT environments disable timestamp maintenance to minimize file system overhead, making timestamps unreliable for hot/cold analysis.

ESM event monitoring can detect active datasets even when  filesystem or OS  level timestamps are disabled. This article explains how to capture data and dataset access details, including all actions such as read and write.PaulGittins_1-1765098839437.png
Total datasets & size
Available from the CA Inventory report or DataMart.

ESM captures detailed data access events, which should be integrated with output from the sas-data-mon utility to include the entire filesystem, including dormant non-SAS files. Overlaying ESM activity data on this comprehensive view enables precise hot-to-cold analysis.
SAS programs accessing individual datasets
ESM captures details of user sessions and associated events. Data-related events can be retrieved at the session level. For batch sessions, the .sas program name is recorded, enabling direct tracking of usage.
For interactive sessions, identifying the code name requires additional steps. This involves matching process IDs (PIDs) to log files and extracting the program or Enterprise Guide name from the log. This process can be automated using code; an example is provided in the SAS Support Communities blog Accelerating SAS9 to Viya Migration with Log Intelligence - SAS Support Communities. Linking usage back to specific code or Enterprise Guide projects falls outside the scope of Eco-Diagnostics.
Use of specific .sas7bdat capabilities
CA Profile Content DataMart: all_datasets. See excerpt below:
PaulGittins_2-1765098839440.png

 

At a minimum you need a clear understanding of the active-to-inactive data ratio, the total size of your data estate, and which SAS programs interact with each category.

 

3. Assess

 

With a clearer understanding of your current data landscape, you are better positioned to make informed modernization decisions. At a high level, two key considerations emerge:

Picture5.png

 

 

1.  Modernize Existing Data

  • Should the underlying technology be changed?
  • What are the costs of maintaining the status quo?
  • What benefits would conversion deliver?
  • If conversion is pursued, what is the effort required—including retesting?
  • Is a hybrid approach viable, such as modernizing only high-value subsets?

2.  Managing New Data

  • What impact will this have on current practices?
  • What benefits will new storage technologies provide?
  • Where should implementation begin—perhaps with a pilot group or early adopters?

 

Impact of Code Changes

Converting data sets to new formats can require significant refactoring. For each candidate data set, identify all .sas programs that reference it to assess the scope of changes. Key considerations include:

 

  • SQL Usage: If your code primarily uses PROC SQL, impact is likely minimal. However, review any platform-specific SQL syntax that may need adjustment.
  • DATA Step Logic: Complex operations (e.g., FIRST./LAST. processing) can be more challenging to migrate.
  • Sort Order: Ensure sort dependencies are preserved.
  • Formats, Labels, and Missing Values: Validate how these would be handled.

 

General Considerations

When implementing a policy for adopting new technologies, address the following:

 

  • Usage Scope: Should the technology be mandated for all use cases, or can users choose among approved options?
  • Data Segmentation: Does it make sense to classify datasets (e.g., by size, access frequency, or other attributes) and recommend specific technologies for each category?
  • Training Requirements: What level of training will users need to effectively adopt the new technology?

 

Considerations for staying with .sas7bdat

While new data formats and storage technologies are emerging, the .sas7bdat file remains the default for SAS and is still widely used. Despite being over 30 years old, it continues to provide flexibility and robust functionality.


To reduce storage costs for .sas7bdat datasets, consider the following strategies:

 

  • Compression: Use built-in SAS dataset compression or external methods, including saving datasets in compressed folders.
  • Storage Tiering: Identify infrequently accessed datasets and move them to lower-cost storage, such as cloud object storage. Ensure you have a retrieval mechanism, as SAS does not provide this natively (though some cloud solutions do).

 

Final Thoughts

Modernizing your data environment can significantly transform your SAS landscape, especially when paired with a migration to SAS Viya, typically deployed in the cloud.

 

Over the past decade, data ecosystems have evolved rapidly—Hadoop once dominated, but has since given way to data lakes, new file formats, and open table technologies. SQL remains a critical standard, extending beyond traditional RDBMS to certain file-based systems. Change is inevitable; the challenge lies in adopting technologies and strategies that endure while avoiding short-lived trends.

 

With over 50 years of expertise, SAS recognizes the importance of consistent, accurate data. I hope this article has provided valuable insight into data modernization. For further guidance, connect with your SAS team for expert advice and practical experience.

 

Acknowledgment

Thanks to the following colleagues for their contributions and feedback on this article: James Ochiai-Brown, George Beevers, Neil Griffin and Mike Johansson.

 

Version history
Last update:
12 hours ago
Updated by:

sas-innovate-2026-white.png



April 27 – 30 | Gaylord Texan | Grapevine, Texas

Registration is open

Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!

Register now

SAS AI and Machine Learning Courses

The rapid growth of AI technologies is driving an AI skills gap and demand for AI talent. Ready to grow your AI literacy? SAS offers free ways to get started for beginners, business leaders, and analytics professionals of all skill levels. Your future self will thank you.

Get started

Article Tags