Data is the foundation of analytics. Every SAS environment relies on analytical data stored across diverse locations and formats, including:
Raw text
Spreadsheets (.csv)
SAS datasets (.sas7bdat)
Relational and NoSQL databases
Other formats (e.g., JSON, HTML)
This article assumes you are planning modernization alongside a migration from SAS 9.4 to SAS Viya. It examines modern options for data formats and cloud storage technologies, with the following considerations:
SAS datasets (.sas7bdat) are likely central to your current data estate. This proprietary binary format is difficult for non-SAS tools to process.
Moving workloads to SAS Viya and the cloud often requires storing SAS datasets in cloud file systems rather than object storage (e.g., S3, ADLS), which can increase costs.
To address these challenges, we explore two modern data management technologies available in SAS Viya: Parquet file format and SAS SpeedyStore.
Article Structure:
Overview and comparison of Parquet and SAS SpeedyStore
Recommended steps for data modernization
Technology Candidates
Two technologies have been chosen to consider reflecting customer interest and marketplace activity.
Parquet
Apache Parquet is a free and open-source column-oriented file format offering efficient compression and encoding for improved performance. SAS Viya introduced Libname support for Parquet in 2021.2.6.
The importance of Parquet in SAS environments grew with the release of a SAS Viya Libname for DuckDB in 2025.07. DuckDB is an open-source, column-oriented OLAP database that runs in Viya, enabling it to better consume Parquet data and open table formats such as Iceberg and Delta Lake.
DuckDB can access this data from a file system or from low-cost cloud object storage.
SAS SpeedyStore
SAS SpeedyStore is modern cloud-native SAS storage solution built on on SingleStore technology. Its universal database design offers row, column and vector tables, with advanced compression, elastic scalability and virtually unlimited capacity using cloud object storage.
SpeedyStore unifies transactional and analytical workloads in a secure, high-performance environment. Pre-integrated with SAS Viya AI and analytics (including SAS Embedded Process), it enables real-time insights, cost efficiency and accelerated decision-making. SpeedyStore enhances SAS Visual Analytics workloads by offloading SQL processing and is the natural successor to SAS Scalable Performance Data (SPD) Server [SPD/S] ).
Comparison
Here's a brief comparison of these technologies:
Technology
Advantages
Considerations
Parquet file format
Open source & open format
Improved compression & performance
Likely requires code changes to existing SAS programs
Emerging adoption – limited enterprise-wide deployment
SAS SpeedyStore
Improved compression
SQL interface is a defacto open standard
High-performance RDBMS with close integration to SAS using the SAS Embedded Process and acceleration of SAS Visual Analytics.
Likely requires code changes to existing SAS programs
Licenced SAS product
.sas7bdat file format
No code changes Proven No risk
Proprietary – effectively unavailable to non-SAS tooling Cloud-based POSIX file systems eg. Lustre & NetApp tend to be more expensive than object storage (S3 & ADLS)
Diversifying the Data Estate
When developing a modernization strategy, it’s important to recognize that adopting a single data format or technology for all use cases is unlikely. Modernization efforts may not encompass the entire data landscape, and some legacy formats will likely remain post-transition (see adjacent diagram).
While most data is currently stored in SAS datasets, future environments will likely incorporate a mix of .sas7bdat, Parquet, and SAS SpeedyStore formats.
Steps to Modernize
Recommended Steps for Modernization
Define objectives for data modernization.
Assess your current SAS data landscape.
Based on this assessment, determine strategies for managing existing and new data.
We’ll examine each step in detail.
1. Understand your objectives
Modernization is driven by several key factors:
Cloud Migration: Moving to the cloud and adopting SAS Viya—offers an opportunity to reassess your data landscape and leverage cloud-based storage.
Data Accessibility: Expanding access to analytic data and reducing reliance on proprietary formats.
Cost Optimization: Lowering storage costs, often the largest component of SAS platform TCO.
Innovation: Exploring new technologies that improve processing efficiency, enable modern workflows, and deliver cost savings.
If you are modernizing alongside a transition from SAS 9.4 to SAS Viya, consider:
Data Formats: .sas7bdat datasets likely form a significant part of your current SAS environment. These binary files are difficult for non-SAS tools to consume.
Cloud Storage Implications: Migrating workloads to SAS Viya typically requires storing .sas7bdat files in cloud-based file systems rather than object storage (e.g., S3, ADLS). This approach is more costly to implement.
2. Understanding your SAS Data Landscape
Before making decisions, assess your current SAS environment by reviewing:
Storage: Total capacity (TB) and data locations (volumes, folders, directories).
Inventory: Number of datasets and their sizes.
Usage Patterns: Identify active (recently accessed) vs. inactive datasets.
Access Details: For active datasets, determine which SAS users or programs access them.
Dataset Feature Dependency: For selected datasets, understand the use of .sas7bdat features (labels, formats, etc.) and whether processing is sequential (e.g., first/last).
Tools to help gather this information include:
SAS Content Assessment (CA) A suite of applications for analyzing your SAS 9.4 deployment and data landscape. Available at support.sas.com.
SAS Enterprise Session Monitor (ESM) A licenced product that scans SAS log files for capacity planning and provides insights into data usage and consumption.
The table below presents key methods for gathering insights using these two products.
Insight
Tools
Hot/cold data
CA Inventory report or DataMart. See example below: Note that many IT environments disable timestamp maintenance to minimize file system overhead, making timestamps unreliable for hot/cold analysis.
ESM event monitoring can detect active datasets even when filesystem or OS level timestamps are disabled. This article explains how to capture data and dataset access details, including all actions such as read and write.
Total datasets & size
Available from the CA Inventory report or DataMart.
ESM captures detailed data access events, which should be integrated with output from the sas-data-mon utility to include the entire filesystem, including dormant non-SAS files. Overlaying ESM activity data on this comprehensive view enables precise hot-to-cold analysis.
SAS programs accessing individual datasets
ESM captures details of user sessions and associated events. Data-related events can be retrieved at the session level. For batch sessions, the .sas program name is recorded, enabling direct tracking of usage. For interactive sessions, identifying the code name requires additional steps. This involves matching process IDs (PIDs) to log files and extracting the program or Enterprise Guide name from the log. This process can be automated using code; an example is provided in the SAS Support Communities blog Accelerating SAS9 to Viya Migration with Log Intelligence - SAS Support Communities. Linking usage back to specific code or Enterprise Guide projects falls outside the scope of Eco-Diagnostics.
Use of specific .sas7bdat capabilities
CA Profile Content DataMart: all_datasets. See excerpt below:
At a minimum you need a clear understanding of the active-to-inactive data ratio, the total size of your data estate, and which SAS programs interact with each category.
3. Assess
With a clearer understanding of your current data landscape, you are better positioned to make informed modernization decisions. At a high level, two key considerations emerge:
1. Modernize Existing Data
Should the underlying technology be changed?
What are the costs of maintaining the status quo?
What benefits would conversion deliver?
If conversion is pursued, what is the effort required—including retesting?
Is a hybrid approach viable, such as modernizing only high-value subsets?
2. Managing New Data
What impact will this have on current practices?
What benefits will new storage technologies provide?
Where should implementation begin—perhaps with a pilot group or early adopters?
Impact of Code Changes
Converting data sets to new formats can require significant refactoring. For each candidate data set, identify all .sas programs that reference it to assess the scope of changes. Key considerations include:
SQL Usage: If your code primarily uses PROC SQL, impact is likely minimal. However, review any platform-specific SQL syntax that may need adjustment.
DATA Step Logic: Complex operations (e.g., FIRST./LAST. processing) can be more challenging to migrate.
Sort Order: Ensure sort dependencies are preserved.
Formats, Labels, and Missing Values: Validate how these would be handled.
General Considerations
When implementing a policy for adopting new technologies, address the following:
Usage Scope: Should the technology be mandated for all use cases, or can users choose among approved options?
Data Segmentation: Does it make sense to classify datasets (e.g., by size, access frequency, or other attributes) and recommend specific technologies for each category?
Training Requirements: What level of training will users need to effectively adopt the new technology?
Considerations for staying with .sas7bdat
While new data formats and storage technologies are emerging, the .sas7bdat file remains the default for SAS and is still widely used. Despite being over 30 years old, it continues to provide flexibility and robust functionality.
To reduce storage costs for .sas7bdat datasets, consider the following strategies:
Compression: Use built-in SAS dataset compression or external methods, including saving datasets in compressed folders.
Storage Tiering: Identify infrequently accessed datasets and move them to lower-cost storage, such as cloud object storage. Ensure you have a retrieval mechanism, as SAS does not provide this natively (though some cloud solutions do).
To learn more about leveraging data storage tiering and compression with open file formats in SAS Viya, try out the SAS Viya Data Storage Cost Optimization Calculator.
Final Thoughts
Modernizing your data environment can significantly transform your SAS landscape, especially when paired with a migration to SAS Viya, typically deployed in the cloud.
Over the past decade, data ecosystems have evolved rapidly—Hadoop once dominated, but has since given way to data lakes, new file formats, and open table technologies. SQL remains a critical standard, extending beyond traditional RDBMS to certain file-based systems. Change is inevitable; the challenge lies in adopting technologies and strategies that endure while avoiding short-lived trends.
With over 50 years of expertise, SAS recognizes the importance of consistent, accurate data. I hope this article has provided valuable insight into data modernization. For further guidance, connect with your SAS team for expert advice and practical experience.
Acknowledgment
Thanks to the following colleagues for their contributions and feedback on this article: James Ochiai-Brown, George Beevers, Neil Griffin and Mike Johansson.
... View more