Architecting SAS Viya on AWS with FSx for ONTAP: A Practitioner’s Guide

1 Like

Managed shared storage in the cloud has evolved well beyond simple file shares. With Amazon FSx for NetApp ONTAP, you can bring enterprise ONTAP data services like snapshots, replication, cloning, QoS, and multi‑protocol access etc into AWS while keeping a fully POSIX‑compliant file system with rich extended ACL support for fine‑grained security.

However, getting the architecture right from the start is critical. Decisions around throughput sizing, tenancy design, FlexGroup usage, and HA pair scaling have a direct impact on performance, operational complexity, and long-term cost efficiency.

Teams deploying FSx for ONTAP often face the same challenges when scaling shared storage. In this post, we share practical lessons and show how to build POSIX-compliant storage with extended ACLs giving platforms like SAS Viya on Amazon Web Services a reliable data foundation.

FSx for ONTAP Architecture in One Picture

To begin, let’s first understand the ONTAP architecture as illustrated in the diagram below.

Ontap-Export.drawio.png

At a high level, FSx for ONTAP sits inside your VPC as a managed ONTAP cluster, connecting your compute layer at the top to a highly available storage layer at the bottom. You get the power of ONTAP without running the storage infrastructure yourself.

Data Sources (Top Layer)

This is where workloads actually consume storage:

Linux and Windows EC2 instances
Amazon EKS pods and other container platforms

They mount shared FlexVol or FlexGroup volumes over familiar protocols ,NFS, SMB, and iSCSI, so FSx looks like a traditional NAS or block target. You can lift‑and‑shift legacy apps or back cloud‑native workloads with the same platform. ONTAP provides consistent file locking, POSIX semantics, and extended ACL enforcement so multiple clients can safely hit the same datasets

FSx File System (Control Plane)

This is the ONTAP brain:

Storage Virtual Machines (SVMs) give you isolated tenants that host volumes, exports/shares, and protocol endpoints.
Management and networking components handle snapshots, replication, tiering, authentication, and ENI connectivity into your VPC.

Here is where you define tenancy boundaries (Dev/Test/Prod, teams, BUs), apply QoS, manage lifecycle operations, and enforce security with POSIX permissions and extended ACLs. Multiple SVMs can live inside a single file system, so you segment environments without fragmenting infrastructure.

High Availability (HA Pair)

At the bottom, the HA pair is your resilience backbone:

Two storage nodes run across Availability Zones with shared, synchronously replicated storage.
Writes are mirrored before acknowledgment, enabling near zero‑data‑loss failover.

If one node fails, its partner takes over access to the same storage with minimal disruption; client mounts typically stay in place. The HA pair defines both the performance envelope and failure domain—capacity and throughput scale by adding HA pairs, while SVMs and volumes above stay logically consistent.

Designing FSx for ONTAP Like You Mean It

Amazon FSx for NetApp ONTAP gives you mature ONTAP data services with cloud elasticity, but it’s not a “click next, next, finish” service. The way you size throughput, lay out file systems and SVMs, and plan HA pairs will decide whether your environment feels boring and predictable or noisy and expensive.

What follows is a set of opinionated field‑tested plus AWS and NetApp best practices.

Start With Throughput, Not Just Capacity

FSx for ONTAP performance is really about three things: latency, throughput, and IOPS, with throughput being easy to overlook. Total provisioned throughput is shared across all volumes inside a file system; there’s no per‑volume carve‑out.

Before you size anything, get clear on workload expectations:

Shared‑tolerant workloads (batch ETL, analytics) can live with a shared throughput pool; you can often pick a lower throughput tier and save real money.

Latency‑sensitive workloads (user‑facing apps, critical services) may need QoS or even their own SVM/file system boundary.

Pricing is driven by provisioned capacity + provisioned throughput, so a few extra GB/s of throughput can add thousands per month at scale. Getting throughput roughly right up front and then refining with CloudWatch metrics has a much bigger TCO impact than obsessing over IOPS on day one.

One File System, Many Volumes via SVMs

AWS and NetApp strongly recommend treating one FSx for ONTAP file system as a multi-tenant platform rather than creating one file system per workload. ONTAP is designed for multi-volume, multi-tenant operation, and FSx preserves that model.

At the heart of this is the Storage Virtual Machine (SVM):

A logical ONTAP container hosting volumes, exports, and protocol endpoints (NFS, SMB, iSCSI).
Provides isolation of configuration, security, and data for different teams or environments.
Scales with workloads; a single filesystem can host multiple SVMs for Dev/Test/Prod or organizational boundaries.

Why “single FS + SVMs + multiple volumes” is considered best practice:

Resource efficiency : You pool throughput and capacity instead of fragmenting into small, underutilized file systems.
Operational simplicity : Fewer ONTAP control planes means simpler backups, snapshots, SnapMirror/SnapVault, and upgrades.
Simpler networking : Fewer ENIs and mount targets in your VPC to manage and secure.
Consistent governance : Easier to apply QoS, quotas, and security policies at SVM/volume levels across workloads.

In short: use SVMs for tenancy; use volumes (FlexVol/FlexGroup) for workloads.

User-Provisioned Throughput and FlexGroup Volumes

When you select user-provisioned throughput, FSx for ONTAP automatically configures volumes as FlexGroup volumes. This is expected behavior and actually aligns very well with modern analytics and scale-out workloads.

FlexGroup volumes give you:

Parallelism : Data is striped across multiple constituents, letting many clients read/write concurrently.
High-throughput capacity : Well suited for workloads that need large aggregate bandwidth, like analytics or big ETL.
Distributed metadata : Helps avoid “hot directories” and metadata bottlenecks in wide, deep file trees.

In other words, “user-provisioned throughput + FlexGroup” should be your default for large shared data volumes, not a niche option.

HA Pairs: Failure Domains vs. Performance Headroom

In FSx for ONTAP, the HA pair is the fundamental deployment unit: two storage nodes (active/passive) with shared NVMe or hybrid storage, replicating synchronously across AZs.

Each HA pair defines both:

A failure domain : If one node fails, its partner takes over with minimal disruption and no data loss.
A performance and capacity envelope : Throughput, IOPS, and capacity scale with the number of HA pairs.

Trade-offs that AWS and NetApp emphasize:

Fewer HA pairs

Lower operational complexity and fewer objects to manage.
Fewer failure boundaries to understand.
But less overall performance headroom per environment.

More HA pairs

Higher aggregate throughput and scalability.
Better workload and fault-domain isolation when you need strict SLAs.
But increased operational overhead (monitoring, automation, networking).

A widely recommended best practice:

Start with the minimum number of HA pairs that meet your throughput, capacity, and SLA requirements, and add HA pairs only when performance or fault-domain isolation needs justify it.

Real-World Example: SAS Viya Customer Use Case:

Consider a SAS Viya 4 deployment needing 30TB SSD total across 3 × 10TB volumes, each requiring 8GB/s throughput for business data analytics.

Recommended approach:

Create one FSx for ONTAP file system:

- 30TB SSD capacity

- 24GB/s total throughput (shared pool)

Inside that file system, create 3 × 10TB volumes
Apply ONTAP QoS policies:

- Max 8GB/s throughput per volume

4. Deploy minimum HA pairs needed for 24GB/s capacity

Cost Reality Check: Single FS vs. Three Separate File Systems

Let's compare the two approaches for our SAS Viya use case (30TB total, 3×10TB volumes @ 8GB/s each) using US East (N. Virginia) pricing. (Excluding volume quotas, IOPS, backups)

Approach	SSD Storage	Throughput	Monthly Total
1× 30TB FS @ 24GB/s	$0.04375/GB-mo × 30×1024 = $1,344	$0.72/MBps-mo × 24×1000 = $17,280	$18,624
3× 10TB FS @ 8GB/s each	$0.04375/GB-mo × 10×1024 × 3 = $1,344	$0.72/MBps-mo × 8×1000 × 3 = $17,280	$18,624

Key takeaways:

Same cost : Both approaches land at identical monthly spend
Single FS wins on operations : One system to monitor, backup, patch
Throughput dominates : 93% of cost is throughput capacity, not storage

Pro tip: Start with conservative throughput (16GB/s?), monitor CloudWatch metrics, then scale up. Throughput capacity is adjustable post-deployment without downtime.

Bottom line: Same dollars, but one file system + QoS gives you better ops and future-proof scaling.

Architecting SAS Viya on AWS with FSx for ONTAP: A Practitioner’s Guide

Ready to see what SAS Viya Copilot can do?

SAS AI and Machine Learning Courses