BookmarkSubscribeRSS Feed

Getting Started: Write SAS data to a Parquet file on Amazon S3 - Using 3 easy steps in SAS Viya

Started ‎12-23-2019 by
Modified ‎12-20-2019 by
Views 16,599

Recently a friend asked me to help him write some SAS data onto Amazon S3 in Parquet file format.

 

It is easy to do this using SAS Viya 3.5, which has capabilities for reading/writing Parquet files on S3.

Here is the process I used to get it done.

 

Step 1 - Create your Amazon bucket

Step 2 - Get your credentials to access the bucket

Step 3 - Submit the SAS code

 

Step 1 - Create an Amazon bucket

 

Go to Amazon S3 console

https://s3.console.aws.amazon.com/s3/home?region=us-east-1

 

Create a bucket.  I will call mine “sasaibucket” and click “Create”

 

CreateBucket.png

 

You can see the bucket “sasaibucket” has been created.

 

BucketCreated.png

 

Step 2 - Get your credentials to access the bucket

 

Go to Identity and Access Management (IAM) in Amazon.

https://console.aws.amazon.com/iam/home?region=us-east-1#/home

 

Click on “Users” on the left panel.

Click “Add user.”

CreateUser.png

Provide a user name.

In this example, I used “sasjst”

 

Select Access Type “Programmatic access”

Click “Next: Permissions”

SetPermissions1.png

Next, I search for S3 policies.

SetPermissions2.png

I check the policies to access Amazon S3.

In my case, I selected “AmazonS3FullAccess” and “AmazonS3ReadOnlyAccess”

SetPermissions3.png

 

Click “Next”

Click “Create User”

SetPermissions4.png

 

The user is now created. 

On this screen, you are now provided two important items.

  • The Access key ID
  • The Secret Access Key

Credentials1.png

Copy both the Access key ID, and the Secret access key.

They will be needed for the SAS libname.

Credentials2.png

Step 3 - Submit the SAS code

 

Using SAS Studio on SAS Viya, I created some simple SAS code.

 

The CAS statement starts a CAS Session.

The caslib statement defines the data connection in CAS to S3.

The Libref= option creates a SAS library in SAS Studio as well.

 

In this code, I inserted the Access key ID, and the Secret Access Key from the previous step.

 

cas casauto;
 
caslib "001_Amazon S3 Bucket" datasource=(
      srctype="s3"
      accessKeyId='AKIAY7ONEHNKGCRG6OF4'
      secretAccessKey='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'
      region="US_East"
      bucket="sasaibucket"
)
  subdirs
  global
  libref=S3
;

 

 

Studio1.png

 

After submitting the SAS code, you can see the log shows the caslib has been added.

Studio2.png

 

Using SAS Data Explorer on SAS Viya, I can the available data sources including S3.

DataExplorer.png

 

My S3 bucket is currently empty, so I will first load some SAS data

To do this, I can select an existing SAS dataset, in this case cars.sashdat.

I import it to the target location called “001_Amazon S3 Bucket”

I name the target table: cars.

I specify a format of parquet.

I then Click “Import” to begin the import process”

SaveTable.png

 

The file is read into memory.  And the file is copied to S3 as a parquet file.

Explorer2.png

 

If I refresh the data sources, you can see now the file CARS.parquet was written.

 

Explorer2.png

 

If I look at the bucket on Amazon S3, you can see the directory is created.

 

S3.png

 

And inside the directory is a set of parquet files.

S32.png

 

Perhaps the hardest part was remembering how to get the AWS keys.

 

Hopefully you will this example useful if you are doing this for the first time!

Good luck my friend on your journey!

 

Comments

Do you know of anyway to leverage SAS 9.4 M6 and PROC S3 to write out a Parquet formatted dataset from a SAS dataset?  Seems like the only thing i have found would be to write out the data to a flat file, then use something like Python to format the data into a parquet format.  Just curious if there is anything to make this easier if you have 9.4m6 and Proc S3, but not Viya which has CASLIB.

could use saspy:

sas.sd2df('cars','sashelp').to_parquet('/tmp/cars.parquet')

 

I run in databricks [community edition] which allows me to then write directly to s3 using a dbfs:/mnt mount and following:

 

dbutils.fs.cp('file:/tmp/cars.parquet','dbfs:/mnt/jgalloway/parquet/cars_from_sashelp.parquet')

 

but there will be numerous other ways to get your parquet to s3 (WinSCP is another option for this)

Version history
Last update:
‎12-20-2019 03:46 PM
Updated by:
SAS Employee
Contributors

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Tags