Recently a friend asked me to help him write some SAS data onto Amazon S3 in Parquet file format.
It is easy to do this using SAS Viya 3.5, which has capabilities for reading/writing Parquet files on S3.
Here is the process I used to get it done.
Step 1 - Create your Amazon bucket
Step 2 - Get your credentials to access the bucket
Step 3 - Submit the SAS code
Step 1 - Create an Amazon bucket
Go to Amazon S3 console
https://s3.console.aws.amazon.com/s3/home?region=us-east-1
Create a bucket. I will call mine “sasaibucket” and click “Create”
You can see the bucket “sasaibucket” has been created.
Step 2 - Get your credentials to access the bucket
Go to Identity and Access Management (IAM) in Amazon.
https://console.aws.amazon.com/iam/home?region=us-east-1#/home
Click on “Users” on the left panel.
Click “Add user.”
Provide a user name.
In this example, I used “sasjst”
Select Access Type “Programmatic access”
Click “Next: Permissions”
Next, I search for S3 policies.
I check the policies to access Amazon S3.
In my case, I selected “AmazonS3FullAccess” and “AmazonS3ReadOnlyAccess”
Click “Next”
Click “Create User”
The user is now created.
On this screen, you are now provided two important items.
The Access key ID
The Secret Access Key
Copy both the Access key ID, and the Secret access key.
They will be needed for the SAS libname.
Step 3 - Submit the SAS code
Using SAS Studio on SAS Viya, I created some simple SAS code.
The CAS statement starts a CAS Session.
The caslib statement defines the data connection in CAS to S3.
The Libref= option creates a SAS library in SAS Studio as well.
In this code, I inserted the Access key ID, and the Secret Access Key from the previous step.
cas casauto;
caslib "001_Amazon S3 Bucket" datasource=(
srctype="s3"
accessKeyId='AKIAY7ONEHNKGCRG6OF4'
secretAccessKey='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'
region="US_East"
bucket="sasaibucket"
)
subdirs
global
libref=S3
;
After submitting the SAS code, you can see the log shows the caslib has been added.
Using SAS Data Explorer on SAS Viya, I can the available data sources including S3.
My S3 bucket is currently empty, so I will first load some SAS data
To do this, I can select an existing SAS dataset, in this case cars.sashdat.
I import it to the target location called “001_Amazon S3 Bucket”
I name the target table: cars.
I specify a format of parquet.
I then Click “Import” to begin the import process”
The file is read into memory. And the file is copied to S3 as a parquet file.
If I refresh the data sources, you can see now the file CARS.parquet was written.
If I look at the bucket on Amazon S3, you can see the directory is created.
And inside the directory is a set of parquet files.
Perhaps the hardest part was remembering how to get the AWS keys.
Hopefully you will this example useful if you are doing this for the first time!
Good luck my friend on your journey!
... View more