To import files from your PC in CAS, for convenience, you can use the SAS SWAT python package. Read about two simple ways to import. Understand how to perform client side loading with SWAT and what are the pros and cons of client side or server side parsing.
In a previous post, I highlighted Six Ways to Import a Local File into CAS (SAS Viya 3.5) using CAS actions, SAS procs and Viya visual interfaces. In this post, you will use python to import local files. It reminds me of a quote from "The Matrix": --Trinity : Where are you going? --The Keymaker : Another way. Always another way.
Import local files with python: convenient, straightforward and cool, when you can combine Pandas data frames with CAS actions.
With SAS Scripting Wrapper for Analytics Transfer (SWAT) you can read and upload a file in CAS.
We will focus on a client side load: import a file stored on your PC and make it available in CAS. When SAS Viya cannot access the files, unless you make them available to SAS Viya.
What is a client? A drive from your PC, the Jupyter notebook, a browser or a “SAS client” (SPRE or SAS 9.4M5).
The focus is not server side load (although it will be discussed with an example): when the file is already on the CAS Controller, or on a location available to the CAS Controller or workers: mounted drive, Hadoop distribution, DNFS, etc.
The file must be on your PC, just like when you use the Visual Interfaces.
Say you might want to load a file, ‘prdsale.csv’ from your local drive.
Read about SWAT Pre-requisites at the end of the post.
At this stage I assume you:
Open a Jupyter Notebook and test:
import swat
Then start a session on the CAS server:
# start a CAS session
# conn = swat.CAS(host, port, username, password)
conn = swat.CAS('intcas01', 5570, 'myuser', 'mypass')
If you encode the password, be very careful. Anyone with access to the Jupyter notebook will see it.
Use the best-practice is to create a authinfo file, where password is stored securely.
Please see the following blogs and links:
Then test the connections
# test connection status
out = conn.serverstatus()
print(out)
If you can connect you will see:
Summary:
# Define location of the files
folder="/temp/python/"
csvdata=folder+"prdsale.csv"
dsdata=folder+"prdsale.sas7bdat"
print (folder, csvdata, dsdata)
# List files
import os
os.listdir(folder)
The easiest way to get data into CAS is using the data loading methods:
The same methods exist on CAS objects, but in fact, use the Pandas functions in the background to do the parsing. The only difference is that the result of the method is a CAS Table object rather than a pandas.DataFrame.
Let’s look at an example. In this example you are using a local file.
# Read file
prdsale = conn.read_csv(csvdata)
File is loaded in CAS:
NOTE: Cloud Analytic Services made the uploaded file available as table TMP72UTYQXN in caslib CASUSERHDFS(sbxbot).
NOTE: The table TMP72UTYQXN has been created in caslib CASUSERHDFS(sbxbot) from binary data uploaded to Cloud Analytic Services.
List sample records:
# Sample records from the file
prdsale.head()
You could point to a URL of a file stored somewhere. Just keep in mind that, when using a URL, the data is downloaded from wherever it is to the client machine, for parsing, before it is uploaded to CAS.
As we mentioned earlier, these methods on the CAS object use the Pandas functions under-the-cover to do the parsing. This means that you can also use all of the Pandas parsing options.
Example of column selection:
# Read file
prdsale = conn.read_csv(csvdata, usecols=[0, 1, 2, 3], names=['A', 'B', 'C', 'D'], skiprows=1)
# Sample records from the file
prdsale.head()
You can check the data types assigned to the imported columns:
# the table name is automatically assigned. Look into the NOTE:
conn.table.columnInfo(table="TMP2CPYA85F")
A, B are varchar. They are dollar amounts in the file. They should be numeric. You can simply change them at "read" time. Use the dtype option when you load.
prdsale2=conn.read_csv(csvdata, usecols=[0, 1, 2, 3], names=['A', 'B', 'C', 'D'],
dtype = {"A" : "float64", "B" : "float64"}, skiprows=1, nrows=0)
Check:
conn.table.columnInfo(table="TMP9GQ4NDMQ")
Problem solved: A and B have now the type is now 'double'.
If you don’t need the full power of Pandas’ parsers, you may be better off uploading the file to CAS and parse it there. This offers some advantages as well. The server parsers will likely be faster than a client-side parser (especially in MPP mode where parsing of some file types can happen in parallel). Doing server-side parsing is also more efficient since it doesn’t require the data to be converted to Python objects before creating the data buffer to be sent to the server. The CAS.upload_file() method uploads a data file as-is to CAS and invokes the table.loadtable CAS action in the background to parse it. Let’s use the same file as in the previous example, but let the server do the parsing. Just as before, if a URL is specified, it must be downloaded to the client then uploaded CAS.
# Drop table first if reloaded
prdsale = conn.CASTable("prdsale", caslib='casuserhdfs', replace=True)
prdsale.table.dropTable(quiet=True)
# The CAS.upload_file() method uploads a data file as-is to CAS and invokes the table.loadtable action
prdsale = conn.upload_file(csvdata, casOut='prdsale')
# Sample records from the file
prdsale.head()
Invoke the table info CAS action.
# table info
conn.table.tableInfo(table="prdsale")
And the column info CAS action.
# column info
conn.table.columnInfo(table="prdsale")
Terminate the session:
# Close the connection
conn.close()
Follow the instructions on SWAT github page.
If you have an Anaconda environment, open the Anaconda Prompt type pip install swat
Open a Jupyter Notebook and test:
import swat
If you are unsure what your host is, open your hosts file: e.g.: C:\Windows\System32\drivers\etc\hosts. Check what corresponds to your CAS Server.
This is no longer client side, but server side. Anyway, the code is useful.
I assume you uploaded the files using SAS Studio or SAS Drive or ftp like in the previous post: Six Ways to Import a Local File into CAS (SAS Viya 3.5).
If you have data files on the server, you can load them directly from the CASLib that they are in. Paths to files in a CASLib are always relative paths. This is the recommended method for large files.
# Drop caslib
x=conn.table.dropcaslib(caslib="indata", quiet=True)
# file was loaded first in /user/sbxbot/
# Define location of the files
sfolder="/home/sbxbot/"
# Define CASLIB first
caslib=conn.table.addcaslib(name="indata",
path=sfolder,
datasource={"srctype":"PATH"},
activeonadd="True")
info=conn.fileinfo(caslib="indata")
info
# Drop table first if reloaded
prdsale = conn.CASTable("prdsale", caslib='indata', replace=True)
prdsale.table.dropTable(quiet=True)
# The CAS.upload_file() method uploads a data file as-is to CAS and invokes the table.loadtable action
prdsale = conn.load_path('prdsale.csv', sourceCaslib='indata', casOut='prdsale')
info=conn.tableinfo(caslib="indata")
info
conn.close()
# Close the connection
We looked at the different options to import a file in CAS using python and to read it using: client side or server side parsing.
Use client side when you need data frames options, use server side when you have high volumes and take advantage of the CAS parallel processing.
As a bonus, we looked at loading a file server side in CAS.
I would recommend the following resources:
Thank you for your time reading this post. Please comment and share your experience with the Local File Import in CAS and help others, please leave a comment.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.