BookmarkSubscribeRSS Feed

Two Simple Ways to Import Local Files with Python in CAS (Viya 3.5)

Started ‎02-24-2020 by
Modified ‎04-26-2020 by
Views 8,453

To import files from your PC in CAS, for convenience, you can use the SAS SWAT python package. Read about two simple ways to import. Understand how to perform client side loading with SWAT and what are the pros and cons of client side or server side parsing.

In a previous post, I highlighted Six Ways to Import a Local File into CAS (SAS Viya 3.5) using CAS actions, SAS procs and Viya visual interfaces. In this post, you will use python to import local files. It reminds me of a quote from "The Matrix": --Trinity : Where are you going?  --The Keymaker : Another way. Always another way.

 

Why?

Import local files with python: convenient, straightforward and cool, when you can combine Pandas data frames with CAS actions.

 

What?

With SAS Scripting Wrapper for Analytics Transfer (SWAT) you can read and upload a file in CAS.

We will focus on a client side load: import a file stored on your PC and make it available in CAS. When SAS Viya cannot access the files, unless you make them available to SAS Viya.

What is a client? A drive from your PC, the Jupyter notebook, a browser or a “SAS client” (SPRE or SAS 9.4M5).

The focus is not server side load (although it will be discussed with an example): when the file is already on the CAS Controller, or on a location available to the CAS Controller or workers: mounted drive, Hadoop distribution, DNFS, etc.

 

How?

10 Import files in CAS with python Summary.png

 

Python SWAT

The file must be on your PC, just like when you use the Visual Interfaces.

Say you might want to load a file, ‘prdsale.csv’ from your local drive.

Read about SWAT Pre-requisites at the end of the post.

At this stage I assume you:

Open a Jupyter Notebook and test:

import swat

 

Then start a session on the CAS server:

# start a CAS session
# conn = swat.CAS(host, port, username, password)
conn = swat.CAS('intcas01', 5570, 'myuser', 'mypass')

If you encode the password, be very careful. Anyone with access to the Jupyter notebook will see it.

Use the best-practice is to create a authinfo file, where password is stored securely.

Please see the following blogs and links:

 

Then test the connections

# test connection status
out = conn.serverstatus()
print(out)

 

If you can connect you will see:

20 Import files in CAS with python test connection.png

 

Client Side Parsing (Pandas Data Frames) Client Side Load

Summary:

  • File is on PC
  • File is loaded in CAS
  • Pandas is parsing on the client machine.
  • The result is stored in CAS

# Define location of the files
folder="/temp/python/"
csvdata=folder+"prdsale.csv"
dsdata=folder+"prdsale.sas7bdat"
print (folder, csvdata, dsdata)
# List files
import os
os.listdir(folder)

 

The easiest way to get data into CAS is using the data loading methods:

  • read_csv()
  • read_table()
  • read_excel()

The same methods exist on CAS objects, but in fact, use the Pandas functions in the background to do the parsing. The only difference is that the result of the method is a CAS Table object rather than a pandas.DataFrame.

 

Let’s look at an example. In this example you are using a local file.

# Read file
prdsale = conn.read_csv(csvdata)

File is loaded in CAS:

NOTE: Cloud Analytic Services made the uploaded file available as table TMP72UTYQXN in caslib CASUSERHDFS(sbxbot).
NOTE: The table TMP72UTYQXN has been created in caslib CASUSERHDFS(sbxbot) from binary data uploaded to Cloud Analytic Services.

List sample records:

# Sample records from the file
prdsale.head()

40 Import files in CAS with python - Test read.png

You could point to a URL of a file stored somewhere. Just keep in mind that, when using a URL, the data is downloaded from wherever it is to the client machine, for parsing, before it is uploaded to CAS.

 

Pandas Parsing Option Example

As we mentioned earlier, these methods on the CAS object use the Pandas functions under-the-cover to do the parsing. This means that you can also use all of the Pandas parsing options.

 

Example of column selection:

# Read file
prdsale = conn.read_csv(csvdata, usecols=[0, 1, 2, 3], names=['A', 'B', 'C', 'D'], skiprows=1)
# Sample records from the file
prdsale.head()

30 Import files in CAS with python - Test read df options.png

 

Control the data types when you load

You can check the data types assigned to the imported columns:

# the table name is automatically assigned. Look into the NOTE: 
conn.table.columnInfo(table="TMP2CPYA85F")

35 Import files in CAS with python - column info varchar.png

A, B are varchar. They are dollar amounts in the file. They should be numeric. You can simply change them at "read" time. Use the dtype option when you load.

prdsale2=conn.read_csv(csvdata, usecols=[0, 1, 2, 3], names=['A', 'B', 'C', 'D'],
  dtype = {"A" : "float64", "B" : "float64"}, skiprows=1, nrows=0)

Check:

conn.table.columnInfo(table="TMP9GQ4NDMQ")

 

Problem solved: A and B have now the type is now 'double'.

 

Server Side Parsing (File Upload) Client Side Load

If you don’t need the full power of Pandas’ parsers, you may be better off uploading the file to CAS and parse it there. This offers some advantages as well. The server parsers will likely be faster than a client-side parser (especially in MPP mode where parsing of some file types can happen in parallel). Doing server-side parsing is also more efficient since it doesn’t require the data to be converted to Python objects before creating the data buffer to be sent to the server. The CAS.upload_file() method uploads a data file as-is to CAS and invokes the table.loadtable CAS action in the background to parse it. Let’s use the same file as in the previous example, but let the server do the parsing. Just as before, if a URL is specified, it must be downloaded to the client then uploaded CAS.

 

# Drop table first if reloaded
prdsale = conn.CASTable("prdsale", caslib='casuserhdfs', replace=True)
prdsale.table.dropTable(quiet=True)

# The CAS.upload_file() method uploads a data file as-is to CAS and invokes the table.loadtable action
prdsale = conn.upload_file(csvdata, casOut='prdsale')
# Sample records from the file
prdsale.head()

60 Import files in CAS with python - Test read CAS.png

Invoke the table info CAS action. 
# table info
conn.table.tableInfo(table="prdsale")

65 Import files in CAS with python - CAS table.info.png

And the column info CAS action. 
# column info
conn.table.columnInfo(table="prdsale")

70 Import files in CAS with python - Test CAS column.info.png

Terminate the session:

# Close the connection
conn.close()

 

SWAT Pre-Requisites

Install SWAT

Follow the instructions on SWAT github page.

If you have an Anaconda environment, open the Anaconda Prompt type pip install swat

80 Import SWAT.png

 

Test SWAT

Open a Jupyter Notebook and test:

import swat

 

Host

If you are unsure what your host is, open your hosts file: e.g.: C:\Windows\System32\drivers\etc\hosts. Check what corresponds to your CAS Server.

 

Bonus: Server-Side Parsing Server Side Load (the Third Way)

This is no longer client side, but server side. Anyway, the code is useful.

I assume you uploaded the files using SAS Studio or SAS Drive or ftp like in the previous post: Six Ways to Import a Local File into CAS (SAS Viya 3.5).

If you have data files on the server, you can load them directly from the CASLib that they are in. Paths to files in a CASLib are always relative paths. This is the recommended method for large files.

# Drop caslib
x=conn.table.dropcaslib(caslib="indata", quiet=True)

# file was loaded first in /user/sbxbot/
# Define location of the files
sfolder="/home/sbxbot/"

# Define CASLIB first
caslib=conn.table.addcaslib(name="indata",
path=sfolder,
datasource={"srctype":"PATH"},
activeonadd="True")

info=conn.fileinfo(caslib="indata")
info

90 Import files in CAS with python - CASlib definition.png

# Drop table first if reloaded
prdsale = conn.CASTable("prdsale", caslib='indata', replace=True)
prdsale.table.dropTable(quiet=True)

# The CAS.upload_file() method uploads a data file as-is to CAS and invokes the table.loadtable action
prdsale = conn.load_path('prdsale.csv', sourceCaslib='indata', casOut='prdsale')
info=conn.tableinfo(caslib="indata")
info

100 Import files in CAS with python - Test CAS server side load.png

conn.close()
# Close the connection

 

Conclusions

We looked at the different options to import a file in CAS using python and to read it using: client side or server side parsing.

Use client side when you need data frames options, use server side when you have high volumes and take advantage of the CAS parallel processing.

As a bonus, we looked at loading a file server side in CAS.

 

References

I would recommend the following resources:

Want to Learn More about Viya 3.5?

Thank you for your time reading this post. Please comment and share your experience with the Local File Import in CAS and help others, please leave a comment.

Version history
Last update:
‎04-26-2020 09:54 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started