Unlocking the Power of SAS Viya: Exploring Python Integration Fundamentals with SWAT

1 Like

introduction

Dive into the dynamic fusion of analytics excellence with our blog series, "Unlocking the Power of SAS Viya: Exploring Python Integration Fundamentals with SWAT." In this journey, we unravel the symbiotic relationship between SAS Viya and Python's versatility through the Scripting Wrapper for Analytics Transfer (SWAT). Join us as we navigate the fundamentals, empowering you to seamlessly integrate these robust technologies and unleash a new dimension of analytical possibilities. Get ready to amplify your data prowess and elevate your insights as we embark on this exploration at the intersection of SAS Viya and Python.

Accessing and Managing Data

1. Import Packages and Connect to SAS Viya

import swat
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import sys

## Set options
pd.set_option('display.max_columns', None)

## Connect to SAS Viya
conn = swat.CAS(server_name, port, username, password)
print(conn)

This Python script initializes SAS Viya integration using SWAT. It imports Pandas, NumPy, and Matplotlib for data manipulation and visualization, sets Pandas to display all columns, and connects to SAS Viya, printing the connection details.

2. Explore Available Data in SAS Viya

## View available data sources in SAS Viya
ci = conn.caslibInfo()

## View available data source files
fi = conn.fileInfo(caslib = 'PIVY')

## View available in-memory tables
ti = conn.tableInfo(caslib = 'casuser')

View Available Data Sources in SAS Viya:
- ci = conn.caslibInfo(): Retrieves information about available CAS libraries in SAS Viya, storing the details in the variable ci.
View Available Data Source Files:
- fi = conn.fileInfo(caslib='PIVY'): Obtains information about the files in the CAS library named 'PIVY', storing the details in the variable fi.
View Available In-Memory Tables:
- ti = conn.tableInfo(caslib='casuser'): Retrieves information about in-memory tables in the CAS library named 'casuser', storing the details in the variable ti.

3. Load Data into Memory in SAS Viya

Load a Server-Side File Into Memory

## Load a table into memory and display the output
lt = conn.loadTable(path = 'loans_raw.sashdat', caslib = 'PIVY',
                    casOut = {'caslib' : 'casuser', 
                              'replace' : True})

## View available in-memory tables
ti = conn.tableInfo(caslib = 'casuser')

display(lt, ti)

Load a Table into Memory:
- lt = conn.loadTable(path='loans_raw.sashdat', caslib='PIVY', casOut={'caslib': 'casuser', 'replace': True}): Loads a table named 'loans_raw.sashdat' from the 'PIVY' CAS library into memory. The loaded table is then stored in a variable named lt. The option 'replace: True' ensures that if a table with the same name exists in the 'casuser' library, it will be replaced.
View Available In-Memory Tables:
- ti = conn.tableInfo(caslib='casuser'): Retrieves information about in-memory tables in the 'casuser' CAS library, storing the details in the variable ti.
Display the Output:
- display(lt, ti): Presents the loaded table (lt) and the information about available in-memory tables (ti). This provides a visual representation of the table's contents and metadata.

Load a Client-Side File Into Memory

## Path to the client side file cars_client.csv. The CSV file resides in the PIVY -> Data 
## folder on your Python client.
carsFile = (os
            .path
            .dirname(os.getcwd()) + r'/data/cars_client.csv')


## Set a list of column names for the CSV file
colNames = ['Make', 'Model', 'Type', 'Origin', 'DriveTrain', 'MSRP', 'Invoice',
            'EngineSize', 'Cylinders', 'Horsepower', 'MPG_City', 'MPG_Highway',
            'Weight', 'Wheelbase', 'Length']
conn.read_csv(carsFile, 
              header = None, 
              names = colNames,
              casout = {'name' : 'clientupload', 
                        'caslib' : 'casuser'}
             )

Define the Path to the Client-Side CSV File:
- carsFile = (os.path.dirname(os.getcwd()) + r'/data/cars_client.csv'): Constructs the file path to the CSV file named 'cars_client.csv' located in the 'PIVY -> Data' folder on the Python client. The path is stored in the variable carsFile.
Set a List of Column Names:
- colNames = [...]: Defines a list of column names to be used for the CSV file. Each element in the list corresponds to a specific attribute of the cars dataset.
Read the CSV File into SAS Viya:
- conn.read_csv(...): Uses the read_csv function to read the CSV file.
- header=None: Specifies that the CSV file does not have a header row.
- names=colNames: Provides the list of column names to be used.
- casout={'name': 'clientupload', 'caslib': 'casuser'}: Specifies the CAS table name ('clientupload') and the CAS library ('casuser') where the data will be stored in SAS Viya.

4. Make a Reference to a Table in SAS Viya

## Create a reference to a table and view the object
tbl = conn.CASTable('loans_raw', caslib = 'casuser')
tbl

Create a Reference to a Table:
- tbl = conn.CASTable('loans_raw', caslib='casuser'): Establishes a reference to a table named 'loans_raw' in the 'casuser' CAS library using the CASTable function. The reference is stored in the variable tbl.
View the Object:
- tbl: Displays the object, providing a concise representation of the reference to the 'loans_raw' table in the 'casuser' CAS library. This reference can be used for various data manipulation and analysis operations.

5. Explore a Table

## View the dimensions of the table
shape = tbl.shape

## Preview the table
df_head = tbl.head()

## View column attributes
df_ci = tbl.columnInfo()

## Obtain summary statistics
colNames = ['Age', 'Salary', 'EmpLength', 'Amount', 'InterestRate']
df_summary = tbl.summary(input = colNames)

View the Dimensions of the Table:
- shape = tbl.shape: Retrieves and stores the dimensions (number of rows and columns) of the 'loans_raw' table in the 'casuser' CAS library in the variable shape.
Preview the Table:
- df_head = tbl.head(): Fetches and stores a preview of the 'loans_raw' table's top rows in the variable df_head. This provides a quick overview of the table's structure and content.
View Column Attributes:
- df_ci = tbl.columnInfo(): Retrieves and stores information about the columns of the 'loans_raw' table, such as data type and other attributes, in the variable df_ci.
Obtain Summary Statistics:
- colNames = ['Age', 'Salary', 'EmpLength', 'Amount', 'InterestRate']: Specifies a list of column names for which summary statistics will be obtained.
- df_summary = tbl.summary(input=colNames): Obtains summary statistics for the specified columns ('Age', 'Salary', 'EmpLength', 'Amount', 'InterestRate') from the 'loans_raw' table, storing the results in the variable df_summary.

6. Analyze a Table

## Calculate the frequency of Category in SAS Viya
df = (tbl
      .Category
      .value_counts(normalize = True))
display(df)

## Plot the summarized results on the client using Pandas
df.plot(kind = 'bar', 
        figsize = (10,6), 
        title = 'Percent of Loans by Category');

Calculate the Frequency of Category in SAS Viya:
- (tbl.Category.value_counts(normalize=True)): Utilizes the Pandas library to calculate the frequency of each unique category in the 'Category' column of the 'loans_raw' table. The normalize=True parameter expresses the results as percentages, and the outcome is stored in the variable df.
Display the Results:
- display(df): Presents the calculated frequencies of each category in the 'Category' column, providing a clear overview of the distribution.
Plot Summarized Results Using Pandas:
- df.plot(kind='bar', figsize=(10,6), title='Percent of Loans by Category'): Plots a bar chart using Pandas, visualizing the percentage distribution of categories in the 'Category' column. The chart is set to display with a specified figure size and title.

df = (tbl
      .query('Category = "Mortgage"') 
      .groupby('Year') 
      .Amount
      .sum())

display(df)

df.plot(kind = 'line', 
        figsize = (10,6), 
        title = 'Total Amount of New Mortgage Loans by Year');

Filter and Group Data:
- (tbl.query('Category = "Mortgage").groupby('Year').Amount.sum()): Filters the 'loans_raw' table to include only rows where the 'Category' is "Mortgage". It then groups the filtered data by the 'Year' column and calculates the sum of the 'Amount' column for each year. The results are stored in the variable df.
Display the Results:
- display(df): Presents the total amount of new mortgage loans grouped by year, providing a clear overview of the aggregated data.
Plot Summarized Results Using Pandas:
- df.plot(kind='line', figsize=(10, 6), title='Total Amount of New Mortgage Loans by Year'): Plots a line chart using Pandas, visualizing the trend in the total amount of new mortgage loans over the years. The chart is set to display with a specified figure size and title.

7- Preparing data

## Subset the data by adding the where parameter to the CASTable object
tbl.where = 'Category = "Car Loan" and LoanStatus ne "Fully Paid"'

## Create the calculated columns by adding the computedVarsProgram parameter to the CASTable object
calcAccOpenDate = 'AccOpenDate = mdy(Month, Day, Year);'
calcLoanLengthMonths = 'LoanLengthMonths = LoanLength * 12;'
tbl.computedVarsProgram = calcAccOpenDate + calcLoanLengthMonths

## Specify the columns to include by adding the vars parameter to the CASTable object
tbl.vars = ['ID', 'AccNumber', 'Amount', 'AccOpenDate', 'LoanLengthMonths','LoanStatus', 'Category']


## Preview the CAS table with the parameters
tbl.head()

Subset the Data Using the WHERE Parameter:
- tbl.where = 'Category = "Car Loan" and LoanStatus ne "Fully Paid"': Modifies the CASTable object (tbl) to subset the data, including only rows where the 'Category' is "Car Loan" and the 'LoanStatus' is not "Fully Paid".
Create Calculated Columns Using ComputedVarsProgram:
- calcAccOpenDate and calcLoanLengthMonths: Define two calculations to create new columns. calcAccOpenDate calculates the 'AccOpenDate' using the 'Month', 'Day', and 'Year' columns, and calcLoanLengthMonths computes 'LoanLengthMonths' as the product of 'LoanLength' and 12.
- tbl.computedVarsProgram = calcAccOpenDate + calcLoanLengthMonths: Modifies the CASTable object to include the specified calculated columns.
Specify Columns to Include Using Vars Parameter:
- tbl.vars = ['ID', 'AccNumber', 'Amount', 'AccOpenDate', 'LoanLengthMonths', 'LoanStatus', 'Category']: Specifies the columns to include in the CASTable object, limiting the output to the specified columns.
Preview the CAS Table:
- tbl.head(): Displays a preview of the modified CASTable, showing the specified columns and the calculated columns based on the defined conditions and computations.

8- Terminate session

conn.terminate()

terminate or close the connection to the SAS Viya server.