This is part of a new series of technical use cases, designed to help SAS users solve a particular business or technical challenge.
The challenge: How to segment customers based on their lifestyle, spending habits, and engagement patterns to develop targeted loyalty programs that increase customer retention, cross-sell opportunities, and brand advocacy, while optimizing the ability to offer personalized rewards and services that align with each customer segment.
Products used: SAS Studio and Visual Analytics (Visual Statistics provides advanced segmentation and predictive analytics to improve the accuracy of lifestyle segmentation). The Visual Analytics components are covered in Part 2 of this series.
SAS Data Explorer
Step 1: Import Customer Data into CAS: Open Manage Data in SAS Viya (this launches SAS Data Explorer)
In the Import tab, choose the source as Local Files (if uploading from your computer) and select the raw data files – for example, a customers.csv (with demographic and loyalty info) and a transactions.csv (with transaction histories). Specify the target CAS library (e.g., CASUSER for your personal session) and table name for each or accept defaults. Click Import to load each file individually or Import All to load all files at once. The data will be loaded into SAS Viya's in-memory environment (CAS) as tables (e.g., CASUSER.CUSTOMERS and CASUSER.TRANSACTIONS).
If a table with the same name already exists, you can either import under a different name or check the Replace option to overwrite it. Result: The raw customer profile and transaction data are now available in CAS as in-memory tables.
SAS Studio
Step 2: Verify Data in SAS Studio and start a CAS session: Launch SAS Studio (from the SAS landing page or applications menu, select Develop Code and Flows)
From the Start Page, select “Program in SAS” to create a new SAS program, enter the following code and press F3 to run it:
cas;
caslib _ALL_ assign;
Check the Log for confirmation messages (you should see notes such as "The session CASAUTO connected successfully" and a list of assigned caslibs). Now test the setup to ensure you can work with the CAS data: for example, run a simple proc print data=CASUSER.CUSTOMERS (obs=5); to print the first 5 rows of the CUSTOMERS table. If the output appears as expected, it means your CAS session is active and the data is accessible through SAS code. From this point on, any data steps or procedures you run on tables loaded to CAS will execute in the CAS environment. Tables that are not in CAS will execute locally in compute.
In SAS Studio's left pane, expand the Libraries section and locate the CAS library (e.g., CASUSER) where you imported the data. You should see your tables (e.g., CUSTOMERS and TRANSACTIONS). Open each table to inspect the contents (for instance, right-click on CUSTOMERS and choose Open). Verify that all expected columns are present and correctly populated: for example, the CUSTOMERS table should have fields like Customer_ID, Name, Age, Gender, Location, Loyalty_Tier, etc., and the TRANSACTIONS table should have Customer_ID (to link to customers), Transaction_Date, Amount, Merchant_Category, Channel, etc.
Check a few sample rows to ensure values look reasonable (e.g., numeric fields contain numbers, dates are in proper format). Also confirm the row count roughly matches the source data (e.g., if transactions.csv had 50,000 records, the CAS table shows ~50,000 rows). This can be done with the following code:
proc contents data=casuser.customers;
run;
Or,
proc fedsql sessref=casauto;
select count(*) from casuser.customers;
quit;
This step confirms the data was imported correctly and is accessible for processing
Step 3: (Optional) Load Data via PROC CASUTIL or PROC IMPORT: As an alternative to the interactive import in Step 1, you can use SAS code to load data into CAS (useful for automation or repeatable scripts). For example, to load the transactions CSV via code, you could use:
This code loads the CSV file from the specified path into the CASUSER caslib as a table named TRANSACTIONS (replacing any existing table with that name).
You would execute a similar proc casutil step for the customers.csv file. After running, check the SAS log for a note confirming the load (e.g., it will report the number of rows and columns loaded and the table name). The end result is the same: the data resides in CAS tables ready for use. This step is optional if you already imported the data via the GUI, but it's good to know for creating automated workflows.
Step 4: Clean and Merge Data: Prepare the data for analysis by cleaning inconsistencies and combining where necessary. First, standardize column names and data types — ensure that all field names are valid and easy to read (for example, SAS automatically converts spaces to underscores, so "Annual Income" becomes Annual_Income; rename any oddly formatted names if needed).
Verify numeric fields are truly numeric in CAS (if something imported as a character, convert it, e.g., using an INPUT function in a DATA step, or expand the number of sample rows so PROC CASUTIL or PROC IMPORT has a better guess). Address missing values or outliers as appropriate (you might, for instance, fill missing ages with a median or flag them, and check for any negative transaction amounts or other anomalies).
Next, merge the customer and transaction tables to create a unified dataset. Typically, you perform a join on the Customer_ID field. In SAS Studio, you can use a PROC FEDSQL step or a DATA step merge. For example, using PROC FEDSQL:
This left join ensures every transaction record (t) is paired with its corresponding customer profile info (c)
The result CASUSER.CUSTOMER_COMBINED will have multiple rows per customer (one for each transaction, including customers with no transactions if any). After merging, perform some quick checks on this combined table: verify that for a given Customer_ID, the demographic fields (like Name, Loyalty_Tier) are consistent across all their transaction rows, and confirm that the number of rows matches the number of transactions (to ensure no duplication or loss occurred during the join). You can check for duplication using this code:
proc sort data=casuser.customers
out=casuser.dupe_check
nodupkey
dupout=casuser.dupes;
by customer_id;
run;
This code will check for and remove duplicate customer IDs in CASUSER.customers, create a new deduplicated table, and save the duplicated values to a separate CAS table. If this table is populated with rows, then there are duplicates. The original table is still in-tact and not modified. If you wish to modify the original table, remove the out= option from PROC SORT.
By the end of this step, you have a clean, integrated dataset of customer information and their transactions, which will serve as the basis for calculating segmentation features.
Step 5: Feature Engineering – Create Customer-Level Metrics: Derive aggregated metrics that capture each customer’s engagement and spending habits, for use in segmentation. We need to summarize the transaction data up to the customer level. Using the combined dataset from Step 5, calculate features for each Customer_ID. Key metrics include:
Total_Spend: total monetary amount the customer has spent (sum of all transaction amounts)
Transaction_Count: total number of transactions the customer made (frequency of purchase).
Avg_Transaction_Value: average amount per transaction (Total_Spend divided by Transaction_Count).
Distinct_Merchant_Categories: number of unique merchant categories in which the customer has made purchases (breadth of spending interests).
Pct_Spend_Travel / Pct_Spend_Dining: percentage of the customer's spend devoted to specific categories like Travel or Dining (for example, if $200 of $1000 total is travel-related, Pct_Spend_Travel = 20%). These indicate lifestyle preferences (e.g., a high travel percentage might identify a frequent traveler).
Online_Txn_Count: number of transactions conducted via online channels (versus in-branch), as an indicator of digital engagement. You could also compute an online transaction ratio (Online_Txn_Count / Transaction_Count).
Last_Transaction_Date: the most recent transaction date for the customer, which can be transformed into a recency measure (e.g., days since last purchase).
Also, carry forward key demographic or profile attributes for reference (such as Age, Gender, or Loyalty_Tier) into the customer-level table (since these are constant per customer, you might take a MIN or MAX in the aggregation just to include them).
You can obtain these metrics using a single SQL query with GROUP BY, or with SAS procedures. For example, a PROC FEDSQL might look like:
proc fedsql;
create table casuser.customer_features as
select Customer_ID,
sum(Transaction_Amount) as Total_Spend,
count(*) as Transaction_Count,
mean(Transaction_Amount) as Avg_Transaction_Value,
count(distinct Merchant_Category) as Distinct_Merchant_Categories,
sum(case when Merchant_Category = 'Travel' then Transaction_Amount else 0 end) / SUM(Transaction_Amount) as Pct_Spend_Travel,
sum(case when Merchant_Category = 'Dining' then Transaction_Amount else 0 end) / SUM(Transaction_Amount) as Pct_Spend_Dining,
sum(case when Channel_Type = 'Online' then 1 else 0 end) as Online_Txn_Count,
sum(case when Channel_Type = 'Online' then 1 else 0 end) / COUNT(*) as Online_Txn_Ratio,
max(Transaction_Date) as Last_Transaction_Date
from casuser.customers
group by Customer_ID
;
quit;
This creates a new table CASUSER.CUSTOMER_FEATURES with one row per Customer_ID and the aggregated values.
After running such a step, inspect the resulting table (e.g., view a few rows or run PROC MEANS) to ensure the calculations make sense. For instance, check that for each customer, Total_Spend roughly equals Avg_Transaction_Value * Transaction_Count (barring rounding errors), and that percentages like Pct_Spend_Travel are between 0 and 1 (or 0–100% if formatted as percent). Now you have a concise customer-level feature dataset that quantitatively describes each customer's behavior and lifestyle, ready for clustering analysis.
Step 6: Promote the Customer Feature Table: The new customer-level table (e.g., CUSTOMER_FEATURES) currently exists in your session’s CAS library and would disappear when the session ends. To use it in Visual Analytics and keep it accessible, promote it to global status in CAS. In SAS Studio, run a PROC CASUTIL promote action. For example:
proc casutil;
promote casdata='customer_metrics' incaslib='casuser' outcaslib='casuser';
quit;
This command takes the CASUSER.CUSTOMER_FEATURES table from your session and promotes it as a globally available table named Customer_Lifestyle_Summary in the CASUSER library.
Note that if you have saved your data to CASUSER, you must promote it to another global CASLIB that is accessible by others. Otherwise, your data can only be seen by you since CASUSER is your personal library.
In part 2, we'll show you how to create segmentation reports in SAS Visual Analytics.
... View more