Lookup tables are a common efficiency tool. When a process produces hordes of data, you can preserve memory and improve performance by storing that information in multiple tables (such as customer data and order data). On the other hand, your data may include one or more codes from a common reference table, like a set of medical billing codes and their meanings. As the name suggests, lookup tables are used to perform table lookups when you want to add reference data to a table.
If you’re a low-code/no-code SAS user, many methods are available for performing table lookups. In this post, I’ll use a variety of tables to demonstrate four table lookup methods available in SAS Studio Flows.
Scenario
Through this post, I’ll be using one main table and seven different lookup tables. Observe the table relationships below:
Select any image to see a larger version. Mobile users: To view the images, select the "Full" version at the bottom of the page.
The main table, CUSTOMER, contains a list of all customers and their demographics (totaling 100,004 rows).
Three of the lookup tables (TRANSACTION, MERCHANT, and BANK) store purchase related information. I’ll perform table lookups to enhance the customer transaction records with information like where the transaction occurred and which bank approved the transaction.
The remaining four lookup tables (STATEPOPULATION, DIVISIONCODE, REGIONCODE, and STATECODE) store geographic information. The set of US population data uses abbreviations and codes to refer to the state, region, and division in each row. I’ll perform table lookups to replace these codes with descriptive labels.
Both result sets will be used to perform a final lookup with the customer data.
Lookup Custom Step
First, I’ll simply add the bank name to each transaction based on the listed bank ID number.
The Lookup custom step is perfect for this task, because it is easy to configure and returns one column from the lookup table. This step only requires you to select the columns to keep from the main table, the lookup key columns, and the return column from the lookup table. When submitted, this step will use hash objects to perform the lookup. However, note that custom steps are created and maintained by SAS users and are not available out-of-the-box. Visit the custom step's README documentation for a full description of step capabilities, and click here to learn how to use custom steps from the SAS Studio Custom Step GitHub Repository.
I’ll keep all columns from the main table, use BankID as the key column, and return the Name column.
The results show that the bank name was successfully added for each transaction with an associated BankID.
Query Step
Next, I’ll use the Query step to add merchant data to each transaction as well.
The Query step is multifaceted and can be used for tasks like sorting, deduplicating, and aggregating data. Another primary feature of this step is performing joins, which are another table lookup method. First, configure two or more input ports on the Query node. Then, you can specify join types and conditions on the Join tab. When submitted, this step will run a PROC SQL (or PROC FEDSQL) query. Visit the documentation for a full description of step capabilities.
I want to join the TRANSACTION_PARTIAL results with MERCHANT in a left join where MerchantID is the same in both tables. In addition, I’ve selected all columns from both tables for output (excluding duplicates).
The results show that descriptive merchant information has been added for each transaction, including the merchant’s name, type, and provided service. All the transaction data has been successfully combined into one output table, TRANSACTION_FULL.
Calculate Columns Step (with custom formats)
Now, I’ll switch to working on my state population data. I’ll create formats based on the STATECODE, REGIONCODE, and DIVISIONCODE tables, then apply those formats to STATEPOPULATION using the Calculate Columns step.
Each table is a simple lookup table containing a code and its label. For example, the StateCode NC has the label North Carolina, or the RegionCode 3 has the label South Region. I’ll convert the tables to formats with the following code, which can be executed with a SAS Program step.
proc sql;
create view region_fmt as
select "$region" as FMTNAME, RegionCode as Start, RegionName as Label
from regioncode;
quit;
proc format cntlin=region_fmt;
run;
proc sql;
create view division_fmt as
select "$division" as FMTNAME, DivisionCode as Start, DivisionName as Label
from divisioncode;
quit;
proc format cntlin=division_fmt;
run;
proc sql;
create view state_fmt as
select "$state" as FMTNAME, StateCode as Start, StateName as Label
from statecode;
quit;
proc format cntlin=state_fmt;
run;
Once the formats have been created, they can be applied with the Calculate Columns step. This step has an expression builder which can use any valid DATA step expression to create new columns. When submitted, this step will run DATA step code. Visit the documentation for a full description of step capabilities.
For this method, I’ll use the PUT function to apply the three new formats based on their original code columns, like so:
PUT(Region,$region.)
The results show that STATEPOPULATION has been enhanced with descriptive RegionName, DivisionName, and StateName for each state entry.
Lookup Step
Finally, I’ll perform table lookups to add TRANSACTION_FULL and STATES_FULL to the CUSTOMER table with the pre-built Lookup step.
The Lookup step became available starting in LTS 2024.09, making it the newest SAS Studio Engineer step. With this step, you can input one or more lookup tables, and you can control settings and features like key columns, filters, and exceptions for each lookup table. Optional error tables can be included in the results if you wish to analyze any rows that were not included in a successful table lookup. Visit the documentation for a full description of step capabilities.
After connecting TRANSACTION_FULL and STATES_FULL as lookup tables, I’ll select the key columns for each: TRANSACTION_FULL will use AccountID, and STATES_FULL will use Name (which matches with CUSTOMER.State).
In addition, I’ll filter TRANSACTION_FULL. I know that not every customer will have an AccountID. In addition, not every transaction has an associated AccountID. However, I don’t want to create a false join based on missing AccountID values, so I’ll add the filter:
AccountID is not null
Note that filters need to be written in SQL WHERE clause syntax, but must exclude the initial WHERE keyword.
Exceptions can be set to handle any possible errors during processing. Three exception conditions are possible: Lookup table missing, Lookup table contains no records, or Lookup value not found. An associated action can be set to execute when exceptions occur during processing. By default, when the lookup table is missing or the lookup table contains no records, the step will Abort process. I’ve added some processing for when the lookup value is not found. The main table’s row will be added to the error table and to the exception table, and the target column will be set to missing in the output table. However, you could choose other actions such as skipping the row or setting the column to a specific value.
Setting exceptions correlates directly to configuring error tables on the Error Tables tab. These are enabled by checking the box to create the table. An error table will simply contain the source table row that triggered the error. You can select one or more columns from the source table to include (I selected all columns) and you can also choose to limit the number of errors to store. If selected, this will surface additional options for error limits.
An exception table may seem similar, but it has different features. By default, it will include three columns containing the lookup table name, the exception condition that was triggered, and the exception action taken. You can choose to add target table columns here to help identify the record associated with the triggered exception. I’ve added identifying information, like customer AccountID, BankID, FirstName, and LastName.
The results contain a mega-table lookup result. This data contains all information regarding each customer, any associated transactions, and state demographics. This makes for a total of 106,680 columns and 54 rows. This result can be queried further to answer questions like: How many customers made more than one purchase, and what are their associated incomes? Did a merchant have any transactions from customers who were out-of-state? What percentage of the total recorded spending comes from customers in each state, region, and division?
Our error tables are included in the output as well. This preview of the EXCEPTIONS table shows that many customers appeared not to have an AccountID, meaning that they could not be joined with any transaction data. Based on my settings, the error table will have the same number of rows and can be used to answer other questions about the data.
Considerations & Summary
Each of these lookup steps use different underlying methods and vary in features, including those to control output and results. The best step for your needs will depend on how much control you want over your lookup settings and results.
These steps also vary based on availability by license. Find information on SAS Studio licensing and included steps here.
In this post, I demonstrated how to use four different methods to perform table lookups: the Lookup custom step, the Query step, custom formats and the Calculate Columns step, and the SAS Studio Engineer Lookup Step.
Want to know more about data manipulation in SAS Studio? Check out my previous blogs in this series on appending, sorting, deduplicating, and aggregating data.
Do you need to perform table lookups often? Did you learn a new lookup method today? Let me know!
Find more articles from SAS Global Enablement and Learning here.
... View more