BookmarkSubscribeRSS Feed

Data Processing with Union Rows in SAS Studio Flows

Started ‎02-14-2024 by
Modified ‎02-14-2024 by
Views 148

Have you ever needed to combine data from multiple sources, like a database and a spreadsheet? How about finding a subset of data from multiple data sets, like the customers that only shop in person but never online, or the customers that have purchased items from every department in the store at least once? What if you had a tool that could do both of those things at once?

 

As SAS Studio Flow capabilities continue to expand, so do your low code/no code options for data processing and consolidation! Combine your data in a few clicks with the Union Rows step, available starting in the Viya 2023.05 stable or LTS 2023.10 releases. This step enables you to combine data from multiple sources (when compatible) using seven different SQL set operators. Additional options allow you to customize the step based on your data needs.

 

A SAS Studio Analyst or SAS Studio Engineer license is required to use Union Rows.

 

Overview

 

The Union Rows step requires at least two input tables, but you can add more input ports as needed. This step is configured on the Options tab, where you select the desired set operator and column matching method. Columns can be matched by position or by name but be mindful that the column types should also align according to the column matching method.

 

01_grbarn_unionrows_1.png

 Select any image to see a larger version.

Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Then, for each source table, you can choose to include distinct rows only or filter rows with a where clause.

 

02_grbarn_unionrows_2.png

 

When running this step, SAS generates PROC SQL code that creates a table based on the set operator result, which combines multiple query results. Various keywords are added based on step options. ‘DISTINCT’ selects unique rows only, ‘ALL’ includes duplicate matching rows in the results, and ‘CORR’ matches columns based on name. WHERE clauses are added when applicable.

 

Demo: School rosters

 

For example, say I’m working with class roster data from a secondary school. I want to compare the number of students in honors classes against the number of students participating in school clubs. I have rosters for two honors classes (Algebra and Calculus) and three clubs (Art, Cross Country, and Zoology).

 

First, I’ll use two Union Rows steps to combine the honors rosters, then the club rosters. Both steps use the UNION operator and store the results in a new table.

 

03_grbarn_unionrows_3.png

 

Since I have three club rosters, I’ll add another input port to Union Rows. This means the step will perform two unions to combine all three tables.

 

04_grbarn_unionrows_4.png

 

Now, I can answer some questions I have about this data with additional Union Rows nodes.

 

For example, I can use the EXCEPT operator to find all the students who are in honors math but no clubs. Specifically, I'll use ALL_MATH_H as my first table, select the EXCEPT operator, then use ALL_CLUBS as my second table.

 

05_grbarn_unionrows_5.png

 

The result table has zero rows, so all honors math kids are also in clubs. This may suggest that honors students are more likely to participate in extracurricular activities.

 

06_grbarn_unionrows_6.png

 

I could also find the opposite result (students who are in clubs but not honors math) simply by switching the order of the source tables.

 

07_grbarn_unionrows_7.png

 

8 students are not in honors math. I can do future research to see if they’re in any other honors classes.

 

08_grbarn_unionrows_8.png

 

I’d also like to see which students are in honors math and at least one club. I can find this with the INTERSECT operator.

 

09_grbarn_unionrows_9.png

 

10 students participate in both.

 

10_grbarn_unionrows_10.png

 

Now, I’d like to see which students are in honors math and all three clubs. I can change my initial ALL_CLUBS logic to create a THREE_CLUBS table using the INTERSECT operator.

 

11_grbarn_unionrows_11_a.png

 

Only two students, Judy and Mary, are in all three clubs.

 

12_grbarn_unionrows_11_b.png

 

Then, I can combine ALL_MATH_H and THREE_CLUBS with an intersect to get my final answer.

 

13_grbarn_unionrows_11_c.png

 

Both Judy and Mary are in the results. This suggests that they’re exemplary, well-rounded students who might be eligible for certain awards or scholarships.

 

14_grbarn_unionrows_12.png

 

Alternatively, I could've found this result by doing a three-way intersect with ART, CROSS_COUNTRY, ZOOLOGY, and ALL_MATH_H, but I wanted to create and save the THREE_CLUBS table separately for additional processing later.

 

Summary

 

This is only one example of the possibilities with the Union Rows step. Though the logic itself is simple, this step can be used to handle some complex data prep and data consolidation tasks.

 

For more information on this step, visit the documentation.

 

Interested in SAS Studio Flows? Check out my previous post on the Merge Table step, plus other posts under the SAS Studio Flow tag.

 

 

Find more articles from SAS Global Enablement and Learning here.

Version history
Last update:
‎02-14-2024 03:28 PM
Updated by:
Contributors

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started