BookmarkSubscribeRSS Feed

Network Analysis in SAS Visual Analytics: Part 3 of Biostats in the Time of Coronavirus

Started ‎10-22-2020 by
Modified ‎10-30-2020 by
Views 6,142

Maybe you’ve heard of the 6 degrees of separation theory, that everyone in the world is connected through six or less other people.  The network analysis in SAS Visual Analytics helps you glean valuable information from even highly complex network data.  This article will show you how to get started! Check out other articles in this series: Sensitivity & Specificity in Disease Testing: Part 1 Statistical Concepts in the Time of Coronavirus...and SAS Helps You Understand Disease Spread: Part 2 Biostat Concepts in the Time of Coronavirus.

 

 

image001-1.png

Select any image to see a larger version.
Mobile users: To view the images, select the "Full" version at the bottom of the page.

 

Or maybe you’ve played the game of “6 degrees of Kevin Bacon,” where you name an actor and try to link that actor to Kevin Bacon through people who’ve been in films together. For example, if I select Kerry Washington, I figure out that I can get to Kevin Bacon in 3 leaps (links) through 2 people (nodes). Kerry Washington was in "Little Fires Everywhere" with Reese Witherspoon. Reese Witherspoon was in "Legally Blonde" with Luke Wilson. Luke Wilson was in "My Dog Skip" with Kevin Bacon. Boom. Done.

 

bacon4.png

 

But wait! Is there a shorter path? Kerry Washington was in The Last King of Scotland with James McAvoy (who, incidentally, donated £275,000 to UK National Health Service (NHS) for PPE). James McAvoy was in X-Men: First Class with Kevin Bacon. Even shorter! Along 2 links and through just one person.

 

bacon5.png

 

But neither of these gives the full picture. Network analysis is complex; people or other entities can be linked in very complex manners.

 

If we try to see too many links at once, we get essentially a hairball. The more complex the network, the more you will need network analytics to gain insights and make good decisions.

 

How people are linked is very important to the spread of disease. For diseases that can spread through droplets in the air, how much time people spend together, and how close they are affects spread. Also, if the infector is actively talking, singing, laughing and especially coughing and/or sneezing around the infectee, that can reduce the time needed for a dose of the virus sufficient for the infectee to contract the disease. Read more about disease spread in my previous article.

Using SAS Visual Analytics Network Analysis

Key Terms

A network analysis may be ungrouped or hierarchical.

  • Ungrouped networks use a traditional node-link, i.e., source and target data values.

     

    image006-1-1024x520.png

     

    Data for ungrouped network analysis must have one row for each source-target pair. For information on preparing data for network analysis, see SAS Education’s course by Nicole Ball and Lynn Matthews Visual Analytics 2 for SAS Viya: Advanced (YVA285) https://support.sas.com/edu/schedules.html?id=17167&ctry=US, Lesson 4: Performing Network Analysis.

     

  • Hierarchical networks use a standard hierarchy structure of categorical values. Examples can be geographic, such as Country, Region, State, Zip Code. Or non-geographic, such as Company, Product Line, Product.

     

    image007-1-1024x526.png

Node: An entity, e.g., individual; also called vertices.

 

Link: A connection between nodes; also may be called ties or edges. Links may be directed (asymmetric) or undirected (symmetric).

Network Analysis Metrics Available in SAS Visual Analytics

  • Disconnected Network IDimage008-2-300x156.png
  • Community
  • Reach Centrality
  • Closeness Centrality
  • Stress Centrality
  • Betweenness Centrality

Let’s look first at an ungrouped example.

UNGROUPED

In an ungrouped network, there can be many links among many entities. Let’s use a hypothetical data set representing face to face conversations between individuals. Each individual is represented by a node, and each individual may be both a talker and a listener. The links represents the conversations. Most of the individuals are associated with one of four care centers.

 

image009-1-1024x770.png

 

Again we open a new page, and drag the Network analysis object to the canvas. We assign as follows:

 

Source: Talker
Target: Listener
Color: Community
Link Width: Duration

Disconnected Network ID

The Disconnected Network ID can also be added to the Color role. It provides a label and color code for each separate disconnected group. Below we can see that most of the people are in the yellow network ID 1. However, two individuals, Chuck and Wilson, are stranded on an island, with no one to talk to but each other, and they are in the blue network ID 0.

 

withoutHanks.png

Community

Communities are highly connected clusters of nodes. Community is a derived attribute that is automatically created by the network analysis object. Community can be added to the Color role as shown below. Below we see 5 distinct communities, including one that is completely disconnected.

 

image011-1-1024x602.png

Reach, Closeness, Stress & Betweenness Centrality

Reach and closeness provide information about links. Stress and betweenness provide information about the nodes (e.g., entities or individuals). All of the metrics are based on the shortest path.

Reach Centrality

Reach is the number of links between a node and the farthest connected node (on the shortest path). The range includes whole numbers greater than or equal to 0.

 

Let’s filter our network to get a smaller group to illustrate this. Here we have filtered to view only the OpulentCare group, which coincides with one of our communities. We further filter to include only those conversations of 10 minutes or greater.

 

image012-1-1024x599.png

 

From our Roles pane, we select Reach Centrality for node Size. Most of the nodes have a Reach Centrality of 2, i.e., the shortest distance to the farthest node is 2 links. Only one node has a Reach Centrality of 1, because that node is connected directly one hop from every other node.

 

image013-1-1024x653.png

 

From the Options pane, let’s add data labels. We see that Olivia is the one who has the lowest reach centrality.

 

image014-1-1024x598.png

Closeness Centrality

Closeness centrality measures the distance that an entity is connected to every other entity in a network. This metric is normalized to a range from 0 to 1 with 1 being the highest closeness. Thus the highest number of links is normalized to 0 (0 closeness) and the lowest number of links is normalized to 1. (For more information on normalizing, see Changing the Scale: Transforming Data.)

 

In real world examples of disease spread, those entities with high closeness scores could be considered “broadcasters” or “superspreaders” of information or of disease. We see that Olivia has the maximum closeness score of 1.

 

image015-1-1024x649.png

Stress Centrality

Stress indicates how close a node is to all of its connected nodes. Specifically, stress centrality identifies the nodes that are crossed most frequently (when taking the shortest paths between nodes), regardless of where that stress originates. Its value is normalized to a range from 0 to 1. Nodes with value of 1 are the most frequently crossed i.e., most trafficked nodes. The node (or nodes) that is most frequently crossed will have stress centrality of 1.

 

image016-1-1024x820.png

 

image030-1024x820.png

Betweenness Centrality

Like stress centrality, betweenness centrality identifies the nodes that are crossed most frequently using shortest distance paths, and it also ranges from 0 to 1. However, betweenness centrality also accounts for multiple shortest paths between two nodes.

 

Again, at least one node in the network has a value of 1. The highest betweenness scores identify nodes that are critical to lots of origin destination pairs, but are not necessarily stressed the most. Betweenness measures the number of shortest paths an entity is on, which in turn indicates how often entities can reach each other through it. A high score indicates a likely path for flow of whatever is being measured, such as information or disease.

 

image018-1024x821.png

 

If these nodes represent people who have face to face conversations, based on the centrality measures alone, which node has the potential to spread disease more than the other nodes?

 

image019.png

 

For more details and information on how these metrics are calculated, see this video.

HIERARCHICAL

Hierarchical analysis requires a standard hierarchy structure of categorical values. Organizational structure or regional structure are common hierarchies.

 

For this illustration, let’s say, that we know that the US Centers for Disease Control (CDC) spends $5.7 billion on contracts a year. CDC lies within the Department of Health and Human Services (HHS). Maybe we want to get a picture of the overall US contract spending. We can do this with a hierarchical network.

 

Hierarchical networks are commonly fairly self-explanatory. For example, with the US Spending data includes four hierarchical levels: Country, Type, Department, and Branch. These are shown below in a List Table.

 

image020-1024x516.png

 

You can create a hierarchy in SAS Visual Analytics easily, by selecting + New data item, Hierarchy.

 

image021.png

 

Then double-click each item in the order you want the hierarchy, or use the plus-arrow icon to move the items to the right.

 

image022.png

 

This creates your hierarchy.

 

image023.png

 

Use the + tab to create a new page. From the Objects pane, drag your Network analysis icon to the canvas.

 

image024-1024x594.png

 

Open the Options tab on the right and select Type, Hierarchical.

 

image025-1024x591.png

 

In the Roles tab on the right, under Levels, select your hierarchy.

 

image026.png

 

Return to the Options pane and use the slider to change Additional levels to the maximum (in this case 3). Under Network Diagram (also in the Options pane), check Data labels, and change the Text style to font size 11, Bold. Voila, you have a hierarchical network diagram.

 

image027-1024x599.png

 

You can add information using the Roles tab, for example, setting Size to ContractSpendingMillions and Color to NumberOfPersonnel.

 

Notice that if you don’t like where items are displayed, you can select a node and move it. If you roll over an item, you will see the data tips. Here we see in the data tip below, for example, annual Contract Spending for the US Department of Defense is $358,300,000,000.

 

image028-1024x876.png

 

The data used here are publicly available from the internet. Defense contracts may be for services (e.g., operations, maintenance, R&D) or products (e.g., planes, radios). Examples of large contractors are companies like Lockheed Martin, Boeing, General Dynamics, Raytheon, United Technologies, and Huntington. If you want to create your own hierarchical network to see which contractors get the most US tax dollars, go ahead! You can find the details on the internet, and now you have the skills to create a hierarchical network!

 

Source: https://www.youtube.com/watch?v=dStT9Au2bN0

 

Sneak peek of Carlos’s presentation:

 

image029-1024x565.png

For More Information

See the following SAS education courses:

Version history
Last update:
‎10-30-2020 10:30 AM
Updated by:

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels
Article Tags