BookmarkSubscribeRSS Feed

Using SAS Viya to Understand Voter Turnout: An End-to-End Example

Started ‎04-26-2019 by
Modified ‎04-26-2019 by
Views 1,135

I recently heard on the radio that we were having record voting turnout here in the United States. Record turnout! I was so excited! Yes, record turnout, the radio announcer repeated. A whopping 49 percent. Wait, what? Did I hear that correctly? Forty-nine percent? THAT is record turnout?

 

I quickly pulled out an envelope and did some back of the envelope calculations … carry the one… and … that’s LESS THAN HALF. Yes, you heard right, less than half of eligible voters voted in the US general elections in November 2018. And that was a record turnout for a midterm election.

 

I decided to use SAS VA|VS|VDMML, Model Manager, and Decision Manager to take a preliminary look at voter turnout here in the US. I pulled a couple of small publicly-available datasets off the internet to illustrate the process.

 

For convenience I created a small dataset from 1980 to 2014 with just a few variables. Variables included state, year, education success score for state, % high school graduates, percent ineligible felons, voting eligible population, voting age population, etc.

 

Explore Data

First let’s use a geo map to look at voter turnout by state.

 

1-1024x661.jpg

Select any image to see a larger version.

 

My home state of Maryland falls about in the middle. Pretty shabby.

 

We can use ranking to look at the top 10 and bottom 10 states.

 

2-1024x643.jpg

 

Let’s look at voter turnout by year as a percentage of the voting eligible population (VEP) and the voting age population (VAP).

 

3-1024x714.jpg

 

We can easily see a big difference between presidential and midterm elections. Let’s use the VA interactive interface to create a calculated variable for ElectionType (midterm vs presidential). We can later use this new variable as an input in our model.

 

4-.jpg

 

Build Models

Let’s say in my hypothetical situation, I want to call a friend to whine about low voter participation. But I don’t want to inadvertently call a friend who did not vote, because then I will have to listen to a rash of lame excuses like, “I was getting a massage” or “I had to reorganize my sock drawer” or “I was busy posting pictures of my dog on Instagram” or “I was in a full body cast in the hospital.” Okay, maybe that last excuse was not so lame.

 

At any rate, I want to figure out which of my friends most likely voted, so that I can call her. So together let’s use VA|VS|VDMML to build some models and use Model Manager and Decision Manager to manage those models and make a decision for me.

 

Caveat: Keep in mind that the goal of this article is to illustrate the process of using these SAS Viya tools, not to come to any real conclusions. A much more detailed dataset and more rigorous process would be required for that. Please do not pick which friend to call to whine to based on this rudimentary analysis!

 

Our first step will be to choose pertinent variables for modeling. Let’s identify and eliminate highly correlated inputs.

 

5-1024x756.jpg

 

Once we pare down our input variables, we can create a model directly from the correlation matrix! We’ll use “Duplicate on new page” as decision tree to build a decision tree model. (Some previous versions of VA would use the term “launch” a new model here, but that’s soooo 2017.)

 

6-1024x581.jpg

 

Notice that all our inputs (independent variables) and our target (dependent variable) are numeric, because we started with a correlation matrix, which requires numeric variables.

 

Let’s add a categorical inputs to the model—ElectionType. We realize that this decision tree is regression tree because the target is numeric, but we can, of course have both numeric and categorical inputs. Let’s also add a partition variable so that we divide the data into training and validation datasets.

 

7-1024x586.jpg

 

Because it’s so easy (yay, VA!), let’s create a couple more models to compare to our decision tree. “Duplicate on new page” as forest model and linear regression model will build those two additional models using the same inputs, the same target, the same partition, and the same assessment criterion.

 

8-1024x377.jpg

 

Compare Models

We can compare models here in the VA|VS|VDMML interactive interface, or we can compare models using the Model Studio pipeline interface, or we can compare models in Model Manager.

 

Comparing models in the VA|VS|VDMML interactive interface

 

9-1024x630.jpg

 

We see that using the validation average squared error selects the forest model as the best model.

 

Comparing models in the Model Studio pipeline interface

We could create a pipeline from the original decision tree page, forest page, or linear regression page. But we will create the pipeline directly here from the Model Comparison page. Let’s create it as a new project.

 

10-1024x838.jpg

 

Notice that only the “Interactive Forest” model came through, because that was the best model selected based on our selection criterion. Here in the Model Studio pipeline, let’s add a gradient boosting model and a generalized linear model, and see if either of those performs better than the forest model.

 

11.png

 

Let’s run the pipeline and view the Model Comparison results.

 

12.png

 

We see that our Champion model is the gradient boosting model, based on the validation average squared error.

 

13-1024x806.png

 

Register Models

Recall that SAS Model Manager lets us:

  • Register (store) models in a common model repository, organized in projects and folders; the models may be created in Model Manager, or they may be brought in from:
    • SAS Visual Analytics (“Explore and Visualize Data” tab)
    • Model Studio (“Build Models” tab)
    • SAS Studio (“Develop SAS Code” tab)
    • SAS 9 models
    • PMML models
  • Compare models and select champion models
  • Monitor model performance
  • Publish models (to SAS Micro Analytic Service-MAS, CAS, Teradata, Hadoop, etc.) for scoring by external applications/interfaces

We will start by registering our models from Model Studio.

 

Note: We could also have registered models directly from the VA|VS|VDMML interactive interface

 

Let's go to the Pipeline Comparison tab to register our model in the Model Manager repository. By default, models registered from the Model Studio pipeline interface will be stored in Model Manager under /DMRepository. If we had registered the model from the VA|VS|VDMML interactive interface, it would by default be stored in Model Manager under /VARepository.

 

14-1024x909.png

 

Our gradient boosting model has been successfully registered!

 

15-1024x265.png

 

Let’s look at our models in Model Manager. To get to Model Manager, we will use the hamburger icon in the top left and select Manage Models.

 

16-.png

 

We can compare our models side by side here in Model Manager.

 

17-1024x440.png

 

18-1024x635.png

 

Versioning

Model Versions

Model Manager provides governance and helps us organize and track our models. We can create new model versions and new project versions. Let’s create a new version of a model. On the Models tab, we’ll click on the underlined name of the model to open it.

 

19-1024x190.png

 

We create a new version by going to the Versions tab and selecting New Version.

 

20-.png

 

21-1024x449.png

 

Note: We can only edit model properties and file contents of the current version of a model. Previous versions are snapshots set in stone.

 

A new model version is added any time we:

  • Manually add a new model version
  • Set a model as the champion model, or
  • Publish a champion model from the project level

Model versions can be neither unlocked nor deleted.

 

On the Versions tab, a checkmark indicates the set (displayed) version. To change the displayed version, we select the version that we want, and click Set Version.

 

22.png

 

The displayed version is the version whose information is displayed on the other tabs (Files, Variables, and Properties tabs). The version number for the displayed version follows the model name in the object title bar as shown below.

 

23-1024x336.png

 

Project versions

New project versions can be created in the project view.

 

24-1024x316.png

 

By default the name will be Version 1, Version 2, etc., but we can edit the name of the new version as we create it, and provide a helpful description.

 

25.png

 

In our Model list, we can choose whether to view all versions or a specific version that we select.

 

26-1024x317.png

 

Manage versions lets us edit version names or descriptions.

 

27.png

 

The History tab lets us see the complete history of our project and models, dates modified, and who modified them.

 

28-1024x459.png

 

Publishing Models

We can publish our model from the Model Manager interface. We can publish models to SAS Micro Analytic Scoring service-MAS, CAS, Teradata, or Hadoop. Publishing models lets them run elsewhere. For example, a model published to Hadoop will run in Hadoop.

 

To publish a model we go to the Models pane and select (checkmark) the model we want to publish. Then we simply click the Publish button.

 

Note that our system must be properly configured by the administrator in order to publish.

 

29-1024x264.png

 

Once a model is selected and opened, we can publish it from any tab (Files, Variables, Properties or Versions).

 

34.png

 

35.png

 

36.png

 

Model Information in Model Manager

Tons of information on our models is readily available in Model Manager. From the Files tab, we can see the .json and .sas files.

 

37.png

 

From the Properties tab, we can see the date created, date modified, repository location, project name, project version, score code type, etc.

 

38-1024x884.png

 

Again, we can publish to the SAS Micro Analytic Score service, CAS, Hadoop, or Teradata.

 

39.png

 

40.png

 

Make Decisions

Ta dah! We have created 5 models, compared them, selected a champion, registered and published models, and let Model Manager track our model versions. Now we are ready to make an actual decision. To call, or not to call. "That is the question: Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them?"

 

But I digress.

 

SAS Decision Manager lets us create a logical flow of analytical models, rules, and conditional logic to automate decision-making. In my simple example, I can decide whether I will call my friend to whine about low voter participation, based on the likelihood that she voted. Maybe I decide that if she was 60 percent likely to vote, that is good enough for me to give her a call. Although then I run the 40% risk of listening to an hour about how she reorganized her sock drawer.

 

41.png

 

 

 

By the Way

General elections are held in November in the United States. I find it interesting that some of the states certain to have the worst weather in November (cold and/or rainy) have the highest voter turnout—Minnesota, Maine, South Dakota, Montana, Oregon, Wisconsin, Iowa, Vermont, Connecticut, and New Hampshire. And some of the states with better autumn weather have poor turnout (West Virginia, South Carolina, Washington DC, Georgia, Texas, Tennessee, Nevada, Arizona, Hawaii, and Arkansas). See below the turnout for presidential year elections. This sure seems counter-intuitive to me; maybe someone should compare climate with voter turnout. 😊

 

(ASIDE: This appears to support but actually in no way actually supports my untested and unproved theory that young people are more likely to develop programming languages during cold, rainy Christmas breaks than sunny, beachy Christmas breaks, a la Guido Van Rossum’s development of Python allegedly while bored during Christmas holiday in the Netherlands.)

 

42-1024x643.png

 

One notable exception is Michigan. While not shabby (52.8% at 16th place), it falls well behind its neighbors Minnesota (64.6%) and Wisconsin (57.7%).

 

43.png

 

44-1024x652.png

 

I think I’ve stumbled onto a possible distraction that may be keeping voters away from the polls in Michigan. 😊

 

45.png

 

Apparently, this stadium is generally filled beyond capacity. That’s right, it appears that over 100,000 people regularly sit outside in the open air for hours at a time in the lovely Michigan weather.

 

46.png

 

And back to Guido, here’s some more cherry-picked circumstantial evidence that in no way supports my unfounded and unproven theory.

 

47.png

 

Here’s hoping that however cold it is as you ring in the New Year, you enjoy the season.

 

48.png

 

 

Sources and More Information

Version history
Last update:
‎04-26-2019 11:23 AM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Free course: Data Literacy Essentials

Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning  and boost your career prospects.

Get Started

Article Labels