I recently heard on the radio that we were having record voting turnout here in the United States. Record turnout! I was so excited! Yes, record turnout, the radio announcer repeated. A whopping 49 percent. Wait, what? Did I hear that correctly? Forty-nine percent? THAT is record turnout?
I quickly pulled out an envelope and did some back of the envelope calculations … carry the one… and … that’s LESS THAN HALF. Yes, you heard right, less than half of eligible voters voted in the US general elections in November 2018. And that was a record turnout for a midterm election.
I decided to use SAS VA|VS|VDMML, Model Manager, and Decision Manager to take a preliminary look at voter turnout here in the US. I pulled a couple of small publicly-available datasets off the internet to illustrate the process.
For convenience I created a small dataset from 1980 to 2014 with just a few variables. Variables included state, year, education success score for state, % high school graduates, percent ineligible felons, voting eligible population, voting age population, etc.
First let’s use a geo map to look at voter turnout by state.
Select any image to see a larger version.
My home state of Maryland falls about in the middle. Pretty shabby.
We can use ranking to look at the top 10 and bottom 10 states.
Let’s look at voter turnout by year as a percentage of the voting eligible population (VEP) and the voting age population (VAP).
We can easily see a big difference between presidential and midterm elections. Let’s use the VA interactive interface to create a calculated variable for ElectionType (midterm vs presidential). We can later use this new variable as an input in our model.
Let’s say in my hypothetical situation, I want to call a friend to whine about low voter participation. But I don’t want to inadvertently call a friend who did not vote, because then I will have to listen to a rash of lame excuses like, “I was getting a massage” or “I had to reorganize my sock drawer” or “I was busy posting pictures of my dog on Instagram” or “I was in a full body cast in the hospital.” Okay, maybe that last excuse was not so lame.
At any rate, I want to figure out which of my friends most likely voted, so that I can call her. So together let’s use VA|VS|VDMML to build some models and use Model Manager and Decision Manager to manage those models and make a decision for me.
Caveat: Keep in mind that the goal of this article is to illustrate the process of using these SAS Viya tools, not to come to any real conclusions. A much more detailed dataset and more rigorous process would be required for that. Please do not pick which friend to call to whine to based on this rudimentary analysis!
Our first step will be to choose pertinent variables for modeling. Let’s identify and eliminate highly correlated inputs.
Once we pare down our input variables, we can create a model directly from the correlation matrix! We’ll use “Duplicate on new page” as decision tree to build a decision tree model. (Some previous versions of VA would use the term “launch” a new model here, but that’s soooo 2017.)
Notice that all our inputs (independent variables) and our target (dependent variable) are numeric, because we started with a correlation matrix, which requires numeric variables.
Let’s add a categorical inputs to the model—ElectionType. We realize that this decision tree is regression tree because the target is numeric, but we can, of course have both numeric and categorical inputs. Let’s also add a partition variable so that we divide the data into training and validation datasets.
Because it’s so easy (yay, VA!), let’s create a couple more models to compare to our decision tree. “Duplicate on new page” as forest model and linear regression model will build those two additional models using the same inputs, the same target, the same partition, and the same assessment criterion.
We can compare models here in the VA|VS|VDMML interactive interface, or we can compare models using the Model Studio pipeline interface, or we can compare models in Model Manager.
We see that using the validation average squared error selects the forest model as the best model.
We could create a pipeline from the original decision tree page, forest page, or linear regression page. But we will create the pipeline directly here from the Model Comparison page. Let’s create it as a new project.
Notice that only the “Interactive Forest” model came through, because that was the best model selected based on our selection criterion. Here in the Model Studio pipeline, let’s add a gradient boosting model and a generalized linear model, and see if either of those performs better than the forest model.
Let’s run the pipeline and view the Model Comparison results.
We see that our Champion model is the gradient boosting model, based on the validation average squared error.
Recall that SAS Model Manager lets us:
We will start by registering our models from Model Studio.
Note: We could also have registered models directly from the VA|VS|VDMML interactive interface
Let's go to the Pipeline Comparison tab to register our model in the Model Manager repository. By default, models registered from the Model Studio pipeline interface will be stored in Model Manager under /DMRepository. If we had registered the model from the VA|VS|VDMML interactive interface, it would by default be stored in Model Manager under /VARepository.
Our gradient boosting model has been successfully registered!
Let’s look at our models in Model Manager. To get to Model Manager, we will use the hamburger icon in the top left and select Manage Models.
We can compare our models side by side here in Model Manager.
Model Manager provides governance and helps us organize and track our models. We can create new model versions and new project versions. Let’s create a new version of a model. On the Models tab, we’ll click on the underlined name of the model to open it.
We create a new version by going to the Versions tab and selecting New Version.
Note: We can only edit model properties and file contents of the current version of a model. Previous versions are snapshots set in stone.
A new model version is added any time we:
Model versions can be neither unlocked nor deleted.
On the Versions tab, a checkmark indicates the set (displayed) version. To change the displayed version, we select the version that we want, and click Set Version.
The displayed version is the version whose information is displayed on the other tabs (Files, Variables, and Properties tabs). The version number for the displayed version follows the model name in the object title bar as shown below.
New project versions can be created in the project view.
By default the name will be Version 1, Version 2, etc., but we can edit the name of the new version as we create it, and provide a helpful description.
In our Model list, we can choose whether to view all versions or a specific version that we select.
Manage versions lets us edit version names or descriptions.
The History tab lets us see the complete history of our project and models, dates modified, and who modified them.
We can publish our model from the Model Manager interface. We can publish models to SAS Micro Analytic Scoring service-MAS, CAS, Teradata, or Hadoop. Publishing models lets them run elsewhere. For example, a model published to Hadoop will run in Hadoop.
To publish a model we go to the Models pane and select (checkmark) the model we want to publish. Then we simply click the Publish button.
Note that our system must be properly configured by the administrator in order to publish.
Once a model is selected and opened, we can publish it from any tab (Files, Variables, Properties or Versions).
Tons of information on our models is readily available in Model Manager. From the Files tab, we can see the .json and .sas files.
From the Properties tab, we can see the date created, date modified, repository location, project name, project version, score code type, etc.
Again, we can publish to the SAS Micro Analytic Score service, CAS, Hadoop, or Teradata.
Ta dah! We have created 5 models, compared them, selected a champion, registered and published models, and let Model Manager track our model versions. Now we are ready to make an actual decision. To call, or not to call. "That is the question: Whether ‘tis nobler in the mind to suffer the slings and arrows of outrageous fortune, or to take arms against a sea of troubles, and by opposing end them?"
But I digress.
SAS Decision Manager lets us create a logical flow of analytical models, rules, and conditional logic to automate decision-making. In my simple example, I can decide whether I will call my friend to whine about low voter participation, based on the likelihood that she voted. Maybe I decide that if she was 60 percent likely to vote, that is good enough for me to give her a call. Although then I run the 40% risk of listening to an hour about how she reorganized her sock drawer.
General elections are held in November in the United States. I find it interesting that some of the states certain to have the worst weather in November (cold and/or rainy) have the highest voter turnout—Minnesota, Maine, South Dakota, Montana, Oregon, Wisconsin, Iowa, Vermont, Connecticut, and New Hampshire. And some of the states with better autumn weather have poor turnout (West Virginia, South Carolina, Washington DC, Georgia, Texas, Tennessee, Nevada, Arizona, Hawaii, and Arkansas). See below the turnout for presidential year elections. This sure seems counter-intuitive to me; maybe someone should compare climate with voter turnout. 😊
(ASIDE: This appears to support but actually in no way actually supports my untested and unproved theory that young people are more likely to develop programming languages during cold, rainy Christmas breaks than sunny, beachy Christmas breaks, a la Guido Van Rossum’s development of Python allegedly while bored during Christmas holiday in the Netherlands.)
One notable exception is Michigan. While not shabby (52.8% at 16th place), it falls well behind its neighbors Minnesota (64.6%) and Wisconsin (57.7%).
I think I’ve stumbled onto a possible distraction that may be keeping voters away from the polls in Michigan. 😊
Apparently, this stadium is generally filled beyond capacity. That’s right, it appears that over 100,000 people regularly sit outside in the open air for hours at a time in the lovely Michigan weather.
And back to Guido, here’s some more cherry-picked circumstantial evidence that in no way supports my unfounded and unproven theory.
Here’s hoping that however cold it is as you ring in the New Year, you enjoy the season.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.