SAS programming concepts in this and other Free Data Friday articles remain useful, but SAS OnDemand for Academics has replaced SAS University Edition as a free e-learning option. Hit the orange button below to start your journey with SAS OnDemand for Academics:
As the process for electing the US President in 2020 gets underway with the declaration of candidates for their parties’ nomination, more and more data about that process is becoming available. A great resource for this data is the Federal Election Commission (FEC) and, in this post, we will be examining the source of funds raised by President Trump’s campaign. I chose the Trump campaign because it is, at the time of writing, the campaign with by far the highest level of donations received. The same process could be used for any of the declared candidates.
You can download the data from the FEC web site as a CSV file. I would strongly recommend only downloading data for one candidate at a time as otherwise the downloaded file will be huge. There are many more variables in the download than are displayed on the FEC visualization and we need to ensure that SAS University Edition can handle a file of the size we intend to use. Failure to do that can result in a corrupted installation forcing you to reinstall SAS UE (trust me, it’s happened to me…).
Firstly, because the downloaded file doesn’t have a very meaningful name, I renamed it for my convenience. I then used the Import Task to import the file into SAS – I had to use a very large value for the GUESSINGROWS option (68,000) as I discovered that one of the codes switches from being purely numeric to alphanumeric just before that level is reached. After examining the file I decided I didn’t need to make any changes to it in order to successfully carry out my analysis.
With no fewer than 77 variables in the file I had to decide which aspects of the data to examine. I decided to focus on where, geographically, the major donors were and what type of donors predominated (individuals, committees, PACs etc). My first step was to run a Proc Means on the data using contributor_state and entity_type as the Class Variables.
proc means data=campaign.trump noprint; class contributor_state entity_type; output out=trump_aggs sum=total_contributions mean=average_contribution; var contribution_receipt_amount; run;
This what the output looks like
I decided to create a pie chart using the SGPie Procedure showing the split between entity types. Firstly, however, because I wanted the entity types expanded to show their full description instead of just the abbreviation in the file, I created a custom format.
proc format; value $entity_type 'IND'='Individual' 'COM'='Committee' 'ORG'='Organization' 'PAC'='Political Action Committee' 'PTY'='Party'; run;
Now I can create the pie chart – notice how I can apply the custom format within the procedure call without first associating it with the variable in the data set. I also use the _type_ variable to determine which combination of class variables to chart.
ods graphics / reset; title1 'Contributions to Trump Campaign 2020'; title2 'Total Contributions by Entity Type'; footnote j=l 'Data from https://www.fec.gov'; proc sgpie data=trump_aggs(where=(_type_=1)); format total_contributions dollar11.0 entity_type $entity_type.; pie entity_type / response=total_contributions dataskin=gloss datalabeldisplay=all; run;
This what the generated Pie chart looks like
We can see that exactly two thirds of the campaign donations are from committees with a further 30% from individuals. The rest comes from all other categories combined. I checked the source file and discovered that there are only 6 contributors in the committee group and the “donations” appear to be transfers in from affiliated committees. But turns out they aren't. Therefore, that seems a very profitable line for further enquiry, so I decided to concentrate on the donations from individuals.
I was curious about where the majority of donations came from, so I created a horizontal bar chart showing the total dollar amount of contributions by state. Before doing that, however, I built another custom format that converted the state codes into full state names.
/* Taken from http://support.sas.com/kb/25/301.html */ proc format; value $statename 'AL'='Alabama' 'AK'='Alaska' 'AZ'='Arizona' 'AR'='Arkansas' 'CA'='California' 'CO'='Colorado' 'CT'='Connecticut' 'DE'='Delaware' 'DC'='District of Columbia' 'FL'='Florida' 'GA'='Georgia' 'HI'='Hawaii' 'ID'='Idaho' 'IL'='Illinois' 'IN'='Indiana' 'IA'='Iowa' 'KS'='Kansas' 'KY'='Kentucky' 'LA'='Louisiana' 'ME'='Maine' 'MD'='Maryland' 'MA'='Massachusetts' 'MI'='Michigan' 'MN'='Minnesota' 'MS'='Mississippi' 'MO'='Missouri' 'MT'='Montana' 'NE'='Nebraska' 'NV'='Nevada' 'NH'='New Hampshire' 'NJ'='New Jersey' 'NM'='New Mexico' 'NY'='New York' 'NC'='North Carolina' 'ND'='North Dakota' 'OH'='Ohio' 'OK'='Oklahoma' 'OR'='Oregon' 'PA'='Pennsylvania' 'RI'='Rhode Island' 'SC'='South Carolina' 'SD'='South Dakota' 'TN'='Tennessee' 'TX'='Texas' 'UT'='Utah' 'VT'='Vermont' 'VA'='Virginia' 'WA'='Washington' 'WV'='West Virginia' 'WI'='Wisconsin' 'WY'='Wyoming' 'RQ'='Puerto Rico' 'GQ'='Guam' '99'='Foreign'; run;
There are a few things to note in the Proc SGPlot call:
Here is the code to generate the chart:
ods graphics / reset width=8in height=10in imagemap; title1 'Contributions to Trump Campaign 2020'; title2 'Total Contribution (Individuals) by State'; footnote1 j=l 'Data from https://www.fec.gov'; footnote2 j=l 'Minimum 20 Contributions'; proc sgplot data=trump_aggs(where=(_type_=3 and entity_type="IND" and _freq_>19)); hbar contributor_state / response=total_contributions categoryorder=respdesc dataskin=pressed fillattrs=(color=vpab) tip=(contributor_state _freq_ total_contributions) tiplabel=("State" "No. of Contributors" "Total of Contributions") tipformat=($statename. comma8.0 dollar12.0); xaxis label="Total Contributions in US$"; yaxis label="Contributor State" fitpolicy=none; run;
This what the generated chart looks like:
There are no great surprises here with the largest amounts raised coming from the big states of Florida, Texas and California.
I then charted the average donation amount by state, again using Proc SGPlot:
ods graphics / reset width=8in height=10in imagemap; title1 'Contributions to Trump Campaign 2020'; title2 'Average Contribution (Individuals) by State'; footnote1 j=l 'Data from https://www.fec.gov'; footnote2 j=l 'Minimum 20 Contributions'; proc sgplot data=trump_aggs(where=(_type_=3 and entity_type="IND" and _freq_>19)); hbar contributor_state / response=average_contribution categoryorder=respdesc dataskin=pressed fillattrs=(color=vpab) tip=(contributor_state _freq_ average_contribution) tiplabel=("State" "No. of Contributors" "Average Contribution") tipformat=($statename. comma8.0 dollar8.2); xaxis label="Average Contribution in US$"; yaxis label="Contributor State" fitpolicy=none; run;
This was the generated chart:
One big surprise here was that Washington DC had by far the highest average donation – over $500 against $125 for the next most generous state (Ohio). At this point I should add that I know the District of Columbia is not a state, but it’s usually included in these types of analyses.
The question, then, is what does this mean? There were only 49 contributions from DC against over 4,000 from Ohio. so the results need to be treated with some caution. Could it be that the sort of people who donate to campaigns are, on average, wealthier in DC than anywhere else, or are capitol residents just more politically committed? More analysis is needed to try to answer that question and the many others that can be generated from this extensive data set.
Did you find something else interesting in this data? Share in the comments. I’m glad to answer any questions.
Visit [[this link]] to see all the Free Data Friday articles.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Data Literacy is for all, even absolute beginners. Jump on board with this free e-learning and boost your career prospects.