Building models with SAS Enterprise Miner, SAS Factory Miner, SAS Visual Data Mining and Machine Learning or just with programming

Methods to the March Madness: Data Mining

Reply
SAS Employee
Posts: 1

Methods to the March Madness: Data Mining

Who wants to be a billionaire? - SAS Voices

As you may have heard, billionaire philanthropist Warren Buffett and Cleveland Cavaliers owner Dan Gilbert have teamed up to offer $1 billion to anyone who can create a perfect NCAA March Madness bracket. “Wow,” you might say. “How hard can it be to create a perfect bracket? I could really use a billion bucks!”  Well, the answer is “really, really, unbelievably hard.” So hard that in the history of March Madness, no one has ever done it. For you math lovers out there, the odds are supposedly 1 in 9.2 quintillion.  And what if someone is actually able to create this magical winning bracket?

"I will invite him or her to be my guest at the final game and be there with a check in my pocket, but I will not be cheering for him or her to win," Buffett said, jokingly. "I may even give them a little investment advice.”

I wanted to share how I used SAS Enterprise Miner with the Data Mining community as it related to the mania around March Madness.  I heavily used data mining techniques with SAS Enterprise Miner and SAS Rapid Predictive Modeler to get customers to be comfortable with data mining techniques via March Madness.  These are the steps I took to pull in the data to be analyzed.

Through my research, I’ve also compiled a list of some helpful (and some not-so-helpful) factors for selection. Here’s what’s been successful in the past:

  • RPI (Rating Percentage Index), based upon wins, losses, and strength of schedule
  • Jeff Sagarin rankings from USA Today
  • Wins against top 25 teams (per RPI rankings)
  • Wins against teams ranked 26-50
  • Neutral court wins (Note: conference tournaments matter!)
  • Record and rank in-conference (regular season championships matter!)
  • Strength of conference (conferences do matter!)

And here’s what doesn’t work:

  • A team’s record in the last 10 games, i.e. the "hot team" myth:
    • Strong finish is not important to the vs. a team’s overall performance
    • Those "hot teams" are often doing some of the other things -- winning on neutral courts, and beating teams in the top 25 or top 50 – that do help boost their chances according to the Dance Card
    • A team’s record against teams ranked 50-100
    • Winning against good teams helps, and the Dance Card model shows there's little downside in losing to good teams

               The lesson for athletic directors? Schedule less cupcakes and more top 50 RPI teams

In addition to SAS products, here are the other sources I tapped:

Let's submit a community bracket to prove that SAS has the best modelers around!  Entries must be complete prior to the start of the NCAA tournament.

Super User
Posts: 19,787

Re: Methods to the March Madness: Data Mining

Posted in reply to kathyball_sas

It's only open to Americans though :smileycry:

Do you happen to have data handy though, or should we compile our own?

SAS Employee
Posts: 25

Re: Methods to the March Madness: Data Mining

Hi Reeza, That's exactly why we wanted to open up this discussion for ALL members! Traditionally, entries must be completed prior to the NCAA tournament. The community brackets/predictions can be submitted through the tournament. I believe Kathy originally used some historical data of each team and other factors listed above. This discussion is just for fun, and a great way to use your Data Mining skills.

-Anna-Marie

Ask a Question
Discussion stats
  • 2 replies
  • 1209 views
  • 1 like
  • 3 in conversation