How do I get Nominated? Talk Oscars to me.

Oscars ceremony last weekend was a blast. My friends had their theories as to what movie would win the Best Picture Award; I secretly prayed for "The Imitation Game" to make it. Alas, I was wrong and I figured if I want to make better guesses in the future, I should seriously learn a little more about movie industry. Especially given how much I like movies (and who doesn't?).

So how do I learn more about movie industry and figure out what it takes to get nominated for an Oscar ?

Dataset

I have to admit that I know nothing about movie industry, so after spending a few hours on google search I came across a pretty cool movie website (http://www.the-numbers.com/market/) where they publish basic information about movies made in the last 20 years. I managed to obtain information on 11,330 movies produced between 1995 - 2015.

This dataset (lets call it Movie Dossier) consisted of the following fields:

  • Movie Name
  • Release Date
  • Distributor (Film Studio Name)
  • Genre
  • MPAA Rating (Movie rating system that suggests what type of audience is eligible for watching the movie)
  • Total Revenue ($)
  • Total # Tickets Sold (Units)
  • Year (When the movie was in production)

Along with the Movie Dossier, I also found a separate database on Box Office Mojo (http://www.boxofficemojo.com/oscar/) listing movies nominated for the Oscar Best Picture Award. I joined Movie Dossier and Mojo Oscar tables (on Movie Name) and voila - I knew if a movie was nominated or not.

Exploratory Analysis

When I looked at the Movie Dossier dataset, I didn't know where to start. Does movie production change over time? If so, how? Does movie genre have any significance? How come some movies are so much more popular than others? And above all, how can this data help me understand what helped 8 movies beat other 660 competitors and get nominated for Oscar's Best Picture Award in 2015?
So here goes...

My Lesson #1

When you enter an uncharted territory and you lack domain knowledge in the subject you are about to analyze, take a pause...find data you think is relevant and play with it. Very similar to how you usually play with a new gadget when you are too lazy to read the manual. This will help you learn more about the subject and generate hypotheses you are looking for.

I took my own lesson and started looking at basic metrics like:

  • Counts and Frequency tables on categorical fields (Movie name, Distributor, MPAA Rating and Genre)
  • Total, Average, Standard deviation of numerical fields (Total Revenue and Total # Tickets Sold)

While doing this simple analysis, I noticed that movie production has been following overall US economy market trends with a little lag. Here is a visualization I put together.

Preliminary Findings

First off, why do Drama movies bring less money than Comedy films?! I guess people prefer to be more funny than serious... But look, film producers don't seem to agree since they keep making more drama than comedy (1,960 comedy films vs. 3,541 drama movies have been produced since 1995, but comedies earned 31% more money than dramas). Adventure movies turned out to be the most efficient ones -- can you believe that 619 movies made almost $39B (i.e. $61M/movie)? Well, I guess they are the most expensive ones too.

Speaking of production budgets, Avatar proved to be a revenue champion in action genre with $760M in gross earnings and $425M spent on production (who said 79% is a bad ROI?).

Curiously, Action, Thriller/Suspense and Adventure top movies earned 2 times more money than Drama, Comedy/Romantic Comedy and Horror favorites. And to my biggest dissapointment, Justin Bieber's concert show ranked #1 in Concert/Performance genre. But lets keep moving...

The bottom chart displays movie production volume change since 1995. Bar charts represent the # of movies released in that year and the trend line shows average revenue per ticket sold. As an add-on, if you hover over any bar you will see:

  • Average revenue per movie in that year
  • Average # tickets sold that year

It was a big revelation for me that although movie production consistently followed US economic trend with a little lag (US market activity dropped in 2008-2009, whereas movie industry showed decline in 2010), in 2010 when movie production went down by 40%, on average film companies made a lot of money per film. Average revenue per movie was $25M which is the highest average revenue seen in 20 years. But when I looked at how much money each US citizen spent on movies that year, the picture cleared up a little. Turns out, people were paying $8.3 for a ticket compared to $6/ticket historical average.

So in a matter of few hours I saw that

  • People pay more to laugh than to cry although film producers for some reason make more drama than comedy.
  • If you happen to become a top movie you seem to be better off in action/thriller/adventure genre than in drama or comedy.
  • Turns out people will pay more for fewer movies than less for more movies.

Next Steps

I think I kicked my data around enough to generate initial hypotheses to answer my main question.

  • Genre is a promising factor that could help me understand whether a movie can get nominated for Oscar.
  • How much money a movie makes could be a determining factor in Oscar nominations.
  • Number of tickets sold is a reflection of movie's popularity, therefore, we could also use this factor to answer our question.

In my next post I will conduct a confirmatory analysis where I will test how well each factor can predict the likelihood of a movie to be nominated for Best Picture Award.

Your Turn

What do you think about the data? Could I use other sources to dive deeper into existing datasets?

Did exploratory analysis make sense to you? How else could I have explored the data to better link to Oscar nominations topic?

Have you conducted a similar exploratory analysis before? How did you approach the problem?

I would love to hear your feedback on this analysis. As I mentioned in my first post , I need your thoughts to fuel "knowledge bank" of case studies that will help data analytics community spend more time innovating and less time reinventing the wheel.