Analysing the Oscars with data analytics and NLP

We at the Young Data Analytics Working Group (YDAWG) are used to looking at the world through a different, data-focused lens. So, despite the millions of different articles, opinions, and hot takes from this week's Oscars, we have managed to put together a unique and interesting way to look at some of the greatest movies from recent history. So, pop your favourite popcorn, import your favourite data packages, and prepare yourselves for another 'YDAWG Analytica' article!

**In this article, we use Data Analytics^TM to look at:**

Natural Language Processing (NLP) as a means to analyse the Oscars in a unique, novel way.
How we can use movies to understand analytics (and vice versa).
The People vs The Critics vs The Academy.

Natural subtitle processing

In this analysis, we primarily used the NLP technique of 'sentiment analysis'. This is a technique that takes a group of words or sentences and uses a language model to measure the polarity (whether the opinion is positive/negative) and intensity (strength between 0 and 1).

We're using a popular python package called NLTK and its pre-trained sentiment lexicon called Valence Aware Dictionary for sEntiment Reasoning or 'VADER' for short (conveniently fitting in with our movie theme). Often, this is used for reviews and testimonials to gain more insight into customer satisfaction.

A social media comment with VADER sentiment scores annotated on individual words and phrases, showing an overall positive comment sentiment of 0.941.

Figure 1: A highly positive review of our last YDAWG Analytica article. It only made sense that the Oscars were the next topic chosen…

This technique can be used on individual words, sentences or whole comments. I've picked out how VADER rates a few specific words, so you can see that the sentiments for individual words line up with human expectations (except personally I would rate the word data as highly positive!). Most words end up neutral, but the model is usually pretty smart about how it understands groups of sentences as a whole.

So, we can train computers to understand reviews, but how about movies? To attempt this, we came up with a novel and creative solution to use NLP on subtitles and track sentiment over time!

Two subtitle blocks from a film with VADER sentiment scores applied — a negative score of -0.276 and a positive score of 0.494 — illustrating how sentiment analysis works on movie dialogue.

Figure 2: Sentiment analysis is conducted on each block of subtitles. No points for guessing the movie…

We conveniently have timestamps with every piece of dialogue too, which means we can analyse these subtitles as a time series. With a little bit of python magic, we can add an aggregation function for a rolling 10 minutes of screen time to produce graphs like this.

A line chart showing subtitle sentiment across the runtime of all eight Harry Potter films, with Harry Potter 7-2 showing the most negative overall sentiment.

Figure 3: Testing some of our analysis on the Harry Potter films. As some might expect, the last movie had the most negative overall sentiment.

A (best) picture is worth a thousand (best) words…

While a picture can tell a thousand words, it's often easier to just use a thousand words to give us a better picture. So, let's use our newfound NLP skills to analyse all of the Best Picture nominees for the past few years of the Oscars.

A line chart of subtitle sentiment for 2019 Best Picture nominees, with winner Green Book highlighted in red and A Star Is Born showing a notable peak of positive sentiment at around 70% through the film.

Figure 4: A Star Is Born had a high peak of positive polarity start ~70% of the way into the movie.

On each graph, we have highlighted the winning movie in red, which for 2019 was Green Book. You can see the peak for one of the nominees, A Star Is Born, comes from a long series of positive compliments and remarks in a short window. There is even a celebratory montage and accompanying positive song lyrics that help it reach these high heights.

A line chart of subtitle sentiment for 2020 Best Picture nominees, with winner Parasite highlighted in red and a clear transition point annotated where the film shifts to its "second movie."

Figure 5: Best Picture winner Parasite has a clear turning point in the movie which is reflected in this sentiment analysis.

Parasite, Best Picture winner for 2020, has been hailed as a masterpiece that upends the familiar 'three-act' structure that we are used to. Instead, the film is almost two separate movies joined into one. In the graph above you can see that our NLP analysis is able to pick out when exactly the transition sequence kicks off and we move into the 'second' movie.

A line chart of subtitle sentiment for 2021 Best Picture nominees, with winner Nomadland in red and Judas and the Black Messiah showing a notably negative dip annotated as containing 50 or more expletives in 10 minutes.

Figure 6: 2021 Nominee Judas and the Black Messiah has some particularly negative language.

As you might imagine, there are sometimes key phrases or words which are interpreted by the model as especially negative or positive in every context. In Judas and the Black Messiah, an already dark movie, you can clearly see a portion where there are a lot of expletives used. While some of these quotes and text blocks are actually using these swear words in a neutral or positive way, the large majority of the time this is not the case. Either way, we won't be including any of these quotes in this article.

A line chart of subtitle sentiment for 2022 Best Picture nominees, with winner Coda in red and King Richard showing the highest sentiment peak of all films analysed, coinciding with the Williams family's arrival in Florida.

Figure 7: King Richard has the highest peak of all movies analysed so far.

One of the biggest films of the year was King Richard, which is also one with the most positive sentiment of all the Best Picture nominees. The huge surge in positive sentiment intensity comes as the Williams family arrives in Florida and, weirdly enough, the downturn happens after the family goes to Disney World. Not the way around I'd expect… On closer inspection, it seems to be caused by some fast-talking celebration and plenty of subtitle insertions of '[laughing]' and '[cheering]' picked up by the model. Or maybe there's just something really great about Florida (that isn't Disney…).

Breathless tension

One of the key skills of data analysis is understanding the limitations of your models, your data and your analysis. While the method chosen has proven to reveal many interesting points in Oscar movies, there is a key weakness that some of you might have already considered.

A line chart of subtitle sentiment for 2018 Best Picture nominees, with winner The Shape of Water in red and Dunkirk shown as a near-flat line bounded between plus and minus three, reflecting its minimal dialogue.

Figure 8: Dunkirk, a war movie known to be 100 minutes of pure tension, has the least intensity of sentiment.

A movie full of spectacle, with countless moments of intense fear and celebration, has somehow managed to evade the model's ability to pick up any moments at all. For those who have watched Dunkirk, it may not come as a surprise, because the movie manages to convey all this emotion through visuals and music, with very little dialogue at all. Due to this, the 10-minute rolling window often gives the VADER model very little to work with and so the wordless tension goes unnoticed in our analysis.

A bar chart showing the count of subtitle blocks for each Oscar-nominated film from 2016 to 2022, colour-coded by year, with Dunkirk having the fewest and The Irishman the most.

Figure 9: The number of 'subtitle blocks' to analyse varies greatly between movies.

The disadvantage Dunkirk has can be seen very clearly when all of the movie dialogue lengths are laid out together. The fact that we knew to investigate further because we'd seen Dunkirk leaves us a key lesson around the importance of subject matter expertise when analysing data. And also the lesson that maybe Scorsese's 3.5-hour marathon of a movie had 20,000 words too many.

Best in class

Some more interesting results can emerge when you aggregate many movies into specific classes such as 'Oscars year' or 'Oscars category'.

A line chart showing average subtitle sentiment for Best Picture nominees aggregated by year from 2016 to 2022, with 2022 nominees showing the highest average sentiment of 4.72.

Figure 10: Best Picture Nominees aggregated by year show a clear progression toward positive polarity subtitles.

It looks like the Oscars have gradually been including a greater number of movies with positive sentiment. Does this represent an improving disposition of the Academy? Have top-tier screenwriters escalated sentiment as they noticed nominations going to the most positive dialogue? Or is this a reflection of humanity yearning for happier movies, needing an escape from the hopeless reality of a perpetually deteriorating world? Or is it just chance?

A line chart comparing average subtitle sentiment across five Oscar categories for 2022 nominees, showing Best Picture with the highest average sentiment and Best Documentary Feature with the lowest.

Figure 11: Nominees from 2022 aggregated by category show the different kinds of language used in various styles of movies.

It's surprising to see Best Picture still above the pack, ahead of even the Animated Feature Film category which this year includes three Disney movies (Encanto, Luca and Raya and the Last Dragon). Also notable is that international films haven't reached the same lofty dialogue sentiment heights as their Hollywood counterparts, and the best documentaries follow the perception using more serious and more factual language than other categories.

But I think we've seen enough now from movie subtitles… it's time to see what everyone else has to say.

The People vs The Critics

It's been said that "any fool can criticise". Well, let's see how much the esteemed critics of Rotten Tomatoes agree with the average fool!

Each review on Rotten Tomatoes is binary (fresh or rotten) and the overall 'rating' represents the percentage of people who gave the movie a positive review.

The 'tomatometer rating' is comprised of writers, bloggers, podcasters and publications approved by Rotten Tomatoes as embodying the key values of dedication and insight.
The 'audience rating' is comprised of anyone bothered to make a free account.

A histogram comparing audience and critic ratings on Rotten Tomatoes for movies from 2000 to 2017, showing audience ratings follow a bell curve while critic ratings are more evenly distributed.

Figure 12: Comparison of ratings given by audiences vs critics on Rotten Tomatoes.

Looking at the histogram of ratings for movies from 2000 to 2017, the audience ratings tend to follow a nice almost bell curve, with the majority of movies getting a rating between 20% and 90%, and very few movies receiving a very high or very low rating. By contrast, the tomatometer ratings are quite uniformly spread across all rating levels.

Perhaps now is a good time to point out that if every rating on the site was decided by coinflip, regardless of movie quality, the resulting shape would be a bell curve.

The People vs The Critics vs The Academy

A histogram of Rotten Tomatoes ratings for Oscar nominees by audience and tomatometer score, with the movie poster for Norbit highlighted as an outlier with a near-zero critic rating despite its nomination.

Figure 13: Norbit stands out as a clear outlier among Oscar nominees — the film's near-zero critic rating sits well below the pack, despite its nomination for Best Achievement in Makeup.

Overall, there seems to be quite a lot of consensus between audiences, critics and the Academy with the clear peak of each histogram sitting around the 80% to 90% rating. One notable exception is the 2007 film starring Eddie Murphy and Eddie Murphy called Norbit. This 9% tomatometer film has an Oscar nomination for 'best achievement in makeup', but unfortunately for Eddie, it lost out to La Vie En Rose, which had three fewer characters played by Eddie Murphy.

A histogram comparing audience and critic Rotten Tomatoes ratings for Oscar winners across all categories, showing critics cluster strongly at the 90 to 100 range while audience ratings are more broadly distributed.

Figure 14: Comparison of ratings given by audiences vs critics for Oscar winners in any category.

It's when we get to the Oscar winners that we can really see the alignment between tomatometer and the Academy. This correlation is made possible by the higher number of 90+ ratings given out by critics. However, one thing to call out is that the audience and tomatometer ratings could be affected by Oscars results. Perhaps this means the tomatometer critics are just chasing clout by increasing their ratings of Oscar winners post hoc?

And with that baseless slander, we must bring this article to a close. As always, thanks for coming on this analytics journey with us! It's been fun crunching data and talking movies. Reach out with any feedback or ideas on topics you want us to cover next.

Catch up on more analytics deep dives by YDAWG Analytica: