Claim your CPD points
From 2007 to 2009, Wayne Brady hosted the short-lived game show Don’t Forget the Lyrics! where contestants competed to recall the lyrics to well-known songs for the chance to win cash and prizes. In our latest article, YDAWG explores whether the lyrics of Eurovision songs - like in Brady’s show - are what determines a song’s chances of success.
The Eurovision Song Contest has been running since 1956 and has become well-known for its extravagant spectacles and larger-than-life performances. Despite the name, the competition features entries from countries across the globe, with songs performed in a variety of different languages. To begin our analysis, we have translated the lyrics of each song to ensure they are all in English to ensure consistency between comparisons, although some tone/sentiment may be lost as part of the conversion process.
Over 60% of Eurovision songs are not sung in English
Changes to the native language rule has meant that songs are performed more frequently in English
After translating the lyrics into English, we begin by performing an initial sentiment analysis. This involves assigning each song a distribution of sentiment, indicating the proportion of the lyrics that are negative, neutral and positive.
This process uses Natural Language Processing (NLP) to classify each individual word in the text based on the feelings they convey. Positive words such as “love” can increase the score while negative words such as “bad” decrease it. However, as this only considers each word individually, it can return results that are different when considering the broader context of the lyrics.
After this, we fitted a random forest to predict whether a song won Eurovision or not based on its sentiment. Although the model achieved an accuracy of over 90% (since its easy to correctly guess a song did not win), its practical performance was poor. The total ROC area under the curve was a relatively uninformative 63% and none of the Eurovision winners in the test dataset were correctly predicted.
A max depth of 5 was selected for our random forest
The original dataset was unevenly weighted, with most entries not representing Eurovision winners, which limited the analysis. To address this issue, we considered the following:
We extend the analysis by comparing four modelling approaches: a gradient boosting machine (GBM), a logistic regression model, a linear regression model and the previously used random forest.
We introduced two new outcome variables to assess song performance: final placing (e.g. 1st, 2nd) and total points scored. Instead of modelling Eurovision success as a binary outcome (win vs. not win), we reframed the problem as a regression task by predicting final placement on a scale from 1 to 25. Entries that did not reach the final stage (i.e. failed to qualify) were excluded from this version of the dataset to maintain consistency in the outcome variable. We also separately modelled total points scored as an alternative measure of performance.
Finally, to improve model performance and capture additional sources of variation, we introduced new features into the dataset. First, we included language as a predictor. While this is not directly derived from lyrical content, it may capture broader nuances that are potentially lost in translation. In addition, we included a binary indicator distinguishing between English and non-English songs.
Secondly, we examined whether lyrical structure provides additional information beyond sentiment alone. In particular, we considered whether songs that are more “lyrically pleasing” perform better. To investigate this, we introduced additional variables such as the number of adjacent rhyming pairs — that is, successive lines that end with a rhyme, also known as couplets.
We also wanted to consider how a song is written. For instance, do songs that have made-up words — think MMMBop by Hanson — get the crowd humming? Or would a dense, lyrically intrinsic song with a powerful message receive more votes at Eurovision?
To explore this, we introduced additional features capturing the linguistic composition of each song, including the number of verbs, nouns, adjectives and other grammatical categories present in the lyrics. We also included total word count as a separate variable. This allows us to investigate whether longer songs are associated with more favourable outcomes, or whether shorter, simpler songs are more effective in influencing voting behaviour.
Finally, we examined word usage patterns within each song, focusing on whether repetition plays a role in voting outcomes. Specifically, we analysed the proportion of unique words versus repeated words in the lyrics. This allows us to assess whether repeated lyrical phrases or hooks have an impact on a song’s popularity and its ability to attract votes.We also measured the proportion of unique words in each song — that is, words that appear only once — as a way of capturing how lexically varied a song's lyrics are.
However, many songs also include repeated non-lexical elements such as vocal sounds, filler phrases or stylistic repetitions that do not carry clear semantic meaning. Given that determiners such as “the” are common and not particularly informative, focusing on nouns provides a clearer indication of the key subjects being referenced and the actions or descriptions associated with them. As part of this, we have looked at what lyrics are the most common in each song. We have used the Natural Language Processing module to standardise the data. This ensures that words that are plurals are grouped with singular to ensure that data is not lost.
“Love” emerged as a particularly common theme in the lyrics, with approximately 10% of songs featuring it as the most frequently used noun.
English and French account for nearly two-thirds of all Eurovision winners.
As part of this analysis, we applied k-means clustering to group words based on their usage patterns. This approach allowed us to classify frequently occurring terms into distinct clusters with similar characteristics. In our dataset, we defined 50 different groups, with words such as 'love', 'lover' and 'loving' grouped together within the same cluster. Each song's cluster assignment — that is, the group its most frequently used word belongs to — was then included as a feature in the model.
Now that we have engineered our features, we proceed to the modelling stage. We have trained our model and evaluated its performance using a range of metrics to assess overall effectiveness.
As discussed above, we ran the data through a variety of different models and different prediction variables. Unfortunately, despite introducing these additional models and features, we were unable to develop a model that accurately predicts Eurovision performance. The best-performing model returned low r-squared values suggesting that the factors included in the analysis have limited predictive power on their own. In some cases, the models produced a negative R-squared value, indicating that they performed worse than a simple baseline model that predicts the mean outcome (i.e. a horizontal line).
Proper nouns and compound sentiment score are the strongest predictors of Eurovision performance in the random forest model.
Interestingly, song sentiment (negative, positive, compound) remained a consistently important feature across the different models. However, its degree of importance varied depending on the specific model and whether positive or negative sentiment was considered. In Figure 7, we present partial dependence plots for the key features to illustrate how changes in each variable influence the model’s predictions. Songs with a positive sentiment had a 2-3 percentage point improvement in their win prediction.
How key lyrical and sentiment features influence the predicted probability of Eurovision success
For example, if we focus on our GBM model, we obtain the SHAP plot shown below, which illustrates each feature’s contribution to the model’s predictions. In the bee swarm plot, the count of the key-word (e.g. “love”) has the highest SHAP value, indicating it is the most influential feature for this model. In contrast, the same feature shows only moderate importance in our random forest model. Similarly, neutral and negative sentiment songs did not score a high feature importance under the GBM, like they did for the random forest. Similarly, neutral and negative sentiment scores were less influential in the GBM than they were in the random forest model.
Positive sentiment and word repetition have the greatest influence on the GBM model's predictions
Unfortunately, our models couldn’t reliably determine Eurovision winners based on song lyrics. Success at Eurovision depends on a variety of different factors, and while a song’s lyrics might carry emotion, narrative, or harmonic resonance, it is only a fragment of a greater performance. Elements such as the musical composition, staging, beat, rhythm and everything in-between ultimately help contribute to a song’s overall success and make Eurovision the spectacle that it is. As a result, we encourage readers to tune in and enjoy the remaining performances as they unfold.
The performance times (AEST) are provided below:
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivatives CC BY-NC-ND Version 4.0.
Subscribe to Actuaries Digital for free and receive the latest actuarial analysis, research, and commentary direct to your inbox