Building a festival line-up thank’s to machine learning (python)

Organizing a festival is a tough but great, especially when you’re a student. It’s been many years now, that students of my school organize a music festival called the Rock’n Solex. With 52 editions, it is in fact the oldest french music festival organized by students. Two years ago, I was musical programmer for this event, and I faced the difficulty of creating the best line-up as possible with a given budget. I specialized myself in mathematical engineering since then and discovered Machine Learning algorithms. Time is now come to put what I’ve learn into the test and do a little decision support system that I would have loved to have when I was musical programmer.

In this article, I go over key steps of the project. I talk briefly about how I created the data bases I used, and I also give some details about how I used machine learning (especially random forest) and a little Linear Program to obtain my festival line-up.

Create databases

The first thing I had to do was to generate datasets to “feed” my algorithm. This got me to learn about extracting datas with Spotify’s API (thank’s to them for having quality datas for each artist, song and playlist). Here are the steps I followed :

  • First create a first database containing festivals Line-Up since 2015 including the Rock’n Solex (of course) and 5 other similar or near (geographically) festivals
  • Via Spotify, get informations on each artist inputed (number of followers, popularity, music genre…)
  • Via Spotify, get informations on top 10 tracks of each artist inputed (accousticness rate, instrumentalness rate, is it danceable or not, duration, energy…)
  • Work on musical genre by adding binary variables such as : « Rock », « Pop », « Jazz », « Techno », « Funk », « Folk », « Dub », « House », « Electronic music » — I know it’s a bit restrictive but it helped me to get a good idea of the festival habits (in terms of genres in the Line Up)
  • Create some others variables such as « flag RNS », taking 1 if the artist has been programmed in the Rock’n Solex, 0 otherwise or « Recent content » taking 1 if the artist has released an album in the year or one year before he was programmed, 0 otherwise.

The Rock’n Solex is a “small” festival, and the learning data base I built is not really balanced : at the moment, there is way more binary variables flagRNS at 0 than at 1. To get a more balanced database, I decided to set flagRNS to 1 for all artists programmed in Imaginarium festival between 2015 and 2020. These two festivals are similar : they are both organized by students and their musical influences are similar so are their budgets. I stored all of this in a learning database (550 rows/artists).

Evolution of genres represented over the five last editions

Over the years, rap music has been more and more present in the line up (from 10% to 20% over the 5 last years). Electronic music, on the other hand, has always occupied the greatest part of the line up (around 50%) over the last 5 year. Pop, reggae and rock are fulfilling the line-up.

At the end of the project, I want to generate a festival line-up of 10 artists (5 per day of festival). Given what I analysed, I plan on choosing :

  • at least 2 rap artists
  • at least 2 electronic artists and 1 techno artist
  • at least 1 artist of pop, reggae and rock

Note that the headline will be either a rap artist or an electronic artist.

I know I’ll have to choose artists of multiple musical genres. This got me to create multiple scoring databases : one for rap, one for electronic music, one for pop, and so on.

To do so, and for each genre, I took various playlists of the genre and generate their corresponding datas, the same as in the learning database. However, I changed some meaning of variables : this time, the binary variable recentContent is 1 if the artist has released a project in 2021 or in 2020, 0 otherwise.

I obtained 1 learning dataset and 5 datasets for scoring (dimensions are written above).

Databases obtained

Create the line-up

Now that I have created my databases, how can I extract from them relevant artists for my line up ? I used the principle of scoring. Even though you have no particular background in mathematics, the principle of scoring is easy to understand : an algorithm gives a score for each artist in the database. The score is bounded between 0 and 1 : 1 means that the artist is perfectly suitable for the Rock’n Solex, 0 means the opposite.

To do so, I used a random forest algorithm. This allowed me to take the 15 most relevant artists in electronic music and rap, and the 10 most relevant artists in pop, rock, techno and reggae. The figure below sums up the process.

Scoring helped me to obtain top 15 and top 10 bases

To finish the project and build the festival line-up, I had to find a solution to select artists in these top15 and top10 to do a feasible line-up in term of prices.

Price is obviously an inaccessible data. To have an idea of this quantity, I made the hypothesis that the price of an artist is highly correlated (positively) to the number of followers this artist has on Spotify.

I want to maximize the popularity of the line up (i.e choose as many popular artists as possible), but stay in the festival budget. With the hypothesis made before, this means : try to maximize the number of followers of a line up (i.e the sum of followers of artists in a line up), knowing that it is bounded by something (an equivalent of the festival’s budget but in terms of followers). I set this something as the mean number of followers of line-up between 2015 and 2020 : around 815 000.

As I said before, the headline of the festival line-up is either a rap artist or an electronic artists. If the headline is a rap artist, the number of followers of selected rap artists has to be higher than the number of followers of other selected artists.

Also remember that I decided to choose at least 2 rap artists, 2 electronic artists, 1 techno artists, etc. This is another constraint that I need to take into account to create my line-up.

I formulated this problem as a linear problem, presented just below.

Linear program to create the line up

I wrote it with python puLP and obtained my line-up.

This project set bases of a decision support system. Of course, there are many ways to improve it. One can create other variables, one can use other algorithms for scoring (one good thing to do is to compare different algorithms with a given metric), one can also work on a bigger festival, with bigger databases (which may lead to something more accurate)… At the end, I’m happy of what I’ve done : even though generated line up are optimistic in term of budget, I think they are accurate and will suit with the festival. This project was an opportunity to learn about Spotify’s API and to translate some theory into practice.

I coded the project in python and I’ll put the code in my GitHub in the next days. If you have any question or comment to do, please don’t hesitate, I’ll be happy to discuss about it.

Student in mathematical engineering (machine learning, statistics, data analysis, optimisation…) at INSA Rennes (France, 4th year / 5).

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store