Letterboxd Gaps
Kirsten Dunst had a great moment 1 Actually many great moments. in a 2016 Hollywood Reporter roundtable in response to actresses lamenting that they don't work with female directors, saying that "[she's] worked with so many female directors" and that "it's up to us as actresses to give the opportunity to first time directors". While actors have a lot more sway than audiences in getting projects greenlit, 2 Unless you're making Veronica Mars. I was curious what percentage of the movies I'd seen were directed by women. I googled around, but the website I found wasn't working and this script was non-trivial for breaking down custom lists. So I built my own!
Humble Beginnings
Much like the other projects that I'd seen analyzing this, it relies on Letterboxd
3
Follow me!
and TMDb, an aggregator for movie info including countries, languages, and casts (with gender info). A lot of tools analyzing Letterboxd data take in a username, but there's no public API so they rely on scraping your profile. This is slow, hard to cache, and temperamental—I wanted something that worked fast. Luckily while there's no public API, Letterboxd allows you to export your data into a .zip
file, which we can process.
4
Or at least I should've been able to process it using ZipArchive
in PHP. I was running PHP 7.4 though, and hosting for php7.4-zip
was recently removed, forcing me to do a long overdue server upgrade from Ubuntu 16.04 to 24.04 (and going through 18.04, 20.04, and 22.04 along the way). This took over a day of ~20 EC2 backups and disk resizing. Whoops.
The watched.csv
list has the date you marked it as seen, the name of the movie and its year of release, and a short URL that redirects to the main page. The name and year are typically enough to uniquely identify the movie, but they're not always
5
Take Leo (2023) (denoted by Letterboxd as leo-2023
) and Leo (2023) (denoted as leo-2023-1
).
so it's safer to load the Letterboxd page first, extract the unique TMDb ID, and use it to pull additional information about the movie.
Date Name Year Letterboxd URI
2020-09-13 Snowpiercer 2013 https://boxd.it/3Icg
2020-09-13 Frozen II 2019 https://boxd.it/aPvo
2020-09-13 50/50 2011 https://boxd.it/10Tw
Initially I had the client send off one request for each movie to get the information and poster. However Chrome is capped at 25MB of request data, resulting in Failed to load resource: net::ERR_INSUFFICIENT_RESOURCES
errors and the loading process aborting halfway through.
To fix this I swapped to doing it all in one request. This meant rather than stream a constant update of movies, it all had to be generated at once. Here's where caching became pretty invaluable. After you calculate it once you can save it to a database, and because there aren't that many movies 6 While TMDb has 1.1+ million movies, the Letterboxd lists of 'all movies' are more on the scale of 30k. that people watch you get good coverage pretty quickly. 7 Currently (Feb 2025) from 380 incremental uploads there are 55,139 unique movies processed.
Color Sorting
A lot of a project like this is visualization, and unfortunately movie posters aren't designed to look good shrunk down to 30px and randomly shuffled among a thousand other posters. Sorting by color is a natural way to do this, but the posters are complex enough to make this nontrivial. Here's some screenshots of me playing around with how to do this.
Polar coordinates with θ from hue and r from saturation
Force directed using
d3.js
Overcomplicated screen-filling layout using radius
I have another post coming™ around improving my approach to doing this, but I ended up going with the simple implementation of just sorting by average hue. This results in a bunch of white posters being sprinkled throughout, but you can still see the color bands.
Landing Page
While I abandoned the radial approach to rendering lists, I still wanted to keep something like that for the landing page. Looking through my list of watched movies I selected a bunch that had strong saturation, but also ones that I had some personal connection to.
I liked the first iteration, but I needed to make some space for an actual explanation of how to use the website, so I had to spread them out and rearrange them, finally landing on this. 8 In hindsight it's darkly funny that I ended up cutting Woman of the Hour (2023) for The Lorax (2012) in a project that kicked off around highlighting female directors, but WotH was only fine, while as The Lorax gave me a song I've (inexplicably) listened to upwards of 300 times. Sue me.
I ended up manually arranging them using some temporary code, which is why they're a little off-centered (or as I like to refer to it, characterful). You can see the full list of movies I used in my Letterboxd list here.
I also added in some animations, so that all of the movies expand outwards when you drag and drop a file onto it. Additionally after you drop it the posters fade out and the title pieces relocate to make room to render the list.
Exploring Lists
Letterboxd has a Watchlist feature, but I've started complementing that with other lists where whenever I think "I should watch that" and it's already in my watchlist, it goes into a 'Watchlist 2' list (and so on up to my current peak of 'Watchlist 4'). When I started the project it only supported whatever .csv
list I manually dropped in, which made flicking between my different watchlists a pain.
Because I swapped to only handling the full .zip
though, this was easy to support. I initially went with a standard <select>
tag (which is the normal way to render a selector list like this), but they don't support CSS styling
9
At least for now, though it's work-in-progress!
so it looked pretty garish.
I ended up rolling my own menu, which allowed me to make the custom lists collapsible, as well as split the diary.csv
file by year. There's probably a good JS library for handling tiered menus like this, but I couldn't trivially find a lightweight option, and it doesn't look horrible so leaving it as is for now.
Missing Languages and Countries
In addition to filtering my watchlists by female directors I was also curious if there were countries I hadn't seen movies from and languages that I hadn't seen movies in. Letterboxd Pro has a Stats page where you can see a map of which countries you've seen movies from, 10 I didn't replicate this functionality because it's explicitly forbidden by the API. I'm not technically using the API, but I agree that if you want that functionality you should just buy a subscription. They're great value! but doesn't make it easy to see which countries you're missing. Luckily TMDb has all country information for movies, and Letterboxd maintains a full list of countries and languages so it wasn't too hard to join these datasets. I started out how I always start out: just get something rendering.
The map UI was much prettier though, and I wanted to find a way to leverage it. I found an open-source library that rendered maps, but while the UX seemed very similar the projection was different and certain countries were combined (e.g. in Letterboxd's UI "Svalbard and Jan Mayen" are their own region vs. being part of Norway).
svgMap demo
Letterboxd map
I was exploring whether I could fork the open-source library when I realized that it was exactly what Letterboxd was using under the hood! Instead I just leveraged their customized svgMap
file. Tada!
The languages don't have a 1:1 mapping to countries, so I just displayed it as a list. You can also see from comparing the above screenshot to this one that if you select the default "Watched" list it will tell you all of the countries/languages you're missing, but if you select a specific list it will tell you which countries/languages you're missing from that specific list. This makes it better for my use case of curating my watchlists.
Loading Indicator
The site isn't guaranteed to have all of the movies you've watched already in the cache. When you upload a list with a new movie the site will upload the temporary information, and then process all the new movies in batches of 50. 11 How long this takes can vary, but in general it takes around 6s per batch. I wanted a themed loading indicator to convey this to users, so I did it in the "Letterboxd" letters at the top of the screen. You can see in this screenshot that they fill up with orange, green, and blue water, mirroring the Letterboxd colors.
Unfortunately most people click off the site pretty quickly, which has two downsides. One, it leaves movies unprocessed. To resolve this I have a Bash one-liner triggered by cron
every 6 hours that clears whatever backlog there is.
0 */6 * * * while output=$(php </path/to/>process.php <key2> 1 1); do echo "$output"; [[ "$output" == *"finished"* ]] && break; sleep 1; done
But secondly, and much more importantly, people don't get to enjoy the loading indicator!
This works by having a .header
div that has two children. One is the normal text, and the other is the highlighted text (in orange, green, and blue) wrapped in two wrapper divs.
<div class="header">
<div class="normal">Letterboxd</div>
<div class="progress">
<div class="wrapper">
<span class="orange">Let</span>
<span class="green">ter</span>
<span class="blue">boxd</span>
</div>
</div>
</div>
The header div is position: relative
so that the two layers of text can perfectly overlap each other, and then both wrapper divs are set to position: absolute
with bottom: 0px
so they're aligned in the same place. By doing overflow: hidden
on the .progress
div we can control the progress bar with height (what the above "Progress" slider is doing).
.header {
position: relative;
}
.progress {
position: absolute;
bottom: 0px;
width: 100%;
overflow: hidden;
}
.wrapper {
position: absolute;
bottom: 0px;
}
However to make the wave animation we have to be a little trickier. We can swap the overflow: hidden
for clip-path
, which uses SVG to crop out little bezier waves. However, you can't animate clip-path
(or at least not in Chrome, as it's very jerky). Instead we have the outermost wrapper scroll to the right. This makes the waves churn, but it also offsets the letters. This is why we have a second wrapper, as we can have it scroll to the left at the same speed! This way the letters that are being filled stay in place, but the clip-path
seems to animate sliding. By making it self-similar, it seamlessly loops, and voila!
.progress {
clip-path: path('M -100 0 Q -90 5 -80 0 Q -70 5 -60 0 Q -50 5 -40 0 Q -30 5 -20 0 Q -10 5 0 0 Q 10 5 20 0 Q 30 5 40 0 Q 50 5 60 0 Q 70 5 80 0 Q 90 5 100 0 Q 110 5 120 0 Q 130 5 140 0 Q 150 5 160 0 Q 170 5 180 0 Q 190 5 200 0 Q 210 5 220 0 Q 230 5 240 0 Q 250 5 260 0 Q 270 5 280 0 Q 290 5 300 0 Q 310 5 320 0 Q 330 5 340 0 Q 350 5 360 0 Q 370 5 380 0 Q 390 5 400 0 Q 410 5 420 0 Q 430 5 440 0 Q 450 5 460 0 Q 470 5 480 0 Q 490 5 500 0 L 500 0 L 0 500 Z');
animation: cycle 2s linear 0s infinite;
}
.wrapper {
animation: cycleReverse 2s linear 0s infinite;
}
@keyframes cycle {
from { transform: translateX(-40px); }
to { transform: translateX(0px); }
}
@keyframes cycleReverse {
from { transform: translateX(40px); }
to { transform: translateX(0px); }
}
Data
And finally an answer to the initial question. I set out trying to answer what percentage of the movies that I'd seen were directed by women. Of the 1,623 movies I've logged in Letterboxd (including shorts) there were 154, or 9.5%, directed by women. Filtering to countries/languages with over 5,000 movies it also showed that I haven't seen anything from the USSR, Argentina, or Czechoslovakia, and I haven't seen anything in Arabic, Hindi, Czech, Turkish, Tagalog, Indonesian, or Greek. Hoping to close some of these gaps in 2025!
If you're interested in any of this you can check out Letterboxd Gaps here and try it yourself. The code is all public on my GitHub.
-
Actually many great moments. ↩︎
-
Unless you're making Veronica Mars. ↩︎
-
Or at least I should've been able to process it using
ZipArchive
in PHP. I was running PHP 7.4 though, and hosting forphp7.4-zip
was recently removed, forcing me to do a long overdue server upgrade from Ubuntu 16.04 to 24.04 (and going through 18.04, 20.04, and 22.04 along the way). This took over a day of ~20 EC2 backups and disk resizing. Whoops. ↩︎ -
Take Leo (2023) (denoted by Letterboxd as
leo-2023
) and Leo (2023) (denoted asleo-2023-1
). ↩︎ -
While TMDb has 1.1+ million movies, the Letterboxd lists of 'all movies' are more on the scale of 30k. ↩︎
-
Currently (Feb 2025) from 380 incremental uploads there are 55,139 unique movies processed. ↩︎
-
In hindsight it's darkly funny that I ended up cutting Woman of the Hour (2023) for The Lorax (2012) in a project that kicked off around highlighting female directors, but WotH was only fine, while as The Lorax gave me a song I've (inexplicably) listened to upwards of 300 times. Sue me. ↩︎
-
At least for now, though it's work-in-progress! ↩︎
-
I didn't replicate this functionality because it's explicitly forbidden by the API. I'm not technically using the API, but I agree that if you want that functionality you should just buy a subscription. They're great value! ↩︎
-
How long this takes can vary, but in general it takes around 6s per batch. ↩︎