Additional Info: I designed a Python workflow to perform OCR on every xkcd comic, feed that text into a large language model, and ask the model whether this comic was about the category named in the title.
Report an error
xkcd comics published about star wars correlates with...
Variable | Correlation | Years | Has img? |
Ana Ivanovic's WTA Finals played | r=0.85 | 8yrs | No |
Deepest snow depth in Dallas | r=0.84 | 7yrs | No |
The number of surgens in Minnesota | r=0.84 | 12yrs | No |
Air pollution in Kingston, New York | r=0.83 | 6yrs | No |
Number of moderate earthquakes worldwide | r=0.82 | 12yrs | No |
Average number of comments on PBS Space Time YouTube videos | r=0.79 | 9yrs | No |
Popularity of the first name Stanley | r=0.74 | 16yrs | No |
The number of movies Chris Evans appeared in | r=0.7 | 17yrs | No |
Petroluem consumption in Jamaica | r=0.69 | 15yrs | Yes! |
Sherbet consumption | r=0.67 | 15yrs | No |
Carjackings in the US | r=0.61 | 15yrs | No |
Number of edits to the Wikipedia article for Zeus | r=0.59 | 16yrs | No |
Google searches for 'black holes' | r=0.57 | 17yrs | No |
How provocative SmarterEveryDay YouTube video titles are | r=0.54 | 17yrs | No |
Points Scored by the losing team in the Super Bowl | r=-0.48 | 16yrs | No |
xkcd comics published about star wars also correlates with...
<< Back to discover a correlation
You caught me! While it would be intuitive to sort only by "correlation," I have a big, weird database. If I sort only by correlation, often all the top results are from some one or two very large datasets (like the weather or labor statistics), and it overwhelms the page.
I can't show you *all* the correlations, because my database would get too large and this page would take a very long time to load. Instead I opt to show you a subset, and I sort them by a magic system score. It starts with the correlation, but penalizes variables that repeat from the same dataset. (It also gives a bonus to variables I happen to find interesting.)