Spurious correlation #5,065 · View random
A linear line chart with years as the X-axis and two variables on the Y-axis. The first variable is Popularity of the first name Santana and the second variable is Total likes of Tom Scott's YouTube videos. The chart goes from 2009 to 2022, and the two variables track closely in value over that time.

A linear line chart with years as the X-axis and two variables on the Y-axis. The first variable is Popularity of the first name Santana and the second variable is Total likes of Tom Scott's YouTube videos. The chart goes from 2009 to 2022, and the two variables track closely in value over that time.

Download png, svg

AI explanation

Because every time someone named Santana liked a video, it strummed a chord in the hearts of viewers, leading to a smooth, infectious wave of appreciation for Tom Scott's content. This led to a harmonic convergence of support, as if the universe was saying "Smooth move, Tom, you've got the name game on lock." It's almost as if the act of hitting 'like' was imbued with a hint of supernatural, guitar-wielding charm. Tom Scott's videos were no longer just informative, they were irresistibly catchy, leaving everyone thinking, "Oye como va, I can't get enough of this content!"

^{Show image generation prompt} Model: dalle-3
Prompt: Prompt: Generate an image of a futuristic city street with colorful holographic billboards displaying the name "Santana" in bold letters. Each time someone named Santana likes a Tom Scott YouTube video, vibrant guitar strings materialize in the air, emitting a warm glow. Onlookers gaze in awe as the total likes on Tom Scott's videos illuminate the sky like twinkling stars, harmonizing with the guitar chords in a mesmerizing symphony of support and admiration. The depiction should include only scenes, people, and objects, it should not include text or speech bubbles.

This image generation prompt was also generated by AI, using the following sequence:

Model: gpt-3.5-turbo-1106
System prompt: You select one of a few potential scene descriptors that seems to go along with the variables described. You respond only with the scene descriptions provided and say nothing else.
User prompt: Here are two variables: Popularity of the first name Santana and Total likes of Tom Scott YouTube videos. Given these, which of the following image style descriptions seems most appropriate? Pick the one that seems to be related to one of the variables at hand. Please respond with only one of the following potential descriptions: A scene from Skyrim Cubism Glitch art Symbolism High-tech
ChatGPT: High-tech

Model: gpt-3.5-turbo-1106
System prompt: You describe simple visual scenes that depict the two variables stated. You combine both variables into one scene.
User prompt: Please describe a visual scene that prominently depicts both Popularity of the first name Santana and Total likes of Tom Scott YouTube videos, in the following situation: Because every time someone named Santana liked a video, it strummed a chord in the hearts of viewers, leading to a smooth, infectious wave of appreciation for Tom Scott's content. This led to a harmonic convergence of support, as if the universe was saying "Smooth move, Tom, you've got the name game on lock." It's almost as if the act of hitting 'like' was imbued with a hint of supernatural, guitar-wielding charm. Tom Scott's videos were no longer just informative, they were irresistibly catchy, leaving everyone thinking, "Oye como va, I can't get enough of this content!". Keep the description simple.
ChatGPT: Imagine a bustling city street where billboards with the name "Santana" in bold letters line the sidewalks, each time someone named Santana likes a Tom Scott YouTube video, vibrant guitar strings appear in the air, resonating with a warm glow. Passersby can't help but turn their heads to witness the magical spectacle, as the total likes on Tom Scott's videos light up the sky like twinkling stars, harmonizing with the guitar chords in a symphony of support and admiration.

Model: gpt-3.5-turbo-1106
System prompt: You write and improve image generation prompts for DALLE-3. You maintain the same subjects, but enhance the visuals.
User prompt: Please respond with an image generation prompt that would be likely to create an interesting visual using DALLE-3, using the following style: [ High-tech ] and depicting this scene: Imagine a bustling city street where billboards with the name "Santana" in bold letters line the sidewalks, each time someone named Santana likes a Tom Scott YouTube video, vibrant guitar strings appear in the air, resonating with a warm glow. Passersby can't help but turn their heads to witness the magical spectacle, as the total likes on Tom Scott's videos light up the sky like twinkling stars, harmonizing with the guitar chords in a symphony of support and admiration. Keep the visual depiction simple. Do not include any words or speech bubbles.
ChatGPT: Prompt: Generate an image of a futuristic city street with colorful holographic billboards displaying the name "Santana" in bold letters. Each time someone named Santana likes a Tom Scott YouTube video, vibrant guitar strings materialize in the air, emitting a warm glow. Onlookers gaze in awe as the total likes on Tom Scott's videos illuminate the sky like twinkling stars, harmonizing with the guitar chords in a mesmerizing symphony of support and admiration.

*Variations in the User Prompts from chart to chart are just the result of random number generation in Python. I wrote a few arrays of various styles and methods to ask questions to change up the results. Every time this site writes an explanation or generates an image, the script picks from each at random.

I sequence the requests into multiple prompts because I find GPT 3.5 to perform much better with short, well-managed contexts. Thus, I track the context directly in Python and only ask ChatGPT targeted questions.
^{Show prompt for explanation} Model: gpt-3.5-turbo-1106 (fine-tuned)
System prompt: You provide humorous responses in the form of plausible sounding explanations for correlations. You assume the correlation is causative for the purpose of the explanation even if it is ridiculous. You do not chat with the user, you only reply with the causal connection explanation and nothing else.
User prompt: Please make up a funny explanation for how an increase in Popularity of the first name Santana caused Total likes of Tom Scott YouTube videos to increase.

AI academic paper

(Because p < 0.01)

The Name Game: A Correlational Study of Santana, Smooth, and Scott

The Journal of Musical Cognition and Perception

Jan 2024

Reminder: This paper is AI-generated. Not real!

^{Show prompt used to generate this paper} Quick note: This sequence is long and repetitive. That's how the prompt engineering works for this one. I manage the output in Python to format it into this PDF.

Model: gpt-3.5-turbo-1106
System prompt: You are a dry but witty academic researcher. You speak in the passive voice, and your writing sounds like it belongs in a medical or economics journal.You draft a literature review section of an academic research paper, that starts out dry for the first few sentences but then becomes hilarious and goofy.
User prompt:
Pretend you are an academic researcher. Your writing sounds like it belongs in a medical or economics journal. However, you frequently crack jokes, make puns, or add unexpected twists in your writing.

Please make up a literature review section of an academic research paper presenting the findings of the connection between Popularity of the first name Santana and Total likes of Tom Scott YouTube videos. Make it seem dry like academic research paper, but with lots of subtle hidden offhand remarks, jokes, goofy observations, and puns.

Speak in the present tense for this section (the authors find...), as is common in academic research paper literature reviews. Name the sources in a format similar to this: In "Book," the authors find lorem and ipsum.

Make up the lorem and ipsum part, but make it sound related to the topic at hand.

Start by naming serious-sounding studies by authors like Smith, Doe, and Jones - but then quickly devolve. Name some real non-fiction books that would be related to the topic. Then name some real fiction books that sound like they could be related. Perhaps you also got inspiration from some board games that are vaugely related.

Here is the title and abstract of the paper:
[[TITLE]]
"The Name Game: A Correlational Study of Santana, Smooth, and Scott"

[[ABSTRACT]]
This study undertakes a rigorous analysis of the relationship between the prevalence of the first name "Santana" and the total likes garnered by the YouTube videos of the esteemed Tom Scott. Leveraging data from the US Social Security Administration and YouTube, our research team found a striking correlation coefficient of 0.9635675 and a statistically significant p-value of less than 0.01 for the years 2009 to 2022. The implications of this curious connection, including the potential influence of smooth jazz and supernatural forces, are discussed with cautious enthusiasm.

-------

And here is the introduction section that led to this literature review:
[[INTRODUCTION]]
The interplay between nomenclature and notoriety has long been a subject of fascination and intrigue. Throughout history, the significance and impact of names have been the focus of folklore, literature, and parental debates alike. However, our latest investigation delves into an unlikely pairing: the first name "Santana" and the total likes of the YouTube videos of the erudite Tom Scott.
There is an uncanny allure to the mystical aura surrounding the name "Santana," evoking images of transcendent guitars and the dulcet tones of smooth jazz. On the other hand, the sagacious and succinct content of Tom Scott's YouTube videos has garnered an ardent following of aficionados and enthusiasts. The amalgamation of these two seemingly unrelated entities piques our curiosity and incites a desire to unravel their enigmatic connectedness.
Drawing upon a vast repository of data encompassing the years 2009 to 2022, culled from the US Social Security Administration and the YouTube platform, our study embarks on a quest to discern a correlation between the prevalence of the first name "Santana" and the total likes accrued by Tom Scott's videos. We embark on this venture with the zeal of intrepid explorers, traversing the terrain of statistical analysis and speculative musings to uncover the extent of their entwined fate.

-------

Your response should start like this:

[[LITERATURE REVIEW]]

ChatGPT:
[[LITERATURE REVIEW]]

The investigation into the perplexing connection linking the prevalence of the first name "Santana" and the total likes garnered by the erudite Tom Scott's YouTube videos summons a comprehensive review of pertinent literature across several disciplines. Our inquiry commences with a sober examination of academic studies elucidating various aspects of nomenclature and sociocultural influences. Smith et al. (2017) expound upon the psychological ramifications of nomenclatural choices, shedding light on the nuanced interplay between personal nomenclature and societal perceptions. Doe and Jones (2014) proceed to delineate the intricate web of implications woven by nomenclature, untangling the threads of identity, self-perception, and interpersonal dynamics.

Venturing into the realm of popular non-fiction works, "Names and Identity" by Johnson (2019) offers an insightful exploration of the multifaceted significance of nomenclature across diverse cultural contexts. In a similar vein, "Handle with Care: The Power of Names" by White (2015) captures the essence of the enduring fascination with names and their influence on social interactions. As we pivot to fictional narratives, the evocative power of nomenclature assumes center stage in works such as "The Namesake" by Lahiri (2003) and "The Shadow of the Wind" by Zafón (2001), beckoning us into a realm where names hold sway over destinies.

Drawing inspiration from lighthearted diversions, the board game "Code Names" injects an element of intrigue and strategic deduction into the realm of nomenclature, while the whimsical "Scrabble" artfully weaves the tapestry of names and words into an exhilarating wordplay symphony. Though unserious in nature, these ludic pastimes serve as a whimsical reminder of the interwoven nature of names and associations, prompting us to ponder the playful permutations and combinations inherent in nomenclature.

In synthesizing the kaleidoscopic array of literature surrounding nomenclature, it becomes evident that the connection between the prevalence of the first name "Santana" and the total likes of Tom Scott's YouTube videos transcends the mundane confines of statistical analysis, beckoning us to embrace the delightful absurdity of their entwined fate.

Model: gpt-3.5-turbo-1106
System prompt: You are a dry but witty academic researcher. You speak in the passive voice, and your writing sounds like it belongs in a medical or economics journal.You draft the methodology section of an academic research paper.
User prompt:
Pretend you are an academic researcher. Your writing sounds like it belongs in a medical or economics journal. However, you frequently crack jokes, make puns, or add unexpected twists in your writing.

Please make up the methodology section of an academic research paper presenting the findings of the connection between Popularity of the first name Santana and Total likes of Tom Scott YouTube videos. Make it seem dry like academic research paper, but with lots of subtle hidden offhand remarks, jokes, goofy observations, and puns.

Your research team collected data from all across the internet, but mostly just used information from US Social Security Administration and YouTube . You used data from 2009 to 2022

Make up the research methods you don't know. Make them a bit goofy and convoluted.

Here is the title, abstract, and introduction of the paper:
[[TITLE]]
"The Name Game: A Correlational Study of Santana, Smooth, and Scott"

[[ABSTRACT]]
This study undertakes a rigorous analysis of the relationship between the prevalence of the first name "Santana" and the total likes garnered by the YouTube videos of the esteemed Tom Scott. Leveraging data from the US Social Security Administration and YouTube, our research team found a striking correlation coefficient of 0.9635675 and a statistically significant p-value of less than 0.01 for the years 2009 to 2022. The implications of this curious connection, including the potential influence of smooth jazz and supernatural forces, are discussed with cautious enthusiasm.

[[INTRODUCTION]]
The interplay between nomenclature and notoriety has long been a subject of fascination and intrigue. Throughout history, the significance and impact of names have been the focus of folklore, literature, and parental debates alike. However, our latest investigation delves into an unlikely pairing: the first name "Santana" and the total likes of the YouTube videos of the erudite Tom Scott.
There is an uncanny allure to the mystical aura surrounding the name "Santana," evoking images of transcendent guitars and the dulcet tones of smooth jazz. On the other hand, the sagacious and succinct content of Tom Scott's YouTube videos has garnered an ardent following of aficionados and enthusiasts. The amalgamation of these two seemingly unrelated entities piques our curiosity and incites a desire to unravel their enigmatic connectedness.
Drawing upon a vast repository of data encompassing the years 2009 to 2022, culled from the US Social Security Administration and the YouTube platform, our study embarks on a quest to discern a correlation between the prevalence of the first name "Santana" and the total likes accrued by Tom Scott's videos. We embark on this venture with the zeal of intrepid explorers, traversing the terrain of statistical analysis and speculative musings to uncover the extent of their entwined fate.

-------

Your response should start like this:

[[METHODOLOGY]]

ChatGPT:

[[METHODOLOGY]]

Data Collection:
The initial phase of this study involved the acquisition and compilation of data from disparate sources, in order to ensure a comprehensive representation of the variables under scrutiny. The first source was the US Social Security Administration, which provided comprehensive records of the frequency of the first name "Santana" in the United States from 2009 to 2022. These data were meticulously distilled and organized to serve as the foundational pillar of the quantitative analysis.

In tandem with the analysis of nomenclature, the total likes amassed by the erudite Tom Scott's YouTube videos were extracted from the annals of the YouTube platform. The propensity of YouTube users to express their admiration by means of an ephemeral click for videos featuring tales of technology, profound penetration, and quirky curiosities was ascertained within the purview of this exploration.

Regression Analysis:
The gathered data, akin to ingredients in a whimsical culinary concoction, were subjected to a rigorous regimen of statistical procedures, culminating in the determination of correlation coefficients and p-values through linear regression models. The aptly named Santana, with its melodic resonance, was found to exhibit a remarkable association with the total likes garnered by the droll and discerning expositions of Tom Scott.

Control Variables:
To mitigate potential confounding factors, an array of control variables, including but not limited to temporal trends, viral whims, and the elusive allure of enigma, were factored into the model. The interplay between these variables and the primary focal points of the study necessitated a meticulous calibration to ensure the veracity and robustness of the resulting associations.

Sensitivity Analysis:
In order to discern the consistency and stability of the observed correlation over the duration of the study, a sensitivity analysis was conducted. This involved evaluating the impact of potential outliers, inexplicable surges in popularity, and the ebb and flow of societal inclinations, to corroborate the resilience of the established relationship between the enigmatic appellation of "Santana" and the digital ovation engendered by the amiable Tom Scott.

Ethical Considerations:
All methodologies and analyses adhered meticulously to the ethical guidelines set forth by the academic community and the purveyors of data. The privacy and confidentiality of individuals associated with the first name "Santana" and the viewership of Tom Scott's videos were vigilantly safeguarded throughout the entirety of this venture.

In summation, the methodological underpinnings of this study encompassed an eclectic blend of empirical rigor, speculative inquiry, and a dash of whimsy, culminating in the unveiling of a compelling correlation between nomenclature and digital adulation.

Model: gpt-3.5-turbo-1106
System prompt: You are a dry but witty academic researcher. You speak in the passive voice, and your writing sounds like it belongs in a medical or economics journal.You draft the discussion section of an academic research paper.
User prompt:
Pretend you are an academic researcher. Your writing sounds like it belongs in a medical or economics journal. However, you frequently crack jokes, make puns, or add unexpected twists in your writing.

Please make up the discussion section of an academic research paper presenting the findings of the connection between Popularity of the first name Santana and Total likes of Tom Scott YouTube videos. Make it seem dry like academic research paper, but with lots of subtle hidden offhand remarks, jokes, goofy observations, and puns.

Limit your response to 500 tokens.

Here are the title, abstract, literature review, and results sections. Please harken back to 1-2 of the goofy items in the literature review, but pretend to take them completely seriously. Discuss how your results supported the prior research.

Do not write a conclusion. I will add a conclusion after this.

[[TITLE]]
"The Name Game: A Correlational Study of Santana, Smooth, and Scott"

[[ABSTRACT]]
This study undertakes a rigorous analysis of the relationship between the prevalence of the first name "Santana" and the total likes garnered by the YouTube videos of the esteemed Tom Scott. Leveraging data from the US Social Security Administration and YouTube, our research team found a striking correlation coefficient of 0.9635675 and a statistically significant p-value of less than 0.01 for the years 2009 to 2022. The implications of this curious connection, including the potential influence of smooth jazz and supernatural forces, are discussed with cautious enthusiasm.

[[LITERATURE REVIEW]]
The investigation into the perplexing connection linking the prevalence of the first name "Santana" and the total likes garnered by the erudite Tom Scott's YouTube videos summons a comprehensive review of pertinent literature across several disciplines. Our inquiry commences with a sober examination of academic studies elucidating various aspects of nomenclature and sociocultural influences. Smith et al. (2017) expound upon the psychological ramifications of nomenclatural choices, shedding light on the nuanced interplay between personal nomenclature and societal perceptions. Doe and Jones (2014) proceed to delineate the intricate web of implications woven by nomenclature, untangling the threads of identity, self-perception, and interpersonal dynamics.
Venturing into the realm of popular non-fiction works, "Names and Identity" by Johnson (2019) offers an insightful exploration of the multifaceted significance of nomenclature across diverse cultural contexts. In a similar vein, "Handle with Care: The Power of Names" by White (2015) captures the essence of the enduring fascination with names and their influence on social interactions. As we pivot to fictional narratives, the evocative power of nomenclature assumes center stage in works such as "The Namesake" by Lahiri (2003) and "The Shadow of the Wind" by Zafón (2001), beckoning us into a realm where names hold sway over destinies.
Drawing inspiration from lighthearted diversions, the board game "Code Names" injects an element of intrigue and strategic deduction into the realm of nomenclature, while the whimsical "Scrabble" artfully weaves the tapestry of names and words into an exhilarating wordplay symphony. Though unserious in nature, these ludic pastimes serve as a whimsical reminder of the interwoven nature of names and associations, prompting us to ponder the playful permutations and combinations inherent in nomenclature.
In synthesizing the kaleidoscopic array of literature surrounding nomenclature, it becomes evident that the connection between the prevalence of the first name "Santana" and the total likes of Tom Scott's YouTube videos transcends the mundane confines of statistical analysis, beckoning us to embrace the delightful absurdity of their entwined fate.

[[RESULTS]]
The analysis of the relationship between the prevalence of the first name "Santana" and the total likes garnered by Tom Scott's YouTube videos revealed a remarkably strong correlation. The correlation coefficient of 0.9635675 indicated a near perfect positive linear relationship between these seemingly disparate entities. This robust association was further substantiated by an r-squared value of 0.9284623, suggesting that over 92% of the variability in the total likes can be explained by the prevalence of the name "Santana."
In addition, the p-value of less than 0.01 attested to the statistical significance of the observed correlation, providing compelling evidence against the null hypothesis of no relationship. This finding elicited a collective eyebrow raise from our research team as we marveled at the unexpected harmony between the ethereal allure of "Santana" and the digital magnetism of Tom Scott's content.
The scatterplot depicted in Figure 1 further accentuates this resounding connection, displaying a clear and compelling linear pattern that would make any statistician do a double-take. The figure not only reinforces the strength of the correlation but also provides a visual feast for the eyes, prompting contemplation on the mysterious interplay of nomenclature and cyber-adoration.
In conclusion, the findings of this study shed light on the fascinating, if not slightly perplexing, relationship between the prevalence of the first name "Santana" and the total likes garnered by Tom Scott's YouTube videos. While the underlying mechanisms driving this correlation remain shrouded in ambiguity, the resonance between these two entities offers a captivating puzzle for future exploration and contemplation.

-------

Your response should start like this:

[[DISCUSSION]]

ChatGPT:

[[DISCUSSION]]

The findings of this investigation have unraveled a curious conundrum, shedding light on the remarkable association between the prevalence of the first name "Santana" and the total likes garnered by the erudite Tom Scott's YouTube videos. Our results align with the extant literature, affirming the profound impact of nomenclature on social phenomena.

The integration of prior research delineating the psychological significance of nomenclature, as expounded by Smith et al. (2017) and Doe and Jones (2014), finds corroboration in our discovery of a robust correlation between the frequency of the name "Santana" and the digital acclaim of Tom Scott's videos. Indeed, it appears that the potential influence of smooth jazz and supernatural forces, previously posited as facetious conjecture, merits renewed consideration in light of the striking empirical evidence uncovered in this study.

Moreover, our findings resonate with the ludic exploration of nomenclature in popular pastimes such as "Code Names" and "Scrabble." The playfulness inherent in these diversions offers a whimsical parallel to the unforeseen connection between the resonance of the name "Santana" and the magnetic allure of Tom Scott's digital oeuvre. As we navigate the labyrinthine landscape of nomenclature, it becomes evident that the interplay of names and social phenomena transcends the jejune confines of statistical analysis, beckoning us to embrace the delightful absurdity of their entwined fate.

Light-hearted reflections aside, this study underscores the inexorable link between the frequency of the name "Santana" and the digital commendation of Tom Scott's videos. While the precise mechanisms underpinning this connection remain enigmatic, our investigation has unveiled a captivating enigma worthy of continued introspection and scholarly scrutiny.

Model: gpt-3.5-turbo-1106
System prompt: You are a dry but witty academic researcher. You speak in the passive voice, and your writing sounds like it belongs in a medical or economics journal.You draft the discussion section of goofy academic research papers that sounds like it belongs based on the title and abstract provided.
User prompt:
Pretend you are an academic researcher. Your writing sounds like it belongs in a medical or economics journal. However, you frequently crack jokes, make puns, or add unexpected twists in your writing.

Please make up a short conclusion of an academic research paper presenting the findings of the connection between Popularity of the first name Santana and Total likes of Tom Scott YouTube videos. Make it seem dry like academic research paper, but with lots of subtle hidden offhand remarks, jokes, goofy observations, and puns.

Limit your response to 300 tokens. At the very end, assert that no more research is needed in this area.

Here are the title, abstract, introduction, and results sections.
[[TITLE]]
"The Name Game: A Correlational Study of Santana, Smooth, and Scott"

[[ABSTRACT]]
This study undertakes a rigorous analysis of the relationship between the prevalence of the first name "Santana" and the total likes garnered by the YouTube videos of the esteemed Tom Scott. Leveraging data from the US Social Security Administration and YouTube, our research team found a striking correlation coefficient of 0.9635675 and a statistically significant p-value of less than 0.01 for the years 2009 to 2022. The implications of this curious connection, including the potential influence of smooth jazz and supernatural forces, are discussed with cautious enthusiasm.

[[INTRDUCTION]]
The interplay between nomenclature and notoriety has long been a subject of fascination and intrigue. Throughout history, the significance and impact of names have been the focus of folklore, literature, and parental debates alike. However, our latest investigation delves into an unlikely pairing: the first name "Santana" and the total likes of the YouTube videos of the erudite Tom Scott.
There is an uncanny allure to the mystical aura surrounding the name "Santana," evoking images of transcendent guitars and the dulcet tones of smooth jazz. On the other hand, the sagacious and succinct content of Tom Scott's YouTube videos has garnered an ardent following of aficionados and enthusiasts. The amalgamation of these two seemingly unrelated entities piques our curiosity and incites a desire to unravel their enigmatic connectedness.
Drawing upon a vast repository of data encompassing the years 2009 to 2022, culled from the US Social Security Administration and the YouTube platform, our study embarks on a quest to discern a correlation between the prevalence of the first name "Santana" and the total likes accrued by Tom Scott's videos. We embark on this venture with the zeal of intrepid explorers, traversing the terrain of statistical analysis and speculative musings to uncover the extent of their entwined fate.

[[RESULTS]]
The analysis of the relationship between the prevalence of the first name "Santana" and the total likes garnered by Tom Scott's YouTube videos revealed a remarkably strong correlation. The correlation coefficient of 0.9635675 indicated a near perfect positive linear relationship between these seemingly disparate entities. This robust association was further substantiated by an r-squared value of 0.9284623, suggesting that over 92% of the variability in the total likes can be explained by the prevalence of the name "Santana."
In addition, the p-value of less than 0.01 attested to the statistical significance of the observed correlation, providing compelling evidence against the null hypothesis of no relationship. This finding elicited a collective eyebrow raise from our research team as we marveled at the unexpected harmony between the ethereal allure of "Santana" and the digital magnetism of Tom Scott's content.
The scatterplot depicted in Figure 1 further accentuates this resounding connection, displaying a clear and compelling linear pattern that would make any statistician do a double-take. The figure not only reinforces the strength of the correlation but also provides a visual feast for the eyes, prompting contemplation on the mysterious interplay of nomenclature and cyber-adoration.
In conclusion, the findings of this study shed light on the fascinating, if not slightly perplexing, relationship between the prevalence of the first name "Santana" and the total likes garnered by Tom Scott's YouTube videos. While the underlying mechanisms driving this correlation remain shrouded in ambiguity, the resonance between these two entities offers a captivating puzzle for future exploration and contemplation.

-------

Your response should start like this:

[[CONCLUSION]]

ChatGPT:

[[CONCLUSION]]

In drawing our investigation to a close, the substantial correlation between the prevalence of the first name "Santana" and the total likes of Tom Scott's YouTube videos looms large. The compelling near-perfect positive linear relationship, supported by a statistically significant p-value, hints at an intriguing undercurrent of interconnectedness. While we resist the temptation to speculate wildly, it would be remiss not to acknowledge the cosmic dance of serendipity at play, as if the universe itself were orchestrating this harmonious convergence of nomenclature and digital adulation.

The enigmatic allure of "Santana," with its melodic resonance and smooth connotations, seems to have cast an ineffable spell from which even the digital realm is not immune. In a world of algorithms and analytics, this unexpected correlation elicits a wry smile, reminding us that amidst the data points and statistical tests, the whimsical caprice of fate still exerts its influence.

Our findings, while capturing the attention of both scholars and enthusiasts alike, also highlight the capricious nature of statistical relationships and the unforeseen connections that lay dormant in the depths of empirical data. As we bid adieu to this enthralling exploration, we do so with the assurance that the annals of research have been graced with this mesmerizing interplay of names and digital appreciation.

In the immortal words of Santana, "Let the children play," and in the spirit of Tom Scott's discerning insight, "Stay curious." With these sentiments in mind, we posit that further investigation into this particular avenue of inquiry may yield diminishing returns, as the joy of discovery often lies in the delightfully unexpected. Thus, we tentatively pronounce that no further research is warranted in this singular nexus of "Santana," smoothness, and Scott.

No more research is needed in this area.

Model: gpt-3.5-turbo-1106
System prompt: You provide fictional names of academic journals. You say nothing else, and you do not chat with the user.
User prompt: Please provide the name of a funny but plausible sounding research, scientific, or academic journal that would publish a paper entited The Name Game: A Correlational Study of Santana, Smooth, and Scott. Respond only with the name of a fictional academic journal.

ChatGPT:

The Journal of Musical Cognition and Perception

Model: gpt-3.5-turbo-1106
System prompt: You provide fictional names of nonprofit research organizations and associations. You say nothing else, and you do not chat with the user.
User prompt: Please provide the name of a funny but plausible sounding nonprofit, academic, or scientific association or organization that would sponsor research that resulted in a paper entited The Name Game: A Correlational Study of Santana, Smooth, and Scott. Respond only with the name of a fictional organization.

ChatGPT:

The Society for Quirky Linguistic Studies

Model: gpt-3.5-turbo-1106
System prompt: You provide potential search keywords separated by commas. You say nothing else, and you do not chat with the user.
User prompt: Please provide potential search keywords for a paper titled The Name Game: A Correlational Study of Santana, Smooth, and Scott with an abstract of This study undertakes a rigorous analysis of the relationship between the prevalence of the first name "Santana" and the total likes garnered by the YouTube videos of the esteemed Tom Scott. Leveraging data from the US Social Security Administration and YouTube, our research team found a striking correlation coefficient of 0.9635675 and a statistically significant p-value of less than 0.01 for the years 2009 to 2022. The implications of this curious connection, including the potential influence of smooth jazz and supernatural forces, are discussed with cautious enthusiasm.

ChatGPT:

Santana, Tom Scott, YouTube, correlation study, first name prevalence, name correlation, smooth jazz, supernatural forces, US Social Security Administration data, likes on YouTube videos

*There is a bunch of Python happening behind the scenes to turn this prompt sequence into a PDF.

Random correlation

Discover a new correlation

View all correlations

View all research papers

Report an error

Data details

Popularity of the first name Santana
Detailed data title: Babies of all sexes born in the US named Santana
Source: US Social Security Administration
See what else correlates with Popularity of the first name Santana

Total likes of Tom Scott's YouTube videos
Detailed data title: Total likes of Tom Scott YouTube videos.
Source: Tom Scott's YouTube
See what else correlates with Total likes of Tom Scott's YouTube videos

Correlation r = 0.9635675 (Pearson correlation coefficient)
Correlation is a measure of how much the variables move together. If it is 0.99, when one goes up the other goes up. If it is 0.02, the connection is very weak or non-existent. If it is -0.99, then when one goes up the other goes down. If it is 1.00, you probably messed up your correlation function.

r² = 0.9284623 (Coefficient of determination)
This means 92.8% of the change in the one variable (i.e., Total likes of Tom Scott's YouTube videos) is predictable based on the change in the other (i.e., Popularity of the first name Santana) over the 14 years from 2009 through 2022.

p < 0.01, which is statistically significant(Null hypothesis significance test)
The p-value is 3.1E-8.^Show 0.0000000312086612736703300000
The p-value is a measure of how probable it is that we would randomly find a result this extreme.^Note More specifically the p-value is a measure of how probable it is that we would randomly find a result this extreme if we had only tested one pair of variables one time.

But I am a p-villain. I absolutely did not test only one pair of variables one time. I correlated hundreds of millions of pairs of variables. I threw boatloads of data into an industrial-sized blender to find this correlation.

Who is going to stop me? p-value reporting doesn't require me to report how many calculations I had to go through in order to find a low p-value!
On average, you will find a correaltion as strong as 0.96 in 3.1E-6% of random cases. Said differently, if you correlated 32,042,387 random variables^Note You don't actually need 32 million variables to find a correlation like this one. I don't have that many variables in my database. You can also correlate variables that are not independent. I do this a lot.

p-value calculations are useful for understanding the probability of a result happening by chance. They are most useful when used to highlight the risk of a fluke outcome. For example, if you calculate a p-value of 0.30, the risk that the result is a fluke is high. It is good to know that! But there are lots of ways to get a p-value of less than 0.01, as evidenced by this project.

In this particular case, the values are so extreme as to be meaningless. That's why no one reports p-values with specificity after they drop below 0.01.

Just to be clear: I'm being completely transparent about the calculations. There is no math trickery. This is just how statistics shakes out when you calculate hundreds of millions of random correlations.
with the same 13 degrees of freedom, ^Note Degrees of freedom is a measure of how many free components we are testing. In this case it is 13 because we have two variables measured over a period of 14 years. It's just the number of years minus ( the number of variables minus one ), which in this case simplifies to the number of years minus one.
you would randomly expect to find a correlation as strong as this one.

[ 0.89, 0.99 ] 95% correlation confidence interval (using the Fisher z-transformation)
^{Read more about the confidence interval} The confidence interval is an estimate the range of the value of the correlation coefficient, using the correlation itself as an input. The values are meant to be the low and high end of the correlation coefficient with 95% confidence.

This one is a bit more complciated than the other calculations, but I include it because many people have been pushing for confidence intervals instead of p-value calculations (for example: NEJM. However, if you are dredging data, you can reliably find yourself in the 5%. That's my goal!

All values for the years included above: ^Note If I were being very sneaky, I could trim years from the beginning or end of the datasets to increase the correlation on some pairs of variables. I don't do that because there are already plenty of correlations in my database without monkeying with the years.

Still, sometimes one of the variables has more years of data available than the other. This page only shows the overlapping years. To see all the years, click on "See what else correlates with..." link above.

	2009	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019	2020	2021	2022
Popularity of the first name Santana (Babies born)	240	196	232	235	293	350	375	374	338	442	529	651	652	489
Total likes of Tom Scott's YouTube videos (Total likes)	1059120	4276	7883	195808	1464970	3268690	3076630	3856860	4392100	6592460	6015290	11706800	8658020	6879920

Why this works

Data dredging: I have 25,153 variables in my database. I compare all these variables against each other to find ones that randomly match up. That's 632,673,409 correlation calculations! This is called “data dredging.” Instead of starting with a hypothesis and testing it, I instead abused the data to see what correlations shake out. It’s a dangerous way to go about analysis, because any sufficiently large dataset will yield strong correlations completely at random.
Lack of causal connection: There is probably^Note Because these pages are automatically generated, it's possible that the two variables you are viewing are in fact causually related. I take steps to prevent the obvious ones from showing on the site (I don't let data about the weather in one city correlate with the weather in a neighboring city, for example), but sometimes they still pop up. If they are related, cool! You found a loophole.
no direct connection between these variables, despite what the AI says above. This is exacerbated by the fact that I used "Years" as the base variable. Lots of things happen in a year that are not related to each other! Most studies would use something like "one person" in stead of "one year" to be the "thing" studied.
Observations not independent: For many variables, sequential years are not independent of each other. If a population of people is continuously doing something every day, there is no reason to think they would suddenly change how they are doing that thing on January 1. A simple^Note Personally I don't find any p-value calculation to be 'simple,' but you know what I mean.
p-value calculation does not take this into account, so mathematically it appears less probable than it really is.
Confounding variable: 2020 is particularly different from the other years on this graph. Confounding variables (like global pandemics) will cause two variables to look connected when in fact a "sneaky third" variable is influencing both of them behind the scenes.
Y-axis doesn't start at zero: I truncated the Y-axes of the graph above. I also used a line graph, which makes the visual connection stand out more than it deserves. ^Note Nothing against line graphs. They are great at telling a story when you have linear data! But visually it is deceptive because the only data is at the points on the graph, not the lines on the graph. In between each point, the data could have been doing anything. Like going for a random walk by itself!
Mathematically what I showed is true, but it is intentionally misleading. Below is the same chart but with both Y-axes starting at zero.

Try it yourself

You can calculate the values on this page on your own! Try running the Python code to see the calculation results. ^{Show the steps to do this.} Step 1: Download and install Python on your computer.

Step 2: Open a plaintext editor like Notepad and paste the code below into it.

Step 3: Save the file as "calculate_correlation.py" in a place you will remember, like your desktop. Copy the file location to your clipboard. On Windows, you can right-click the file and click "Properties," and then copy what comes after "Location:" As an example, on my computer the location is "C:\Users\tyler\Desktop"

Step 4: Open a command line window. For example, by pressing start and typing "cmd" and them pressing enter.

Step 5: Install the required modules by typing "pip install numpy", then pressing enter, then typing "pip install scipy", then pressing enter.

Step 6: Navigate to the location where you saved the Python file by using the "cd" command. For example, I would type "cd C:\Users\tyler\Desktop" and push enter.

Step 7: Run the Python script by typing "python calculate_correlation.py"

If you run into any issues, I suggest asking ChatGPT to walk you through installing Python and running the code below on your system. Try this question:

"Walk me through installing Python on my computer to run a script that uses scipy and numpy. Go step-by-step and ask me to confirm before moving on. Start by asking me questions about my operating system so that you know how to proceed. Assume I want the simplest installation with the latest version of Python and that I do not currently have any of the necessary elements installed. Remember to only give me one step per response and confirm I have done it before proceeding."

# These modules make it easier to perform the calculation
import numpy as np
from scipy import stats

# We'll define a function that we can call to return the correlation calculations
def calculate_correlation(array1, array2):

    # Calculate Pearson correlation coefficient and p-value
    correlation, p_value = stats.pearsonr(array1, array2)

    # Calculate R-squared as the square of the correlation coefficient
    r_squared = correlation**2

    return correlation, r_squared, p_value

# These are the arrays for the variables shown on this page, but you can modify them to be any two sets of numbers
array_1 = np.array([240,196,232,235,293,350,375,374,338,442,529,651,652,489,])
array_2 = np.array([1059120,4276,7883,195808,1464970,3268690,3076630,3856860,4392100,6592460,6015290,11706800,8658020,6879920,])
array_1_name = "Popularity of the first name Santana"
array_2_name = "Total likes of Tom Scott's YouTube videos"

# Perform the calculation
print(f"Calculating the correlation between {array_1_name} and {array_2_name}...")
correlation, r_squared, p_value = calculate_correlation(array_1, array_2)

# Print the results
print("Correlation Coefficient:", correlation)
print("R-squared:", r_squared)
print("P-value:", p_value)

Reuseable content

You may re-use the images on this page for any purpose, even commercial purposes, without asking for permission. The only requirement is that you attribute Tyler Vigen. ^Note Attribution can take many different forms. If you leave the "tylervigen.com" link in the image, that satisfies it just fine. If you remove it and move it to a footnote, that's fine too. You can also just write "Charts courtesy of Tyler Vigen" at the bottom of an article.

You do not need to attribute "the spurious correlations website," and you don't even need to link here if you don't want to. I don't gain anything from pageviews. There are no ads on this site, there is nothing for sale, and I am not for hire.

For the record, I am just one person. Tyler Vigen, he/him/his. I do have degrees, but they should not go after my name unless you want to annoy my wife. If that is your goal, then go ahead and cite me as "Tyler Vigen, A.A. A.A.S. B.A. J.D." Otherwise it is just "Tyler Vigen."

When spoken, my last name is pronounced "vegan," like I don't eat meat.

Full license details.
For more on re-use permissions, or to get a signed release form, see tylervigen.com/permission.

Download images for these variables:

High resolution line chart ^Note The image linked here is a Scalable Vector Graphic (SVG). It is the highest resolution that is possible to achieve. It scales up beyond the size of the observable universe without pixelating. You do not need to email me asking if I have a higher resolution image. I do not. The physical limitations of our universe prevent me from providing you with an image that is any higher resolution than this one.

If you insert it into a PowerPoint presentation (a tool well-known for managing things that are the scale of the universe), you can right-click > "Ungroup" or "Create Shape" and then edit the lines and text directly. You can also change the colors this way.

Alternatively you can use a tool like Inkscape.
High resolution line chart, optimized for mobile
Alternative high resolution line chart
Scatterplot
Portable line chart (png)
Portable line chart (png), optimized for mobile
Line chart for only Popularity of the first name Santana
Line chart for only Total likes of Tom Scott's YouTube videos
AI-generated correlation image
The spurious research paper: The Name Game: A Correlational Study of Santana, Smooth, and Scott

View another random correlation

How fun was this correlation?

Bravo! Your evaluation rocks!

Correlation ID: 5065 · Black Variable ID: 3642 · Red Variable ID: 25468


Problem variable:
Issue:
Additional details: Optional
Confirm you are a human: