Industry

How Spotify, Netflix, and YouTube Use AI to Keep You Watching

7 min read

Netflix estimates its recommendation engine saves the company $1 billion per year in reduced churn. That single number tells you everything about why the biggest tech platforms treat their AI recommendation systems as crown jewels – and why understanding how they work matters far beyond idle curiosity.

I’ve spent a good chunk of the past year digging into the published research, engineering blog posts, and patents from Spotify, Netflix, and YouTube. The systems are more sophisticated than most people realize, and more different from each other than you’d expect.

Spotify: The Sound of Machine Learning

Discover Weekly might be Spotify’s most beloved feature. Every Monday, 30 songs you’ve never heard that feel eerily tailored to your taste. When it launched in 2015, it drove a measurable spike in user retention. Internally, Spotify has described it as one of the most successful product features in the company’s history.

The system behind it uses a layered approach. The foundation is collaborative filtering – the same basic idea behind Amazon’s “customers who bought this also bought” recommendations, but significantly more sophisticated. Spotify analyzes listening patterns across its 600+ million users to find clusters of similar taste profiles. If users with similar listening histories to yours tend to discover and love a particular track, that track becomes a candidate for your Discover Weekly.

But Spotify goes further than pure collaborative filtering. They use natural language processing to analyze text written about music – blog posts, reviews, social media discussions, playlist titles and descriptions. This creates a semantic understanding of what a song sounds like and what kind of listener it appeals to, even for tracks that don’t have enough listening data for collaborative filtering to work well.

Then there’s the audio analysis layer. Spotify’s models analyze the raw audio signal of every track on the platform, extracting features like tempo, energy, danceability, acoustic qualities, and more abstract characteristics that don’t have clean human labels. This is particularly powerful for surfacing new artists with tiny listener bases – the so-called cold start problem. A brand-new song might not have enough listening data for collaborative filtering, and might not have been written about anywhere, but if its audio features resemble songs you love, it can still surface.

The final Discover Weekly playlist is assembled by a system that blends all three signal types, applies business rules (no repeats of recently played tracks, diversity of artists and genres, appropriate sequencing), and delivers a curated 30-track experience. The team has spoken publicly about how much work goes into making the playlist feel right, not just be accurate – the ordering, the emotional arc, the balance between familiar-adjacent and genuinely novel.

Netflix: Recommending What You Don’t Know You Want

Netflix’s recommendation system is arguably the most studied in the industry, partly because the company has been unusually open about its approaches and partly because of the famous Netflix Prize competition in 2006-2009 that put recommendation algorithms into the mainstream consciousness.

The modern system goes well beyond “people who watched X also watched Y.” Netflix personalizes nearly every aspect of what you see on screen, and the most fascinating part might be the artwork personalization.

When you browse Netflix, the thumbnail image shown for a given title is selected specifically for you. If you watch a lot of comedies, the algorithm might show you a frame from a drama that has a lighter, more comedic moment. If you watch a lot of films featuring a particular actor, it might show a frame that prominently features that actor, even if they’re a supporting character. Netflix runs continuous A/B tests on these image selections, and the results are striking – the right artwork can increase the engagement rate for a title by double-digit percentages.

The recommendation engine itself operates on multiple time horizons. There’s the immediate session model (what should we show right now based on what you just watched), the medium-term preference model (what genres and styles have you been gravitating toward this month), and the long-term taste profile (your deep, stable preferences). These different signals get weighted differently depending on context – time of day, device, whether you’re browsing alone or likely with a family.

What I find most interesting is how Netflix uses viewing data to inform content investment decisions. The company has stated publicly that recommendation data plays a role in greenlighting original productions. If the algorithm detects an underserved cluster of viewers – people whose taste profiles suggest they’d love a specific type of content that doesn’t yet exist on the platform – that becomes an argument for creating it. This closes the loop in a way that’s genuinely novel: the AI doesn’t just recommend existing content, it influences what content gets made.

YouTube: The Rabbit Hole Machine

YouTube’s recommendation system is in a category of its own, partly because of the sheer scale (500+ hours of video uploaded every minute) and partly because of the unique challenge of video content where engagement patterns are fundamentally different from music or movies.

The system has evolved dramatically over the past decade. Early YouTube recommendations were heavily based on view counts and click-through rates – basically, popularity contests. The modern system, detailed in several published Google Brain papers, uses deep neural networks operating in two stages: candidate generation (narrowing millions of videos to hundreds of plausible recommendations) and ranking (ordering those candidates by predicted engagement).

The critical shift happened around 2016 when YouTube moved from optimizing for clicks to optimizing for watch time. This seems like a small change, but it transformed the platform. Clickbait thumbnails and titles generate clicks but not watch time, so the algorithm learned to deprioritize them. Videos that held attention – even if fewer people clicked initially – started getting promoted. This drove a measurable increase in average session duration and shifted the content ecosystem toward longer, more substantive videos.

More recently, YouTube has layered in satisfaction signals beyond raw watch time: likes, shares, survey responses about satisfaction, and notably, whether viewers regret spending time on a video. This addresses the “rabbit hole” criticism – the concern that watch-time optimization leads people down increasingly extreme or low-quality content pathways. YouTube has published research showing that incorporating satisfaction metrics reduces the spread of borderline content that people watch but later say they wish they hadn’t.

The tension between engagement optimization and user wellbeing is most visible at YouTube because the content diversity is so extreme. Spotify might accidentally play you a song you don’t like; YouTube might surface conspiracy theories or harmful misinformation because those videos generate intense engagement metrics. The company has invested heavily in this problem – dedicated teams building classifiers for borderline content, external advisory boards, and algorithmic adjustments that explicitly trade engagement for responsibility. Whether they’ve gone far enough is a legitimate debate.

The Technical Foundation: How It All Works

For the technically curious, most modern recommendation systems combine several core approaches:

  • Collaborative filtering finds patterns in user behavior – people who liked A and B tend to like C. Matrix factorization and its neural network successors are the workhorses here.
  • Content-based filtering analyzes the attributes of items themselves – genre, audio features, visual characteristics, text descriptions – and matches them to user preference profiles.
  • Deep learning models, particularly transformer architectures, have increasingly replaced simpler methods because they can capture complex, non-linear relationships between user behavior sequences and item features.
  • Reinforcement learning for sequential recommendation treats the recommendation problem as a series of decisions over time. Rather than optimizing each recommendation independently, the system learns to optimize for long-term user satisfaction across an entire session or even across multiple visits. This is where the cutting edge research is focused, and it’s particularly relevant for platforms like YouTube where each recommendation shapes the context for the next one.

How Recommendations Drive Engagement

The outsized impact of recommendation algorithms on major platforms

🎵

Spotify

345M active users

80%

of all streams come from
recommendation algorithms

🎬

Netflix

230M subscribers

$1B

saved annually from
reduced subscriber churn

YouTube

2B+ active users

70%

of total watch time driven
by recommendations

What Smaller Companies Can Learn

Most companies don’t have Spotify’s data volume or Netflix’s engineering budget. But the principles translate remarkably well.

Start with collaborative filtering. Even a simple “users who did X also did Y” model, properly implemented, outperforms hand-curated recommendations in almost every domain. You don’t need deep learning to get meaningful results.

Invest in the feedback loop. The biggest gap between hobbyist recommendation systems and production ones isn’t model sophistication – it’s evaluation infrastructure. Netflix and Spotify run thousands of A/B tests annually on their recommendation systems. Build the ability to measure whether your recommendations actually improve user outcomes before you invest in fancier models.

Be deliberate about what you optimize for. This is the most important lesson. YouTube’s shift from clicks to watch time to satisfaction wasn’t just a technical change – it was a values decision that reshaped the entire platform. What metric you optimize your recommendations for will shape your product and your user community in ways that go far beyond the algorithm itself. Choose it carefully, measure its second-order effects, and be willing to change it when the consequences aren’t what you intended.

These systems are among the most consequential pieces of software ever deployed, quietly shaping the media diets of billions of people. Understanding how they work isn’t just technically interesting – it’s increasingly a form of digital literacy.

Share
EK
Contributing Writer
Tech industry analyst covering AI adoption across healthcare, finance, and enterprise. Previously at top management consulting firms. Writes about the business side of AI with a focus on what the numbers actually say.

Join the Discussion

Your email address will not be published.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.