Research Project

Welcome to the YOUPOL Database

A database of political influencers on YouTube and TikTok (2006–present), soon expanding to the anglophone world.

Over 69 channels tracked, thousands of transcribed videos, millions of comments and NLP annotations to analyze online political discourse. The database is continuously enriched by a distributed network of contributor machines. Join the network by deploying a worker on your machine, or contact us for access.

Live Corpus Stats
0
Videos in DB
0
Channels
0
Comments
0
Transcripts
0
Total Views

Continuous Observatory

The database is continuously updated: channel scanning, video transcription and annotation, comment extraction, metadata updates (views, likes, subscribers). Each scan produces a longitudinal history accessible via the API.

Last updated: 2026-06-03 00:43
Today
videos transcribed
comments extracted
Since January
videos transcribed
comments extracted
videos detected
metadata updated
channels scanned
The Project

Understanding Political YouTube & TikTok
Through Its Content

Previous research on political YouTube and TikTok was limited to metadata (titles, tags, view counts). YOUPOL goes further by analyzing what creators actually say.

By transcribing and annotating over 30,100 videos from more than 69 political channels (2006 to today), we built the first database enabling computational analysis of political discourse at the content level — from far-right ideology to scientific rhetoric, from hate speech to audience engagement.

28,121 Videos transcribed
9.6M+ Comments extracted
69 Channels analyzed
20 Years of coverage
01

Collect & Transcribe

Videos and 9M+ comments are scraped, audio is preprocessed with Demucs, then transcribed with Whisper and speaker-diarized with pyannote.audio.

yt-dlpWhisperpyannote
02

Annotate & Classify

NLP classifiers detect far-right ideology, hate speech, scientific rhetoric, and political orientation at the sentence level. Annotation powered by LLM_Tool (technical paper).

TransformersNERLLM annotation
03

Analyze & Visualize

Entity networks, co-occurrence graphs, OLS regressions, and temporal trends reveal the evolution of political discourse across two decades.

NetworkXOLSECharts
Corpus Composition

Across the Political Spectrum

Initial seed of 69 channels spanning the entire francophone political spectrum (France and Quebec), categorized by political orientation, soon expanding to the anglophone world. The corpus deliberately oversamples far-right content to enable fine-grained analysis of radical discourse.

17,054 Far Right (FR) 57%
5,941 Left (FR) 20%
4,880 Far Right (QC) 16%
1,811 Masculinist 6%
414 Conspiracy (QC) 1%

Growth of Political YouTube & TikTok (2006–today)

Number of videos published per year, by political orientation

Political Content Detection Over Time

Monthly share of content classified as political — transcriptions (sentences) and comments

Sentences: CamemBERTv2 classifier (FR) · comments: XLM-RoBERTa classifier (multilingual) — trained on LLM annotations validated by human coders
Technical Paper
Contact the Team
Have a question about the data, the API, or the project? Send us a message.
Suggest a Channel or Feature
Help us improve the YOUPOL corpus. Suggest a political YouTube or TikTok channel we should track, or a feature you'd like to see.