The YOUPOL Pipeline
A seven-phase pipeline transforming raw YouTube and TikTok data into a research-ready database for computational political science.
YOUPOL (Lemor & Boursier, 2025) is a textual database compiling the transcripts of over 30,100 videos by political influencers on YouTube and TikTok since 2006, soon expanding to the anglophone world. The corpus specifically targets major political content creators spanning the entire political spectrum, from the far left to the far right. We continuously update the database: our 69 channels are regularly scanned to detect and collect new videos, transcriptions and comments.
The database is distinctive due to its scale (30,100+ videos, 9.6M+ comments), granularity (speaker diarization, sentence-level NLP annotation), and especially its capacity to longitudinally and computationally analyze video content, where previous studies focused only on titles or metadata.
Continuous Collection & Processing Pipeline
Click any step to explore its methodology, tools and key statistics.
Continuous Observatory
The database is continuously updated: channel scanning, video transcription and annotation, comment extraction, metadata updates (views, likes, subscribers). Each scan produces a longitudinal history accessible via the API.
Channel Selection
Initial seed of over 60 francophone political YouTube and TikTok channels in France and Quebec, soon expanding to the anglophone world.
Data Collection
Videos, comments and metadata continuously collected via an automated pipeline.
Vocal Separation
Source separation via Demucs to isolate voices from audio tracks.
Transcription & Diarization
Speaker-labeled transcripts produced via diarization and speech-to-text.
NLP & Annotation
CamemBERT classifiers detect ideology, hate speech and rhetoric at sentence level.
Database & Analysis
Normalized PostgreSQL schema powering NER, networks and regression analyses.
Continuous Observatory
Distributed collaborative infrastructure providing continuous collection and processing.
Originality & Scientific Interest
YOUPOL enables longitudinal analysis of political discourse on YouTube and TikTok at the content level while including all metadata for each video. The francophone seed (France and Quebec), soon expanding to the anglophone world, supports a wide variety of analyses on the dissemination of political ideas (particularly those of the far right) and their evolution over time and across the political spectrum.
Two ongoing research projects are tied to this database: (1) studying the determinants of hate comments based on video content and far-right discourse dissemination; and (2) examining the impact of scientific arguments on comments depending on the political orientation of the channels.
Key References
Boursier, T. (2022). White Supremacism on YouTube. In Temporalities of Diversity. Waxmann.
Boursier, T. (2024). La banalisation du supremacisme blanc sur YouTube. Politique et societes, 42(1).
Carter, E. (2018). Right-wing extremism/radicalism. Journal of Political Ideologies, 23(2).
Finlayson, A. (2022). YouTube and Political Ideologies. Political Studies, 70(1).
Riedl, M. et al. (2021). The Rise of Political Influencers. Frontiers in Communication, 6.
Stephan, G. (2024). Faire carriere dans les medias de reinformation. Politiques de communication, 22(1).
Voirol, O. & Martini, E. (2023). La fabrique discursive de la haine. Reseaux, 241(5).