Step 1 of 7

Channel Selection

Francophone seed (France and Quebec) set to expand to the anglophone world

                
                Last updated: 2026-07-20 19:15
            

71 Channels tracked

4 Political orientations

2 Countries covered

Input YouTube & TikTok ecosystem

Process Identification by political content + audience metrics

Output Classified channels

YouTube and TikTok channel links are collected by specifically identifying channels recognized for their role in the ecosystem of political content creators. The current seed covers the francophone ecosystems (France and Quebec); the same procedure will be used to build the anglophone seed. Audience metrics (number of views, subscribers) serve as criteria to assess each channel's reach and influence.

Each channel is classified along two dimensions: political orientation (far right, left, manosphere, conspiracy) and country of origin. The corpus covers the full political spectrum, from niche creators to major political influencers, enabling comparative analysis across the ecosystems included. The same classification scheme will be carried over to anglophone channels added during the corpus expansion.

Click each card above to expand details

The first step of the pipeline consists of collecting videos from YouTube and TikTok channel links specifically identified for their role in the ecosystem of political content creators. The current corpus is a francophone seed: over 60 channels in France and Quebec covering the entire political spectrum, from the far left to the far right (Finlayson, 2022; Riedl et al., 2021). This seed is set to expand to the anglophone world, placing YOUPOL in an international, comparative perspective.

The channel selection relies on audience metrics such as the number of views and subscribers, making it possible to assess each channel's role and influence in the ecosystem. The selected channels are classified by political orientation (far right, left, manosphere, conspiracy) and country of origin. The resulting corpus is distinctive due to its scale, granularity (including speaker diarization), and especially its capacity to enable longitudinal and computational analysis of video content, where previous studies focused only on titles (Boursier, 2022, 2024). The methodology is replicable and will be applied as-is to the anglophone expansion.

Python

YouTube Data API

TikTok API

#	Table	Description	Scale
1	videos	One row per video: ID, channel metadata, views, likes, comments, tags, duration, upload date, political orientation, country, gender.	26,775 rows
2	comments	All comments with author info, like counts, timestamps, nested reply structure, and JSONB analysis column.	9.9M+ rows
3	video_transcripts	Full diarized transcripts with speaker labels and cleaned text versions.	27,852 rows
4	transcription_speakers	Individual speaker segments from diarization, ordered by position within each video.	702,228 rows
5	comments_processed	Sentence-level tokenized comments with NER entities (PER, ORG, LOC) and ML prediction columns.	15.7M+ rows
6	transcription_speakers_processed	Sentence-level speaker segments with NER extraction and full annotation suite.	3.5M+ rows

Next Data Collection

All steps

Continuous Observatory

The database is continuously updated: channel scanning, video transcription and annotation, comment extraction, metadata updates (views, likes, subscribers). Each scan produces a longitudinal history accessible via the API.

Last updated: 2026-07-20 19:15

Today

videos transcribed

comments extracted

new comments

metadata updated

channels scanned

Since January

videos transcribed

comments extracted

videos detected

metadata updated

channels scanned

Channel Selection

Continuous Update Pipeline

How It Works

Identification of Political Channels

Classification by Orientation & Country

Tools Used

Database Schema

Continuous Observatory