Data Analysis
Explore patterns in political YouTube and TikTok content across a seed of 69 channels (France and Quebec, soon the anglophone world), 30,100 videos and 20 years of discourse (2006 to today).
Videos Over Time by Political Orientation
Yearly count of uploaded videos, stacked by orientation (2006 to today)
Total Views Over Time by Political Orientation
Cumulative views per year, broken down by orientation
Monthly Publication Activity
Total videos published per month across all orientations
Political content share over time
Monthly share of sentences and comments classified as political. Binning adapts to available volume.
Annotation protocol & validation
The training corpus is pre-annotated by a large language model (LLM). The LLM's reliability is first measured on a random sample that is also coded independently by two human researchers. The metrics below compare the LLM's predictions against the consensus of the two annotators (agreement > n/2); once validated, that LLM labels the rest of the corpus, on which the final classifiers are trained (CamemBERTv2 for sentences, XLM-RoBERTa for comments).
| Metric | Sentences | Comments |
|---|---|---|
| Macro F1 unweighted mean across classes | 92.2% | 93.5% |
| Weighted F1 weighted by each class support | 92.0% | 93.9% |
| Micro F1 global, across all decisions | 92.0% | 93.7% |
| Accuracy % of LLM decisions identical to the human consensus | 88.1% | 88.9% |
| Hamming loss % of mislabelled bits — lower is better | 7.7% | 6.0% |
What these numbers measure: how well the LLM reproduces the humans' agreement. The training corpus is therefore built from LLM labels whose reliability is publicly auditable. The CamemBERTv2 and XLM-RoBERTa classifiers are then trained on this LLM-annotated corpus and served here.
Tools & references Annotation and validation pipeline: LLM_Tool (technical paper, OSF).
Video Distribution by Political Orientation
Share of the corpus by ideological category
Videos by Country
France vs. Quebec distribution across orientations
Gender Distribution of Channel Creators
Channels classified by the gender of their primary creator(s)
Average Video Duration by Orientation
Mean duration in minutes per political category
Political intensity by orientation
Average share of political sentences and comments, broken down by the channel's editorial orientation.
Annotation protocol & validation
The training corpus is pre-annotated by a large language model (LLM). The LLM's reliability is first measured on a random sample that is also coded independently by two human researchers. The metrics below compare the LLM's predictions against the consensus of the two annotators (agreement > n/2); once validated, that LLM labels the rest of the corpus, on which the final classifiers are trained (CamemBERTv2 for sentences, XLM-RoBERTa for comments).
| Metric | Sentences | Comments |
|---|---|---|
| Macro F1 unweighted mean across classes | 92.2% | 93.5% |
| Weighted F1 weighted by each class support | 92.0% | 93.9% |
| Micro F1 global, across all decisions | 92.0% | 93.7% |
| Accuracy % of LLM decisions identical to the human consensus | 88.1% | 88.9% |
| Hamming loss % of mislabelled bits — lower is better | 7.7% | 6.0% |
What these numbers measure: how well the LLM reproduces the humans' agreement. The training corpus is therefore built from LLM labels whose reliability is publicly auditable. The CamemBERTv2 and XLM-RoBERTa classifiers are then trained on this LLM-annotated corpus and served here.
Tools & references Annotation and validation pipeline: LLM_Tool (technical paper, OSF).
Sentence Level Annotation
Every sentence in the corpus is first classified as political or non-political, then annotated within three research projects: gender analysis, technophile neo-reactionaries, and far-right ideas (SIED, building on Boursier & Lemor, 2025). Annotation powered by LLM_Tool (technical paper).
What Annotation Looks Like
Select a sentence from the corpus. Each classifier analyses it independently and assigns a label when the construct is detected.
Detected annotations
How It Works
Powered by LLM_Tool (technical paper), our open-source annotation framework.
Three Stage Process
A political detection codebook first filters out non-political content. Three additional codebooks define the target constructs for the research projects: gender analysis (gender, valence, rationality, science position), technophile neo-reactionaries (technology, libertarianism, fictional metaphors, equality, ecology) and SIED (nationalism, immigration, democracy, progress, authority, tradition, equality, ecology) following the framework developed in Boursier & Lemor (2025). The LLM receives each codebook as a system prompt and annotates batches of 500 sentences, producing labels and justifications.
For each classification task, a CamemBERT model is fine-tuned on the LLM-annotated data. Models are evaluated on a held-out validation set. If the F1 score is insufficient, the pipeline enters a reinforced loop: the most uncertain predictions are sent back to the LLM for re-annotation with stricter prompts, the training set is augmented, and the model is retrained. This loop typically converges within two to three rounds.
Validated classifiers run inference on the entire corpus. Each of the 20 million sentences receives a prediction and a confidence score for every applicable model. All results are stored in PostgreSQL, enabling queries such as "show all political sentences annotated immigration_security" or "compare gender valence across political orientations".
Concrete Example
Training a SIED classifier — Example: "nation_threat"
The SIED codebook, developed in Boursier & Lemor (2025), defines nation_threat as "the nation described as under threat (internal or external), requiring protection, defense or preservation." The LLM annotates a sample of sentences, labeling positives (e.g., "notre civilisation est en train de disparaître sous les coups de la mondialisation") and negatives. A CamemBERT classifier is fine-tuned on these labels. If performance is insufficient, the reinforced loop selects the most uncertain sentences, sends them back to the LLM for re-annotation, and the model is retrained. This process is applied to each sub-category across all three projects.
What You Will Be Able to Explore
Once the annotation pipeline is complete, this tab will feature interactive charts for each of the following analyses.
SIED: breakdown by political orientation
Compare SIED categories (nationalism, immigration, democracy, progress, authority, tradition, equality, ecology) across far-right, left, masculinist and conspiracy channels, extending Boursier & Lemor (2025).
Gender discourse analysis
Explore gender valence, rationality types and science positioning in political channel discourse over time.
Technophile neo-reactionaries
Analyze discourses on technology, libertarianism and fictional metaphors (red pill, Cathedral, etc.) across the political content in the corpus.
France vs. Quebec comparison
Compare how the same ideological constructs manifest differently in the two francophone political ecosystems.
All categories and subcategories
A first model classifies each sentence as political or non-political using a broad definition (current affairs, social issues, political actors, power relations, social norms). This filtering precedes the three annotation projects.
Political Detection
Binary classification of each sentence as political or non-political.
political_yes
The sentence refers to current affairs, social issues, political actors, power relations or social norms.
political_no
The sentence relates to private life, personal narrative or entertainment without collective scope.
Detection of gender discourse and multidimensional analysis: gender presence, valence (positive, negative, ambivalent), type of rationality invoked and positioning towards science.
Gender
Does the content address gender? Direct or indirect reference to men, women, masculinity, femininity, gender roles, feminism, antifeminism, male-female relations, LGBTQ+.
gender_yes
Gender discourse present
gender_no
No gender discourse
Gender Valence
Tone of the gender discourse.
genre_valence_positive
Promotes gender equality or challenges stereotypes
genre_valence_negative
Hostility, criticism or derogatory claims toward feminism or gender equality
genre_valence_ambivalent
Initially appears egalitarian but limits or relativizes equality
genre_valence_null
No evaluative stance toward gender
Rationality Type
Type of rationality mobilized in gender discourse.
rationality_none
No justificatory rationality
rationality_nature
Biological, natural, evolutionary or religious-natural arguments
rationality_liberal
Invokes formal equality or individual rights to deny structural domination
rationality_empirical
Statistics, data or "facts" as justification
rationality_heroic
Frames the claim as courageous truth-telling, anti-political correctness
Science Position
Positioning towards science in gender discourse.
science_none
No reference to science
science_pro_science
Values studies, experts or research
science_anti_science
Discredits academia or research
science_ambivalent
Both pro- and anti-science registers coexist
Measurement of neo-reactionary (NR) ideas centered on technological optimism, libertarianism and the use of fictional metaphors in political discourse, along with dimensions shared with the SIED (equality and ecology).
Technology
Technological optimism, technocracy and transhumanism.
techno_optimism_overall
Optimistic or positive view of the role of technology and innovation
innovation_as_progress
Technological innovation as a driver of progress or solution to social problems
pro_tech_figures
Favorable reference to tech figures (Musk, Thiel, Altman, Zuckerberg…)
technocracy_over_democracy
Technocratic or expert-led governance is more effective than democracy
deregulation_of_tech
Deregulation of technological innovation as necessary for progress
transhumanism
Support for transhumanism, post-humanism, eugenics or technological augmentation of humans
Libertarianism
Secession, individual autonomy, alternative communities and the corporate model as a political counter-model.
lib_sec
Support for secession or break from the national political community
lib_autonomy
Living autonomously, outside traditional state structures
lib_community
Creation of communities based on their own values and rules
lib_company
The corporate model as a political counter-model to the state or democracy
lib_state
The state should be run like a company, according to performance criteria
Fictional Metaphors
Use of metaphors from popular fiction to structure political interpretation.
metaphor_redpill
Reference to the "red pill," awakening to hidden truth, breaking free from egalitarian or democratic illusions
metaphor_lotr
References to Lord of the Rings to conceptualize social or civilizational hierarchies
metaphor_starwars
References to Star Wars to frame political struggle, authority or legitimacy
metaphor_cathedral
The Cathedral as a metaphor for universities, media or progressive institutions forming an ideological system
Equality SIED + NR
Relationship to equality, social and biological hierarchies.
equality_value
Equality as a threat to values, traditions or the social order
equality_identity
Equality as a threat to French identity or a factor of national dissolution
equality_gender
Inequalities between the sexes presented as natural or biologically grounded
hierarchy_castes
Society described in terms of castes or natural social hierarchies
hierarchy_IQ
IQ used as a criterion for ranking individuals or groups
hierarchy_race
Reference to natural inequalities between races or ethnic groups
equality_utopia
Equality described as unrealistic, naive or utopian
Ecology SIED + NR
Ecological positioning: eco-skepticism, techno-solutionism or civilizational ecology.
eco_eco
Economic growth is more important than environmental protection
eco_tech
Ecological concerns as obstacles to technological development
eco_civ
Climate challenges framed as a competition between civilizations
Far-right ideological score (SIED) developed in Boursier & Lemor (2025), Revue française de science politique. Measures the presence of far-right ideological affiliation categories (CAIED) — nationalism, immigration, democracy, progress, authority, tradition — as well as dimensions shared with the NR project (equality and ecology), through their respective sub-dimensions.
Nationalism
Constructions of the nation and national identity.
nation_ethnic
Nation as an ethnic or cultural community based on blood ties or common ancestors
nation_family
Nation associated with the family, citizens as children of the motherland
nation_state
Nation fused with the state as a single, inseparable entity
nation_vital
The nation as an essential and insurmountable element of human life
nation_threat
Nation described as under threat, requiring protection or defense
nation_colonialism
Colonial nostalgia or denial of the consequences of colonization
Immigration
Framing of immigration as a threat.
immigration_identity
Threat to national identity, culture or French/European values
immigration_security
Association with delinquency, crime or terrorism
immigration_women
Threat to women's rights or gender equality
immigration_law
Call for stricter immigration or asylum legislation
Democracy
Critical relationship to democracy as an ideal or political regime.
demo_value
Democracy as a threat to values, traditions or national identity
demo_sep
Challenging the separation of powers, strengthening the executive
demo_vain
Democracy described as inefficient, slow or unable to produce good decisions
demo_corrupt
Democracy as fundamentally corrupt or captured by special interests
demo_beyond
Call for revolting against or moving beyond democracy
demo_neg
Support for non-democratic regimes (authoritarianism, monarchy, technocracy)
Progress
Rejection of modernization, globalization and progressive change.
progress_identity
Progress as a threat to values, traditions or national identity
progress_stop
Call for slowing, limiting or stopping social progress or progressive reforms
progress_glob
Criticism of progress through globalization or the EU as destruction of identities
Authority
Obedience to authority, use of force and traditionalism.
authority_chief
Importance of a strong leader or providential figure to protect the nation
authority_essential
Political measure presented as essential, urgent to restore authority
authority_security
Importance of order and security, fighting delinquency
authority_army
Valorization of the army, police or law enforcement
Tradition
Defense of traditional values and the civilizational project.
tradition_value
French values, customs or identity to be preserved and promoted
tradition_threat
Tradition or traditional values under threat, requiring protection
tradition_family
Promotion of the traditional family model or criticism of family transformations
tradition_laicite
Secularism as a marker of national identity rather than a principle of neutrality
tradition_civilization
Tradition as a civilizational project to spread values considered superior
Equality SIED + NR
Relationship to equality, social and biological hierarchies.
equality_value
Equality as a threat to values, traditions or the social order
equality_identity
Equality as a threat to French identity or a factor of national dissolution
equality_gender
Inequalities between the sexes presented as natural or biologically grounded
hierarchy_castes
Society described in terms of castes or natural social hierarchies
hierarchy_IQ
IQ used as a criterion for ranking individuals or groups
hierarchy_race
Reference to natural inequalities between races or ethnic groups
equality_utopia
Equality described as unrealistic, naive or utopian
Ecology SIED + NR
Ecological positioning: eco-skepticism, techno-solutionism or civilizational ecology.
eco_eco
Economic growth is more important than environmental protection
eco_tech
Ecological concerns as obstacles to technological development
eco_civ
Climate challenges framed as a competition between civilizations
Powered by LLM_Tool
The 20 most-viewed videos in the YOUPOL corpus, spanning all political orientations and both countries.
| # | Channel | Title | Views | Likes | Comments | Orientation | Country | Date |
|---|