Analysis | Political YouTube & TikTok Database

Videos Over Time by Political Orientation

Yearly count of uploaded videos, stacked by orientation (2006 to today)

Total Views Over Time by Political Orientation

Cumulative views per year, broken down by orientation

Monthly Publication Activity

Total videos published per month across all orientations

Political content share over time

Monthly share of sentences and comments classified as political. Binning adapts to available volume.

Sentences: CamemBERTv2 classifier (FR) · comments: XLM-RoBERTa classifier (multilingual) — trained on LLM annotations validated by human coders

Annotation protocol & validation

The training corpus is pre-annotated by a large language model (LLM). The LLM's reliability is first measured on a random sample that is also coded independently by two human researchers. The metrics below compare the LLM's predictions against the consensus of the two annotators (agreement > n/2); once validated, that LLM labels the rest of the corpus, on which the final classifiers are trained (CamemBERTv2 for sentences, XLM-RoBERTa for comments).

Metric	Sentences	Comments
Macro F1 unweighted mean across classes	92.2%	93.5%
Weighted F1 weighted by each class support	92.0%	93.9%
Micro F1 global, across all decisions	92.0%	93.7%
Accuracy % of LLM decisions identical to the human consensus	88.1%	88.9%
Hamming loss % of mislabelled bits — lower is better	7.7%	6.0%

Validation sample: 1,000 sentences · 998 comments (random)
Human annotators: 2 researchers, independent coding
Reference: human consensus (agreement > n/2)
Schema: binary (political / non-political)

What these numbers measure: how well the LLM reproduces the humans' agreement. The training corpus is therefore built from LLM labels whose reliability is publicly auditable. The CamemBERTv2 and XLM-RoBERTa classifiers are then trained on this LLM-annotated corpus and served here.

Tools & references Annotation and validation pipeline: LLM_Tool (technical paper, OSF).

Video Distribution by Political Orientation

Share of the corpus by ideological category

Videos by Country

France vs. Quebec distribution across orientations

Gender Distribution of Channel Creators

Channels classified by the gender of their primary creator(s)

Average Video Duration by Orientation

Mean duration in minutes per political category

Political intensity by orientation

Average share of political sentences and comments, broken down by the channel's editorial orientation.

Sentences: CamemBERTv2 classifier (FR) · comments: XLM-RoBERTa classifier (multilingual) — trained on LLM annotations validated by human coders

Annotation protocol & validation

Metric	Sentences	Comments
Macro F1 unweighted mean across classes	92.2%	93.5%
Weighted F1 weighted by each class support	92.0%	93.9%
Micro F1 global, across all decisions	92.0%	93.7%
Accuracy % of LLM decisions identical to the human consensus	88.1%	88.9%
Hamming loss % of mislabelled bits — lower is better	7.7%	6.0%

Validation sample: 1,000 sentences · 998 comments (random)
Human annotators: 2 researchers, independent coding
Reference: human consensus (agreement > n/2)
Schema: binary (political / non-political)

Tools & references Annotation and validation pipeline: LLM_Tool (technical paper, OSF).

Coming Soon

Sentence Level Annotation

Every sentence in the corpus is first classified as political or non-political, then annotated within three research projects: gender analysis, technophile neo-reactionaries, and far-right ideas (SIED, building on Boursier & Lemor, 2025). Annotation powered by LLM_Tool (technical paper).

19M+ Sentences to annotate

68+ Classification models

3 Research projects

Detected annotations

Input Codebooks (political, gender, NR, SIED) + 19M sentences

Process LLM annotation + CamemBERT fine tuning

Output Validated classifiers at scale

A political detection codebook first filters out non-political content. Three additional codebooks define the target constructs for the research projects: gender analysis (gender, valence, rationality, science position), technophile neo-reactionaries (technology, libertarianism, fictional metaphors, equality, ecology) and SIED (nationalism, immigration, democracy, progress, authority, tradition, equality, ecology) following the framework developed in Boursier & Lemor (2025). The LLM receives each codebook as a system prompt and annotates batches of 500 sentences, producing labels and justifications.

For each classification task, a CamemBERT model is fine-tuned on the LLM-annotated data. Models are evaluated on a held-out validation set. If the F1 score is insufficient, the pipeline enters a reinforced loop: the most uncertain predictions are sent back to the LLM for re-annotation with stricter prompts, the training set is augmented, and the model is retrained. This loop typically converges within two to three rounds.

Validated classifiers run inference on the entire corpus. Each of the 19 million sentences receives a prediction and a confidence score for every applicable model. All results are stored in PostgreSQL, enabling queries such as "show all political sentences annotated immigration_security" or "compare gender valence across political orientations".

The SIED codebook, developed in Boursier & Lemor (2025), defines nation_threat as "the nation described as under threat (internal or external), requiring protection, defense or preservation." The LLM annotates a sample of sentences, labeling positives (e.g., "notre civilisation est en train de disparaître sous les coups de la mondialisation") and negatives. A CamemBERT classifier is fine-tuned on these labels. If performance is insufficient, the reinforced loop selects the most uncertain sentences, sends them back to the LLM for re-annotation, and the model is retrained. This process is applied to each sub-category across all three projects.

SIED: breakdown by political orientation

Compare SIED categories (nationalism, immigration, democracy, progress, authority, tradition, equality, ecology) across far-right, left, masculinist and conspiracy channels, extending Boursier & Lemor (2025).

Gender discourse analysis

Explore gender valence, rationality types and science positioning in political channel discourse over time.

Technophile neo-reactionaries

Analyze discourses on technology, libertarianism and fictional metaphors (red pill, Cathedral, etc.) across the political content in the corpus.

France vs. Quebec comparison

Compare how the same ideological constructs manifest differently in the two francophone political ecosystems.

A first model classifies each sentence as political or non-political using a broad definition (current affairs, social issues, political actors, power relations, social norms). This filtering precedes the three annotation projects.

Political Detection

Binary classification of each sentence as political or non-political.

political_yes The sentence refers to current affairs, social issues, political actors, power relations or social norms.

political_no The sentence relates to private life, personal narrative or entertainment without collective scope.

Detection of gender discourse and multidimensional analysis: gender presence, valence (positive, negative, ambivalent), type of rationality invoked and positioning towards science.

Gender

Does the content address gender? Direct or indirect reference to men, women, masculinity, femininity, gender roles, feminism, antifeminism, male-female relations, LGBTQ+.

gender_yes Gender discourse present

gender_no No gender discourse

Gender Valence

Tone of the gender discourse.

genre_valence_positive Promotes gender equality or challenges stereotypes

genre_valence_negative Hostility, criticism or derogatory claims toward feminism or gender equality

genre_valence_ambivalent Initially appears egalitarian but limits or relativizes equality

genre_valence_null No evaluative stance toward gender

Rationality Type

Type of rationality mobilized in gender discourse.

rationality_none No justificatory rationality

rationality_nature Biological, natural, evolutionary or religious-natural arguments

rationality_liberal Invokes formal equality or individual rights to deny structural domination

rationality_empirical Statistics, data or "facts" as justification

rationality_heroic Frames the claim as courageous truth-telling, anti-political correctness

Science Position

Positioning towards science in gender discourse.

science_none No reference to science

science_pro_science Values studies, experts or research

science_anti_science Discredits academia or research

science_ambivalent Both pro- and anti-science registers coexist

Measurement of neo-reactionary (NR) ideas centered on technological optimism, libertarianism and the use of fictional metaphors in political discourse, along with dimensions shared with the SIED (equality and ecology).

Technology

Technological optimism, technocracy and transhumanism.

techno_optimism_overall Optimistic or positive view of the role of technology and innovation

innovation_as_progress Technological innovation as a driver of progress or solution to social problems

pro_tech_figures Favorable reference to tech figures (Musk, Thiel, Altman, Zuckerberg…)

technocracy_over_democracy Technocratic or expert-led governance is more effective than democracy

deregulation_of_tech Deregulation of technological innovation as necessary for progress

transhumanism Support for transhumanism, post-humanism, eugenics or technological augmentation of humans

Libertarianism

Secession, individual autonomy, alternative communities and the corporate model as a political counter-model.

lib_sec Support for secession or break from the national political community

lib_autonomy Living autonomously, outside traditional state structures

lib_community Creation of communities based on their own values and rules

lib_company The corporate model as a political counter-model to the state or democracy

lib_state The state should be run like a company, according to performance criteria

Fictional Metaphors

Use of metaphors from popular fiction to structure political interpretation.

metaphor_redpill Reference to the "red pill," awakening to hidden truth, breaking free from egalitarian or democratic illusions

metaphor_lotr References to Lord of the Rings to conceptualize social or civilizational hierarchies

metaphor_starwars References to Star Wars to frame political struggle, authority or legitimacy

metaphor_cathedral The Cathedral as a metaphor for universities, media or progressive institutions forming an ideological system

Equality SIED + NR

Relationship to equality, social and biological hierarchies.

equality_value Equality as a threat to values, traditions or the social order

equality_identity Equality as a threat to French identity or a factor of national dissolution

equality_gender Inequalities between the sexes presented as natural or biologically grounded

hierarchy_castes Society described in terms of castes or natural social hierarchies

hierarchy_IQ IQ used as a criterion for ranking individuals or groups

hierarchy_race Reference to natural inequalities between races or ethnic groups

equality_utopia Equality described as unrealistic, naive or utopian

Ecology SIED + NR

Ecological positioning: eco-skepticism, techno-solutionism or civilizational ecology.

eco_eco Economic growth is more important than environmental protection

eco_tech Ecological concerns as obstacles to technological development

eco_civ Climate challenges framed as a competition between civilizations

Far-right ideological score (SIED) developed in Boursier & Lemor (2025), Revue française de science politique. Measures the presence of far-right ideological affiliation categories (CAIED) — nationalism, immigration, democracy, progress, authority, tradition — as well as dimensions shared with the NR project (equality and ecology), through their respective sub-dimensions.

Nationalism

Constructions of the nation and national identity.

nation_ethnic Nation as an ethnic or cultural community based on blood ties or common ancestors

nation_family Nation associated with the family, citizens as children of the motherland

nation_state Nation fused with the state as a single, inseparable entity

nation_vital The nation as an essential and insurmountable element of human life

nation_threat Nation described as under threat, requiring protection or defense

nation_colonialism Colonial nostalgia or denial of the consequences of colonization

Immigration

Framing of immigration as a threat.

immigration_identity Threat to national identity, culture or French/European values

immigration_security Association with delinquency, crime or terrorism

immigration_women Threat to women's rights or gender equality

immigration_law Call for stricter immigration or asylum legislation

Democracy

Critical relationship to democracy as an ideal or political regime.

demo_value Democracy as a threat to values, traditions or national identity

demo_sep Challenging the separation of powers, strengthening the executive

demo_vain Democracy described as inefficient, slow or unable to produce good decisions

demo_corrupt Democracy as fundamentally corrupt or captured by special interests

demo_beyond Call for revolting against or moving beyond democracy

demo_neg Support for non-democratic regimes (authoritarianism, monarchy, technocracy)

Progress

Rejection of modernization, globalization and progressive change.

progress_identity Progress as a threat to values, traditions or national identity

progress_stop Call for slowing, limiting or stopping social progress or progressive reforms

progress_glob Criticism of progress through globalization or the EU as destruction of identities

Authority

Obedience to authority, use of force and traditionalism.

authority_chief Importance of a strong leader or providential figure to protect the nation

authority_essential Political measure presented as essential, urgent to restore authority

authority_security Importance of order and security, fighting delinquency

authority_army Valorization of the army, police or law enforcement

Tradition

Defense of traditional values and the civilizational project.

tradition_value French values, customs or identity to be preserved and promoted

tradition_threat Tradition or traditional values under threat, requiring protection

tradition_family Promotion of the traditional family model or criticism of family transformations

tradition_laicite Secularism as a marker of national identity rather than a principle of neutrality

tradition_civilization Tradition as a civilizational project to spread values considered superior