Audio Preprocessing
Vocal source separation with Demucs (htdemucs)
Continuous Update Pipeline
How It Works
Each audio file is processed through the htdemucs model for source separation. The signal is decomposed into four stems (vocals, bass, drums, other). Only the vocal stem is retained for subsequent steps.
Vocal separation runs on the collaborative machine network with GPU acceleration. Processing is parallelized across available compute nodes.
Click each card above to expand details
Audio tracks extracted from videos are processed by the source separation model Demucs (htdemucs model). Demucs decomposes the audio signal into four components (vocals, bass, drums, other) and only the vocal stem is retained. This step removes background music, jingles and parasitic noise, significantly improving the quality of subsequent speech recognition.
Processing is performed continuously on the collaborators' machine network, with GPU acceleration. Each video is processed automatically upon detection by the scanner.
Tools Used
Database Schema
Six tables in a normalized relational schema, from raw metadata to sentence-level NLP annotations.
| # | Table | Description | Scale |
|---|---|---|---|
| 1 | videos | One row per video: ID, channel metadata, views, likes, comments, tags, duration, upload date, political orientation, country, gender. | 26,396 rows |
| 2 | comments | All comments with author info, like counts, timestamps, nested reply structure, and JSONB analysis column. | 9.6M+ rows |
| 3 | video_transcripts | Full diarized transcripts with speaker labels and cleaned text versions. | 28,121 rows |
| 4 | transcription_speakers | Individual speaker segments from diarization, ordered by position within each video. | 1,021,611 rows |
| 5 | comments_processed | Sentence-level tokenized comments with NER entities (PER, ORG, LOC) and ML prediction columns. | 15.3M+ rows |
| 6 | transcription_speakers_processed | Sentence-level speaker segments with NER extraction and full annotation suite. | 4.8M+ rows |
Continuous Observatory
The database is continuously updated: channel scanning, video transcription and annotation, comment extraction, metadata updates (views, likes, subscribers). Each scan produces a longitudinal history accessible via the API.