The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
-
Updated
Sep 18, 2025 - Python
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
Sound event localization, detection, and tracking of multiple overlapping and moving sources in 2D spherical space using convolutional recurrent neural network
A SOTA Industrial-Grade Voice Activity Detection & Audio Event Detection, supporting 100+ languages, outperforming Silero-VAD, TEN-VAD, FunASR-VAD and WebRTC-VAD
Reading list for research topics in Sound AI
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
This is the public repository for eigenvector-based SALSA features for polyphonic sound event localization and detection.
OpenFLAM: Framewise Language Audio Model
SELD-TCN: Sound Event Detection & Localization via Temporal Convolutional Network | Python w/ Tensorflow
2024 Latest laughter detection & segmentaion model. Paper: "Robust Laughter Segmentation with Automatic Diverse Data Synthesis", Interspeech 2024
Baseline of dcase 2019 task 4
Sound event detection with depthwise separable and dilated convolutions.
Training code of Cornell Birdcall Identification Challenge 6th place solution
🎵 A repository for manually annotating files to create labeled acoustic datasets for machine learning.
Python library for rapid prototyping of environmental sound analysis systems
Author's repository for reproducing DcaseNet, an integrated pre-trained DNN that performs acoustic scene classification, audio tagging, and sound event detection. Implemented using PyTorch.
CST-former: Transformer with Channel-Spectro-Temporal Attention for Sound Event Localization and Detection (ICASSP 2024)
Tracking states of the arts and recent results (bibliography) on sound tasks.
📊 Easily apply audio-related machine learning models trained on the AudioSet dataset (527+ models/classes).
Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection
Easy to use Audio Tagging in PyTorch
Add a description, image, and links to the sound-event-detection topic page so that developers can more easily learn about it.
To associate your repository with the sound-event-detection topic, visit your repo's landing page and select "manage topics."