Publications
2024
- High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow MatchingarXiv preprint arXiv:2407.03648, 2024
- URGENT Challenge: Universality, Robustness, and Generalizability For Speech EnhancementarXiv preprint arXiv:2406.04660, 2024
- Less Peaky and More Accurate CTC Forced Alignment by Label PriorsIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
- Folding Attention: Memory and Power Optimization for On-device Transformer-based Streaming Speech RecognitionIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
- An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech EnhancementIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
- On The Open Prompt Challenge In Conditional Audio GenerationIEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
- Stack-and-Delay: A New Codebook Pattern for Music GenerationICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
2023
- Exploring Speech Enhancement for Low-resource Speech SynthesisarXiv preprint arXiv:2309.10795, 2023
- FoleyGen: Visually-Guided Audio GenerationarXiv preprint arXiv:2309.10537, 2023
- Enhance Audio Generation Controllability through Representation Similarity RegularizationarXiv preprint arXiv:2309.08773, 2023
- Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech ProcessingJournal of Open Source Software, 2023
- ESPnet-ST-v2: Multipurpose Spoken Language Translation ToolkitarXiv preprint arXiv:2304.04596, 2023
- Ripple Sparse Self-attention for Monaural Speech EnhancementIn ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2023
- Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic ComputearXiv preprint arXiv:2306.06672, 2023
2022
- Time-Frequency Attention for Monaural Speech EnhancementIn ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2022
- ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and UnderstandingIn Proc. Interspeech 2022 , 2022
- Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to the L3DAS22 ChallengeIn ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2022
- A Time-Frequency Attention Module for Neural Speech EnhancementIn , 2022
2021
2020
- CHiME-6 challenge: Tackling Multispeaker Speech Recognition for Unsegmented RecordingsarXiv preprint arXiv:2004.09249, 2020
- Mask-dependent Phase Estimation for Monaural Speaker SeparationIn ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2020
- CUNY Speech Diarization System for the CHiME-6 ChallengeIn Proc. The 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020) , 2020
- Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering MasksarXiv preprint arXiv:2012.02191, 2020
- Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech EnhancementarXiv preprint arXiv:2012.03388, 2020
- Enhancement of Spatial Clustering-based Time-Frequency Masks using LSTM Neural NetworksarXiv preprint arXiv:2012.01576, 2020
2019
- ONSSEN: An Open-source Speech Separation and Enhancement LibraryarXiv preprint arXiv:1911.00982, 2019
2018
- Unusable Spoken Response Detection with BLSTM Neural NetworksIn 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) , 2018
- Sound Signal Processing with Seq2Tree NetworkIn Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) , 2018
2014
- Anatomical Entity Recognition with a Hierarchical Framework Augmented by External ResourcesPloS one, 2014