Publications

2024

  1. Less Peaky and More Accurate CTC Forced Alignment by Label Priors
    Ruizhe Huang ,  Xiaohui Zhang ,  Zhaoheng Ni ,  Li Sun ,  Moto Hira ,  Jeff Hwang ,  Vimal Manohar ,  Vineel Pratap ,  Matthew Wiesner ,  Shinji Watanabe ,  Daniel Povey , and 1 more author
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  2. Folding Attention: Memory and Power Optimization for On-device Transformer-based Streaming Speech Recognition
    Yang Li ,  Liangzhen Lai ,  Yuan Shangguan ,  Forrest N. Iandola ,  Zhaoheng Ni ,  Ernie Chang ,  Yangyang Shi ,  and  Vikas Chandra
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  3. An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement
    Qiquan Zhang ,  Meng Ge ,  Hongxu Zhu ,  Eliathamby Ambikairajah ,  Qi Song ,  Zhaoheng Ni ,  and  Haizhou Li
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  4. On The Open Prompt Challenge In Conditional Audio Generation
    Ernie Chang ,  Sidd Srinivasan ,  Mahi Luthra ,  Pin-Jie Lin ,  Varun Nagaraja ,  Forrest Iandola ,  Zechun Liu ,  Zhaoheng Ni ,  Changsheng Zhao ,  Yangyang Shi ,  and  others
    IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024
  5. Stack-and-Delay: A New Codebook Pattern for Music Generation
    Gael Le Lan ,  Varun Nagaraja ,  Ernie Chang ,  David Kant ,  Zhaoheng Ni ,  Yangyang Shi ,  Forrest Iandola ,  and  Vikas Chandra
    ICASSP 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024

2023

  1. Exploring Speech Enhancement for Low-resource Speech Synthesis
    Zhaoheng Ni ,  Sravya Popuri ,  Ning Dong ,  Kohei Saijo ,  Xiaohui Zhang ,  Gael Le Lan ,  Yangyang Shi ,  Vikas Chandra ,  and  Changhan Wang
    arXiv preprint arXiv:2309.10795, 2023
  2. FoleyGen: Visually-Guided Audio Generation
    Xinhao Mei ,  Varun Nagaraja ,  Gael Le Lan ,  Zhaoheng Ni ,  Ernie Chang ,  Yangyang Shi ,  and  Vikas Chandra
    arXiv preprint arXiv:2309.10537, 2023
  3. Enhance Audio Generation Controllability through Representation Similarity Regularization
    Yangyang Shi ,  Gael Le Lan ,  Varun Nagaraja ,  Zhaoheng Ni ,  Xinhao Mei ,  Ernie Chang ,  Forrest Iandola ,  Yang Liu ,  and  Vikas Chandra
    arXiv preprint arXiv:2309.08773, 2023
  4. TorchAudio 2.1: Advancing Speech Recognition, Self-supervised Learning, and Audio Processing Components for PyTorch
    Jeff Hwang ,  Moto Hira ,  Caroline Chen ,  Xiaohui Zhang ,  Zhaoheng Ni ,  Guangzhi Sun ,  Pingchuan Ma ,  Ruizhe Huang ,  Vineel Pratap ,  Yuekai Zhang ,  and  others
    2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2023
  5. Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing
    Yen-Ju Lu ,  Xuankai Chang ,  Chenda Li ,  Wangyou Zhang ,  Samuele Cornell ,  Zhaoheng Ni ,  Yoshiki Masuyama ,  Brian Yan ,  Robin Scheibler ,  Zhong-Qiu Wang ,  and  others
    Journal of Open Source Software, 2023
  6. TorchAudio-Squim: Reference-less Speech Quality and Intelligibility measures in TorchAudio
    Anurag Kumar ,  Ke Tan ,  Zhaoheng Ni ,  Pranay Manocha ,  Xiaohui Zhang ,  Ethan Henderson ,  and  Buye Xu
    In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2023
  7. ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit
    Brian Yan ,  Jiatong Shi ,  Yun Tang ,  Hirofumi Inaguma ,  Yifan Peng ,  Siddharth Dalmia ,  Peter Polák ,  Patrick Fernandes ,  Dan Berrebbi ,  Tomoki Hayashi ,  and  others
    arXiv preprint arXiv:2304.04596, 2023
  8. Scaling Speech Technology to 1,000+ Languages
    Vineel Pratap ,  Andros Tjandra ,  Bowen Shi ,  Paden Tomasello ,  Arun Babu ,  Sayani Kundu ,  Ali Elkahky ,  Zhaoheng Ni ,  Apoorv Vyas ,  Maryam Fazel-Zarandi ,  and  others
    arXiv preprint arXiv:2305.13516, 2023
  9. Ripple Sparse Self-attention for Monaural Speech Enhancement
    Qiquan Zhang ,  Hongxu Zhu ,  Qi Song ,  Xinyuan Qian ,  Zhaoheng Ni ,  and  Haizhou Li
    In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2023
  10. Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute
    William Chen ,  Xuankai Chang ,  Yifan Peng ,  Zhaoheng Ni ,  Soumi Maiti ,  and  Shinji Watanabe
    arXiv preprint arXiv:2306.06672, 2023

2022

  1. TorchAudio: Building Blocks for Audio and Speech Processing
    Yao-Yuan Yang ,  Moto Hira ,  Zhaoheng Ni ,  Artyom Astafurov ,  Caroline Chen ,  Christian Puhrsch ,  David Pollack ,  Dmitriy Genzel ,  Donny Greenberg ,  Edward Z Yang ,  and  others
    In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2022
  2. Time-Frequency Attention for Monaural Speech Enhancement
    Qiquan Zhang ,  Qi Song ,  Zhaoheng Ni ,  Aaron Nicolson ,  and  Haizhou Li
    In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2022
  3. ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding
    Yen-Ju Lu ,  Xuankai Chang ,  Chenda Li ,  Wangyou Zhang ,  Samuele Cornell ,  Zhaoheng Ni ,  Yoshiki Masuyama ,  Brian Yan ,  Robin Scheibler ,  Zhong-Qiu Wang ,  Yu Tsao , and 2 more authors
    In Proc. Interspeech 2022 , 2022
  4. Towards Low-distortion Multi-channel Speech Enhancement: The ESPNet-SE Submission to the L3DAS22 Challenge
    Yen-Ju Lu ,  Samuele Cornell ,  Xuankai Chang ,  Wangyou Zhang ,  Chenda Li ,  Zhaoheng Ni ,  Zhong-Qiu Wang ,  and  Shinji Watanabe
    In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2022
  5. A Time-Frequency Attention Module for Neural Speech Enhancement
    Qiquan Zhang ,  Xinyuan Qian ,  Zhaoheng Ni ,  Aaron Nicolson ,  Eliathamby Ambikairajah ,  and  Haizhou Li
    In , 2022

2021

  1. WPD++: An Improved Neural Beamformer for Simultaneous Speech Separation and Dereverberation
    Zhaoheng Ni ,  Yong Xu ,  Meng Yu ,  Bo Wu ,  Shixiong Zhang ,  Dong Yu ,  and  Michael I Mandel
    In 2021 IEEE Spoken Language Technology Workshop (SLT) , 2021

2020

  1. CHiME-6 challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings
    Shinji Watanabe ,  Michael Mandel ,  Jon Barker ,  Emmanuel Vincent ,  Ashish Arora ,  Xuankai Chang ,  Sanjeev Khudanpur ,  Vimal Manohar ,  Daniel Povey ,  Desh Raj ,  Zhaoheng Ni , and 1 more author
    arXiv preprint arXiv:2004.09249, 2020
  2. Mask-dependent Phase Estimation for Monaural Speaker Separation
    Zhaoheng Ni ,  and  Michael I Mandel
    In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2020
  3. CUNY Speech Diarization System for the CHiME-6 Challenge
    Zhaoheng Ni ,  and  Michael I Mandel
    In Proc. The 6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020) , 2020
  4. Improved MVDR Beamforming Using LSTM Speech Models to Clean Spatial Clustering Masks
    Zhaoheng Ni ,  Felix Grezes ,  Viet Anh Trinh ,  and  Michael I Mandel
    arXiv preprint arXiv:2012.02191, 2020
  5. Combining Spatial Clustering with LSTM Speech Models for Multichannel Speech Enhancement
    Felix Grezes ,  Zhaoheng Ni ,  Viet Anh Trinh ,  and  Michael Mandel
    arXiv preprint arXiv:2012.03388, 2020
  6. Enhancement of Spatial Clustering-based Time-Frequency Masks using LSTM Neural Networks
    Felix Grezes ,  Zhaoheng Ni ,  Viet Anh Trinh ,  and  Michael Mandel
    arXiv preprint arXiv:2012.01576, 2020

2019

  1. ONSSEN: An Open-source Speech Separation and Enhancement Library
    Zhaoheng Ni ,  and  Michael I Mandel
    arXiv preprint arXiv:1911.00982, 2019

2018

  1. Unusable Spoken Response Detection with BLSTM Neural Networks
    Zhaoheng Ni ,  Rutuja Ubale ,  Yao Qian ,  Michael Mandel ,  Su-Youn Yoon ,  Abhinav Misra ,  and  David Suendermann-Oeft
    In 2018 11th International Symposium on Chinese Spoken Language Processing (ISCSLP) , 2018
  2. Sound Signal Processing with Seq2Tree Network
    Weicheng Ma ,  Kai Cao ,  Zhaoheng Ni ,  Peter Chin ,  and  Xiang Li
    In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) , 2018

2014

  1. Anatomical Entity Recognition with a Hierarchical Framework Augmented by External Resources
    Yan Xu ,  Ji Hua ,  Zhaoheng Ni ,  Qinlang Chen ,  Yubo Fan ,  Sophia Ananiadou ,  Eric I-Chao Chang ,  and  Junichi Tsujii
    PloS one, 2014