上海交通大学计算机科学与工程系(CSE)

新闻动态

2016 SJTU Workshop on Acoustic Speech and Signal Processing

发布时间：2016-03-25

Date: 2016.3.26
Place: 3-410

Topic: Welcome and introduction of research at SJTU SpeechLab
Time: 9:30 - 10:30
Speaker: Kai Yu (SJTU)

Topic: CUED Speech Group Overview

Time: 10:40 - 11:40
Abstract: Cambridge University Engineering Department (CUED) is regularly listed in the top 5 engineering departments in the world. It is large integrated engineering department, covering all aspects of engineering. The Speech Group is part of the Information Engineering division within CUED. The group applies machine learning solutions to speech recognition, speech synthesis, statistical machine translation and spoken dialogue systems. It has a long history of world leading research in these fields. This talk will give a brief overview of the group and the research interest of the faculty. The latest release of HTK (V3.5) will be described, along with the MPhil course taught in collaboration with the Machine Learning Group. Finally some examples of projects currently being undertaken in the group will be discussed.
Bio:
Prof. Mark Gales studied for the B.A. in Electrical and Information Sciences at the University of Cambridge from 1985-88. Following graduation he worked as a consultant at Roke Manor Research Ltd. In 1991 he took up a position as a Research Associate in the Speech Vision and Robotics group in the Engineering Department at Cambridge University. In 1995 he completed his doctoral thesis: Model-Based Techniques for Robust Speech Recognition supervised by Professor Steve Young. From 1995-1997 he was a Research Fellow at Emmanuel College Cambridge. He was then a Research Staff Member in the Speech group at the IBM T.J.Watson Research Center until 1999 when he returned to Cambridge University Engineering Department as a University Lecturer. He was appointed Reader in Information Engineering in 2004. He is currently a Professor of Information Engineering (appointed 2012) and a College Lecturer and Official Fellow of Emmanuel College. Mark Gales is a Fellow of the IEEE and a member of the Speech and Language Processing Technical Committee (2015-2017, previously a member from 2001-2004). He was an associate editor for IEEE Signal Processing Letters from 2008-2011 and IEEE Transactions on Audio Speech and Language Processing from 2009-2013. He is currently on the Editorial Board of Computer Speech and Language.
Mark Gales has been awarded a number of paper awards, including a 1997 IEEE Young Author Paper Award for his paper on Parallel Model Combination and a 2002 IEEE Paper Award for his paper on Semi-Tied Covariance Matrices.

Dr. Kate Knill is a Senior Research Associate in the Machine Intelligence Laboratory working on the ALTA Institute and BABEL projects. She has over 20 years experience in speech technology. From 1993 to 1996, she was a Research Associate in the Speech Vision and Robotics Group in the Engineering Department at Cambridge University working on audio document retrieval, supervised by Prof. Steve Young and funded by HP Labs, Bristol. She joined the Speech R & D team of Nuance Communications in 1997. As Languages Manager (2000 - 2002), she led a cross-site team that developed over 20 languages for speech recognition and speaker verification. In 2002, she established a new Speech Technology Group at Toshiba Research Europe, Cambridge Research Laboratory (CRL), Cambridge, U.K. As Assistant Managing Director and Speech Technology Group Leader she was responsible for interactive technology, in particular core speech recognition and synthesis R & D and development of European and North American speech products. The Cambridge team led the led the creation of new speech recognition and speech synthesis engines for Toshiba for which Kate served as project lead across sites in the UK, Japan and China. She is a member of the ISCA Board (2013-17) and was a member of the IEEE Speech and Language Technical Committee from 2009-2012

Topic: Recent Speech Research at National Taiwan University

Time: 13:00-14:00
Speaker: Prof. Lin-shan Lee, Dr. Hung-yi Lee
Bio:
Lin-shan Lee has been a Professor of electrical engineering and computer science at National Taiwan University, Taipei, Taiwan, since 1982. His research interests include digital communication and spoken language processing. He was on the Board of Governors (1995), the Vice President for International Affairs (1996-97) and the Awards Committee chair (1998-99) of IEEE Communications Society, on the Board of International Speech Communication Association (ISCA, 2002-09), and a Distinguished Lecturer (2007-08) and the general chair of ICASSP 2009 of IEEE Signal Processing Society at Taipei. He is a fellow of IEEE (1993) and of ISCA (2010) for contributions in Chinese spoken language processing and spoken content retrieval, and received the Meritorious Service Award from IEEE Signal Processing Society (2011), and Exemplary Global Service Award from IEEE Communications Society (2014).
李宏毅 (Hung-yi Lee) received the M.S. and Ph.D. degrees from National Taiwan University (NTU), Taipei, Taiwan, in 2010 and 2012, respectively. From September 2012 to August 2013, he was a postdoctoral fellow in Research Center for Information Technology Innovation, Academia Sinica. From September 2013 to July 2014, he was a visiting scientist at the Spoken Language Systems Group of MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is currently an assistant professor of the Department of Electrical Engineering of National Taiwan University, with a joint appointment at the Department of Computer Science & Information Engineering of the university. His research focuses on machine learning (especially deep learning), spoken language understanding and speech recognition.

Topic: Introduction to IBM
Time: 14:00-14:45
Speaker: Bhuvana Ramabhadran (IBM)

Topic: Recent Activities of distant talk speech recognition
Time: 14:45-15:30
Speaker: Shinji Watanabe (MERL)
Abstract:
With the success of voice search applications using mobile devices, the application area for speech recognition has widen from close talk to distant talk scenarios. However, distant speech recognition is a significantly harder problem, because speech signals are distorted due to noises, reverberations, and attenuation. In such scenarios, the performance of current speech recognition systems drastically degrades due to their lack of robustness. In this presentation, we introduce several research trends in distant speech recognition, in particular the CHiME speech separation and recognition challenge series, the REVERB challenge, and the ASpIRE challenge. We also introduce several promising techniques used in our systems which showed their effectiveness in these challenges, including non-negative matrix factorization, long short-term memory network for speech enhancement, and discriminative techniques for acoustic modeling.
Bio:
Shinji Watanabe received the Dr. Eng. Degree in 2006 from Waseda University, Japan. From 2001 to 2011, he was working at NTT Communication Science Laboratories, Japan. From 2012, he has been working at Mitsubishi Electric Research Laboratories, USA. His research interests include machine learning, Bayesian inference, speech recognition, and spoken language processing. He is a member of the Acoustical Society of Japan (ASJ), the Institute of Electronics, Information and Communications Engineers (IEICE), and a senior member of IEEE. He is currently an Associate Editor of the IEEE Transactions on Audio Speech and Language Processing, and several committee members including the IEEE Signal Processing Society Speech and Language Technical Committee (IEEE SLTC).

Topic: Machine Learning and Signal Processing Technologies for Assistive Hearing Devices
Time: 15:30-16:30
Speaker: Yu Tsao (Sinica Taiwan)
Abstract：With the rapid advancement in speech processing technologies and in-depth understanding of human speech perception mechanism, significant improvement has been made in the design of assistive hearing devices [assistive listening device (ALD), hearing aids (HAs), and cochlear implants (CIs)] to benefit the speech communication for millions of hearing-impaired patients and subsequently enhance their quality of life. However, there are still many technical challenges, such as designing noise-suppression algorithms catered for ALD, HA, and CI users, deriving optimal compression strategies, improving the music appreciation, optimizing speech processing strategies for users speaking tonal languages, to name a few. In this talk, we present our recent research achievements using machine learning and signal processing on improving speech perception abilities for ALD, HA, and CI users.
Bio:
Dr. Yu Tsao received the Ph.D. degree in Electrical and Computer Engineering from Georgia Institute of Technology, GA, USA, in 2008. The topic of his Ph.D. research is on characterizing unknown environments for enhancing automatic speech recognition (ASR) robustness under adverse conditions. In the summers of 2004, 2005, and 2006, Dr. Tsao was with Speech Technologies Laboratory, Texas Instruments Incorporated (TI), as a summer research associate, where he was deriving algorithms to reduce online computation for ASR on mobile devices. A US patent was applied based on his research work in TI. In addition to the patent application, Dr. Tsao received a TILU (Texas Instruments Leadership University) fellowship and two conference grants offered by ISCA (International Speech Communication Association).
From April 2009 to September 2011, Dr. Tsao was an expert researcher at Spoken Language Communication (SLC) Group, National Institute of Information and Communications Technology (NICT), Kyoto, Japan, where he engaged in research and product development in ASR for multi-lingual speech-to-speech translation. Several papers were published, and a Japan Patent was filed based on his research achievements.
Currently, Dr. Tsao is an assistant research fellow of the Research Center for Information Technology Innovation (CITI) at Academia Sinica. His recent research interests include hearing assistive devices, speech recognition, audio-coding, deep neural networks, bio-signals, and acoustic modeling.