For the upcoming TSD, the following outstanding set of keynote speakers with various expertise covering speech modeling, acoustic-phonetic decoding, dialogue systems, and semantics agreed to give their respective pieces of speech:
See the next section below for details about the speeches (topics, abstracts). By clicking onto the title of the speech (in italics) or the PDF icon behind it, you can see the PDF with the presentation.
The Statistical Approach to Human Language Technology: Achievements and Open Problems - Where do We Stand?Abstract: The last 40 years have seen a dramatic progress in statistical methods for recognizing speech signals and for translating spoken and written language. This talks will present a unifying view of the underlying statistical methods. In particular, the talk will address the remarkable fact that, for these tasks and similar tasks like handwriting recognition, the statistical approach makes use of the same four principles:
1) Bayes decision rule for minimum error rate; 2) probabilistic models, e.g. Hidden Markov models or conditional random fields for handling strings of observations (like acoustic vectors for speech recognition and written words for language translation); 3) training criteria and algorithms for estimating the free model parameters from large amounts of data; 4) the generation or search process that generates the recognition or translation result.
Most of these methods had originally been designed for speech recognition. However, it has turned out that, with suitable modifications, the same concepts carry over to language translation and other tasks in natural language processing. This lecture will give a critical review of the achievements and of the open problems.
Biography: Hermann Ney is a full professor of computer science at RWTH Aachen University in Aachen, Germany. His research interests lie in the area of statistical methods for pattern recognition and human language technology and their specific applications to speech recognition, machine translation and image object recognition. In particular, he has worked on dynamic programming and discriminative training for speech recognition, on language modelling and on phrase-based approaches to machine translation. His work has resulted in more than 500 conference and journal papers (h-index 78, estimated using Google scholar). He is a fellow of both the IEEE and of the International Speech Communication Association. In 2005, he was the recipient of the Technical Achievement Award of the IEEE Signal Processing Society. In 2010, he was awarded a senior Digiteo chair at LIMIS/CNRS in Paris, France.
Biography: Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. He is the director of the DHS funded Center for Multimodal Information Access & Synthesis (MIAS) and has faculty positions also at the Statistics and Linguistics Departments and at the graduate School of Library and Information Science. Roth is a Fellow of the ACM, AAAI, and ACL, for his contributions to the foundations of machine learning and inference and for developing learning centered solutions for natural language processing problems. He has published broadly in machine learning, natural language processing, knowledge representation and reasoning and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely by the research community. Prof. Roth has given keynote talks in major conferences and presented several tutorials in universities and conferences including at ACL and the European ACL and has won several teaching and best paper awards. Prof. Roth got his B.A Summa cum laude in Mathematics from the Technion, Israel and his Ph.D in Computer Science from Harvard University in 1995.
Associate Professor at the Machine Learning Group, Imperial College London, UK
Biography: Björn W. Schuller received his diploma in 1999, his doctoral degree for his study on Automatic Speech and Emotion Recognition in 2006, and his habilitation (fakultas docendi) and was entitled Adjunct Teaching Professor (venia legendi) in the subject area of Signal Processing and Machine Intelligence for his work on Intelligent Audio Analysis in 2012 all in electrical engineering and information technology from TUM (Munich University of Technology), repeatedly the number one German university in different rankings and among its two persistent Excellence Universities. At present, he is Full Professor and head of the Chair of Complex Systems Engineering at the University of Passau/Germany where he previously headed the Chair for Sensor Systems in 2013. At the same time he is a Senior Lecturer (Associate Professor) in Machine Learning in the Department of Computing at Imperial College London/UK (since 2013). Further, he is the co-founding CEO of audEERING UG (limited) – a TUM start-up on intelligent audio engineering. Previously, he headed the Machine Intelligence and Signal Processing Group at TUM from 2006 to 2014. In 2013 he was also invited as a permanent Visiting Professor in the School of Computer Science and Technology at the Harbin Institute of Technology, Harbin/P.R. China and a Visiting Professor at the Université de Genève in Geneva/Switzerland in the Centre Interfacultaire en Sciences Affectives and remains an appointed associate of the institute. In 2012 he was with Joanneum Research, Institute for Information and Communication Technologies in Graz/Austria, working in the Research Group for Remote Sensing and Geoinformation and the Research Group for Space and Acoustics - currently he is an expert consultant of the institute. In 2011 he was guest lecturer at the Università Politecnica delle Marche (UNIVPM) in Ancona/Italy and visiting researcher in the Machine Learning Research Group of NICTA in Sydney/Australia. From 2009 to 2010 he lived in Paris/France and was with the CNRS-LIMSI Spoken Language Processing Group in Orsay/France, and was a visiting scientist at Imperial College. Best known are his works advancing Machine Learning for the Engineering of Intelligent Audiovisual and Complex Information Systems, and Affective Computing for Human-Computer/Robot Interaction and Multimedia Retrieval.
Principal Research Officer at the National Research Council of Canada
Adjunct Professor at the University of Ottawa, Canada
Biography: In 2015, Peter Turney joined the Allen Institute for Artificial Intelligence (AI2) as a Senior Research Scientist. Before joining AI2, he was a Principal Research Officer at the National Research Council of Canada (NRC) and an Adjunct Professor at the University of Ottawa. He obtained his PhD in 1988 from the University of Toronto and joined the NRC in 1989. His recent work focuses on machine learning applied to natural language. He is the author or co-author of more than eighty publications. In the past, he has been an Editor of Canadian Artificial Intelligence magazine, an Editorial Board Member, Associate Editor, and Advisory Board Member of the Journal of Artificial Intelligence Research, and an Editorial Board Member of the journal Computational Linguistics. He was involved in initiating the Wiki of the Association for Computational Linguistics in 2006 and continues to play an active role in its maintenance. His paper Mining the Web for Synonyms won the ECML PKDD 10 Years Award in 2011.
Professor at Karlsruhe Institute of Technology, Germany
More specifically, I will discuss and demonstrate:
• Pocket speech translators running on smartphones for tourists and medical doctors. The software app, Jibbigo, launched in 2009, was the world’s first commercially available speech translator running such programs all on a telephone.
• Speech Translation tools devices deployed on iPads in Humanitarian and Government Missions.
• Simultaneous interpretation systems that translate academic lectures and political speeches in real time (recently tested in the European Parliament).
• A cloud based Lecture Interpretation Service deployed at KIT for the benefit of foreign students studying at a German University.
• Tools and Support Technology to facilitate and accelerate the work of human interpreters.
In the talk, I will review how the technology works and what levels of performance are now possible. Then we will be concerned with the delivery of such technology, so that language separation will truly fade naturally into the background. Finally, we will discuss ongoing research on the problems of portability and scaling, when we attempt to build cross-lingual communication tools for many languages and topics more effectively and inexpensively at acceptable cost. We will report results and experiences from the laboratory, from field trials and deployments.
Biography: Dr. Alexander Waibel is a Professor of Computer Science at Carnegie Mellon University, Pittsburgh and at the Karlsruhe Institute of Technology, Germany. He is the director of the International Center for Advanced Communication Technologies (interACT). The Center works in a network with eight of the world's top research institutions. The Center's mission is to develop multimodal and multilingual human communication technologies that improve human-human and human-machine communication. Prof. Waibel's team developed and demonstrated the first speech translation systems in Europe&USA (1990/1991 (ICASSP'91)), the world's first simultaneous lecture translation system (2005), and Jibbigo, the world's first commercial speech translator on a phone (2009). Dr. Waibel founded and served as chairmen of C-STAR, the Consortium for Speech Translation Advanced Research in 1991. Since then he directed and coordinated many research programs in speech, translation, multimodal interfaces and machine learning in the US, Europe and Asia. He served as director of EU-Bridge 2012-2015, a large scale European multi-site Integrated Project initiative aimed at developing speech translation services for Europe. He also served as co-director of IMMI, a joint venture between KIT, CNRS & RWTH and as principal investigator of several US and European research programs on machine learning, speech translation and multimodal interfaces. Dr. Waibel received many awards for pioneering work on multilingual speech communication and translation technology. He published extensively (>700 publications, >21,000 citations, h-index 75) in the field, and received/filed numerous patents. During his career, Dr. Waibel founded and built 10 successful companies. The latest, Jibbigo, built and distributed the world's first speech-translator on a smart phone. It was acquired by Facebook in 2013 and Dr. Waibel served as founding director of the Facebook Language Technology Group 2013-14. Since 2007, Dr. Waibel and his team also deployed speech translation technologies for healthcare providers in humanitarian and disaster relief missions. Since 2012, his team also deployed the first simultaneous interpretation service for lectures at Universities and interpretation tools at the European Parliament. Dr. Waibel received his BS, MS and PhD degrees at MIT and CMU, respectively.