For the upcoming TSD, the following outstanding set of keynote speakers with various expertise covering speech modeling, acoustic-phonetic decoding, dialogue systems, and semantics agreed to give their respective pieces of speech:

See the next section below for details about the speeches (topics, abstracts). By clicking onto the title of the speech (in italics) or the PDF icon behind it, you can see the PDF with the presentation.


Hermann Ney

Hermann Ney

Head of the Human Language Technology and Pattern Recognition Group, Computer Science Department, Rheinisch-Westfälische Technische Hochschule – Aachen University, Germany

The Statistical Approach to Human Language Technology: Achievements and Open Problems - Where do We Stand?  PDF (502 KB)
Abstract:  The last 40 years have seen a dramatic progress in statistical methods for recognizing speech signals and for translating spoken and written language. This talks will present a unifying view of the underlying statistical methods. In particular, the talk will address the remarkable fact that, for these tasks and similar tasks like handwriting recognition, the statistical approach makes use of the same four principles:
1) Bayes decision rule for minimum error rate; 2) probabilistic models, e.g. Hidden Markov models or conditional random fields for handling strings of observations (like acoustic vectors for speech recognition and written words for language translation); 3) training criteria and algorithms for estimating the free model parameters from large amounts of data; 4) the generation or search process that generates the recognition or translation result.
Most of these methods had originally been designed for speech recognition. However, it has turned out that, with suitable modifications, the same concepts carry over to language translation and other tasks in natural language processing. This lecture will give a critical review of the achievements and of the open problems.

Biography:  Hermann Ney is a full professor of computer science at RWTH Aachen University in Aachen, Germany. His research interests lie in the area of statistical methods for pattern recognition and human language technology and their specific applications to speech recognition, machine translation and image object recognition. In particular, he has worked on dynamic programming and discriminative training for speech recognition, on language modelling and on phrase-based approaches to machine translation. His work has resulted in more than 500 conference and journal papers (h-index 78, estimated using Google scholar). He is a fellow of both the IEEE and of the International Speech Communication Association. In 2005, he was the recipient of the Technical Achievement Award of the IEEE Signal Processing Society. In 2010, he was awarded a senior Digiteo chair at LIMIS/CNRS in Paris, France.
Dan Roth

Dan Roth

Professor at Department of Computer Science and The Beckman Institute, University of Illinois at Urbana-Champaign, USA

Learning and Inference for Natural Language Understanding. PDF (4.6 MB)
Abstract:  Machine Learning and Inference methods have become ubiquitous and have had a broad impact on a range of scientific advances and technologies and on our ability to make sense of large amounts of data. Research in Natural Language Processing has both benefited from and contributed to advancements in these methods and provides an excellent example for some of the challenges we face moving forward. I will describe some of our research in developing learning and inference methods in pursue of natural language understanding. In particular, I will address what I view as some of the key challenges, including (i) learning models from natural interactions, without direct supervision, (ii) knowledge acquisition and the development of inference models capable of incorporating knowledge and reason, and (iii) scalability and adaptation—learning to accelerate inference during the life time of a learning system. A lot of this work is done within the unified computational framework of Constrained Conditional Models (CCMs), an Integer Linear Programming formulation that augments statistically learned models with declarative constraints as a way to support learning and reasoning. Within this framework, I will discuss old and new results pertaining to learning and inference and how they are used to push forward our ability to understand natural language.

Biography:  Dan Roth is a Professor in the Department of Computer Science and the Beckman Institute at the University of Illinois at Urbana-Champaign and a University of Illinois Scholar. He is the director of the DHS funded Center for Multimodal Information Access & Synthesis (MIAS) and has faculty positions also at the Statistics and Linguistics Departments and at the graduate School of Library and Information Science. Roth is a Fellow of the ACM, AAAI, and ACL, for his contributions to the foundations of machine learning and inference and for developing learning centered solutions for natural language processing problems. He has published broadly in machine learning, natural language processing, knowledge representation and reasoning and learning theory, and has developed advanced machine learning based tools for natural language applications that are being used widely by the research community. Prof. Roth has given keynote talks in major conferences and presented several tutorials in universities and conferences including at ACL and the European ACL and has won several teaching and best paper awards. Prof. Roth got his B.A Summa cum laude in Mathematics from the Technion, Israel and his Ph.D in Computer Science from Harvard University in 1995.
Björn Schuller

Björn W. Schuller

Professor at the Chair of Complex Systems Engineering, University of Passau, Germany
Associate Professor at the Machine Learning Group, Imperial College London, UK

Speech Analysis in the Big Data EraPDF (7.91 MB)
Abstract:  In spoken language analysis tasks, one is often faced with comparably small available corpora of only one up to a few hours of speech material mostly annotated with a single phenomenon such as a particular speaker state at a time. In stark contrast to this, engines such as for the recognition of speakers' emotions, sentiment, personality, or pathologies, are often expected to run independent of the speaker, the spoken content, and the acoustic conditions. This lack of large and richly annotated material likely explains to a large degree the headroom left for improvement in accuracy by today’s engines. Three factors are mainly responsible for this sparseness of speech data and suited labels: the data are often 1) sparse per se, such as in the case of a sparsely occurring speaker state or trait, 2) considerably more ambiguous and thus challenging to annotate than, e.g., orthographic transcription of speech usually is, and 3) of highly private nature. Yet, in the big data era, it is becoming less and less the actual speech data that is lacking, as diverse resources such as the internet or broadcast and increased self-monitoring provide access to big amounts. Instead, it is rather the labels that are missing. Luckily, with the increasing availability of crowd-sourcing services, and recent advances in weakly supervised, contextual, and reinforced learning, new opportunities arise to ease this fact. In this light, this talk first shows the de-facto standard in terms of data-availability in a broad range of speaker analysis tasks. It then introduces methods for "cleaning up" the gold standard as obtained, e.g., by noisy crowd-sourced labels and presents highly efficient "cooperative" learning strategies basing on the combination of active and semi-supervised alongside transfer learning to best exploit available data in combination with data synthesis. Further, approaches to estimate meaningful confidence measures in this domain are suggested, as they form (part of) the basis of the weakly supervised learning algorithms. In addition, first successful approaches towards holistic speech analysis are presented using deep recurrent rich multi-target learning with partially missing label information. Finally, steps towards needed distribution of processing for big data handling are demonstrated. Overall, a system architecture and methodology is thus discussed that holds the promise to lead to a major breakthrough in performance and generalization ability of tomorrow's speech analysis systems.

Biography:  Björn W. Schuller received his diploma in 1999, his doctoral degree for his study on Automatic Speech and Emotion Recognition in 2006, and his habilitation (fakultas docendi) and was entitled Adjunct Teaching Professor (venia legendi) in the subject area of Signal Processing and Machine Intelligence for his work on Intelligent Audio Analysis in 2012 all in electrical engineering and information technology from TUM (Munich University of Technology), repeatedly the number one German university in different rankings and among its two persistent Excellence Universities. At present, he is Full Professor and head of the Chair of Complex Systems Engineering at the University of Passau/Germany where he previously headed the Chair for Sensor Systems in 2013. At the same time he is a Senior Lecturer (Associate Professor) in Machine Learning in the Department of Computing at Imperial College London/UK (since 2013). Further, he is the co-founding CEO of audEERING UG (limited) – a TUM start-up on intelligent audio engineering. Previously, he headed the Machine Intelligence and Signal Processing Group at TUM from 2006 to 2014. In 2013 he was also invited as a permanent Visiting Professor in the School of Computer Science and Technology at the Harbin Institute of Technology, Harbin/P.R. China and a Visiting Professor at the Université de Genève in Geneva/Switzerland in the Centre Interfacultaire en Sciences Affectives and remains an appointed associate of the institute. In 2012 he was with Joanneum Research, Institute for Information and Communication Technologies in Graz/Austria, working in the Research Group for Remote Sensing and Geoinformation and the Research Group for Space and Acoustics - currently he is an expert consultant of the institute. In 2011 he was guest lecturer at the Università Politecnica delle Marche (UNIVPM) in Ancona/Italy and visiting researcher in the Machine Learning Research Group of NICTA in Sydney/Australia. From 2009 to 2010 he lived in Paris/France and was with the CNRS-LIMSI Spoken Language Processing Group in Orsay/France, and was a visiting scientist at Imperial College. Best known are his works advancing Machine Learning for the Engineering of Intelligent Audiovisual and Complex Information Systems, and Affective Computing for Human-Computer/Robot Interaction and Multimedia Retrieval.
Peter D. Turney

Peter D. Turney

Senior Research Scientist at Allen Institute for Artificial Intelligence
Principal Research Officer at the National Research Council of Canada
Adjunct Professor at the University of Ottawa, Canada

Allen Institute for Artificial Intelligence: Vision, Projects, ResultsPDF (17.6 MB)
Abstract:  In 1946, with concern about the new atomic age, Albert Einstein wrote, "A new type of thinking is essential if mankind is to survive and move toward higher levels." This is equally true in 2015, in the context of global climate change and mass species extinction. When Artificial Intelligence has the ability to learn from text, speech, and dialogue, and the capacity to reason with what it has learned, we will have a new type of thinking. Combined with human intelligence and compassion, advanced AI has the potential to help humanity survive and move toward higher levels. This potential is what drives the Allen Institute for Artificial Intelligence. In this talk, I will describe the vision, projects, and latest results of AI2.

Biography:  In 2015, Peter Turney joined the Allen Institute for Artificial Intelligence (AI2) as a Senior Research Scientist. Before joining AI2, he was a Principal Research Officer at the National Research Council of Canada (NRC) and an Adjunct Professor at the University of Ottawa. He obtained his PhD in 1988 from the University of Toronto and joined the NRC in 1989. His recent work focuses on machine learning applied to natural language. He is the author or co-author of more than eighty publications. In the past, he has been an Editor of Canadian Artificial Intelligence magazine, an Editorial Board Member, Associate Editor, and Advisory Board Member of the Journal of Artificial Intelligence Research, and an Editorial Board Member of the journal Computational Linguistics. He was involved in initiating the Wiki of the Association for Computational Linguistics in 2006 and continues to play an active role in its maintenance. His paper Mining the Web for Synonyms won the ECML PKDD 10 Years Award in 2011.
Alexander Waibel

Alexander Waibel

Professor at Carnegie Mellon University, Pittsburgh, USA
Professor at Karlsruhe Institute of Technology, Germany

Bridging the Language Divide.  PDF (181 MB)
Abstract:  As our world becomes increasingly interdependent and globalization brings people together more than ever, we quickly discover that it is no longer the "digital divide" that separates us, but the "language divide" and the cultural differences that come with it. Nearly everyone has a cell phone and could connect with everyone else on the planet, if only they shared a common language and a common understanding. Forcing uniformity ("everyone speaks English"), however, is neither realistic nor desirable, as we enjoy the beauty and individuality of each of our languages and cultural heritage. Can technology provide an answer? In this talk, I will present language technology solutions that offer us the best of both worlds: maintaining our cultural diversity while enabling the integration, communication and collaboration that our modern world has to offer. I will present cross-lingual computer communication systems from our university labs, R&D consortia and start-up ventures.

More specifically, I will discuss and demonstrate:
• Pocket speech translators running on smartphones for tourists and medical doctors. The software app, Jibbigo, launched in 2009, was the world’s first commercially available speech translator running such programs all on a telephone.
• Speech Translation tools devices deployed on iPads in Humanitarian and Government Missions.
• Simultaneous interpretation systems that translate academic lectures and political speeches in real time (recently tested in the European Parliament).
• A cloud based Lecture Interpretation Service deployed at KIT for the benefit of foreign students studying at a German University.
• Tools and Support Technology to facilitate and accelerate the work of human interpreters.

In the talk, I will review how the technology works and what levels of performance are now possible. Then we will be concerned with the delivery of such technology, so that language separation will truly fade naturally into the background. Finally, we will discuss ongoing research on the problems of portability and scaling, when we attempt to build cross-lingual communication tools for many languages and topics more effectively and inexpensively at acceptable cost. We will report results and experiences from the laboratory, from field trials and deployments.

Biography:  Dr. Alexander Waibel is a Professor of Computer Science at Carnegie Mellon University, Pittsburgh and at the Karlsruhe Institute of Technology, Germany. He is the director of the International Center for Advanced Communication Technologies (interACT). The Center works in a network with eight of the world's top research institutions. The Center's mission is to develop multimodal and multilingual human communication technologies that improve human-human and human-machine communication. Prof. Waibel's team developed and demonstrated the first speech translation systems in Europe&USA (1990/1991 (ICASSP'91)), the world's first simultaneous lecture translation system (2005), and Jibbigo, the world's first commercial speech translator on a phone (2009). Dr. Waibel founded and served as chairmen of C-STAR, the Consortium for Speech Translation Advanced Research in 1991. Since then he directed and coordinated many research programs in speech, translation, multimodal interfaces and machine learning in the US, Europe and Asia. He served as director of EU-Bridge 2012-2015, a large scale European multi-site Integrated Project initiative aimed at developing speech translation services for Europe. He also served as co-director of IMMI, a joint venture between KIT, CNRS & RWTH and as principal investigator of several US and European research programs on machine learning, speech translation and multimodal interfaces. Dr. Waibel received many awards for pioneering work on multilingual speech communication and translation technology. He published extensively (>700 publications, >21,000 citations, h-index 75) in the field, and received/filed numerous patents. During his career, Dr. Waibel founded and built 10 successful companies. The latest, Jibbigo, built and distributed the world's first speech-translator on a smart phone. It was acquired by Facebook in 2013 and Dr. Waibel served as founding director of the Facebook Language Technology Group 2013-14. Since 2007, Dr. Waibel and his team also deployed speech translation technologies for healthcare providers in humanitarian and disaster relief missions. Since 2012, his team also deployed the first simultaneous interpretation service for lectures at Universities and interpretation tools at the European Parliament. Dr. Waibel received his BS, MS and PhD degrees at MIT and CMU, respectively.
Back to top...