UK and Ireland Speech Workshop 2024

01-02 July 2024 | Cambridge, United Kingdom

Overview

Welcome! UKIS Speech 2024 will be held at the University of Cambridge from 1st to 2nd of July 2024.

The UK and Ireland Speech Workshop aims to bring together researchers within the UK and Ireland’s Speech Science and Speech Technology community, both in academia and industry. Our goal is to share information about ongoing research activities, to meet old friends and make new ones. UKIS 2024 is organised by the Speech Research Group, Department of Engineering, University of Cambridge in collaboration with the UK Speech Community.

Important Dates

Abstract Submission Closes8 May 2024
Notification of Acceptance22 May 2024
Author Registration5 June 2024
Full Registration Closes16 June 2024
UK Speech Conference1-2 July 2024

Venue Details

Conference Venue: Engineering Department, Trumpington Street, Cambridge, CB2 1PZ

Note that no car parking will be available.

Accommodation: Cambridge offers a variety of accommodation options, including hotels and bed & breakfasts (B&Bs). We have reserved 100 rooms at Robinson College on a first-come-first-served basis. To secure your accommodation, we recommend booking directly. Further details for booking can be found through the registration site (see below in registration). Please note that these rooms are reserved for booking for the workshop up to mid-May.

Social Event: The social event will include a drinks reception followed by dinner and music at Robinson College (see programme details).

Call For Abstracts

We invite 1-page abstracts (in PDF format) detailing either completed or ongoing research in speech science and technology and spoken language processing. The work need not be entirely original, provided it is recent. Submissions from researchers in both academic and industrial settings are encouraged.

Submission Details

Please submit your abstracts here.

Please ensure your submission is in pdf format and includes Title, Author(s), Affiliation(s) and abstract. Extended abstracts are encouraged (e.g. including pictures, key results tables/figures) but not required. Please use this Latex/word template inspired by Interspeech 2023. (1 page total, including tables/figures/references) 

Instructions for Accepted Papers

Poster Presentations: Papers that have been accepted to be presented as a poster, it is required that the poster be A0 in portrait orientation.

Oral Presentations: Oral presentation slots are for a total of 20 minutes including Q&A. We recommend 15 minutes for the presentations with 5 minutes for Q&A.

Registration

Please register for the event here.

Note that information about booking accommodation at Robinson College is provided on the registration site.

Keynotes

Dr Catherine Lai

Catherine Lai is a Lecturer in Speech and Language Technology, based in Linguistics and English Language and the Centre for Speech Technology Research at the University of Edinburgh.  Her main interest is speech prosody, e.g., intonation and rhythmic properties of speech, how it contributes to spoken dialogue understanding, drawing on both a speech technology/machine learning and linguistic/social science perspectives.  She was previously a post-doctoral researcher in the School of Informatics at the University of Edinburgh, where she was the Principal Investigator on a Toyota funded grant on spoken language processing for robot companions.  She gained her PhD in linguistics from the University of Pennsylvania, after MSc and BSc (Hons) degrees in computer science and mathematics from the University of Melbourne. Amongst other things, she’s also worked on emotion and stance recognition, speech processing for multimedia archives, human evaluation and interaction with speech technologies, the role prosody in theories of semantics/pragmatics, and (in a past life) linguistic tree query languages.

Prof. Jon Barker

Jon Barker is a Professor of Speech Processing at the University of Sheffield. He has been working in the area of speech and audio processing for over 30 years. His earlier work was on modelling human speech processing with a focus on computational models of speech in noise perception. He also has interests in distant microphone speech processing and was a founder of the CHiME series of challenges and workshops for distant microphone ASR and diarisation. More recently he has been trying to bring these interests together in the area of hearing aid processing with research funded via the EPSRC Clarity <https://claritychallenge.org> and Cadenza projects <https://cadenzachallenge.org> and students funded by Sheffield’s UKRI CDT in Speech and Language Technology.

Prof. Elizabeth Stokoe

Elizabeth Stokoe is a professor in the Department of Psychological and Behavioural Science at The London School of Economics and Political Science. She conducts conversation analytic research to understand how talk works – from first dates to medical communication and from sales encounters to crisis negotiation. She has worked as an industry fellow at technology companies Typeform and at Deployed. In addition to academic publishing, she is passionate about science communication, and has given talks at TED, Google, Microsoft, and The Royal Institution, and performed at Latitude and Cheltenham Science Festivals. Her books include Talk: The Science of Conversation (Little, Brown, 2018) and Crisis Talk (Routledge, 2022, co-authored with Rein Ove Sikveland and Heidi Kevoe-Feldman). Her research and biography were featured on BBC Radio 4’s The Life Scientific. During the Covid-19 pandemic she participated in a behavioural science sub-group of the UK Government’s Scientific Advisory Group for Emergencies (SAGE) and is a member of Independent SAGE behaviour group. She is a Wired Innovation Fellow and in 2021 was awarded Honorary Fellowship of the British Psychological Society.

Catherine Lai

Title: Across the prosodic dimension: Exploring spoken communication beyond text

Abstract: Recent advances in machine learning have made an undeniable impact on the field of speech technology as we’ve long known it.  These advances have also led so some rather bold claims: e.g., Speech-to-text (aka Automatic Speech Recognition) and Text-To-Speech synthesis are solved!  What these sorts of claims often miss is that the traditional objectives of speech technologies neglect important aspects of spoken communication beyond text.  For example, most machine learning oriented work on spoken language understanding is still focuses on text-based methods, ignoring the fact that how we speak can change how our words are interpreted. Nevertheless, previous work has shown that speech prosody (e.g. pitch, energy and timing characteristics of speech) can be used to signal speaker intent and affect, as well as to infer and project dialogue structure.  We also know that prosody can be highly contextually variable.  So, to make use of prosody in speech technology, we need to be able to model this variability and to understand what it actually does in spoken communication.  In this talk, I will discuss recent work exploring prosodic variation in (English) spoken dialogue, using representation learning methods developed for speech generation and recognition.  I argue that there are lot of benefits to be had from self-supervised methods for representation learning on speech and text datasets, but we still need linguistic knowledge to actually make use of the true richness of speech.

Jon Barker

Title: Using Machine Learning to Improve Hearing Aid Signal Processing: The Clarity and Cadenza Challenges

Abstract: At least 1.5 billion people are currently living with hearing loss, and this number will increase as the global population ages.
Many of these people would benefit from hearing aids, yet only a fraction of them have them and, further, many who have them do not use their devices often enough. A major reason for the low uptake is that hearing aids do not perform well enough in many everyday situations. Among users’ biggest complaints is that speech often remains poorly intelligible when listening in noisy situations, and that hearing aids often do not cope well with music. However, recent advances in machine learning have the potential to directly address these problems and transform the experience of hearing aid users.
 
In this talk, I will be presenting two large EPSRC projects Clarity (speech) and Cadenza (music) that are collaborations between the Universities of Sheffield, Salford, Cardiff, Leeds and Nottingham. These projects have been designed to investigate the potential for hearing aid machine learning, and also to grow the community of researchers working in this area. The projects are using a series of open challenges to achieve these goals. Clarity has been considering speech intelligibility enhancement and speech intelligibility prediction, while the more recent project, Cadenza, has been considering music enhancement through a process of source separation and hearing impairment-aware remixing.  The talk will explain some of the difficulties inherent in hearing aid signal processing, how recent advances from the speech community are being applied, and new approaches that are emerging from the latest Clarity/Cadenza challenges. The talk will also present the current challenges (the 3rd Clarity Enhancement Challenge, and the 2nd Cadenza Challenge) which will be ongoing at the time of the UKIS meeting with plenty of opportunity for those who are interested in getting involved.

Elizabeth Stokoe

Title: How ‘conversational’ are conversational products and technologies?

Abstract: Conversational products and technologies are in the headlines more than ever. But how ‘conversational’ are they? And what does ‘conversational’ actually mean? Many products leverage ‘conversation’, from communication training to assessment tools; from to scripted interaction to role-play, and from chatbots to voice assistants. But do they do so in ways that strengthen or do damage in their domains of use? Six decades of research in conversation analysis have identified and described the constitutive practices of human social interaction across the widest range of ordinary and institutional settings. In this talk, I will address the questions of what, when, and how conversational products could and should leverage from conversation analysis.

Programme

The detailed technical programme with all accepted submissions can be viewed here.

Programme at a Glance

Monday 01 July

Time
Location
Session
12:00 – 13:00
Foyer
Registration & Lunch
13:00 – 13:20
Constance Tipper
Welcome Message (Dr. Kate Knill)
13:20 – 14:20
Constance Tipper
Keynote A (Dr. Catherine Lai)
14:30 – 15:30
LR1, LR2, LR3, LR4
Poster Session A
15:30 – 16:00
LR4/Marquee/Foyer
Coffee Break
16:00 – 17:00
Constance Tipper
Oral Session A
18:30 –
Robinson College
UKIS 2024 Banquet in Association with Google

Tuesday 02 July

Time
Location
Session
08:30 – 09:00
Foyer
Registration
09:00 – 10:00
Constance Tipper
Keynote B (Prof. Elizabeth Stokoe)
10:00 – 11:00
LR1, LR2, LR3, LR4
Poster Session B
11:00 – 11:30
LR4/Marquee/Foyer
Coffee Break
11:30 – 12:30
Constance Tipper
Oral Session B
12:30 – 13:30
LR4/Marquee/Foyer
Lunch
13:30 – 14:30
LR1, LR2, LR3, LR4
Poster Session C
14:30 – 15:30
Constance Tipper
Keynote C (Prof. Jon Barker)
15:30 – 16:15
Constance Tipper
UKIS 2024 Cambridge: Future Plans & Farewell

Organizers

  • Thomas Merritt, Amazon
  • Catherine Lai, University of Edinburgh
  • Sebastian Le Maguer, Trinity College Dublin
  • Simone Graetzer, University of Salford

Kate Knill, Erfan Loweimi, Kimberly Cole, Simon McKnight, Stefano Banno, Hari Vydana, Siyuan Tang, Rao Ma, Mengjie Qian, Adian Liuise, Vyas Raina, Charles McGhee, Vatsal Raina, Brian Sun, Keqi Deng, Wen Wu

Contact us by email at ukis2024@eng.cam.ac.uk

Sponsors

We are very grateful to all our sponsors for helping us put on UKIS 2024.

Gold Sponsor

Bronze Sponsors

Captioning Partner

With the kind support of: