ITG-Fb. 312: Speech Communication
15th ITG Conference, 20. – 22.09.2023 in Aachen, Germany
Diese Publikation zitieren
VDE ITG (Hg.), ITG-Fb. 312: Speech Communication (2023), VDE Verlag, Berlin, ISBN: 9783800761654
1
Accesses
Accesses
Beschreibung / Abstract
The 15th ITG conference on Speech Communication solicits contributions on theory, algorithms, and applications in the following areas of speech, audio, and spoken language processing.
Topics:
- Speech Enhancement and Separation
- Source Localization and Tracking
- Detection and Classification of Acoustic
- Scenes and Events
- Automatic Speech and Speaker Recognition
- Spoken Dialogue, Diarization, and Spoken Document Retrieval Systems
- Speech Synthesis
- Speech Modeling, Coding, and Transmission
- Privacy in Speech Technologies
- Speech Production and Perception
- Speech and Audio Quality Assessment
- Paralinguistics, Speech Diagnostics and Speech-related Biosignals
- Speech in Automotive, Mobile, and Multimodal Applications
- Acoustic Interfaces, Assistive Devices, and Hearing Aids
- Hardware and Software Tools
- Emerging Topics and Applications
Topics:
- Speech Enhancement and Separation
- Source Localization and Tracking
- Detection and Classification of Acoustic
- Scenes and Events
- Automatic Speech and Speaker Recognition
- Spoken Dialogue, Diarization, and Spoken Document Retrieval Systems
- Speech Synthesis
- Speech Modeling, Coding, and Transmission
- Privacy in Speech Technologies
- Speech Production and Perception
- Speech and Audio Quality Assessment
- Paralinguistics, Speech Diagnostics and Speech-related Biosignals
- Speech in Automotive, Mobile, and Multimodal Applications
- Acoustic Interfaces, Assistive Devices, and Hearing Aids
- Hardware and Software Tools
- Emerging Topics and Applications
Inhaltsverzeichnis
- ITG-Fachbericht 312: Speech Communication
- Titelseite
- Impressum
- Scope
- Technical Program Committee
- Contents
- Poster Session A
- Poster Session B
- Lecture Session: Best Paper Award Contest
- Poster Session C
- Poster Session D
- Lecture Session: Generative Methods for Speech Processing
- 01 Ad Hoc Distributed Microphones Clustering: A Comparative Analysis on Using Coherence and Signal-specific Features
- 02 Exploiting an External Microphone to Improve Time-difference-of-arrival Estimates for Euclidean Distance Matrix-based Source Localization
- 03 Hearing Impairment in Crowdsourced Speech Quality Assessments: Its Effect and Screening with Digit Triplet Hearing Test
- 04 Long-term Conversation Analysis: Exploring Utility and Privacy
- 05 Towards a Natural Reproduction of Binaural Recordings: Combining Binaural Cue Adaptation and Adaptive Crosstalk Cancellation
- 06 Screening of Alzheimer%8Fs Dementia up to 12 Years ahead from Conversational Speech of ILSE Study
- 07 Speaker%8Fs Articulatory Strategy Analysis: Theoretical Framework and Preliminary Experiment
- 08 Speech-based Age and Gender Prediction with Transformers
- 09 Transfer Learning using Musical/Non-musical Mixtures for Multi-instrument Recognition
- 10 U-DiT TTS: U-Diffusion Vision Transformer for Text-to-speech
- 11 Using Perceptual Evaluation of Speech Quality (PESQ) Loss for DNN-based Speech Enhancement
- 12 Advances In End-to-end Conversational Speech Quality Prediction
- 13 Comparative Study of LC3plus and Lyra Codec on DNN-based Source Localisation for Hearing Aids
- 14 Comparison of Different Neural Network Architectures for Spoken Language Identification
- 15 Exploring Shapely Values for Blood Glucose Level Prediction from Speech
- 16 LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices
- 17 Reduced-complexity Binaural Source Localization for Headphones and Hearing Aids using Low-rank DRTF Approximations
- 18 Single Channel Source Separation in the Wild - Conversational Speech in Realistic Environments
- 19 Subjective Performance Evaluation of Single-channel Speaker-conditioned Target Speaker Extraction Algorithms for Complex Acoustic Scenes
- 20 Toward Semi-supervised Transcription of NAKO+ILSE: Influence of Automatic Speech Recognition Performance on Manual Transcription Effort
- 21 Towards a Brain Computer Interface for Speech Perception
- 22 Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
- 23 Investigating Speaker Embedding Disentanglement on Natural Read Speech
- 24 BRUDEX Database: Binaural Room Impulse Responses with Uniformly Distributed External Microphones
- 25 Comparative Analysis of the wav2vec 2.0 Feature Extractor
- 26 Design of Low-order IIR Filters Based on Hankel Nuclear Norm Regularization for Achieving Acoustic Transparency
- 27 Fast Tracking of Time-variant Systems Using Local Affine Subspaces
- 28 Generalized Wiener Filter for Nonlinear Acoustic Echo Control
- 29 Compression of End-to-end Non-autoregressive Image-to-speech System for Low-resourced Devices
- 30 CRNN-based Multi-DOA Estimator: Comparing Classification and Regression
- 31 Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech
- 32 Evaluation of HRTF Models for Binaural Cue Adaptation
- 33 Global vs. Local Federated Learning in Heterogeneous Acoustic Environments
- 34 In-the-wild Speech Emotion Conversion Using Disentangled Self-supervised Representations and Neural Vocoder-based Resynthesis
- 35 Low-complexity Real-time Single-channel Speech Enhancement Based on Skip-GRUs
- 36 Multi-speaker Text-to-speech Using ForwardTacotron with Improved Duration Prediction
- 37 On Feature Importance and Interpretability of Speaker Representations
- 38 Quantifying Harmonic Distortions in Audio Playback Systems
- 39 Stream-ETS: Low-latency End-to-end Speech Synthesis from Electromyography Signals
- 40 Analyzing and Improving Neural Speaker Embeddings for ASR
- 41 Distribution Mismatch Correction for Acoustic Scene Classification
- 42 Exploratory Evaluation of Speech Content Masking
- 43 Exploring Visualization Techniques for Interpretable Learning in Speech Enhancement Deep Neural Networks
- 44 Feedback-aware Design of an Occlusion Effect Reduction System Using an Earbud-mounted Vibration Sensor
- 45 Fuzzy-clustering-supported Assignment of Smart-speaker-based Microphone Arrays to Acoustic Sources in Reverberant Acoustic Environments
- 46 Investigating Disentanglement of Speaker Identity and Characteristics through User Experience
- 47 Language Recognition for SSB Modulated HF Radio Signals of Short Duration
- 48 Self-learning and Active-learning for Electromyography-to-speech Conversion
- 49 Target-speaker Voice Activity Detection in Multi-talker Scenarios: An Empirical Study
- 50 Uncertainty-driven Hybrid Fusion for Audio-visual Phoneme Recognition
- 51 On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings
- 52 Evaluation Metrics for Generative Speech Enhancement Methods: Issues and Perspectives
- 53 Improving the Naturalness of Synthesized Spectograms for TTS Using GAN-Based Post-processing
- 54 Audio-visual Speech Enhancement with Score-based Generative Models
- 55 A Maximum Entropy Information Bottleneck (MEIB) Regularization for Generative Speech Enhancement with HiFi-GAN
- Bleiben Sie informiert!
- ITG-Fachberichte im Angebot