ITG-Fb. 312: Speech Communication

15th ITG Conference, 20. – 22.09.2023 in Aachen, Germany

Diese Publikation zitieren

VDE ITG (Hg.), ITG-Fb. 312: Speech Communication (2023), VDE Verlag, Berlin, ISBN: 9783800761654

1
Accesses

Beschreibung / Abstract

The 15th ITG conference on Speech Communication solicits contributions on theory, algorithms, and applications in the following areas of speech, audio, and spoken language processing.

Topics:
- Speech Enhancement and Separation
- Source Localization and Tracking
- Detection and Classification of Acoustic
- Scenes and Events
- Automatic Speech and Speaker Recognition
- Spoken Dialogue, Diarization, and Spoken Document Retrieval Systems
- Speech Synthesis
- Speech Modeling, Coding, and Transmission
- Privacy in Speech Technologies
- Speech Production and Perception
- Speech and Audio Quality Assessment
- Paralinguistics, Speech Diagnostics and Speech-related Biosignals
- Speech in Automotive, Mobile, and Multimodal Applications
- Acoustic Interfaces, Assistive Devices, and Hearing Aids
- Hardware and Software Tools
- Emerging Topics and Applications

Inhaltsverzeichnis

  • ITG-Fachbericht 312: Speech Communication
  • Titelseite
  • Impressum
  • Scope
  • Technical Program Committee
  • Contents
  • Poster Session A
  • Poster Session B
  • Lecture Session: Best Paper Award Contest
  • Poster Session C
  • Poster Session D
  • Lecture Session: Generative Methods for Speech Processing
  • 01 Ad Hoc Distributed Microphones Clustering: A Comparative Analysis on Using Coherence and Signal-specific Features
  • 02 Exploiting an External Microphone to Improve Time-difference-of-arrival Estimates for Euclidean Distance Matrix-based Source Localization
  • 03 Hearing Impairment in Crowdsourced Speech Quality Assessments: Its Effect and Screening with Digit Triplet Hearing Test
  • 04 Long-term Conversation Analysis: Exploring Utility and Privacy
  • 05 Towards a Natural Reproduction of Binaural Recordings: Combining Binaural Cue Adaptation and Adaptive Crosstalk Cancellation
  • 06 Screening of Alzheimer%8Fs Dementia up to 12 Years ahead from Conversational Speech of ILSE Study
  • 07 Speaker%8Fs Articulatory Strategy Analysis: Theoretical Framework and Preliminary Experiment
  • 08 Speech-based Age and Gender Prediction with Transformers
  • 09 Transfer Learning using Musical/Non-musical Mixtures for Multi-instrument Recognition
  • 10 U-DiT TTS: U-Diffusion Vision Transformer for Text-to-speech
  • 11 Using Perceptual Evaluation of Speech Quality (PESQ) Loss for DNN-based Speech Enhancement
  • 12 Advances In End-to-end Conversational Speech Quality Prediction
  • 13 Comparative Study of LC3plus and Lyra Codec on DNN-based Source Localisation for Hearing Aids
  • 14 Comparison of Different Neural Network Architectures for Spoken Language Identification
  • 15 Exploring Shapely Values for Blood Glucose Level Prediction from Speech
  • 16 LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices
  • 17 Reduced-complexity Binaural Source Localization for Headphones and Hearing Aids using Low-rank DRTF Approximations
  • 18 Single Channel Source Separation in the Wild - Conversational Speech in Realistic Environments
  • 19 Subjective Performance Evaluation of Single-channel Speaker-conditioned Target Speaker Extraction Algorithms for Complex Acoustic Scenes
  • 20 Toward Semi-supervised Transcription of NAKO+ILSE: Influence of Automatic Speech Recognition Performance on Manual Transcription Effort
  • 21 Towards a Brain Computer Interface for Speech Perception
  • 22 Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
  • 23 Investigating Speaker Embedding Disentanglement on Natural Read Speech
  • 24 BRUDEX Database: Binaural Room Impulse Responses with Uniformly Distributed External Microphones
  • 25 Comparative Analysis of the wav2vec 2.0 Feature Extractor
  • 26 Design of Low-order IIR Filters Based on Hankel Nuclear Norm Regularization for Achieving Acoustic Transparency
  • 27 Fast Tracking of Time-variant Systems Using Local Affine Subspaces
  • 28 Generalized Wiener Filter for Nonlinear Acoustic Echo Control
  • 29 Compression of End-to-end Non-autoregressive Image-to-speech System for Low-resourced Devices
  • 30 CRNN-based Multi-DOA Estimator: Comparing Classification and Regression
  • 31 Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech
  • 32 Evaluation of HRTF Models for Binaural Cue Adaptation
  • 33 Global vs. Local Federated Learning in Heterogeneous Acoustic Environments
  • 34 In-the-wild Speech Emotion Conversion Using Disentangled Self-supervised Representations and Neural Vocoder-based Resynthesis
  • 35 Low-complexity Real-time Single-channel Speech Enhancement Based on Skip-GRUs
  • 36 Multi-speaker Text-to-speech Using ForwardTacotron with Improved Duration Prediction
  • 37 On Feature Importance and Interpretability of Speaker Representations
  • 38 Quantifying Harmonic Distortions in Audio Playback Systems
  • 39 Stream-ETS: Low-latency End-to-end Speech Synthesis from Electromyography Signals
  • 40 Analyzing and Improving Neural Speaker Embeddings for ASR
  • 41 Distribution Mismatch Correction for Acoustic Scene Classification
  • 42 Exploratory Evaluation of Speech Content Masking
  • 43 Exploring Visualization Techniques for Interpretable Learning in Speech Enhancement Deep Neural Networks
  • 44 Feedback-aware Design of an Occlusion Effect Reduction System Using an Earbud-mounted Vibration Sensor
  • 45 Fuzzy-clustering-supported Assignment of Smart-speaker-based Microphone Arrays to Acoustic Sources in Reverberant Acoustic Environments
  • 46 Investigating Disentanglement of Speaker Identity and Characteristics through User Experience
  • 47 Language Recognition for SSB Modulated HF Radio Signals of Short Duration
  • 48 Self-learning and Active-learning for Electromyography-to-speech Conversion
  • 49 Target-speaker Voice Activity Detection in Multi-talker Scenarios: An Empirical Study
  • 50 Uncertainty-driven Hybrid Fusion for Audio-visual Phoneme Recognition
  • 51 On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings
  • 52 Evaluation Metrics for Generative Speech Enhancement Methods: Issues and Perspectives
  • 53 Improving the Naturalness of Synthesized Spectograms for TTS Using GAN-Based Post-processing
  • 54 Audio-visual Speech Enhancement with Score-based Generative Models
  • 55 A Maximum Entropy Information Bottleneck (MEIB) Regularization for Generative Speech Enhancement with HiFi-GAN
  • Bleiben Sie informiert!
  • ITG-Fachberichte im Angebot

Mehr von dieser Serie

    Ähnliche Titel