ITG-Fb. 312: Speech Communication

ITG-Fb. 312: Speech Communication

15th ITG Conference, 20. – 22.09.2023 in Aachen, Germany

Produktinformationen

Herausgeber: VDE ITG
ISBN: 9783800761654
Serie: ITG-Fachberichte
Verlag: VDE Verlag
Erscheinungstermin: 2023-11-06
Erscheinungsjahr (elektronische Fassung): 2023
Auflage: Neuerscheinung
Seiten: 287
Paket: Elektrotechnik 2024 [4142]

P-ISBN: 9783800761647

Diese Publikation zitieren

VDE ITG (Hg.), ITG-Fb. 312: Speech Communication (2023), VDE Verlag, Berlin, ISBN: 9783800761654

1
Accesses

Beschreibung / Abstract

The 15th ITG conference on Speech Communication solicits contributions on theory, algorithms, and applications in the following areas of speech, audio, and spoken language processing.

Topics:
- Speech Enhancement and Separation
- Source Localization and Tracking
- Detection and Classification of Acoustic
- Scenes and Events
- Automatic Speech and Speaker Recognition
- Spoken Dialogue, Diarization, and Spoken Document Retrieval Systems
- Speech Synthesis
- Speech Modeling, Coding, and Transmission
- Privacy in Speech Technologies
- Speech Production and Perception
- Speech and Audio Quality Assessment
- Paralinguistics, Speech Diagnostics and Speech-related Biosignals
- Speech in Automotive, Mobile, and Multimodal Applications
- Acoustic Interfaces, Assistive Devices, and Hearing Aids
- Hardware and Software Tools
- Emerging Topics and Applications

Inhaltsverzeichnis

ITG-Fachbericht 312: Speech Communication
Titelseite
Impressum
Scope
Technical Program Committee
Contents
Poster Session A
Poster Session B
Lecture Session: Best Paper Award Contest
Poster Session C
Poster Session D
Lecture Session: Generative Methods for Speech Processing
01 Ad Hoc Distributed Microphones Clustering: A Comparative Analysis on Using Coherence and Signal-specific Features
02 Exploiting an External Microphone to Improve Time-difference-of-arrival Estimates for Euclidean Distance Matrix-based Source Localization
03 Hearing Impairment in Crowdsourced Speech Quality Assessments: Its Effect and Screening with Digit Triplet Hearing Test
04 Long-term Conversation Analysis: Exploring Utility and Privacy
05 Towards a Natural Reproduction of Binaural Recordings: Combining Binaural Cue Adaptation and Adaptive Crosstalk Cancellation
06 Screening of Alzheimer%8Fs Dementia up to 12 Years ahead from Conversational Speech of ILSE Study
07 Speaker%8Fs Articulatory Strategy Analysis: Theoretical Framework and Preliminary Experiment
08 Speech-based Age and Gender Prediction with Transformers
09 Transfer Learning using Musical/Non-musical Mixtures for Multi-instrument Recognition
10 U-DiT TTS: U-Diffusion Vision Transformer for Text-to-speech
11 Using Perceptual Evaluation of Speech Quality (PESQ) Loss for DNN-based Speech Enhancement
12 Advances In End-to-end Conversational Speech Quality Prediction
13 Comparative Study of LC3plus and Lyra Codec on DNN-based Source Localisation for Hearing Aids
14 Comparison of Different Neural Network Architectures for Spoken Language Identification
15 Exploring Shapely Values for Blood Glucose Level Prediction from Speech
16 LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices
17 Reduced-complexity Binaural Source Localization for Headphones and Hearing Aids using Low-rank DRTF Approximations
18 Single Channel Source Separation in the Wild - Conversational Speech in Realistic Environments
19 Subjective Performance Evaluation of Single-channel Speaker-conditioned Target Speaker Extraction Algorithms for Complex Acoustic Scenes
20 Toward Semi-supervised Transcription of NAKO+ILSE: Influence of Automatic Speech Recognition Performance on Manual Transcription Effort
21 Towards a Brain Computer Interface for Speech Perception
22 Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model
23 Investigating Speaker Embedding Disentanglement on Natural Read Speech
24 BRUDEX Database: Binaural Room Impulse Responses with Uniformly Distributed External Microphones
25 Comparative Analysis of the wav2vec 2.0 Feature Extractor
26 Design of Low-order IIR Filters Based on Hankel Nuclear Norm Regularization for Achieving Acoustic Transparency
27 Fast Tracking of Time-variant Systems Using Local Affine Subspaces
28 Generalized Wiener Filter for Nonlinear Acoustic Echo Control
29 Compression of End-to-end Non-autoregressive Image-to-speech System for Low-resourced Devices
30 CRNN-based Multi-DOA Estimator: Comparing Classification and Regression
31 Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech
32 Evaluation of HRTF Models for Binaural Cue Adaptation
33 Global vs. Local Federated Learning in Heterogeneous Acoustic Environments
34 In-the-wild Speech Emotion Conversion Using Disentangled Self-supervised Representations and Neural Vocoder-based Resynthesis
35 Low-complexity Real-time Single-channel Speech Enhancement Based on Skip-GRUs
36 Multi-speaker Text-to-speech Using ForwardTacotron with Improved Duration Prediction
37 On Feature Importance and Interpretability of Speaker Representations
38 Quantifying Harmonic Distortions in Audio Playback Systems
39 Stream-ETS: Low-latency End-to-end Speech Synthesis from Electromyography Signals
40 Analyzing and Improving Neural Speaker Embeddings for ASR
41 Distribution Mismatch Correction for Acoustic Scene Classification
42 Exploratory Evaluation of Speech Content Masking
43 Exploring Visualization Techniques for Interpretable Learning in Speech Enhancement Deep Neural Networks
44 Feedback-aware Design of an Occlusion Effect Reduction System Using an Earbud-mounted Vibration Sensor
45 Fuzzy-clustering-supported Assignment of Smart-speaker-based Microphone Arrays to Acoustic Sources in Reverberant Acoustic Environments
46 Investigating Disentanglement of Speaker Identity and Characteristics through User Experience
47 Language Recognition for SSB Modulated HF Radio Signals of Short Duration
48 Self-learning and Active-learning for Electromyography-to-speech Conversion
49 Target-speaker Voice Activity Detection in Multi-talker Scenarios: An Empirical Study
50 Uncertainty-driven Hybrid Fusion for Audio-visual Phoneme Recognition
51 On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings
52 Evaluation Metrics for Generative Speech Enhancement Methods: Issues and Perspectives
53 Improving the Naturalness of Synthesized Spectograms for TTS Using GAN-Based Post-processing
54 Audio-visual Speech Enhancement with Score-based Generative Models
55 A Maximum Entropy Information Bottleneck (MEIB) Regularization for Generative Speech Enhancement with HiFi-GAN
Bleiben Sie informiert!
ITG-Fachberichte im Angebot

ITG-Fb. 312: Speech Communication

15th ITG Conference, 20. – 22.09.2023 in Aachen, Germany

Produktinformationen

Diese Publikation zitieren

Beschreibung / Abstract

Inhaltsverzeichnis

Mehr von dieser Serie

Ähnliche Titel