ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Man-Machine Speech Communication: 17th National Conference, NCMMSC 2022, Hefei, China, December 15–18, 2022, Proceedings

دانلود کتاب ارتباط گفتار انسان و ماشین: هفدهمین کنفرانس ملی، NCMMSC 2022، هیفی، چین، 15 تا 18 دسامبر 2022، مجموعه مقالات

Man-Machine Speech Communication: 17th National Conference, NCMMSC 2022, Hefei, China, December 15–18, 2022, Proceedings

مشخصات کتاب

Man-Machine Speech Communication: 17th National Conference, NCMMSC 2022, Hefei, China, December 15–18, 2022, Proceedings

ویرایش:  
نویسندگان: , , ,   
سری: Communications in Computer and Information Science, 1765 
ISBN (شابک) : 9819924006, 9789819924004 
ناشر: Springer 
سال نشر: 2023 
تعداد صفحات: 341
[342] 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 27 Mb 

قیمت کتاب (تومان) : 36,000



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 4


در صورت تبدیل فایل کتاب Man-Machine Speech Communication: 17th National Conference, NCMMSC 2022, Hefei, China, December 15–18, 2022, Proceedings به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب ارتباط گفتار انسان و ماشین: هفدهمین کنفرانس ملی، NCMMSC 2022، هیفی، چین، 15 تا 18 دسامبر 2022، مجموعه مقالات نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب ارتباط گفتار انسان و ماشین: هفدهمین کنفرانس ملی، NCMMSC 2022، هیفی، چین، 15 تا 18 دسامبر 2022، مجموعه مقالات

این کتاب مجموعه مقالات داوری هفدهمین کنفرانس ملی ارتباطات گفتار انسان و ماشین، NCMMSC 2022، در چین، در دسامبر 2022 است. 21 مقاله کامل و 7 مقاله کوتاه موجود در این کتاب با دقت بررسی و از بین 108 مورد ارسالی انتخاب شدند. آنها در بخش های موضوعی به شرح زیر سازماندهی شدند: MCPN: شبکه ادراک متقاطع متعدد برای تشخیص احساسات در زمان واقعی در مکالمه. مجموعه داده سنتز


توضیحاتی درمورد کتاب به خارجی

This book constitutes the refereed proceedings of the 17th National Conference on Man–Machine Speech Communication, NCMMSC 2022, held in China, in December 2022. The 21 full papers and 7 short papers included in this book were carefully reviewed and selected from 108 submissions. They were organized in topical sections as follows: MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation.- Baby Cry Recognition Based on Acoustic Segment Model, MnTTS2 An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset.



فهرست مطالب

Preface
Organization
Contents
MCPN: A Multiple Cross-Perception Network for Real-Time Emotion Recognition in Conversation
	1 Introduction
	2 Related Work
		2.1 Emotion Recognition in Conversation
		2.2 Dynamical Influence Model
	3 Methodology
		3.1 Problem Definition
		3.2 Multimodal Utterance Feature Extraction
		3.3 CPP: Context Pre-perception Module
		3.4 MCP: Multiple Cross-Perception Module
		3.5 Emotion Triple-Recognition Process
		3.6 Loss Function
	4 Experimental Settings
		4.1 Datasets
		4.2 Implementation Details
		4.3 Baseline Methods
	5 Results and Analysis
		5.1 Overall Performance
		5.2 Variants of Various Modalities
		5.3 Effectiveness of State Interaction Interval
		5.4 Performance on Similar Emotion Classification
		5.5 Ablation Study
		5.6 Error Study
	6 Conclusion
	References
Baby Cry Recognition Based on Acoustic Segment Model
	1 Introduction
	2 Method
		2.1 Acoustic Segment Model
		2.2 Latent Semantic Analysis
		2.3 DNN Classifier
	3 Experiments and Analysis
		3.1 Database and Data Preprocessing
		3.2 Ablation Experiments
		3.3 Overall Comparison
		3.4 Results Analysis
	4 Conclusions
	References
A Multi-feature Sets Fusion Strategy with Similar Samples Removal for Snore Sound Classification
	1 Introduction
	2 Materials and Methods
		2.1 MPSSC Database
		2.2 Feature Extraction
		2.3 Classification Model
	3 Experimental Setups
	4 Results and Discussion
		4.1 Classification Results
		4.2 Limitations and Perspectives
	5 Conclusion
	References
Multi-hypergraph Neural Networks for Emotion Recognition in Multi-party Conversations
	1 Introduction
	2 Related Work
		2.1 Emotion Recognition in Conversations
		2.2 Hypergraph Neural Network
	3 Methodology
		3.1 Hypergraph Definition
		3.2 Problem Definition
		3.3 Model
		3.4 Classifier
	4 Experimental Setting
		4.1 Datasets
		4.2 Compared Methods
		4.3 Implementation Details
	5 Results and Discussions
		5.1 Overall Performance
		5.2 Ablation Study
		5.3 Effect of Depths of GNN and Window Sizes
		5.4 Error Analysis
	6 Conclusion
	References
Using Emoji as an Emotion Modality in Text-Based Depression Detection
	1 Introduction
	2 Emoji Extraction and Depression Detection
		2.1 Emotion and Semantic Features
		2.2 Depression Detection Model
	3 Experiments
	4 Results
		4.1 Depression Detection on Social Media Text
		4.2 Depression Detection on Dialogue Text
	5 Analysis
	6 Conclusion
	References
Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis
	1 Introduction
	2 Proposed Method
		2.1 Overview
		2.2 Source Module
		2.3 Resolution-Wise Conditional Filter Module
	3 Experiments
		3.1 Experimental Setup
		3.2 Comparison Among Neural Vocoders
		3.3 Ablation Studies
	4 Conclusion
	References
Semantic Enhancement Framework for Robust Speech Recognition
	1 Introduction
	2 Related Work
		2.1 Contextual Method
		2.2 Adaptive Method
	3 Method
		3.1 Hybrid CTC/Attention Architecture
		3.2 Pre-train Language Model
		3.3 Semantic Enhancement Framework
		3.4 Evaluation Metrics
	4 Experiment
		4.1 Dataset
		4.2 Configuration
		4.3 Impact of Losses
		4.4 Results
	5 Conclusions
	References
Achieving Timestamp Prediction While Recognizing with Non-autoregressive End-to-End ASR Model
	1 Introduction
	2 Related Works
	3 Preliminaries
		3.1 Continuous Integrate-and-Fire
		3.2 Paraformer
	4 Methods
		4.1 Scaled-CIF Training
		4.2 Weights Post-processing
		4.3 Evaluation Metrics
	5 Experiments and Results
		5.1 Datasets
		5.2 Experiment Setup
		5.3 Quality of Timestamp
		5.4 ASR Results
	6 Conclusion
	References
Predictive AutoEncoders Are Context-Aware Unsupervised Anomalous Sound Detectors
	1 Introduction
	2 Related Work
		2.1 Unsupervised Anomalous Sound Detection
		2.2 Transformer
	3 Proposed Method
		3.1 Self-attention Mechanism
		3.2 The Architecture of Predictive AutoEncoder
		3.3 Training Strategy
	4 Experiments and Results
		4.1 Experimental Setup
		4.2 Results
	5 Conclusion
	References
A Pipelined Framework with Serialized Output Training for Overlapping Speech Recognition
	1 Introduction
	2 Overview and the Proposed Methods
		2.1 Pipelined Model
		2.2 Serialized Output Training
		2.3 Proposed Method
	3 Experimental Settings
		3.1 DataSet
		3.2 Training and Evaluation Metric
		3.3 Model Settings
	4 Result Analysis
		4.1 Baseline Results
		4.2 Results of Proposed Method
	5 Conclusion
	References
Adversarial Training Based on Meta-Learning in Unseen Domains for Speaker Verification
	1 Introduction
	2 Overview of the Proposed Network
	3 Method
		3.1 Adversarial Training with Multi-task Learning
		3.2 Improved Episode-Level Balanced Sampling
		3.3 Domain-Invariant Attention Module
	4 Experiments and Analysis
		4.1 Experimental Settings
		4.2 Comparative Experiments
	5 Conclusion
	References
Multi-speaker Multi-style Speech Synthesis with Timbre and Style Disentanglement
	1 Introduction
	2 The Proposed Model
		2.1 The Network Structure of Proposed Network
		2.2 Utterance Level Feature Normalization
	3 Experimental Setup
	4 Experimental Results
		4.1 Subjective Evaluation
		4.2 Ablation Study of Utterance Level Feature Normalization
		4.3 Demonstration of the Proposed Model
		4.4 Style Transition Illustration
	5 Conclusions
	References
Multiple Confidence Gates for Joint Training of SE and ASR
	1 Introduction
	2 Our Method
		2.1 Multiple Confidence Gates Enhancement Module
		2.2 Automatic Speech Recognition
		2.3 Loss Function
	3 Experiments
		3.1 Dataset
		3.2 Training Setup and Baseline
		3.3 Experimental Results and Discussion
	4 Conclusions
	References
Detecting Escalation Level from Speech with Transfer Learning and Acoustic-Linguistic Information Fusion
	1 Introduction
	2 Related Works
		2.1 Conflict Escalation Detection
		2.2 Transfer Learning
		2.3 Textual Embeddings
	3 Datasets and Methods
		3.1 Datasets
		3.2 Methods
	4 Experimental Results
		4.1 Feature Configuration
		4.2 Model Setup
		4.3 Results
	5 Discussion
	6 Conclusions
	References
Pre-training Techniques for Improving Text-to-Speech Synthesis by Automatic Speech Recognition Based Data Enhancement
	1 Introduction
	2 TTS Model
		2.1 Text Analysis Module
		2.2 Acoustic Model
		2.3 Vocoder
	3 The Proposed Approach
		3.1 Frame-Wise Phoneme Classification
		3.2 Semi-supervised Pre-training
		3.3 AdaSpeech Fine-Tuning
	4 Experiments
		4.1 Single-Speaker Mandarin Task
		4.2 Multi-speaker Chinese Dialects Task
	5 Conclusions
	References
A Time-Frequency Attention Mechanism with Subsidiary Information for Effective Speech Emotion Recognition
	1 Introduction
	2 Overview of the Speech Emotion Recognition Architecture
	3 Methods
		3.1 TC Self-attention Module
		3.2 F Domain-Attention Module
	4 Experiments
		4.1 Dataset and Acoustic Features
		4.2 System Description
	5 Conclusion
	References
Interplay Between Prosody and Syntax-Semantics: Evidence from the Prosodic Features of Mandarin Tag Questions
	1 Introduction
	2 Method
		2.1 Participants
		2.2 Stimuli
		2.3 Procedure
		2.4 Acoustic Analysis
	3 Results
		3.1 Fluctuation Scale
		3.2 Duration Ratio
		3.3 Intensity Ratio
	4 Discussion
	References
Improving Fine-Grained Emotion Control and Transfer with Gated Emotion Representations in Speech Synthesis
	1 Introduction
	2 Methodology
		2.1 Fine-Grained Emotion Strengths from Ranking Function
		2.2 The Proposed Method
	3 Experiments
		3.1 The Model Architecture of Baseline Emotional TTS
		3.2 Tasks for Experiments and Models Setup
		3.3 Basic Setups
		3.4 Task-I: Evaluating the Proposed Method for Non-transferred Emotional Speech Synthesis
		3.5 Task-II: Evaluating the Proposed Method for Cross-Speaker Emotion Transfer
		3.6 Analysis of Manually Assigning Emotion Strengths for Both Task-I and Task-II
	4 Conclusions
	References
Violence Detection Through Fusing Visual Information to Auditory Scene
	1 Introduction
	2 Methods
		2.1 CNN-ConvLSTM Model
		2.2 Attention Module
		2.3 Audio-Visual Information Fusion
	3 Experiments and Results
		3.1 Datasets
		3.2 Audio Violence Detection
		3.3 Audio-Visual Violence Detection
	4 Conclusion
	References
Mongolian Text-to-Speech Challenge Under Low-Resource Scenario for NCMMSC2022
	1 Introduction
	2 Voices to Build
		2.1 Speech Dataset
		2.2 Task
	3 Participants
	4 Evaluations and Results
		4.1 Evaluation Materials
		4.2 Evaluation Metrics
		4.3 Results
	References
VC-AUG: Voice Conversion Based Data Augmentation for Text-Dependent Speaker Verification
	1 Introduction
	2 Related Works
		2.1 Speaker Verification System
		2.2 Voice Conversion System
	3 Methods
		3.1 Pre-training and Fine-tuning
		3.2 Data Augmentation Based on the VC System
		3.3 Data Augmentation Based on the TTS System
		3.4 Speaker Augmentation Based on Speed Perturbation
	4 Experimental Results
	5 Conclusion
	References
Transformer-Based Potential Emotional Relation Mining Network for Emotion Recognition in Conversation
	1 Introduction
	2 Related Work
		2.1 Emotion Recognition in Conversation
	3 Task Definition
	4 Proposed Method
		4.1 Utterance Feature Extraction
		4.2 Emotion Extraction Module
		4.3 PERformer Module
		4.4 Emotion Classifier
		4.5 Datasets
	5 Experiments
		5.1 Datasets
		5.2 Implementation Details
		5.3 Evaluation Metrics
		5.4 Comparing Methods and Metrics
		5.5 Compared with the State-of-the-art Method
		5.6 Ablation Study
		5.7 Analysis on Parameters
		5.8 Error Study
	6 Conclusion
	References
FastFoley: Non-autoregressive Foley Sound Generation Based on Visual Semantics
	1 Introduction
	2 Method
		2.1 Audio and Visual Features Extraction
		2.2 Acoustic Model
	3 Dataset
	4 Experiments and Results
		4.1 Implementation Details
		4.2 Experiments and Results
	5 Conclusion
	References
Structured Hierarchical Dialogue Policy with Graph Neural Networks
	1 Introduction
	2 Related Work
	3 Hierarchical Reinforcement Learning
	4 ComNet
		4.1 Composite Dialogue
		4.2 Graph Construction
		4.3 ComNet as Policy Network
	5 Experiments
		5.1 PyDial Benchmark
		5.2 Implementation
		5.3 Analysis
		5.4 Transferability
	6 Conclusion
	References
Deep Reinforcement Learning for On-line Dialogue State Tracking
	1 Introduction
	2 Related Work
	3 On-line DST via Interaction
		3.1 Input and Output
		3.2 Tracking Policy
		3.3 Reward Signal
	4 Implementation Detail
		4.1 Auxiliary Polynomial Tracker
		4.2 Tracking Agents
		4.3 DDPG for Tracking Policy
	5 Joint Training Process
	6 Experiments
		6.1 Dataset
		6.2 Systems
		6.3 DRL-based DST Evaluation
		6.4 Joint Training Evaluation
	7 Conclusion
	References
Dual Learning for Dialogue State Tracking
	1 Introduction
	2 Tracker and Dual Task
		2.1 Coarse-to-Fine State Tracker
		2.2 Dual Task
	3 Dual Learning for DST
	4 Experiments
		4.1 Dataset
		4.2 Training Details
		4.3 Baseline Methods
		4.4 Results
	5 Related Work
	6 Conclusion
	References
Automatic Stress Annotation and Prediction for Expressive Mandarin TTS*-12pt
	1 Introduction
	2 Methodology
		2.1 Proposed Method for Stress Detection
		2.2 Textual-Level Stress Prediction
		2.3 Modeling Stress in Acoustic Model
	3 Experiments
		3.1 Complete Stress-Controllable TTS System
		3.2 Experimental Results
	4 Conclusion
	References
MnTTS2: An Open-Source Multi-speaker Mongolian Text-to-Speech Synthesis Dataset*-12pt
	1 Introduction
	2 Related Work
	3 MnTTS2 Dataset
		3.1 MnTTS
		3.2 MnTTS2
	4 Speech Synthesis Experiments
		4.1 Experimental Setup
		4.2 Naturalness Evaluation
		4.3 Speaker Similarity Evaluation
	5 Challenges and Future Work
	6 Conclusion
	References
Author Index




نظرات کاربران