ورود به حساب

نام کاربری گذرواژه

گذرواژه را فراموش کردید؟ کلیک کنید

حساب کاربری ندارید؟ ساخت حساب

ساخت حساب کاربری

نام نام کاربری ایمیل شماره موبایل گذرواژه

برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید


09117307688
09117179751

در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید

دسترسی نامحدود

برای کاربرانی که ثبت نام کرده اند

ضمانت بازگشت وجه

درصورت عدم همخوانی توضیحات با کتاب

پشتیبانی

از ساعت 7 صبح تا 10 شب

دانلود کتاب Speech and Computer: 24th International Conference, SPECOM 2022, Gurugram, India, November 14–16, 2022, Proceedings

دانلود کتاب گفتار و کامپیوتر: بیست و چهارمین کنفرانس بین المللی، SPECOM 2022، گوروگرام، هند، 14 تا 16 نوامبر 2022، مجموعه مقالات

Speech and Computer: 24th International Conference, SPECOM 2022, Gurugram, India, November 14–16, 2022, Proceedings

مشخصات کتاب

Speech and Computer: 24th International Conference, SPECOM 2022, Gurugram, India, November 14–16, 2022, Proceedings

ویرایش:  
نویسندگان: , , ,   
سری: Lecture Notes in Computer Science, 13721 
ISBN (شابک) : 3031209796, 9783031209796 
ناشر: Springer 
سال نشر: 2022 
تعداد صفحات: 736
[737] 
زبان: English 
فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) 
حجم فایل: 65 Mb 

قیمت کتاب (تومان) : 53,000

در صورت ایرانی بودن نویسنده امکان دانلود وجود ندارد و مبلغ عودت داده خواهد شد



ثبت امتیاز به این کتاب

میانگین امتیاز به این کتاب :
       تعداد امتیاز دهندگان : 9


در صورت تبدیل فایل کتاب Speech and Computer: 24th International Conference, SPECOM 2022, Gurugram, India, November 14–16, 2022, Proceedings به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.

توجه داشته باشید کتاب گفتار و کامپیوتر: بیست و چهارمین کنفرانس بین المللی، SPECOM 2022، گوروگرام، هند، 14 تا 16 نوامبر 2022، مجموعه مقالات نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.


توضیحاتی در مورد کتاب گفتار و کامپیوتر: بیست و چهارمین کنفرانس بین المللی، SPECOM 2022، گوروگرام، هند، 14 تا 16 نوامبر 2022، مجموعه مقالات

این کتاب مجموعه مقالات بیست و چهارمامین کنفرانس بین المللی گفتار و کامپیوتر، SPECOM 2022 است که به صورت یک رویداد ترکیبی در گوروگرام برگزار شد. ، هند، در نوامبر 2021.

51 مقاله کامل و 9 مقاله کوتاه ارائه شده در این جلد به دقت بررسی و از 99 مقاله ارسالی انتخاب شدند. این مقالات تحقیقات فعلی را در زمینه پردازش گفتار رایانه ای از جمله پردازش سیگنال صوتی، تشخیص خودکار گفتار، تشخیص گوینده، شبه زبانی محاسباتی، سنتز گفتار، زبان اشاره و پردازش چندوجهی، و منابع گفتار و زبان ارائه می دهند.


توضیحاتی درمورد کتاب به خارجی

This book constitutes the proceedings of the 24th International Conference on Speech and Computer, SPECOM 2022, held as a hybrid event in Gurugram, India, in November 2021.

The 51 full and 9 short papers presented in this volume were carefully reviewed and selected from 99 submissions. The papers present current research in the area of computer speech processing including audio signal processing, automatic speech recognition, speaker recognition, computational paralinguistics, speech synthesis, sign language and multimodal processing, and speech and language resources.



فهرست مطالب

SPECOM 2022 Preface
Organization
Contents
Thematic Diversity of Everyday Russian Discourse: A Case Study Based on the ORD Corpus
	1 Introduction
	2 Data and Method
	3 Thematic Diversity of the Test Sample: Hyper-themes and Micro-themes
	4 Frequency of Themes in Russian Everyday Discourse
	5 Relative Duration of Themes in Russian Everyday Discourse
	6 Conclusion
	References
Neural Embedding Extractors for Text-Independent Speaker Verification
	1 Introduction
	2 Hybrid Neural Network (HNN) Embeddings Extractor
		2.1 2D-CNN-Based Feature Extraction Module
		2.2 TDNN-LSTM-Based Frame-Level Network
		2.3 Multi-level Global-Local Statistics Pooling
	3 Proposed Neural Embedding Extractors
		3.1 Multi-stream Hybrid Neural Network (MSHNN) Embeddings Extractor
		3.2 The Ensemble Neural Embeddings Extractor
	4 Experiments
		4.1 CNCeleb Corpus and Evaluation Metrics
		4.2 Frontend and Backend
		4.3 Experimental Setup
		4.4 Experimental Results on the CNCeleb Corpus
		4.5 Experiments on VoxCeleb Corpus
	5 Conclusion
	References
Deep Speaker Embeddings Based Online Diarization
	1 Introduction
	2 Related Work
	3 Experimental Setup
		3.1 Speaker Encoder Networks
		3.2 Training Dataset
		3.3 Testing Datasets
	4 Results and Discussion
		4.1 Diarization Performance
		4.2 Diarization in Verification Task
	5 Conclusion
	References
Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations
	1 Introduction
		1.1 Motivation
	2 Features for Overlapped Speech Detection
		2.1 Instantaneous Frequency (IF) Spectrogram
		2.2 TEO-Based Pyknogram
	3 Feature Learning Using Fully-Convolutional Neural Network (F-CNN)
	4 Experiments and Results
		4.1 Dataset
		4.2 Classification Performance
		4.3 Effect of Segment Duration
		4.4 Effect of Gender Combinations Present in Overlap Speech
	5 Conclusion
	References
Significance of Dimensionality Reduction in CNN-Based Vowel Classification from Imagined Speech Using Electroencephalogram Signals
	1 Introduction
	2 Description of the Imagined Speech Database
	3 Convolutional Neural Networks Based Feature Extraction
	4 Proposed Methodology for Vowel Classification from Imagined Speech
		4.1 Variational Mode Decomposition Based EEG Signal Denoising
		4.2 PCA Based Dimensionality Reduction
		4.3 Linear Discriminant Analysis
	5 Experimental Results
	6 Summary and Conclusion
	7 Data Availability and Conflict of Interest Statements from Authors
	References
Study of Speech Recognition System Based on Transformer and Connectionist Temporal Classification Models for Low Resource Language
	1 Introduction
	2 Proposed Model
	3 Experiment and Results
	4 Discussion and Conclusion
	References
An Initial Study on Birdsong Re-synthesis Using Neural Vocoders
	1 Introduction
	2 Selected Vocoders
	3 Experimental Set-Up
		3.1 Dataset
		3.2 Objective Evaluation
		3.3 Subjective Evaluation - Species Discrimination (ABX)
		3.4 Subjective Evaluation - Bird-Related Cues (MOS)
	4 Experimental Results
		4.1 Objective Evaluation Results
		4.2 Subjective Results: Species Discrimination (ABX)
		4.3 Subjective Results: Bird-Related Cues (MOS)
	5 Discussion
	6 Conclusion
	References
Speech Music Overlap Detection Using Spectral Peak Evolutions
	1 Introduction
		1.1 Related Work
		1.2 Motivation
	2 Proposed Work
		2.1 Feature Computation
		2.2 Classifier Design
	3 Experiments and Results
		3.1 Performance Analysis
		3.2 Discussions
	4 Conclusion
	References
Influence of Accented Speech in Automatic Speech Recognition: A Case Study on Assamese L1 Speakers Speaking Code Switched Hindi-English
	1 Introduction
	2 Data Collection
		2.1 Native Hindi-English Speech Data
		2.2 Assamese Accented Hindi-English Speech Data
	3 Speech-to-Text Setup
		3.1 Normalization of Reference Transcription
	4 Results and Discussion
	5 Conclusion
	References
ClusterVote: Automatic Summarization Dataset Construction with Document Clusters
	1 Introduction
	2 Related Work
		2.1 News Summarization Datasets
		2.2 Pseudo-summary Methods
	3 Constructing Dataset with ClusterVote
		3.1 Telegram Data Clustering Contest 2020 Dataset
		3.2 ClusterVote Method
		3.3 Dataset Statistics
	4 Evaluation
		4.1 Extractive Baselines
		4.2 Abstractive Summarization Models
		4.3 Setup
		4.4 Summarization Metrics
		4.5 Factuality Metrics
	5 Conclusion
	References
Comparing Unsupervised Detection Algorithms for Audio Adversarial Examples
	1 Introduction
	2 Related Work and Background
		2.1 Automatic Speech Recognition
		2.2 Adversarial Examples in General
		2.3 Adversarial Examples in Audio
		2.4 Audio Adversarial Defenses
	3 Methodology
		3.1 Target Model
		3.2 Alarm Model
		3.3 Training Process
	4 Experiments
		4.1 Datasets
		4.2 Evaluation Methods
	5 Results and Discussion
	6 Conclusion
	References
Celtic English Continuum in Pitch Patterns of Spontaneous Talk: Evidence of Long-Term Contacts
	1 Background
	2 Methods
		2.1 Material
		2.2 Methods of Analysis
	3 Results
		3.1 Tone Frequencies in the Speech of Adolescents from the Five Cities
		3.2 The Acoustic Structure of Nuclear Tones
	4 Discussion
	5 Conclusion
	References
Coherence Based Automatic Essay Scoring Using Sentence Embedding and Recurrent Neural Networks
	1 Introduction
	2 Related Work
	3 Method
		3.1 Data Set
		3.2 Sentence Level Embedding
	4 Model
		4.1 Experiment Setup and Training
	5 Result Analysis
		5.1 Testing on Adversarial Responses
	6 Conclusion
	References
Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score
	1 Introduction
	2 Some Previous Work in MT Evaluation
	3 Methodology and Experimentation
		3.1 Manual Score (Human Judgment)
		3.2 BERTScore
	4 Result Analysis and Discussion
	5 Conclusion and Future Work
	References
DyCoDa: A Multi-modal Data Collection of Multi-user Remote Survival Game Recordings
	1 Introduction
	2 Related Work
	3 Corpus Design
		3.1 Participants and Privacy
		3.2 Questionnaires
		3.3 Procedure
		3.4 Winter Survival Task Scenario
		3.5 Recording Setup
	4 Collected Data
	5 Annotations
		5.1 Main Annotations
		5.2 Complement Annotations
	6 Availability
	7 Conclusion
	References
On the Use of Ensemble X-Vector Embeddings for Improved Sleepiness Detection
	1 Introduction
	2 The Dusseldorf Sleepy Language Corpus
	3 X-Vector Embeddings
		3.1 DNN Architecture
	4 Ensemble X-Vectors
		4.1 Ensemble Learning
		4.2 The Ensemble X-Vector Model
	5 Experimental Setup
		5.1 X-Vector Training
		5.2 Regression and Evaluation
	6 Experimental Results
		6.1 Model Stochasticity
		6.2 Ensemble X-Vectors
	7 Conclusions and Discussion
	References
Multiresolution Decomposition Analysis via Wavelet Transforms for Audio Deepfake Detection
	1 Introduction
	2 Related Work
	3 End-to-End Spoof Detection Systems Based on Computer Vision Architectures
		3.1 WaveletCNN Architecture
		3.2 Adversarially Robust WaveletCNN
		3.3 Median-filtering Harmonic Percussive Source Separation (HPSS)
		3.4 Additive Margin Softmax Loss
	4 Experiments and Results
		4.1 ASVspoof 2019 Logical Access (LA) Dataset
		4.2 Adversarially Robust WaveletCNN (ARWaveletCNN)
		4.3 Performance Study of Computer Vision Models
	5 Conclusion
	References
Automatic Rhythm and Speech Rate Analysis of Mising Spontaneous Speech
	1 Introduction
		1.1 Previous Work
		1.2 Motivation and Contribution
	2 Database Preparation
		2.1 Speakers
		2.2 Materials
	3 Methodology
	4 Experiments and Results
		4.1 Rhythm Metrics
		4.2 Statistical Analysis
		4.3 Automatic Language Identification Using Speech Rhythm Features and Speech Rate
	5 Conclusion and Future Directions
	References
An Electroglottographic Method for Assessing the Emotional State of the Speaker
	1 Introduction
	2 Materials and Methods
	3 Results
		3.1 EGG and Speech Features in Different Emotional States
		3.2 Comparison of EGG Parameters of Male and Female Subjects
		3.3 Statistical Data Analysis
	4 Discussion
	5 Conclusion
	References
Significance of Distance on Pop Noise for Voice Liveness Detection
	1 Introduction
	2 Proposed Work
		2.1 Motivation And Analysis For Morlet Wavelet
		2.2 Proposed Algorithm
		2.3 Distance-Based Analysis
	3 Experimental Setup
		3.1 Dataset Used
		3.2 Phoneme-wise Categorization
	4 Experimental Results
	5 Summary and Conclusion
	References
CRIM\'s Speech Recognition System for OpenASR21 Evaluation with Conformer and Voice Activity Detector Embeddings
	1 Introduction
	2 Dataset and Preprocessing
	3 ASR Approach
		3.1 Voice Activity Detectors
		3.2 Acoustic Models
	4 Language Model
	5 Combining Multiple Decodes
	6 Post Evaluation Improvements
	7 Conclusion
	References
Joint Changes in First and Second Formants of /a/, /i/, /u/ Vowels in Babble Noise - a New Statistical Approach
	1 Introduction
	2 Methods
	3 Results
	4 Discussion
	5 Conclusion
	References
Comparing NLP Solutions for the Disambiguation of French Heterophonic Homographs for End-to-End TTS Systems
	1 Introduction
	2 State of the Art
	3 Dataset and Models
		3.1 Our Baseline: End-to-End TTS Augmented with Phone Prediction
		3.2 Part-of-Speech (POS) Tagging
		3.3 Linear Discriminant Analysis of BERT Embeddings
	4 Results
	5 Comments
	6 Conclusions and Perspectives
	A Appendices
		A.1 Example of Embeddings of Word Pairs (B-wrd)
		A.2 Example of Embeddings of Class Pairs (B-grp)
	References
Detection of Speech Related Disorders by Pre-trained Embedding Models Extracted Biomarkers
	1 Introduction
	2 Methods
		2.1 Embedding Models
		2.2 Classification
		2.3 Evaluation Metrics
	3 Evaluation Datasets
	4 Results
	5 Discussion
	References
Multi-label Dysfluency Classification
	1 Introduction
	2 Related Work
	3 Method
		3.1 Datasets
		3.2 Transfer Learning
		3.3 Input Features
		3.4 Label Representation
		3.5 Metrics
	4 Results
	5 Discussion
	6 Conclusion
	References
Harnessing Uncertainty - Multi-label Dysfluency Classification with Uncertain Labels
	1 Introduction
	2 Related Work
	3 Method
		3.1 Datasets
		3.2 Transfer Learning
		3.3 Input Features
		3.4 Dealing with Uncertainty
		3.5 Metrics
	4 Results
	5 Discussion
	6 Conclusion
	References
Continuous Wavelet Transform for Severity-Level Classification of Dysarthria
	1 Introduction
	2 Spectrogram and Scalogram
	3 Proposed Work
		3.1 Continuous Wavelet Transform (CWT)
		3.2 Exploiting Morse Wavelet for CWT
	4 Experimental Setup
		4.1 Dataset Used
		4.2 Feature Details
		4.3 Classifier Details
		4.4 Performance Evaluation
	5 Experimental Results
		5.1 Spectrographic Analysis
		5.2 Performance Evaluation
		5.3 Visualization of Various Features Using Linear Discriminant Analysis (LDA)
	6 Summary and Conclusion
	References
Significance of Energy Features for Severity Classification of Dysarthria
	1 Introduction
	2 TEO vs. SEO
		2.1 Analysis of SEO and TEO Profile
		2.2 SECC and TECC Feature Extraction
	3 Experimental Setup
		3.1 Dataset Used
		3.2 Details of Feature Sets
		3.3 Classifier Details
		3.4 Performance Evaluation
	4 Experimental Results
		4.1 Performance Evaluation
		4.2 Visualization of Various Features Using Linear Discriminant Analysis (LDA)
		4.3 Latency Analysis
	5 Summary and Conclusion
	References
An Analytic Study on Clustering-Based Pseudo-labels for Self-supervised Deep Speaker Verification
	1 Introduction
	2 Self-supervised Speaker Embedding Extraction System
		2.1 Speaker Embedding Network
		2.2 Self-supervised Angular Additive Margin Softmax (AAMSoftmax) Objective
		2.3 Clustering-Based Pseudo-label Generation
	3 Experiments
		3.1 Experimental Setup
		3.2 Experimental Results
	4 Conclusion
	References
Investigation of Transfer Learning for End-to-End Russian Speech Recognition
	1 Introduction
	2 Related Work
	3 End-to-End Speech Recognition Model with Transfer Learning
		3.1 Architecture of the End-to-End Speech Recognition Model
		3.2 Application of Transfer Learning at Model’s Training
	4 Experiments
	5 Conclusions and Future Work
	References
Prosodic Features of Verbal Irony in Russian and French: Universal vs. Language-Specific
	1 Introduction
	2 Experiments with Original Stimuli
		2.1 Material and Methods
		2.2 Results
		2.3 Interim Conclusions
	3 Experiments with Modified Stimuli
		3.1 Method
		3.2 Results
		3.3 Interim Conclusions
	4 Discussion and Conclusion
	References
Categorization of Threatening Speech Acts
	1 Introduction
	2 Methodology
	3 Results
	4 Conclusion
	References
Assessment of Speech Quality During Speech Rehabilitation Based on the Solution of the Classification Problem
	1 Introduction
	2 Existing Approaches to Assessing Speech Quality
	3 Speech Quality Assessment Based on the Classification Problem
	4 Experiment
		4.1 Dataset
		4.2 Neural Network
		4.3 All-User Training and Personalized Training
		4.4 Obtaining Final Speech Quality Scores
	5 Discussion
	6 Conclusion
	References
Multi-level Fusion of Fisher Vector Encoded BERT and Wav2vec 2.0 Embeddings for Native Language Identification
	1 Introduction
	2 Background and Related Work
		2.1 Transformer-Based Linguistic Features
		2.2 Transformer-Based Acoustic Features
		2.3 Fisher Vector Encoding
		2.4 Kernel Extreme Learning Machines
	3 Proposed NLI Framework
		3.1 Extracting Conventional Acoustic LLDs
		3.2 Fusion Schemes
	4 Experimental Results
		4.1 ComParE 2016 Native Language Corpus
		4.2 Comparative Experiments with Unimodal Features
		4.3 Proposed Bimodal System and Ablation Studies
		4.4 Effect of Design Choices on the Proposed Pipeline
		4.5 Further Experiments
	5 Conclusions and Future Work
	References
Fake Speech Detection Using OpenSMILE Features
	1 Introduction
	2 OpenSMILE Features for Fake Speech Detection
		2.1 OpenSMILE Features
	3 Experimental Setup
		3.1 Variabilities
		3.2 Dataset Description
		3.3 Feature Selection
		3.4 System Description
	4 Results
		4.1 Train and Test on the Same Conditions
		4.2 Session Variability
		4.3 Gender Variability
		4.4 Domain Variability
		4.5 Synthesizer Variability
	5 Discussion
	6 Conclusion and Future Work
	References
Nonverbal Constituents of Argumentative Discourse: Gesture and Prosody Interaction
	1 Introduction
	2 Research Material
	3 Methodology
	4 Results
		4.1 Tones
		4.2 Tones with Gestures
		4.3 Tones and Gesture Types
	5 Discussion
	6 Conclusion
	References
Classifying Mahout and Social Interactions of Asian Elephants Based on Trumpet Calls
	1 Introduction
	2 Database
		2.1 Study Site and Subjects
		2.2 Recording Context
		2.3 Acoustic Data Collection and Categorization
	3 Mahout and Social Interaction Classification
		3.1 Pre-processing
		3.2 Experimental Setup
	4 Results and Discussion
	5 Conclusion
	References
Recognition of the Emotional State of Children with Down Syndrome by Video, Audio and Text Modalities: Human and Automatic
	1 Introduction
	2 Methods
		2.1 Participants of the Study
		2.2 Data Collection
		2.3 Perceptual Study
		2.4 Automatic Analysis of Facial Expression and Emotional Speech of Children with DS
	3 Results
		3.1 Perceptual Experiment
		3.2 Automatic Analysis of Facial Expression
		3.3 Automatic Analysis of Child Speech
	4 Discussion
	5 Conclusion
	References
Fake Speech Detection Using Modulation Spectrogram
	1 Introduction
	2 Modulation Spectrogram and Motivation
		2.1 Generation of Modulation Spectrogram
		2.2 Motivation to Use Modulation Spectrogram for Fake Speech Detection
	3 Experimental Setup
		3.1 Variabilities
		3.2 Dataset Description
		3.3 System Description
		3.4 Training Details
	4 Results
		4.1 Trained and Tested on the Same Condition
		4.2 Session Variability
		4.3 Speaker and Gender Variability
		4.4 Domain Variability
		4.5 Synthesizer Variability
	5 Discussion
	6 Conclusion and Future Work
	References
Self-Configuring Genetic Programming Feature Generation in Affect Recognition Tasks
	1 Introduction
	2 Related Work
	3 Method
		3.1 Genetic Programming
		3.2 WESAD Corpus Description
		3.3 RECOLA Corpus Description
	4 Results and Discussion
	5 Conclusion
	References
A Multi-modal Approach to Mining Intent from Code-Mixed Hindi-English Calls in the Hyperlocal-Delivery Domain
	1 Introduction
	2 Related Work
	3 Overview of Data
		3.1 Training Data for ASR
		3.2 Training Data for Intent Classification
	4 Developing the ASR System
		4.1 Wav2vec2.0
		4.2 Developing the ASR
		4.3 Final Inferencing Flow for ASR
	5 Intent Detection Through Transcripts
		5.1 Text Embeddings
	6 Results and Analysis
	7 Conclusion and Future Work
	References
Importance of Supra-Segmental Information and Self-Supervised Framework for Spoken Language Diarization Task
	1 Introduction
	2 Analysis of Bilingual Code-Switched Data
		2.1 Data Imbalance
		2.2 Acoustic Similarity
	3 Proposed Self-Supervised Framework
		3.1 Pre-training
		3.2 Fine-Tuning
	4 Experimental Setup and Results
		4.1 Database Details
		4.2 Experimental Setup
		4.3 Wav2vec2
		4.4 Performance Measure
		4.5 Results and Discussion
	5 Conclusion and Future Work
	References
Low-Resource Emotional Speech Synthesis: Transfer Learning and Data Requirements
	1 Introduction
	2 Related Work
		2.1 Data Requirements of Emotional TTS Systems
		2.2 Adversarial Training in Text-to-Speech
		2.3 Transfer Learning from Speaker Verification in Text-to-Speech
	3 Methods
	4 Training Data
		4.1 Emotional Data Validation and Subset Choice
	5 Model Hyper-Parameters
	6 Evaluation Metrics
	7 Experiments
		7.1 Transfer Learning from Speaker Verification and Data Requirements of Emotional TTS
	8 Conclusion
	References
Fuzzy Classifier for Speech Assessment in Speech Rehabilitation
	1 Introduction
	2 Experiment
		2.1 Data Description
		2.2 Fuzzy Classifier
		2.3 Rebalancing Data
	3 Results
	4 Conclusion
	References
Analysis-By-Synthesis Modeling of Bengali Intonation
	1 Introduction
	2 Data Analysis Using Momel-INTSINT Algorithm and ProZed
		2.1 Corpus
		2.2 The Momel Algorithm
		2.3 INTSINT Coding
		2.4 ProZed
		2.5 Application of Momel on Bengali Speech Data
	3 Bengali Prosodic Structure
	4 Analysis of Bengali Intonation Patterns
		4.1 Accentual Phrase (AP)
		4.2 Intermediate Phrase (Ip)
		4.3 Intonation Phrase (IP)
		4.4 Focus Tones
	5 Conclusions
	References
Neural Network Based Curve Fitting to Enhance the Intelligibility of Dysarthric Speech
	1 Introduction
	2 Related Work
	3 Methodology
		3.1 Learning the Transformation
		3.2 Transformation and Synthesis
	4 Database
	5 Objective Evaluation of Model Performance
	6 Conclusion
	References
Personalizing Retrieval-Based Dialogue Agents
	1 Introduction
	2 Related Work
	3 Methods
		3.1 Models
		3.2 Augmentation
	4 Experiments
		4.1 Datasets
		4.2 Retrieval Models Results
		4.3 Augmentation Results
	5 Discussion and Conclusion
	References
Forensic Identification of Foreign-Language Speakers by the Method of Structural-Melodic Analysis of Phonograms
	1 Introduction
	2 Method
		2.1 Informative Prosodic Characteristics of Unprepared Speech, Used in the Structural-Melodic Analysis of Phonograms
		2.2 Description Parameters for Melodic Contour Types/Subtypes
		2.3 Principles of Phonogram Comparison by the Method of Structural-Melodic Analysis
	3 Results
	4 Conclusion
	References
Logistics Translator. Concept Vision on Future Interlanguage Computer Assisted Translation
	1 Introduction
	2 Method
		2.1 Brief Outlook on Modern Computer-Assisted Translation Programs (Main Tasks, Functions and Areas of Application)
		2.2 Reason for Creation of a Computer-Assisted Translator for the “Logistics” Sublanguage
		2.3 Logistics Translator – a Professional Program for Computer Assisted Translation of Sublanguages. Operation Principle and Main Functions
	3 First Tests and Quality Evaluations. Practical Importance of the Conducted Research
	4 Results
	5 Conclusion
	References
Analysis of Time-Averaged Feature Extraction Techniques on Infant Cry Classification
	1 Introduction
	2 Proposed Work
		2.1 Mel Frequency Cepstral Coefficients (MFCC)
		2.2 Linear Frequency Cepstral Coefficients (LFCC)
		2.3 Cepstral Coefficients (CC)
		2.4 Time Averaging of Features
	3 Experimental Setup
		3.1 Dataset Used
		3.2 Classifier Parameters
		3.3 Evaluation Metric and Procedure
	4 Results and Analysis
		4.1 Spectrographic Analysis
		4.2 Performance Evaluation
	5 Summary and Conclusion
	References
Should We Believe Our Eyes or Our Ears? Processing Incongruent Audiovisual Stimuli by Russian Listeners
	1 Introduction
	2 Previous Experimental Studies of the Incongruent Audiovisual Stimuli Processing
	3 Our Experiment
		3.1 Goal
		3.2 Stimuli
		3.3 Procedure
		3.4 Participants
		3.5 The Principles of Data Analysis
		3.6 Results: Schoolchildren vs. Adults
		3.7 Results: Quantitative Analysis of Audiovisual Integration
		3.8 Results: Qualitative Analysis of Audiovisual Integration
		3.9 Results: The Influence of the Preferred Perceptual Modality
	4 Discussion and Conclusions
	References
Emotional Speech Recognition Based on Lip-Reading
	1 Introduction
	2 Related Work
	3 Dataset
	4 Methodology
	5 Evaluation Experiments
	6 Conclusions
	References
Exploring the Use of Machine Learning for Resume Recommendations
	1 Introduction
	2 State of the Art
	3 Methodology and Evaluation Criteria
		3.1 Theory
		3.2 Data Collection and Data Preparation
		3.3 Provision of Recommendations
		3.4 Quality Assessment
	4 Models
		4.1 Data Preprocessing Module
		4.2 Career Recommender Module
	5 Results
	6 Conclusion
	References
The Role of Pause in Interaction: A Case of Polylogue
	1 Introduction
	2 Methodology
	3 Results
	4 Conclusions
	References
Dictionary with the Evaluation of Positivity/Negativity Degree of the Russian Words
	1 Introduction
	2 Dictionary Structure
	3 Application of the Dictionaries
	4 Discussion
	5 Conclusion
	References
Effects of Depth of Field on Focus Using a Virtual Reality Escape Room
	1 Introduction
	2 Related Work
		2.1 Escape Rooms
		2.2 Motion Sickness
		2.3 Depth of Field Usage in Virtual Reality Applications
		2.4 Questionnaire
		2.5 Contribution
	3 Methods
		3.1 Game Development
		3.2 Depth of Field Construction
	4 Experiment
		4.1 Alpha Testing
		4.2 Questionnaire Construction
		4.3 Beta Testing
	5 Results
		5.1 Effectiveness Improvement
		5.2 Side Effects
	6 Discussion
		6.1 Limitation
		6.2 Observations
		6.3 Future Work
	7 Conclusion
	References
Dynamics of Frequency Characteristics of Visually Evoked Potentials of Electroencephalography During the Work with Brain-Computer Interfaces
	1 Introduction
	2 Problematics of the Classification
	3 Application of Deep Machine Learning to Compress Informative Features of Machine Classification
	4 Results of Machine Classification
	5 Conclusion
	References
Device Robust Acoustic Scene Classification Using Adaptive Noise Reduction and Convolutional Recurrent Attention Neural Network
	1 Introduction
		1.1 Device Distortion Analysis
	2 Proposed Method
	3 Experiments
		3.1 Dataset Description
		3.2 Signal Preprocessing
		3.3 Feature Extraction
		3.4 Neural Network Configuration
	4 Results and Discussion
	5 Conclusion
	References
Comparison of Word Embeddings of Unaligned Audio and Text Data Using Persistent Homology
	1 Introduction
	2 Topological Data Analysis
		2.1 Simplicial Complexes and Filtrations
		2.2 Persistent Homology and Betti Numbers
		2.3 Persistent Diagrams
	3 Variational Autoencoders
	4 Experiment Setup
		4.1 TIMIT Dataset
		4.2 VAE Model
		4.3 Feature Extraction
	5 Results and Discussion
	6 Conclusion
	References
Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022
	1 Introduction
		1.1 Our Contribution
	2 Data Description and Baseline System
	3 Acoustic Modeling
	4 Combination of Models
	5 Discussion
	References
Author Index




نظرات کاربران