دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: S. R. Mahadeva Prasanna, Alexey Karpov, K. Samudravijaya, Shyam S. Agrawal سری: Lecture Notes in Computer Science, 13721 ISBN (شابک) : 3031209796, 9783031209796 ناشر: Springer سال نشر: 2022 تعداد صفحات: 736 [737] زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 65 Mb
در صورت ایرانی بودن نویسنده امکان دانلود وجود ندارد و مبلغ عودت داده خواهد شد
در صورت تبدیل فایل کتاب Speech and Computer: 24th International Conference, SPECOM 2022, Gurugram, India, November 14–16, 2022, Proceedings به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب گفتار و کامپیوتر: بیست و چهارمین کنفرانس بین المللی، SPECOM 2022، گوروگرام، هند، 14 تا 16 نوامبر 2022، مجموعه مقالات نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
51 مقاله کامل و 9 مقاله کوتاه ارائه شده در این جلد به دقت بررسی و از 99 مقاله ارسالی انتخاب شدند. این مقالات تحقیقات فعلی را در زمینه پردازش گفتار رایانه ای از جمله پردازش سیگنال صوتی، تشخیص خودکار گفتار، تشخیص گوینده، شبه زبانی محاسباتی، سنتز گفتار، زبان اشاره و پردازش چندوجهی، و منابع گفتار و زبان ارائه می دهند.
The 51 full and 9 short papers presented in this volume were carefully reviewed and selected from 99 submissions. The papers present current research in the area of computer speech processing including audio signal processing, automatic speech recognition, speaker recognition, computational paralinguistics, speech synthesis, sign language and multimodal processing, and speech and language resources.
SPECOM 2022 Preface Organization Contents Thematic Diversity of Everyday Russian Discourse: A Case Study Based on the ORD Corpus 1 Introduction 2 Data and Method 3 Thematic Diversity of the Test Sample: Hyper-themes and Micro-themes 4 Frequency of Themes in Russian Everyday Discourse 5 Relative Duration of Themes in Russian Everyday Discourse 6 Conclusion References Neural Embedding Extractors for Text-Independent Speaker Verification 1 Introduction 2 Hybrid Neural Network (HNN) Embeddings Extractor 2.1 2D-CNN-Based Feature Extraction Module 2.2 TDNN-LSTM-Based Frame-Level Network 2.3 Multi-level Global-Local Statistics Pooling 3 Proposed Neural Embedding Extractors 3.1 Multi-stream Hybrid Neural Network (MSHNN) Embeddings Extractor 3.2 The Ensemble Neural Embeddings Extractor 4 Experiments 4.1 CNCeleb Corpus and Evaluation Metrics 4.2 Frontend and Backend 4.3 Experimental Setup 4.4 Experimental Results on the CNCeleb Corpus 4.5 Experiments on VoxCeleb Corpus 5 Conclusion References Deep Speaker Embeddings Based Online Diarization 1 Introduction 2 Related Work 3 Experimental Setup 3.1 Speaker Encoder Networks 3.2 Training Dataset 3.3 Testing Datasets 4 Results and Discussion 4.1 Diarization Performance 4.2 Diarization in Verification Task 5 Conclusion References Overlapped Speech Detection Using AM-FM Based Time-Frequency Representations 1 Introduction 1.1 Motivation 2 Features for Overlapped Speech Detection 2.1 Instantaneous Frequency (IF) Spectrogram 2.2 TEO-Based Pyknogram 3 Feature Learning Using Fully-Convolutional Neural Network (F-CNN) 4 Experiments and Results 4.1 Dataset 4.2 Classification Performance 4.3 Effect of Segment Duration 4.4 Effect of Gender Combinations Present in Overlap Speech 5 Conclusion References Significance of Dimensionality Reduction in CNN-Based Vowel Classification from Imagined Speech Using Electroencephalogram Signals 1 Introduction 2 Description of the Imagined Speech Database 3 Convolutional Neural Networks Based Feature Extraction 4 Proposed Methodology for Vowel Classification from Imagined Speech 4.1 Variational Mode Decomposition Based EEG Signal Denoising 4.2 PCA Based Dimensionality Reduction 4.3 Linear Discriminant Analysis 5 Experimental Results 6 Summary and Conclusion 7 Data Availability and Conflict of Interest Statements from Authors References Study of Speech Recognition System Based on Transformer and Connectionist Temporal Classification Models for Low Resource Language 1 Introduction 2 Proposed Model 3 Experiment and Results 4 Discussion and Conclusion References An Initial Study on Birdsong Re-synthesis Using Neural Vocoders 1 Introduction 2 Selected Vocoders 3 Experimental Set-Up 3.1 Dataset 3.2 Objective Evaluation 3.3 Subjective Evaluation - Species Discrimination (ABX) 3.4 Subjective Evaluation - Bird-Related Cues (MOS) 4 Experimental Results 4.1 Objective Evaluation Results 4.2 Subjective Results: Species Discrimination (ABX) 4.3 Subjective Results: Bird-Related Cues (MOS) 5 Discussion 6 Conclusion References Speech Music Overlap Detection Using Spectral Peak Evolutions 1 Introduction 1.1 Related Work 1.2 Motivation 2 Proposed Work 2.1 Feature Computation 2.2 Classifier Design 3 Experiments and Results 3.1 Performance Analysis 3.2 Discussions 4 Conclusion References Influence of Accented Speech in Automatic Speech Recognition: A Case Study on Assamese L1 Speakers Speaking Code Switched Hindi-English 1 Introduction 2 Data Collection 2.1 Native Hindi-English Speech Data 2.2 Assamese Accented Hindi-English Speech Data 3 Speech-to-Text Setup 3.1 Normalization of Reference Transcription 4 Results and Discussion 5 Conclusion References ClusterVote: Automatic Summarization Dataset Construction with Document Clusters 1 Introduction 2 Related Work 2.1 News Summarization Datasets 2.2 Pseudo-summary Methods 3 Constructing Dataset with ClusterVote 3.1 Telegram Data Clustering Contest 2020 Dataset 3.2 ClusterVote Method 3.3 Dataset Statistics 4 Evaluation 4.1 Extractive Baselines 4.2 Abstractive Summarization Models 4.3 Setup 4.4 Summarization Metrics 4.5 Factuality Metrics 5 Conclusion References Comparing Unsupervised Detection Algorithms for Audio Adversarial Examples 1 Introduction 2 Related Work and Background 2.1 Automatic Speech Recognition 2.2 Adversarial Examples in General 2.3 Adversarial Examples in Audio 2.4 Audio Adversarial Defenses 3 Methodology 3.1 Target Model 3.2 Alarm Model 3.3 Training Process 4 Experiments 4.1 Datasets 4.2 Evaluation Methods 5 Results and Discussion 6 Conclusion References Celtic English Continuum in Pitch Patterns of Spontaneous Talk: Evidence of Long-Term Contacts 1 Background 2 Methods 2.1 Material 2.2 Methods of Analysis 3 Results 3.1 Tone Frequencies in the Speech of Adolescents from the Five Cities 3.2 The Acoustic Structure of Nuclear Tones 4 Discussion 5 Conclusion References Coherence Based Automatic Essay Scoring Using Sentence Embedding and Recurrent Neural Networks 1 Introduction 2 Related Work 3 Method 3.1 Data Set 3.2 Sentence Level Embedding 4 Model 4.1 Experiment Setup and Training 5 Result Analysis 5.1 Testing on Adversarial Responses 6 Conclusion References Analysis of Automatic Evaluation Metric on Low-Resourced Language: BERTScore vs BLEU Score 1 Introduction 2 Some Previous Work in MT Evaluation 3 Methodology and Experimentation 3.1 Manual Score (Human Judgment) 3.2 BERTScore 4 Result Analysis and Discussion 5 Conclusion and Future Work References DyCoDa: A Multi-modal Data Collection of Multi-user Remote Survival Game Recordings 1 Introduction 2 Related Work 3 Corpus Design 3.1 Participants and Privacy 3.2 Questionnaires 3.3 Procedure 3.4 Winter Survival Task Scenario 3.5 Recording Setup 4 Collected Data 5 Annotations 5.1 Main Annotations 5.2 Complement Annotations 6 Availability 7 Conclusion References On the Use of Ensemble X-Vector Embeddings for Improved Sleepiness Detection 1 Introduction 2 The Dusseldorf Sleepy Language Corpus 3 X-Vector Embeddings 3.1 DNN Architecture 4 Ensemble X-Vectors 4.1 Ensemble Learning 4.2 The Ensemble X-Vector Model 5 Experimental Setup 5.1 X-Vector Training 5.2 Regression and Evaluation 6 Experimental Results 6.1 Model Stochasticity 6.2 Ensemble X-Vectors 7 Conclusions and Discussion References Multiresolution Decomposition Analysis via Wavelet Transforms for Audio Deepfake Detection 1 Introduction 2 Related Work 3 End-to-End Spoof Detection Systems Based on Computer Vision Architectures 3.1 WaveletCNN Architecture 3.2 Adversarially Robust WaveletCNN 3.3 Median-filtering Harmonic Percussive Source Separation (HPSS) 3.4 Additive Margin Softmax Loss 4 Experiments and Results 4.1 ASVspoof 2019 Logical Access (LA) Dataset 4.2 Adversarially Robust WaveletCNN (ARWaveletCNN) 4.3 Performance Study of Computer Vision Models 5 Conclusion References Automatic Rhythm and Speech Rate Analysis of Mising Spontaneous Speech 1 Introduction 1.1 Previous Work 1.2 Motivation and Contribution 2 Database Preparation 2.1 Speakers 2.2 Materials 3 Methodology 4 Experiments and Results 4.1 Rhythm Metrics 4.2 Statistical Analysis 4.3 Automatic Language Identification Using Speech Rhythm Features and Speech Rate 5 Conclusion and Future Directions References An Electroglottographic Method for Assessing the Emotional State of the Speaker 1 Introduction 2 Materials and Methods 3 Results 3.1 EGG and Speech Features in Different Emotional States 3.2 Comparison of EGG Parameters of Male and Female Subjects 3.3 Statistical Data Analysis 4 Discussion 5 Conclusion References Significance of Distance on Pop Noise for Voice Liveness Detection 1 Introduction 2 Proposed Work 2.1 Motivation And Analysis For Morlet Wavelet 2.2 Proposed Algorithm 2.3 Distance-Based Analysis 3 Experimental Setup 3.1 Dataset Used 3.2 Phoneme-wise Categorization 4 Experimental Results 5 Summary and Conclusion References CRIM\'s Speech Recognition System for OpenASR21 Evaluation with Conformer and Voice Activity Detector Embeddings 1 Introduction 2 Dataset and Preprocessing 3 ASR Approach 3.1 Voice Activity Detectors 3.2 Acoustic Models 4 Language Model 5 Combining Multiple Decodes 6 Post Evaluation Improvements 7 Conclusion References Joint Changes in First and Second Formants of /a/, /i/, /u/ Vowels in Babble Noise - a New Statistical Approach 1 Introduction 2 Methods 3 Results 4 Discussion 5 Conclusion References Comparing NLP Solutions for the Disambiguation of French Heterophonic Homographs for End-to-End TTS Systems 1 Introduction 2 State of the Art 3 Dataset and Models 3.1 Our Baseline: End-to-End TTS Augmented with Phone Prediction 3.2 Part-of-Speech (POS) Tagging 3.3 Linear Discriminant Analysis of BERT Embeddings 4 Results 5 Comments 6 Conclusions and Perspectives A Appendices A.1 Example of Embeddings of Word Pairs (B-wrd) A.2 Example of Embeddings of Class Pairs (B-grp) References Detection of Speech Related Disorders by Pre-trained Embedding Models Extracted Biomarkers 1 Introduction 2 Methods 2.1 Embedding Models 2.2 Classification 2.3 Evaluation Metrics 3 Evaluation Datasets 4 Results 5 Discussion References Multi-label Dysfluency Classification 1 Introduction 2 Related Work 3 Method 3.1 Datasets 3.2 Transfer Learning 3.3 Input Features 3.4 Label Representation 3.5 Metrics 4 Results 5 Discussion 6 Conclusion References Harnessing Uncertainty - Multi-label Dysfluency Classification with Uncertain Labels 1 Introduction 2 Related Work 3 Method 3.1 Datasets 3.2 Transfer Learning 3.3 Input Features 3.4 Dealing with Uncertainty 3.5 Metrics 4 Results 5 Discussion 6 Conclusion References Continuous Wavelet Transform for Severity-Level Classification of Dysarthria 1 Introduction 2 Spectrogram and Scalogram 3 Proposed Work 3.1 Continuous Wavelet Transform (CWT) 3.2 Exploiting Morse Wavelet for CWT 4 Experimental Setup 4.1 Dataset Used 4.2 Feature Details 4.3 Classifier Details 4.4 Performance Evaluation 5 Experimental Results 5.1 Spectrographic Analysis 5.2 Performance Evaluation 5.3 Visualization of Various Features Using Linear Discriminant Analysis (LDA) 6 Summary and Conclusion References Significance of Energy Features for Severity Classification of Dysarthria 1 Introduction 2 TEO vs. SEO 2.1 Analysis of SEO and TEO Profile 2.2 SECC and TECC Feature Extraction 3 Experimental Setup 3.1 Dataset Used 3.2 Details of Feature Sets 3.3 Classifier Details 3.4 Performance Evaluation 4 Experimental Results 4.1 Performance Evaluation 4.2 Visualization of Various Features Using Linear Discriminant Analysis (LDA) 4.3 Latency Analysis 5 Summary and Conclusion References An Analytic Study on Clustering-Based Pseudo-labels for Self-supervised Deep Speaker Verification 1 Introduction 2 Self-supervised Speaker Embedding Extraction System 2.1 Speaker Embedding Network 2.2 Self-supervised Angular Additive Margin Softmax (AAMSoftmax) Objective 2.3 Clustering-Based Pseudo-label Generation 3 Experiments 3.1 Experimental Setup 3.2 Experimental Results 4 Conclusion References Investigation of Transfer Learning for End-to-End Russian Speech Recognition 1 Introduction 2 Related Work 3 End-to-End Speech Recognition Model with Transfer Learning 3.1 Architecture of the End-to-End Speech Recognition Model 3.2 Application of Transfer Learning at Model’s Training 4 Experiments 5 Conclusions and Future Work References Prosodic Features of Verbal Irony in Russian and French: Universal vs. Language-Specific 1 Introduction 2 Experiments with Original Stimuli 2.1 Material and Methods 2.2 Results 2.3 Interim Conclusions 3 Experiments with Modified Stimuli 3.1 Method 3.2 Results 3.3 Interim Conclusions 4 Discussion and Conclusion References Categorization of Threatening Speech Acts 1 Introduction 2 Methodology 3 Results 4 Conclusion References Assessment of Speech Quality During Speech Rehabilitation Based on the Solution of the Classification Problem 1 Introduction 2 Existing Approaches to Assessing Speech Quality 3 Speech Quality Assessment Based on the Classification Problem 4 Experiment 4.1 Dataset 4.2 Neural Network 4.3 All-User Training and Personalized Training 4.4 Obtaining Final Speech Quality Scores 5 Discussion 6 Conclusion References Multi-level Fusion of Fisher Vector Encoded BERT and Wav2vec 2.0 Embeddings for Native Language Identification 1 Introduction 2 Background and Related Work 2.1 Transformer-Based Linguistic Features 2.2 Transformer-Based Acoustic Features 2.3 Fisher Vector Encoding 2.4 Kernel Extreme Learning Machines 3 Proposed NLI Framework 3.1 Extracting Conventional Acoustic LLDs 3.2 Fusion Schemes 4 Experimental Results 4.1 ComParE 2016 Native Language Corpus 4.2 Comparative Experiments with Unimodal Features 4.3 Proposed Bimodal System and Ablation Studies 4.4 Effect of Design Choices on the Proposed Pipeline 4.5 Further Experiments 5 Conclusions and Future Work References Fake Speech Detection Using OpenSMILE Features 1 Introduction 2 OpenSMILE Features for Fake Speech Detection 2.1 OpenSMILE Features 3 Experimental Setup 3.1 Variabilities 3.2 Dataset Description 3.3 Feature Selection 3.4 System Description 4 Results 4.1 Train and Test on the Same Conditions 4.2 Session Variability 4.3 Gender Variability 4.4 Domain Variability 4.5 Synthesizer Variability 5 Discussion 6 Conclusion and Future Work References Nonverbal Constituents of Argumentative Discourse: Gesture and Prosody Interaction 1 Introduction 2 Research Material 3 Methodology 4 Results 4.1 Tones 4.2 Tones with Gestures 4.3 Tones and Gesture Types 5 Discussion 6 Conclusion References Classifying Mahout and Social Interactions of Asian Elephants Based on Trumpet Calls 1 Introduction 2 Database 2.1 Study Site and Subjects 2.2 Recording Context 2.3 Acoustic Data Collection and Categorization 3 Mahout and Social Interaction Classification 3.1 Pre-processing 3.2 Experimental Setup 4 Results and Discussion 5 Conclusion References Recognition of the Emotional State of Children with Down Syndrome by Video, Audio and Text Modalities: Human and Automatic 1 Introduction 2 Methods 2.1 Participants of the Study 2.2 Data Collection 2.3 Perceptual Study 2.4 Automatic Analysis of Facial Expression and Emotional Speech of Children with DS 3 Results 3.1 Perceptual Experiment 3.2 Automatic Analysis of Facial Expression 3.3 Automatic Analysis of Child Speech 4 Discussion 5 Conclusion References Fake Speech Detection Using Modulation Spectrogram 1 Introduction 2 Modulation Spectrogram and Motivation 2.1 Generation of Modulation Spectrogram 2.2 Motivation to Use Modulation Spectrogram for Fake Speech Detection 3 Experimental Setup 3.1 Variabilities 3.2 Dataset Description 3.3 System Description 3.4 Training Details 4 Results 4.1 Trained and Tested on the Same Condition 4.2 Session Variability 4.3 Speaker and Gender Variability 4.4 Domain Variability 4.5 Synthesizer Variability 5 Discussion 6 Conclusion and Future Work References Self-Configuring Genetic Programming Feature Generation in Affect Recognition Tasks 1 Introduction 2 Related Work 3 Method 3.1 Genetic Programming 3.2 WESAD Corpus Description 3.3 RECOLA Corpus Description 4 Results and Discussion 5 Conclusion References A Multi-modal Approach to Mining Intent from Code-Mixed Hindi-English Calls in the Hyperlocal-Delivery Domain 1 Introduction 2 Related Work 3 Overview of Data 3.1 Training Data for ASR 3.2 Training Data for Intent Classification 4 Developing the ASR System 4.1 Wav2vec2.0 4.2 Developing the ASR 4.3 Final Inferencing Flow for ASR 5 Intent Detection Through Transcripts 5.1 Text Embeddings 6 Results and Analysis 7 Conclusion and Future Work References Importance of Supra-Segmental Information and Self-Supervised Framework for Spoken Language Diarization Task 1 Introduction 2 Analysis of Bilingual Code-Switched Data 2.1 Data Imbalance 2.2 Acoustic Similarity 3 Proposed Self-Supervised Framework 3.1 Pre-training 3.2 Fine-Tuning 4 Experimental Setup and Results 4.1 Database Details 4.2 Experimental Setup 4.3 Wav2vec2 4.4 Performance Measure 4.5 Results and Discussion 5 Conclusion and Future Work References Low-Resource Emotional Speech Synthesis: Transfer Learning and Data Requirements 1 Introduction 2 Related Work 2.1 Data Requirements of Emotional TTS Systems 2.2 Adversarial Training in Text-to-Speech 2.3 Transfer Learning from Speaker Verification in Text-to-Speech 3 Methods 4 Training Data 4.1 Emotional Data Validation and Subset Choice 5 Model Hyper-Parameters 6 Evaluation Metrics 7 Experiments 7.1 Transfer Learning from Speaker Verification and Data Requirements of Emotional TTS 8 Conclusion References Fuzzy Classifier for Speech Assessment in Speech Rehabilitation 1 Introduction 2 Experiment 2.1 Data Description 2.2 Fuzzy Classifier 2.3 Rebalancing Data 3 Results 4 Conclusion References Analysis-By-Synthesis Modeling of Bengali Intonation 1 Introduction 2 Data Analysis Using Momel-INTSINT Algorithm and ProZed 2.1 Corpus 2.2 The Momel Algorithm 2.3 INTSINT Coding 2.4 ProZed 2.5 Application of Momel on Bengali Speech Data 3 Bengali Prosodic Structure 4 Analysis of Bengali Intonation Patterns 4.1 Accentual Phrase (AP) 4.2 Intermediate Phrase (Ip) 4.3 Intonation Phrase (IP) 4.4 Focus Tones 5 Conclusions References Neural Network Based Curve Fitting to Enhance the Intelligibility of Dysarthric Speech 1 Introduction 2 Related Work 3 Methodology 3.1 Learning the Transformation 3.2 Transformation and Synthesis 4 Database 5 Objective Evaluation of Model Performance 6 Conclusion References Personalizing Retrieval-Based Dialogue Agents 1 Introduction 2 Related Work 3 Methods 3.1 Models 3.2 Augmentation 4 Experiments 4.1 Datasets 4.2 Retrieval Models Results 4.3 Augmentation Results 5 Discussion and Conclusion References Forensic Identification of Foreign-Language Speakers by the Method of Structural-Melodic Analysis of Phonograms 1 Introduction 2 Method 2.1 Informative Prosodic Characteristics of Unprepared Speech, Used in the Structural-Melodic Analysis of Phonograms 2.2 Description Parameters for Melodic Contour Types/Subtypes 2.3 Principles of Phonogram Comparison by the Method of Structural-Melodic Analysis 3 Results 4 Conclusion References Logistics Translator. Concept Vision on Future Interlanguage Computer Assisted Translation 1 Introduction 2 Method 2.1 Brief Outlook on Modern Computer-Assisted Translation Programs (Main Tasks, Functions and Areas of Application) 2.2 Reason for Creation of a Computer-Assisted Translator for the “Logistics” Sublanguage 2.3 Logistics Translator – a Professional Program for Computer Assisted Translation of Sublanguages. Operation Principle and Main Functions 3 First Tests and Quality Evaluations. Practical Importance of the Conducted Research 4 Results 5 Conclusion References Analysis of Time-Averaged Feature Extraction Techniques on Infant Cry Classification 1 Introduction 2 Proposed Work 2.1 Mel Frequency Cepstral Coefficients (MFCC) 2.2 Linear Frequency Cepstral Coefficients (LFCC) 2.3 Cepstral Coefficients (CC) 2.4 Time Averaging of Features 3 Experimental Setup 3.1 Dataset Used 3.2 Classifier Parameters 3.3 Evaluation Metric and Procedure 4 Results and Analysis 4.1 Spectrographic Analysis 4.2 Performance Evaluation 5 Summary and Conclusion References Should We Believe Our Eyes or Our Ears? Processing Incongruent Audiovisual Stimuli by Russian Listeners 1 Introduction 2 Previous Experimental Studies of the Incongruent Audiovisual Stimuli Processing 3 Our Experiment 3.1 Goal 3.2 Stimuli 3.3 Procedure 3.4 Participants 3.5 The Principles of Data Analysis 3.6 Results: Schoolchildren vs. Adults 3.7 Results: Quantitative Analysis of Audiovisual Integration 3.8 Results: Qualitative Analysis of Audiovisual Integration 3.9 Results: The Influence of the Preferred Perceptual Modality 4 Discussion and Conclusions References Emotional Speech Recognition Based on Lip-Reading 1 Introduction 2 Related Work 3 Dataset 4 Methodology 5 Evaluation Experiments 6 Conclusions References Exploring the Use of Machine Learning for Resume Recommendations 1 Introduction 2 State of the Art 3 Methodology and Evaluation Criteria 3.1 Theory 3.2 Data Collection and Data Preparation 3.3 Provision of Recommendations 3.4 Quality Assessment 4 Models 4.1 Data Preprocessing Module 4.2 Career Recommender Module 5 Results 6 Conclusion References The Role of Pause in Interaction: A Case of Polylogue 1 Introduction 2 Methodology 3 Results 4 Conclusions References Dictionary with the Evaluation of Positivity/Negativity Degree of the Russian Words 1 Introduction 2 Dictionary Structure 3 Application of the Dictionaries 4 Discussion 5 Conclusion References Effects of Depth of Field on Focus Using a Virtual Reality Escape Room 1 Introduction 2 Related Work 2.1 Escape Rooms 2.2 Motion Sickness 2.3 Depth of Field Usage in Virtual Reality Applications 2.4 Questionnaire 2.5 Contribution 3 Methods 3.1 Game Development 3.2 Depth of Field Construction 4 Experiment 4.1 Alpha Testing 4.2 Questionnaire Construction 4.3 Beta Testing 5 Results 5.1 Effectiveness Improvement 5.2 Side Effects 6 Discussion 6.1 Limitation 6.2 Observations 6.3 Future Work 7 Conclusion References Dynamics of Frequency Characteristics of Visually Evoked Potentials of Electroencephalography During the Work with Brain-Computer Interfaces 1 Introduction 2 Problematics of the Classification 3 Application of Deep Machine Learning to Compress Informative Features of Machine Classification 4 Results of Machine Classification 5 Conclusion References Device Robust Acoustic Scene Classification Using Adaptive Noise Reduction and Convolutional Recurrent Attention Neural Network 1 Introduction 1.1 Device Distortion Analysis 2 Proposed Method 3 Experiments 3.1 Dataset Description 3.2 Signal Preprocessing 3.3 Feature Extraction 3.4 Neural Network Configuration 4 Results and Discussion 5 Conclusion References Comparison of Word Embeddings of Unaligned Audio and Text Data Using Persistent Homology 1 Introduction 2 Topological Data Analysis 2.1 Simplicial Complexes and Filtrations 2.2 Persistent Homology and Betti Numbers 2.3 Persistent Diagrams 3 Variational Autoencoders 4 Experiment Setup 4.1 TIMIT Dataset 4.2 VAE Model 4.3 Feature Extraction 5 Results and Discussion 6 Conclusion References Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022 1 Introduction 1.1 Our Contribution 2 Data Description and Baseline System 3 Acoustic Modeling 4 Combination of Models 5 Discussion References Author Index