دسترسی نامحدود
برای کاربرانی که ثبت نام کرده اند
برای ارتباط با ما می توانید از طریق شماره موبایل زیر از طریق تماس و پیامک با ما در ارتباط باشید
در صورت عدم پاسخ گویی از طریق پیامک با پشتیبان در ارتباط باشید
برای کاربرانی که ثبت نام کرده اند
درصورت عدم همخوانی توضیحات با کتاب
از ساعت 7 صبح تا 10 شب
ویرایش: نویسندگان: Shiqi Yu, Zhaoxiang Zhang, Pong C. Yuen, Junwei Han, Tieniu Tan, Yike Guo, Jianhuang Lai, Jianguo Zhang سری: LNCS, volume 13536 ISBN (شابک) : 9783031189128, 9783031189135 ناشر: Springer سال نشر: 2022 تعداد صفحات: 775 [789] زبان: English فرمت فایل : PDF (درصورت درخواست کاربر به PDF، EPUB یا AZW3 تبدیل می شود) حجم فایل: 110 Mb
در صورت تبدیل فایل کتاب Pattern Recognition and Computer Vision 5th Chinese Conference, PRCV 2022 Shenzhen, China, November 4–7, 2022 Proceedings, Part III به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب تشخیص الگو و دید کامپیوتری پنجمین کنفرانس چینی، PRCV 2022 شنژن، چین، 4 تا 7 نوامبر 2022 مجموعه مقالات، قسمت سوم نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
مجموعه 4 جلدی LNCS 13534، 13535، 13536 و 13537، مجموعه مقالات داوری پنجمین کنفرانس چینی تشخیص الگو و بینش کامپیوتری، PRCV 2022، که در شنژن، چین، در نوامبر 2022 برگزار شد، تشکیل شده است. و از بین 564 ارسالی انتخاب شد. مقالات در بخشهای موضوعی زیر سازماندهی شدهاند: نظریهها و استخراج ویژگی. یادگیری ماشینی، چند رسانه ای و چندوجهی؛ بهینه سازی و شبکه عصبی و یادگیری عمیق. پردازش و تجزیه و تحلیل تصویر زیست پزشکی؛ طبقه بندی الگوها و خوشه بندی. بینایی و بازسازی کامپیوتر سه بعدی، ربات ها و رانندگی خودکار؛ تشخیص، سنجش از راه دور؛ تحلیل و درک چشم انداز؛ پردازش تصویر و دید سطح پایین. تشخیص اشیاء، بخش بندی و ردیابی.
The 4-volume set LNCS 13534, 13535, 13536 and 13537 constitutes the refereed proceedings of the 5th Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2022, held in Shenzhen, China, in November 2022. The 233 full papers presented were carefully reviewed and selected from 564 submissions. The papers have been organized in the following topical sections: Theories and Feature Extraction; Machine learning, Multimedia and Multimodal; Optimization and Neural Network and Deep Learning; Biomedical Image Processing and Analysis; Pattern Classification and Clustering; 3D Computer Vision and Reconstruction, Robots and Autonomous Driving; Recognition, Remote Sensing; Vision Analysis and Understanding; Image Processing and Low-level Vision; Object Detection, Segmentation and Tracking.
Preface Organization Contents – Part III 3D Computer Vision and Reconstruction, Robots and Autonomous Driving Locally Geometry-Aware Improvements of LOP for Efficient Skeleton Extraction 1 Introduction 2 Related Work 3 The Improved LOP 3.1 Overview 3.2 Bilateral Filter Based Weighting 3.3 Adaptive Radius 4 Experimental Results 5 Conclusions References Spherical Transformer: Adapting Spherical Signal to Convolutional Networks 1 Introduction 2 Related Work 3 The Proposed Approach 3.1 Spherical Sampling 3.2 Spherical Transformer Module 3.3 Network Architecture 4 Experiments 4.1 Spherical MNIST 4.2 3D Object Classification 4.3 Spherical Image Semantic Segmentation 5 Conclusion References Waterfall-Net: Waterfall Feature Aggregation for Point Cloud Semantic Segmentation 1 Introduction 2 Related Work 3 Waterfall-Net 3.1 Cascaded Sub-Networks Encoder 3.2 Learn to Upsample 4 Experiments 4.1 Analysis of Waterfall-Net Architecture 4.2 Results and Visualization 5 Conclusion References Sparse LiDAR and Binocular Stereo Fusion Network for 3D Object Detection 1 Introduction 2 Related Work 2.1 LiDAR-Based 3D Object Detection 2.2 Monocular-Based 3D Object Detection 2.3 Stereo-Based 3D Object Detection 2.4 Multi-modal 3D Object Detection 3 Proposed Method 3.1 Feature Extraction 3.2 Attention Fusion 3.3 3D Object Information Regression Prediction 3.4 Implementation Details 4 Experiments 4.1 KITTI Dataset 4.2 Evaluation Metrics 4.3 Main Results 4.4 Ablation Study 5 Conclusion References Full Head Performance Capture Using Multi-scale Mesh Propagation 1 Introduction 2 Related Work 3 Template Fitting Based Dynamic Full Head Performance Capture 3.1 Per-frame Multi-view Scan Reconstruction 3.2 Template Warping 3.3 Multi-scale Mesh Propagation 3.4 Ear Reconstruction 4 Experimental Results 5 Conclusion References Learning Cross-Domain Features for Domain Generalization on Point Clouds 1 Introduction 2 Related Work 2.1 Deep Learning on Point Cloud 2.2 Unsupervised Domain Adaptation 2.3 Domain Generalization 3 Network Architecture 3.1 Point Set Mask 3.2 Cross-Domain Mixup 3.3 Hierarchical Feature Alignment 3.4 Overall Loss 4 Experiments and Results 4.1 Dataset 4.2 Comparative Methods 4.3 Implementation Details 4.4 Results 4.5 Ablation Study 5 Conclusion References Unsupervised Pre-training for 3D Object Detection with Transformer 1 Introduction 2 Related Work 2.1 Object Detection on Point Clouds 2.2 Unsupervised Representation Learning on Point Clouds 3 UP3DETR 3.1 Pre-training 3.2 Fine-tuning 4 Experiments 4.1 ScanNetV2 Object Detection 4.2 SUN RGB-D Object Detection 4.3 Ablations 5 Conclusion References Global Patch Cross-Attention for Point Cloud Analysis 1 Introduction 2 Related Work 2.1 Multi-view Based and Voxelized Methods 2.2 Point Based Method 2.3 Attention Based Method 3 Method 3.1 Overview 3.2 Global Patch Construction 3.3 Local-Global Feature Aggregation 4 Experiment 4.1 Point Cloud Analysis 4.2 Analysis of GPCAN 5 Conclusion References EEP-Net: Enhancing Local Neighborhood Features and Efficient Semantic Segmentation of Scale Point Clouds 1 Introduction 2 Related Work 2.1 Projection-Based Methods 2.2 Discretization-Based Methods 2.3 Point-Based Methods 3 EEP-Net 3.1 Architecture of EEP Module 3.2 Global Feature (GF) 3.3 Architecture of EEP-Net 4 Experiments 4.1 Evalution on S3DIS Dataset 4.2 Ablation Study 5 Conclusion References CARR-Net: Leveraging on Subtle Variance of Neighbors for Point Cloud Semantic Segmentation 1 Introduction 2 Related Work 3 Method 3.1 CARR Module 3.2 CARRs 3.3 Overall Architecture 4 Experiments 4.1 Set up 4.2 Evaluation on S3DIS Dataset 4.3 Evaluation on SemanticKITTI Dataset 4.4 Ablation Study 5 Conclusion References 3D Meteorological Radar Data Visualization with Point Cloud Completion and Poisson Surface Reconstruction 1 Introduction 2 Method Description 2.1 Data Format Conversion 2.2 Echo Models and Completion Algorithms 2.3 Drawing 3D Surface Using Bilateral-Filtering Poisson Surface Reconstruction Algorithm (BPSR) 3 Experiments and Analysis 3.1 Comparison of Meteorological Radar Data Completion 3.2 Comparison of Point Cloud Completion Experiments 4 Conclusion References JVLDLoc: A Joint Optimization of Visual-LiDAR Constraints and Direction Priors for Localization in Driving Scenario 1 Introduction 2 Related Work 2.1 Visual-LiDAR SLAM 2.2 Using Vanishing Points as Direction Constraint 3 Notation 4 Method 4.1 Tracking 4.2 Local Mapping 4.3 Global Mapping Using Direction Priors 5 Experiments 5.1 Improvements over Prior Map 5.2 Effects of Direction Priors 5.3 Comparison to Other Methods on KITTI Odometry Dataset 5.4 Ablation Study 6 Conclusion References A Single-Pathway Biomimetic Model for Potential Collision Prediction 1 Introduction 2 Related work 3 Proposed Method 3.1 Problem Formulation 3.2 The Single-Pathway LGMD2 3.3 Collision Prediction Criteria 4 Experiments and Analyses 4.1 Datasets and Competing Methods 4.2 Parameter Setting 4.3 Evaluation Metrics 4.4 Experimental Results 5 Conclusions References PilotAttnNet: Multi-modal Attention Network for End-to-End Steering Control 1 Introduction 2 Related Work 2.1 Driving Model 2.2 Driving Dataset 3 Method 3.1 Spatial Information Encoding 3.2 The End-to-End Attentional Driving Model 4 Experiments 4.1 Dataset Description 4.2 Evaluation Metrics 4.3 Results 5 Conclusion References Stochastic Navigation Command Matching for Imitation Learning of a Driving Policy 1 Introduction 2 Related Works 3 Method 3.1 Problem Formulation 3.2 Backbone Network 3.3 Navigation Command 3.4 Multi-branch Architecture 3.5 Stochastic Navigation Command Matching 3.6 Training 4 Experiments 4.1 Experiment Setting 4.2 Quantitative Comparison 4.3 Qualitative Comparison 4.4 Visualization Results 5 Conclusions References Recognition, Remote Sensing Group Activity Representation Learning with Self-supervised Predictive Coding 1 Introduction 2 Related Work 3 Approach 3.1 Spatial Graph Transformer Encoder 3.2 Temporal Causal Transformer Decoder 3.3 Joint Learning Measure 4 Experiments 4.1 Datasets 4.2 Implementation Details 4.3 Ablations on Volleyball 4.4 Comparison with State-of-the-Art 5 Conclusion References Skeleton-Based Action Quality Assessment via Partially Connected LSTM with Triplet Losses 1 Introduction 2 Related Work 2.1 Action Quality Assessment 2.2 Graph-Based Methods 3 Methods 3.1 Joints Graph and Activation Matrix 3.2 Partially Connected Layer 3.3 Partially Connected LSTM 3.4 Triplet Loss 4 Experiments 4.1 Evaluation Datasets and Settings 4.2 Data Preprocess 4.3 Experimental Results and Analysis 4.4 Complexity Analysis 5 Conclusion References Hierarchical Long-Short Transformer for Group Activity Recognition 1 Introduction 2 Related Work 2.1 Group Activity Recognition 2.2 Transformer 3 Methodology 3.1 Overview of HLSTrans 3.2 Long-Short Transformer Block 3.3 Hierarchical Structure 3.4 Position Bias 4 Experiments 4.1 Datasets 4.2 Implementation 4.3 Comparison to Others 5 Conclusion References GNN-Based Structural Dynamics Simulation for Modular Buildings 1 Introduction 2 Methodology 2.1 Graph Representation Method 2.2 GNN Model 3 Numerical Studies 3.1 Numerical Examples of Three Spring-Mass Systems 3.2 Training and Prediction Results 4 Conclusion References Semantic-Augmented Local Decision Aggregation Network for Action Recognition 1 Introduction 2 Proposed Approach 2.1 LDNet 2.2 Semantic Information Module 2.3 Combining Semantic Information Module with LDNet 3 Experiments 3.1 Datasets and Implementation Details 3.2 Ablation Study 4 Conclusions References Consensus-Guided Keyword Targeting for Video Captioning 1 Introduction 2 Related Work 2.1 Video Captioning 2.2 Video Captioning Datasets 3 Method 3.1 Encoder-Decoder Framework 3.2 Consensus-Guided Loss 3.3 Keyword Targeting Loss 3.4 Consensus-Guided Keyword Targeting Captioning Model 4 Experiments 4.1 Datasets and Metrics. 4.2 Implementation Details 4.3 Quantitative Results 4.4 Qualitative Results 5 Conclusion References Handwritten Mathematical Expression Recognition via GCAttention-Based Encoder and Bidirectional Mutual Learning Transformer 1 Introduction 2 Related Work 2.1 Image-to-Markup 2.2 CNN 2.3 Global Contextual Attention 2.4 Transformer 2.5 Mutual Learning 3 Methodology 3.1 Encoder 3.2 Decoder 3.3 Positional Encoding 4 Experiments 4.1 Datasets 4.2 Comparison with Prior Works 4.3 Ablation Study 4.4 The Program with GUI 5 Conclusion References Semi- and Self-supervised Learning for Scene Text Recognition with Fewer Labels 1 Introduction 2 Background and Related Work 2.1 Scene Text Recognition 2.2 Datasets 2.3 Self-supervised Learning 3 Method 3.1 Architecture 3.2 Data Augmentation 3.3 Loss Function 3.4 Pseudo-labeling 4 Experiments 4.1 Performance on Real Scene Datasets 4.2 Performance on Synthetic Datasets 4.3 Comparison with State-of-the-Art Models 5 Conclusion References TMCR: A Twin Matching Networks for Chinese Scene Text Retrieval 1 Introduction 2 Related Work 3 Methods 3.1 Detection Module 3.2 Recognition Module 3.3 Similarity Module 3.4 Loss and Training 4 Experiments 4.1 Dataset and Implementation Details 4.2 Comparisons with State-of-the-Art 4.3 Ablation Study 4.4 Model Generalization 5 Conclusion References Thai Scene Text Recognition with Character Combination 1 Introduction 2 Methodology 2.1 Recognition Architecture 2.2 Thai Character Combination 3 Experimental Setting 3.1 Thai STR Datasets 3.2 Data Preparing 3.3 Model Configurations and Training 3.4 Evaluation Metric 4 Experimental Results and Analyses 4.1 Experimental Results 4.2 The Effectiveness of TCC 4.3 Failure Cases Analysis 5 Conclusion References Automatic Examination Paper Scores Calculation and Grades Analysis Based on OpenCV 1 Introduction 2 Methodology 2.1 Image Acquisition 2.2 Image Processing 2.3 Data Processing 3 Experimental Results and Discussions 4 Conclusions References Efficient License Plate Recognition via Parallel Position-Aware Attention 1 Introduction 2 Related Work 2.1 License Plate Datasets 2.2 License Plate Recognition 3 Methods 3.1 Feature Encoder 3.2 Parallel Position-Aware Attention 3.3 Character Decoder 3.4 Loss Function 4 Data Synthesis 5 Experiments 5.1 DataSets 5.2 Experiment Settings 5.3 Experimental Results 6 Conclusions References Semantic-Aware Non-local Network for Handwritten Mathematical Expression Recognition 1 Introduction 2 Related Works 2.1 Grammar-Based HMER 2.2 Encoder-Decoder Based HMER 3 Methodology 3.1 Non-local Neural Networks 3.2 FastText Language Model 4 Experiments 4.1 Datasets 4.2 Metrics 4.3 Results 5 Conclusion References Math Word Problem Generation with Memory Retrieval 1 Introduction 2 Related Work 2.1 Math Word Problem Generation 2.2 Memory Retrieval for Text Generation 3 Problem Setup 3.1 MWPG 3.2 Low-Resource MWPG 4 Proposed Approach 4.1 Overview 4.2 Retrieval Module 4.3 Generation Module 4.4 Training 5 Experiments 5.1 Datasets 5.2 Metrics 5.3 Implementation Details 5.4 Baselines 5.5 Quantitative Results 5.6 Qualitative Results 6 Conclusions References Traditional Mongolian Script Standard Compliance Testing Based on Deep Residual Network and Spatial Pyramid Pooling 1 Introduction 2 Related Work 3 Model Architecture 3.1 Convolutional Layers with Residual Learning 3.2 The Spatial Pyramid Pooling Layer 4 Experiment and Analysis 4.1 Data and Experimental Environment 4.2 Evaluation Metrics 4.3 Results 5 Conclusion References FOV Recognizer: Telling the Field of View of Movie Shots 1 Introduction 2 Related Works 2.1 Human Detection 2.2 Field of View Recognition Method 3 Movie Field of View Dataset(MFOVD) 4 Field of View Recognition Method 5 Experiments 5.1 Recognition on Movie Field of View Dataset(MFOVD) 5.2 Recognition on a Full Movie 6 Conclusion References Multi-level Temporal Relation Graph for Continuous Sign Language Recognition 1 Introduction 2 Related Work 2.1 Sign Language Recognition 2.2 Video Contexts Modeling 2.3 Graph Convolutional Network 3 Our Approach 3.1 Visual Model 3.2 Multi-level Temporal Relation Graph 3.3 Alignment Model 4 Results and Discussion 4.1 Dataset 4.2 Experimental Setup 4.3 Comparison with SOTA Methods 4.4 Model Validity Experiment 5 Conclusions References Beyond Vision: A Semantic Reasoning Enhanced Model for Gesture Recognition with Improved Spatiotemporal Capacity 1 Introduction 2 Related Work 2.1 Temporal Information Model 2.2 Attention Mechanism 2.3 Semantic Information Model 3 Method 3.1 The Overview of the Network 3.2 Long and Short-term Temporal Shift Module (LS-TSM) 3.3 Spatial Attention Module 3.4 Label Relation Module 4 Experiment 4.1 Datasets 4.2 Implementation Details 4.3 Comparision with the State of the Art 4.4 Ablation Study 5 Conclusion References SemanticGAN: Facial Image Editing with Semantic to Realize Consistency 1 Introduction 2 Related Works 2.1 Generative Adversarial Networks 2.2 Facial Image Editing 3 Proposed Method 3.1 Preliminary 3.2 Attribute-Related Fine Editing 3.3 Attribute-Independent Optimization 4 Experiments 4.1 Implementation Details 4.2 Attribute Face Editing 4.3 Editing with SemanticGAN 4.4 Ablation Studies 5 Conclusion References Least-Squares Estimation of Keypoint Coordinate for Human Pose Estimation 1 Introduction 2 Proposed Method 2.1 Encoding 2.2 Decoding 3 Experiments 3.1 Datasets and Evaluation 3.2 Comparison with Other Methods 3.3 Ablation Study 4 Conclusion References Joint Pixel-Level and Feature-Level Unsupervised Domain Adaptation for Surveillance Face Recognition 1 Introduction 2 Related Work 2.1 Deep Face Recognition 2.2 Unsupervised Domain Adaptation 3 Methodology 3.1 Training of Feature Extractor 3.2 Training of Domain Classifier 3.3 Training of Style Transformer 4 Experiment 4.1 Datasets 4.2 Details of Training 4.3 Ablation Experiment 4.4 Quantity Comparison 4.5 Comparison 5 Conclusion References Category-Oriented Adversarial Data Augmentation via Statistic Similarity for Satellite Images 1 Introduction 2 Related Works 2.1 Data Augmentation 2.2 Appearance Properties 3 Proposed Method 3.1 Problem Definition and Basic Solutions 3.2 Statistic Similarity Evaluation 3.3 Adversarial Generation Between Similar Categories 3.4 Task of Object Detection 4 Experimental Results 5 Conclusion References A Multi-scale Convolutional Neural Network Based on Multilevel Wavelet Decomposition for Hyperspectral Image Classification 1 Introduction 2 Related Works 2.1 2D Discrete Wavelet Transform 2.2 DenseNet 3 Proposed Framework 4 Experimental Results and Discussion 4.1 HIS Datasets 4.2 Experimental Setting 4.3 Results and Discussion 5 Conclusion References High Spatial Resolution Remote Sensing Imagery Classification Based on Markov Random Field Model Integrating Granularity and Semantic Features 1 Introduction 2 Background on MRF-Based Methods 2.1 MRF Model with Different Granularities 2.2 MRF Model with Multilayer 3 Proposed Method 3.1 MRF Model 3.2 Proposed MRF-MM Model 4 Experimental Results 4.1 Data 4.2 Classification Experiment 4.3 Test of the MRF-MM Model Parameters 5 Conclusion References Feature Difference Enhancement Fusion for Remote Sensing Image Change Detection 1 Introduction 2 Related Work 2.1 Traditional Change Detection Methods 2.2 Deep Learning Based Change Detection Methods 3 Method 3.1 Overall Structure of Proposed CD Architecture 3.2 Difference Enhancement Fusion Module (DEFM) 4 Experiments 4.1 Experimental Setup 4.2 Experimental Results 4.3 Ablation Studies 5 Conclusion References WAFormer: Ship Detection in SAR Images Based on Window-Aware Swin-Transformer 1 Introduction 2 Related Works 2.1 SAR Target Detection Based on Deep Learning 2.2 Vision Transformer 3 Method 3.1 Motivation 3.2 Overview 3.3 Variable Size Window Self-attention 3.4 WAFormer Block 4 Experiments 4.1 Dataset and Evaluation Metrics 4.2 Implementation Details 4.3 Comparison Results 4.4 Related Configuration Adjustment 5 Conclusion References EllipseIoU: A General Metric for Aerial Object Detection 1 Introduction 2 Related Work 2.1 Aerial Object Detection 2.2 IoU-Based Metrics 3 EllipseIoU Loss 3.1 EllipseIoU 3.2 EllipseIoU Loss 3.3 Discussion on Several IoU-Based Metrics 4 Experimental Results 4.1 Datasets 4.2 Results on DOTA and HRSC2016 Dataset 5 Conclusion References Transmission Tower Detection Algorithm Based on Feature-Enhanced Convolutional Network in Remote Sensing Image 1 Introduction 2 Dataset Production 2.1 Data Collection 2.2 Dataset Production 2.3 Dataset Expansion 2.4 YOLOv3 Algorithm Principle 2.5 Algorithms in This Paper 3 Experiments and Results Analysis 3.1 Experimental Setup 3.2 Analysis of Results 4 Conclusion References Vision Analysis and Understanding Mining Diverse Clues with Transformers for Person Re-identification 1 Introduction 2 Related Work 3 Proposed Method 3.1 Vision Transformer as Feature Extractor 3.2 Person ReID with Transformers 3.3 MDCTNet Architecture 4 Experiments 4.1 Datasets 4.2 Implementation Details 4.3 Ablation Study 4.4 Comparison with State-of-the-Arts 5 Conclusion References Mutual Learning Inspired Prediction Network for Video Anomaly Detection 1 Introduction 2 Mutual Learning Inspired Prediction Network 2.1 Framework 2.2 Boundary Perception-Based Mimicry Loss 2.3 Self-supervised Weighted Loss 2.4 Objective Function 2.5 Anomaly Detection on Testing Data 3 Experiment 3.1 Dataset 3.2 Evaluation Metrics 3.3 Comparison with Existing Methods 3.4 Running time 4 Conclusion References Weakly Supervised Video Anomaly Detection with Temporal and Abnormal Information 1 Introduction 2 Related Work 2.1 Unsupervised Video Anomaly Detection 2.2 Weakly-Supervised Video Anomaly Detection 2.3 Multiple Instance Learning 2.4 Pair-Based Loss in Deep Metric Learning 3 Approach 3.1 Temporal Strengthen Network 3.2 Multi-positive Sample MIL 3.3 Anomaly Samples' N-Pair Loss 4 Experiment 4.1 Datasets and Metrics 4.2 Implementation Details 4.3 Results on ShanghaiTech 4.4 Results on UCF-Crime 4.5 Ablation Studies 4.6 Qualitative Analyse 5 Conclusion References Towards Class Interpretable Vision Transformer with Multi-Class-Tokens 1 Introduction 2 Related Work 2.1 Vision Transformer 2.2 Heatmap-Based Visual Interpretability 3 Proposed Method 3.1 Overview of Proposed Approach 3.2 Multi-Class-Tokens and Cross Attention 3.3 Non-parametric Scoring Function 3.4 Heatmap Based Per-class Interpretability 4 Experiments 4.1 Datasets 4.2 Implementation Details 4.3 Comparison Results 4.4 Ablation Study 5 Conclusion References Multimodal Violent Video Recognition Based on Mutual Distillation 1 Introduction 2 Related Work 2.1 Violent Video Recognition 2.2 Self-supervised Learning 2.3 Knowledge Distillation 3 Methods 3.1 Mutual Distillation for Violent RGB Feature 3.2 MAF-Net 4 Experiments 4.1 Datasets and Metrics 4.2 Experiments on Mutual Distillation for Violent RGB Feature 4.3 Experiments on Multimodal Feature Fusion 4.4 Comparison with Others References YFormer: A New Transformer Architecture for Video-Query Based Video Moment Retrieval 1 Introduction 2 Related Work 2.1 Transformer in Computer Vision 2.2 Video Moment Retrieval 3 YFormer for Video Moment Retireval 3.1 Spatio-Temporal Feature Extractor 3.2 Semantic Relevance Matcher 3.3 Prediction Heads 3.4 Losses 4 Experiments 4.1 Experiment Setup 4.2 Quantitative Results 4.3 Qualitative Results 4.4 Ablation Study 5 Conclusion References Hightlight Video Detection in Figure Skating 1 Introduction 2 Related Works 2.1 Action Quality Assessment 2.2 Figure Skating 2.3 Temporal Action Segmentation 3 Approach 3.1 Overview 3.2 Video Segmentation 3.3 Tube Self-Attention 3.4 Frame Scoring 4 Experiments 4.1 Dataset 4.2 Implementation Details 4.3 Results During Training 4.4 Ablation Study 4.5 Results in the Singles Figure Skating Competition 5 Conclusion References Memory Enhanced Spatial-Temporal Graph Convolutional Autoencoder for Human-Related Video Anomaly Detection 1 Introduction 2 Related Work 2.1 Video Anomaly Detection 2.2 Graph Convolutional Networks 2.3 Memory Networks 3 Method 3.1 Preprocessing 3.2 Network Architecture 3.3 Loss Function 3.4 Anomaly Detection 4 Experiments 4.1 Datasets 4.2 Implementation Details 4.3 Evaluation 4.4 Ablation Studies 5 Conclusions References Background Suppressed and Motion Enhanced Network for Weakly Supervised Video Anomaly Detection 1 Introduction 2 Related Work 2.1 Unsupervised Video Anomaly Detection 2.2 Weakly Supervised Video Anomaly Detection 3 Background Suppressed and Motion Enhanced Network (BSMEN) 3.1 Motion Discrimination Sequence Extraction (MDSE) 3.2 Background Suppressed and Motion Enhanced Module (BSMEM) 3.3 Loss Function 4 Experiments 4.1 Datasets 4.2 Evaluation Metric 4.3 Implementation Details 4.4 Comparison with State-of-the-Art Methods 4.5 Ablation Studies 5 Conclusions References Dirt Detection and Segmentation Network for Autonomous Washing Robots 1 Introduction 2 Relate Works 3 Method 3.1 SVDD (Support Vector Data Description) 3.2 Deep SVDD 3.3 DDSN (Dirt Detection and Segmentation Network) 4 Evaluation 4.1 Experiment Setup 4.2 Experimental Result on MVTecAD Dataset 4.3 Experimental Result on Dirt Dataset 5 Conclusion References Finding Beautiful and Happy Images for Mental Health and Well-Being Applications 1 Introduction 2 Related Work 3 A Beautiful Natural Image Database (BNID) 3.1 Collecting the Images 3.2 Collecting Beatutifulness and Happiness Scores 3.3 Analysis of Beautifulness and Happiness Scores 4 Beautifulness and Happiness Assessment 4.1 Image Beautifulness Assessment 4.2 Loss Functions 4.3 Final Score 4.4 Extension to Image Happiness Prediction 5 Experiments 5.1 Image Beautifulness Assessment Results 5.2 Image Happiness Assessment Results 6 Concluding Remarks References Query-UAP: Query-Efficient Universal Adversarial Perturbation for Large-Scale Person Re-Identification Attack 1 Introduction 2 Related Work 3 Methodology 3.1 Problem Definition 3.2 Loss Function 3.3 Query-UAP Attack 4 Experiment 4.1 Experimental Settings 4.2 Comparison with State of the Arts 4.3 Ablation Study 5 Conclusion References Robust Person Re-identification with Adversarial Examples Detection and Perturbation Extraction 1 Introduction 2 Related Work 2.1 Adversarial Attack 2.2 Adversarial Defense 3 Proposed Method 3.1 Networks Architecture 3.2 Adversarial Examples Generation 3.3 Perturbation Extractor and Purification 3.4 Adversarial Example Detector 4 Experiments 4.1 Experimental Settings 4.2 Robustness of Purification 4.3 Adversarial Detection 4.4 Ablation Experiments 5 Conclusion References Self-supervised and Template-Enhanced Unknown-Defect Detection 1 Introduction 2 Related Work 3 Approach 3.1 Framework 3.2 Feature Fusion Module 3.3 Loss Function 3.4 Defect Detection 4 Experiments 4.1 Dataset 4.2 Training Parameters 4.3 Results 5 Conclusion References JoinTW: A Joint Image-to-Image Translation and Watermarking Method 1 Introduction 2 Related Work 3 Proposed Work 3.1 Problem Statement 3.2 Method Overview 3.3 The Watermark Extractor 3.4 Watermark Embedding Generator 3.5 The Adversary Net 3.6 Training Details 3.7 Loss Functions 4 Experimental Results 4.1 Datasets 4.2 Training Details 4.3 Visual Quality Study of the Generated Images 4.4 Qualities of the Extracted Watermarks 4.5 Watermark Robustness Under Agnostic Distortions 5 Conclusions References Author Index