کلمات کلیدی مربوط به کتاب تشخیص ویدئو تقریبا تکراری با ساختارهای بصری زمانی و ادراکی همراه و تطابق مبتنی بر استنتاج منطقی: علوم و مهندسی کامپیوتر، پردازش داده های رسانه ای، پردازش ویدئو
در صورت تبدیل فایل کتاب Near-duplicate video detection featuring coupled temporal and perceptual visual structures and logical inference based matching به فرمت های PDF، EPUB، AZW3، MOBI و یا DJVU می توانید به پشتیبان اطلاع دهید تا فایل مورد نظر را تبدیل نمایند.
توجه داشته باشید کتاب تشخیص ویدئو تقریبا تکراری با ساختارهای بصری زمانی و ادراکی همراه و تطابق مبتنی بر استنتاج منطقی نسخه زبان اصلی می باشد و کتاب ترجمه شده به فارسی نمی باشد. وبسایت اینترنشنال لایبرری ارائه دهنده کتاب های زبان اصلی می باشد و هیچ گونه کتاب ترجمه شده یا نوشته شده به فارسی را ارائه نمی دهد.
Belkhatir, M., & Tahayna, B. Near-duplicate video detection
featuring coupled temporal and perceptual visual structures and
logical inference based matching. Information Processing and
Management (2011), doi:10.1016/j.ipm.2011.03.003
Mohammed Belkhatir — Faculty of
Computer Science, University of Lyon, Campus de la Doua, 69622
Villeurbanne Cedex, France
Bashar Tahayna — Faculty of Information Technology, Monash
University, Sunway Campus, 46150, Malaysia
Keywords:
near-duplicate video detection, perceptual visual indexing,
logical inference, lattice-based processing, empirical
evaluation.
We propose in this paper an
architecture for near-duplicate video detection based on: (i)
index and query signature based structures integrating temporal
and perceptual visual features and (ii) a matching framework
computing the logical inference between index and query
documents. As far as indexing is concerned, instead of
concatenating low-level visual features in high-dimensional
spaces which results in curse of dimensionality and redundancy
issues, we adopt a perceptual symbolic representation based on
color and texture concepts. For matching, we propose to
instantiate a retrieval model based on logical inference
through the coupling of an N-gram sliding window process and
theoretically-sound lattice-based structures. The techniques we
cover are robust and insensitive to general video editing
and/or degradation, making it ideal for re-broadcasted video
search. Experiments are carried out on large quantities of
video data collected from the TRECVID 02, 03 and 04 collections
and real-world video broadcasts recorded from two German TV
stations. An empirical comparison over two state-of-the-art
dynamic programming techniques is encouraging and demonstrates
the advantage and feasibility of our method. 2011 Published by
Elsevier Ltd.
Information Processing and Management
journal homepage:
http://www.elsevier.com/locate/infoproman
Introduction Near-duplicate video
(NDV) detection in large multimedia collections is very
important for digital rights management as well as for video
retrieval applications. One crucial step for such task is to
define a matching/mismatching measure between two video
sequences. Extended research has been carried out to identify
NDVs in video collections (Bertini, Bimbo, & Nunziati, 2006;
Hoad & Zobel, 2003, 2006; Joly, Frèlicot, & Buisson, 2003;
Joly, Buisson, & Frélicot, 2007; Vidal, Marzal, & Aibar, 1995;
Zhou & Zhang, 2005). However, existing methods have substantial
limitations: they are sensitive to the degradations of the
video, expensive to compute, and mostly limited to the
comparison of whole video clips. Moreover, much of the video
content is distributed in a continuous stream that cannot be
easily segmented for comparison, making these methods
unsuitable for applications which are used by regulation
authorities for continuous broadcast-streaming monitoring.
During video editing, some inappropriate shots could be deleted
and commercial breaks could be inserted. However, from the
perspective of human perception, the initial and edited videos
are still regarded as similar (Zhou & Zhang, 2005). Thus, in
order to identify duplicates of a specific video, an efficient
video matching and scoring framework is required for detecting
similar or quasi-similar contents. Many of the existing
matching models are not suitable for such a task since they
either ignore the temporal dimension or simplify the query
model. NDV detection requires models for video
sequence-to-sequence matching incorporating the temporal order
inherent in video data. For sequence matching to be meaningful,
corresponding video contents shall be identified in a fixed
chronological temporal order while ignoring all the in-between
mismatching shots which are often artificially introduced in
edited videos. To achieve this, many solutions are based on
proposing to view a sequence of video frames as a string and
use direct comparison between the sequence of features of the
query and index videos. However, this method is computationally
expensive and sensitive to changes that can occur during video
editing. In order to reduce the computational cost, an
alternative approach consists in computing a shot-based index
structure viewed as a string and then apply string matching
algorithms to solve the problem of shot alignment. The main
programming paradigm used in the literature for computing
sequence alignment, dynamic programming, has however some
limitations. Its computational load is indeed affected by the
number of shots and their duration. Furthermore, dynamic
programming measures, in terms of edit distance, how
mismatching two videos are rather than how similar they are.
In this paper, we propose a
near-duplicate video detection framework based on
signature-based index structures featuring perceptual visual
attributes and a matching and scoring framework relying on
logical inference. As far as indexing is concerned, the
concatenation of low-level visual features (color, texture,
etc.) in high-dimensional spaces traditionally results in curse
of dimensionality and redundancy issues. Moreover, this usually
requires normalization which may cause an undesirable
distortion in the feature space. Indeed, since low-level visual
features (color and texture) are of high dimensionality
(typically in the order of 102–103) and data in high-dimension
spaces are sparse, it is necessary to gather enough
observations to make sure that the estimation is viable.
Consequently, it is crucial to consider the dimensionality
reduction of the visual feature representation spaces.
Moreover, contrary to the state-of-the-art approaches for
dimensionality reduction (such as principal component analysis,
multidimensional scaling, singular value decomposition) which
are opaque (i.e. they operate dimensionality reduction of input
spaces without making it possible to understand the
signification of elements in the reduced feature space), our
framework will itself be based on a transparent readable
characterization. We propose to reduce the dimensionality of
signal features by taking into account a perceptual symbolic
representation of the visual features based on color and
texture concepts. A matching framework relying on the logical
inference between index and query documents is instantiated
through an N-gram sliding window technique coupled with fast
lattice-based processing. Nearduplicate videos are here defined
as a set of matched pair-wise sequences, but with certain
constraints that can be induced by frame rate conversions and
editing, which abundantly exist in real-world applications.
Experimentally, we implement our
theoretical proposition, detail the automatic characterization
of the visual (color and texture) concepts and evaluate the
prototype on 286 videos from the TRECVID 02, 03 and 04 corpora
against two dynamic programming frameworks.
The remainder of this paper is
organized as follows: Section 2 introduces the related work on
NDV detection. Section 3 gives an overview of the proposed
system architecture. Temporal video segmentation is detailed in
Section4;. Signaturebased indexing with duration, color and
texture feature extraction is detailed in Section 5'. Then in
Section 6 we discuss the N-gram matching and scoring framework.
Experimental results are reported in Section 7.