VideoQ: An Automated Content Based Video Search System Using Visual Cues

Shih-Fu Chang - William Chen - Horace J. Meng - Hari Sundaram - Di Zhong

Dept. of Electrical Engineering
Columbia University
New York New York 10027.

{sfchang,bchen,jmeng,sundaram,dzhong}@ctr.columbia.edu

Introduction
The Visual Paradigm
The VideoQ System Overview
What is a Video Object?

Color, Texture, Shape
Motion, Time
Weighting the Attributes

Automatic Video Shot Analysis

Scene Cut Detection
Global Video Shot Attributes
Tracking Objects: Motion, Color, and Edges

Projection and Segmentation Module

Building the Visual Feature Library
Feature Space Metrics

Matching Motion Trails
Matching Other Features

Query Resolution

Single Object Query
Querying Multiple Objects

How does VideoQ perform?

Precision-recall metrics
Time and Cost to find a particular Video shot

Research Issues in VideoQ

Region Grouping
Shape
Spatio-temporal search

Conclusions
References

Abstract

The rapidity with which digital information, particularly video, is being generated, has necessitated the development of tools for efficient search of these media. Content based visual queries have been primarily focussed on still image retrieval. In this paper, we propose a novel, real-time, interactive system on the Web, based on the visual paradigm, with spatio-temporal attributes playing a key role in video retrieval. We have developed algorithms for automated video object segmentation and tracking and use real-time video editing techniques while responding to user queries. The resulting system performs well, with the user being able to retrieve complex video clips such as those of skiers, baseball players, with ease.

1 Introduction

The ease of capture and encoding of digital images has caused a massive amount of visual information to be produced and disseminated rapidly. Hence efficient tools and systems for searching and retrieving visual information are needed. While there are efficient search engines for text documents today, there are no satisfactory systems for retrieving visual information.

Content-based visual queries (CBVQ) has emerged as a challenging research area in the past few years [Chang 97], [Gupta 97]. While there has been substantial progress with the presence of systems such as QBIC [Flickner 95], PhotoBook [Pentland 96], Virage [Hamrapur 97] and VisualSEEk [Smith 96] most systems only support retrieval of still images. CBVQ research on video databases has not been fully explored yet. We propose an advanced content-based video search system with the following unique features:

Automatic video object segmentation and tracking.
A rich visual feature library including color, texture, shape, motion.
Query with multiple objects.
Spatio-temporal constraints on the query.
Interactive querying and browsing over the World-Wide Web.
Compressed-domain video manipulation.

Specifically, we propose to develop a novel video search system which allows users to search video based on a rich set of visual features and spatio-temporal relationships. Our objective is to investigate the full potential of visual cues in object-oriented content-based video search. While the search on video databases ought to necessarily incorporate the diversity of the media (video, audio, text captions) our research will complement any such integration.

We will present the the visual search paradigm in section 2, elaborate on the system overview in section 3, describe video objects and our automatic video analysis techniques in sections 4-5, discuss the matching criteria and query resolution in sections 7-8 and finally present some preliminary evaluation results in section 9.

VideoQ: An Automated Content Based Video Search System Using Visual Cues

Table Of Contents

Abstract

1 Introduction

2 The Visual Paradigm

3 The VideoQ System Overview

4 What is a Video Object?

4.1 Color, Texture, Shape

4.2 Motion, Time

4.3 Weighting the Attributes

5 Automatic Video Shot Analysis

5.1 Scene Cut Detection

5.2 Global Video Shot Attributes

5.3 Tracking Objects: Motion, Color and Edges

5.3.1 Projection and Segmentation Module

6 Building the Visual Feature Library

7 Feature Space Metrics

7.1 Matching Motion Trails

7.2 Matching Other Features

8 Query Resolution

8.1 Single Object Query

8.2 Querying Multiple Objects

9 How does VideoQ perform?

9.1 Precision-Recall Type Metrics

9.2 Time and Cost to Find a Particular Video Shot

10 Research Issues in VideoQ

10.1 Region Grouping

10.2 Shape

10.3 Spatio-Temporal Search

11 Conclusions

References

Footnotes