Open Source Column: LIRe - A Java Library for Content Based Image Retrieval

December 2011

 

Authors: Mathias Lux

URL: http://www.semanticmetadata.net/lire

By Pablo Cesar

 

Dr. Mathias Lux is currently assistant professor at the Institute for Information Technology at Klagenfurt University. He received his M.S. in Mathematics and his Ph.D. in Telematics from Graz University of Technology. He worked in industry on web-based applications, then as a junior researcher at a research center for knowledge-based applications. He was research and teaching assistant at the Knowledge Management Institute (KMI) of Graz University of Technology and in 2006 he started working in a post doc position at Klagenfurt University, at the Institute for Information Technology. In his scientific career he has (co-) authored more than 50 scientific publications and has served in multiple program committees of international conferences. He is also well known for managing the development of the award winning and popular open source tools Caliph & Emir and LIRe for multimedia information retrieval. He is currently working on user intentions in multimedia retrieval and production and emergent semantics in social multimedia computing.

 

 

LIRe - A Java Library for Content Based Image Retrieval

 

Content based image retrieval (CBIR) has been for quite some time a hot research topic. Early results and their commercial implementations like IBM QBIC or Cantos Cumulus have not yield good enough results to be applied in everyday tasks. Also research in CBIR has been consolidated and discussed on a general level several times. Different evaluation organizations and events, like imageCLEF, MediaEval or TRECVid, provide playgrounds for researchers and practitioners to test new and consolidate current approaches. Many research groups and practitioners aim to build on existing tools to avoid reimplementation of existing approaches. LIRe satisfies such needs by providing a library of basic and advanced functions employed in the field of content based image retrieval.

LIRe is a Java library to be integrated in existing or yet to be built applications and code. From a design perspective, LIRe tries to hide as much as possible the complexity of CBIR. LIRe is based on Lucene, a text search engine providing inverted indexing, search and fast random access to text indexes. At a first glance developers encounter in LIRe few parameters to be set and even fewer choices. DocumentBuilder classes provide easy access to different low level features and wrap the use of Lucene, which is used for indexing. ImageSearcher classes allow for search and retrieval based on single query images or already indexed documents. Extensibility is a main feature of Lire. By implementing an easy interface, which takes care of serialization of features within the Lucene index, developers can provide their own methods and share them as open source code with the research community.

LIRe provides a broad range of image features including color and texture features, correlograms, joint histograms, region and edge features as well as several different metrics for distance computation. Local features SIFT and SURF are supported by 3rd party libraries, an implementation of MSER and a local feature descriptor describing found regions is in early beta phase.

In general, global as well as local features are stored as byte payload in a Lucene based index. Basic search implementations use linear search to find the most promising n candidates and return them in a ranked list. This works fine for smaller image collection, up to 100k images. For local features a further indexing step allows for the use of an inverted index for the bag of visual words approach. However, for several scenarios with millions of images and global features sub linear search in feature spaces is needed. The approach implemented in LIRe is the metric spaces method based on the ideas of Giuseppe Amato. This approach utilizes inverted lists to characterize data points in feature spaces by their distance to a set of reference data points. Utilizing a ranked list of nearest neighbors the foot-rule distance provides an approximation of the original pair wise distance. This approach has been shown to work well on sets of millions of images.

LIRe has seen its first release in Feb. 2006. Since then it has been downloaded from sourceforge.net more than 22,000 times (with Dec. 28th 2011). This does not include downloads from SVN or from the LIRe website or blog. Since 2006 it has been continuously extended and maintained. In 2008 students from my course on multimedia information systems contributed Tamura and Gabor features, in 2009 Savvas Chatzchristofis added the joint histogram descriptors CEDD and FCTH, being currently the global descriptors of choice for practical applications. In the same year first tests with the metric spaces approach started. In 2010 a major re-write of the serialization of feature descriptors resulted in a major performance increase and in 2011 another re-write of core functions simplified the use of local descriptors and index management of bag of visual words indexes.

Besides core functions and features, several smaller but nevertheless interesting methods have been implemented in LIRe. Examples are an estimator for visual attention in images included, a parallelized version of k-means clustering, a parallel indexer for large file repositories, a benchmarking suite based on the Simplicity data set, and latent semantic indexing. Also a desktop Java demo tool has been developed which allows for easy try-out and testing on arbitrary image collections.

 

Lire demo, a simple GUI tool for testing LIRe features on arbitrary image collections

 

Lire has been around for nearly 6 years now and many researchers and lecturers have adopted LIRe for starting point or platform of their research or framework for teaching and practical assignments in education. Many bug fixes, enhancements and suggestions have been contributed. I appreciate all the positive feedback and help and I'd like to thank all the people who helped throughout the years including Anna-Maria Pasterk, Arthur Li, Arthur Pitman, Bastian Hösch, Benjamin Sznajder, Christian Penz, Christine Keim, Christoph Kofler, Dan Hanley, Daniel Pötzinger, Fabrizio Falchi, Giuseppe Amato, Harald Kosch, Janine Lachner, Katharina Tomanec, Lukas Esterle, Laszlo Böszörmenyi, Marco Bertini, Manuel Oraze, Manuel Warum, Marian Kogler, Markus Fauster, Marko Keuschnig, Oge Marques, Rodrigo Carvalho Rezende, Roman Divotkey, Roman Kern and Savvas Chatzichristofis.


Previous Section Table of Contents Next Section