Content-based image retrieval

From Free net encyclopedia

Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision to the image retrieval problem, that is, the problem of searching for digital images in large databases. "Content-based" means that the search makes use of the contents of the images themselves, rather than relying on human-inputted metadata such as captions or keywords. A content-based image retrieval system (CBIRS) is a piece of software that implements CBIR.

The term CBIR seems to have originated in 1992, when it was used by T. Kato to describe experiments into automatic retrieval of images from a database, based on the colours and shapes present. Since then, the term has been used to describe the process of retrieving desired images from a large collection on the basis of syntactical image features. The techniques, tools and algorithms that are used originate from fields such as statistics, pattern recognition, signal processing, and computer vision.

There is growing interest in CBIR because of the limitations inherent in metadata-based systems. Textual information about images can be easily searched using existing technology, but requires humans to personally describe every image in the database.This is impractical for very large databases, or for images that are generated automatically, e.g. from surveillance cameras. It is also possible to miss images that use different synonyms in their descriptions. Systems based on categorizing images in semantic classes like "cat" as a subclass of "animal" avoid this problem but still face the same scaling issues.

The ideal CBIR system from a user perspective would involve what is referred to as semantic retrieval, where the user makes a request like "find pictures of dogs" or even "find pictures of Abraham Lincoln". This type of open-ended task is very difficult for computers to perform - pictures of chihuahuas and Great Danes look very different, and Lincoln may not always be facing the camera or in the same pose. Current CBIR systems therefore generally make use of lower-level features like texture, color, and shape, although some systems take advantage of very common higher-level features like faces (see facial recognition system). Not every CBIR system is generic. Some systems are designed for a specific domain, e.g. shape matching can be used for finding parts inside a CAD-CAM database.

Different implementations of CBIR make use of different types of user queries.

  • With query by example, the user searches with a query image (supplied by the user or chosen from a random set), and the software finds images similar to it based on various low-level criteria.
  • With query by sketch, the user draws a rough approximation of the image they are looking for, for example with blobs of color, and the software locates images whose layout matches the sketch.
  • Other methods include specifying the proportions of colors desired (e.g. "80% red, 20% blue") and searching for images that contain an object given in a query image (as at [1]).

CBIR systems can also make use of relevance feedback, where the user progressively refines the search results by marking images in the results as "relevant", "not relevant", or "neutral" to the search query, then repeating the search with the new information.

One application of CBIR is to identify images with skin-tones and shapes that could indicate the presence of nudity, for filtering and for searching by law enforcement.

There is one problematic issue with the use of the term "Content Based Image Retrieval". The way the term CBIR is generally used, refers only to the structural content of images. This use excludes image retrieval based on textual annotation. Keywords and free text have the ability to give very rich and detailed description of image content. Even though this kind of image indexing and classification is prone to problems of volume, subjectivity and explicability, it can still be used to completely describe all levels of image content. It would be preferable if a narrower, yet descriptive, term could be used to describe syntactical feature extraction, as this would be more semantically correct.

External links

Relevant research papers