Metadata

From Free net encyclopedia

Metadata (Greek: meta- + Latin: data "information"), literally "data about data", are information about another set of data.

One useful definition is

"Metadata are structured, encoded data that describe characteristics of information-bearing entities to aid in the identification, discovery, assessment, and management of the described entities." (Committee on Cataloging Task Force on metadata Summary Report, http://www.libraries.psu.edu/tas/jca/ccda/tf-meta3.html, 1999)

A common example is a library catalog card, which contains data about the contents and location of a book. Typically, the catalog card contains the name of the author, the title of the book, the publisher, the year it was published, the genre, the series is belongs to, identifiers such as the ISBN number (unique) and Dewey call number (non-unique), and a brief synopsis of the book’s contents.

To locate a book, or a set of books, the user simply searches on the specified attribute (title, genre, ISBN, etc.) Depending on the nature of the attribute (unique such as ISBN or non-unique such as genre) and the availability of books in the library, no books, one book or multiple books may match the search criteria.

An important purpose of metadata is to describe information such that a user or machine can locate a specific item or perhaps a list of items that meet particular criteria. In the case of a list of items, the metadata that matches on a specified criterion acts as a link between similar items, essentially defining a relationship between those items. Some metadata schemes attempt to embrace this concept, such as the Dublin Core element link.

The metadata concept has been extended into the world of systems to include any "data about data"--the names of tables, columns, programs, and the like. Different views of this system metadata are described below, but beyond that is recognition that metadata describe all aspects of systems--data, activities, people and organizations involved, locations of data and processes, access methods, limitations, timing and events, as well as motivation and rules.

Fundamentally, then, metadata are "the data that describe the structure and workings of an organization’s use of information, and which describe the systems it uses to manage that information." To do a model of metadata is to do an "Enterprise model" of the information technology industry itself. [[#References|Template:Footnote]]


Contents

Uses

Metadata have become important on the World Wide Web because of the need to find useful information from the mass of information available. Manually-created metadata add value because they ensure consistency. If one webpage about a topic contains a word or phrase, then all webpages about that topic should contain that same word or phrase. They also ensure variety, so that if one topic has two names, each of these names will be used. For example, an article about Sport Utility Vehicles would also be given the metadata keywords ‘4 wheel drives’, ‘4WDs’ and ‘four wheel drives’, as this is how they are known in some countries.

Examples of metadata for an audio CD include the MusicBrainz project, and AMG's All Music Guide. Similarly, MP3 files have metadata tags in a format called ID3.

Metadata are more properly called an ontology or schema when structured into a hierarchical arrangement. Both terms describe “what exists” for some purpose or to enable some action. For instance, the arrangement of subject headings in a library catalog serves not only as a guide to finding books on a particular subject in the stacks, but also as a guide to what subjects “exist” in the library’s own ontology and how more specialized topics are related to or derived from the more general subject headings.

Metadata are frequently stored in a central location and used to help organizations standardize their data. This information is typically stored in a Metadata Registry.

Types

Relational database metadata

Each relational database system has its own mechanisms for storing metadata. Examples of relational-database metadata include:

  • Tables of all tables in database, their names, sizes and number of rows in each table.
  • Tables of columns in each database, what tables they are used in, and the type of data stored in each column.

In database terminology, this set of metadata is referred to as the catalog. The SQL standard specifies a uniform means to access the catalog, called the INFORMATION_SCHEMA, but not all databases implement it, even if they implement other aspects of the SQL standard. For an example of database-specific metadata access methods, see Oracle metadata.

Data warehouse metadata

Data warehouse metadata systems are sometimes separated into two sections:

  1. back room metadata that are used for Extract, transform, load functions to get OLTP data into a data warehouse
  2. front room metadata that are used to label screens and create reports

Kimball[[#References|Template:Footnote]] lists the following types of metadata in a data warehouse (See also [1]):

Michael Bracket defines metadata (what he calls "Data resource data") as "any data about the organization’s data resource". [[#References|Template:Footnote]] Adrienne Tannenbaum defines metadata as "the detailed description of instance data. The format and characteristics of populated instance data: instances and values, dependent on the role of the metadata recipient." These definitions are characteristic of the "data about data" definition.

General IT metadata

In contrast, David Marco, another metadata theorist, defines metadata as "all physical data and knowledge from inside and outside an organization, including information about the physical data, technical and business processes, rules and constraints of the data, and structures of the data used by a corporation." Notice that this definition expands metadata's scope considerably, to encompass most or all of the data required by the Management Information Systems capability. In this sense, the concept of metadata has significant overlaps with the ITIL concept of a Configuration Management Database (CMDB), and also with disciplines such as Enterprise Architecture and IT Portfolio Management.

This broader definition of metadata has precedent. Third generation corporate repository products (such as those eventually merged into the CA Advantage line) not only store information about data definitions (COBOL copybooks, DBMS schema) but also about the programs accessing those data structures, and the JCL and batch job infrastructure dependencies as well. These products (many of which are still in production) can provide a very complete picture of a mainframe computing environment, supporting exactly the kinds of impact analysis required for ITIL-based processes such as Incident and Change Management. The ITIL "back catalog" includes Data Management volume which recognizes the role of these metadata products on the mainframe, posing the CMDB as the distributed computing equivalent. CMDB vendors however have generally not expanded their scope to include data definitions, and metadata solutions are also available in the distributed world. Determining the appropriate role and scope for each is thus a challenge for large IT organizations requiring the services of both.

First generation data dictionary/metadata repository tools would be those only supporting a specific DBMS, such as the IDD for IDMS, and the IMS Data Dictionary.

Second generation would be MSP's DATAMANAGER product which could support many different file and DBMS types.

Third generation repository products became briefly popular in the early 1990s along with the rise of widespread use of RDBMS engines such as IBM's DB2.

File system metadata

Nearly all file systems keep metadata about files out-of-band. Some systems keep metadata in directory entries; others in specialized structure like inodes or even in the name of a file. Metadata can range from simple timestamps, mode bits, and other special-purpose information used by the implementation itself, to icons and free-text comments, to arbitrary attribute-value pairs.

With more complex and open-ended metadata, it becomes useful to search for files based on the metadata contents. The Unix find utility was an early example, although inefficient when scanning hundreds of thousands of files on a modern computer system. Apple Computer's current version of its Mac OS X operating system (Tiger) supports cataloging and searching for file metadata through a feature known as Spotlight. Microsoft is currently developing similar functionality in the WinFS file system. Linux implements file metadata using extended file attributes.

Image metadata

Examples of image files containing metadata include Exchangeable Image File Format (EXIF) and Tagged Image File Format (TIFF).

Having metadata about images embedded in TIFF of EXIF files is one way of acquiring additional data about an image. Image metadata are attained through tags. Tagging pictures with subjects, related emotions, and other descriptive phrases helps Internet users find pictures easily rather than having to search through entire image collections. A prime example of an image tagging service is Flickr, where users upload images and then describe the contents. Other patrons of the site can then search for those tags . Flickr uses a folksonomy: a free-text keyword system in which the community defines the vocabulary through use rather than through a controlled vocabulary.

Program metadata

Metadata is casually used to describe the controlling data used in software architectures that are more abstract or configurable. Most executable file formats include what may be termed "metadata" that specifies certain, usually configurable, behavioral runtime characteristics. However, it is difficult if not impossible to precisely distinguish program "metadata" from general aspects of stored-program computing architecture; if the machine reads it and acts upon it, it is a computational instruction, and the prefix "meta" has little significance.

In Java, the class file format contains metadata used by the Java compiler and the Java virtual machine to dynamically link classes and to support reflection. The J2SE 5.0 version of Java included a metadata facility to allow additional annotations that are used by development tools.

In MS-DOS, the COM file format does not include metadata, but the EXE file format does, and Windows PE format also. These metadata can include the company that published the program, the date the program was created, the version number and more.

In the Microsoft .NET executable format, extra metadata is included to allow reflection at runtime.

Document metadata: Most programs that create documents, including Microsoft Word and other Microsoft Office products, save metadata with the document files. These metadata can contain the name of the person who created the file (obtained from the operating system), the name of the person who last edited the file, how many times the file has been printed, and even how many revisions have been made on the file. Other saved material, such as deleted text (saved in case of an undelete command), document comments and the like, is also commonly referred to as "metadata", and the inadvertent inclusion of this material in distributed files has sometimes led to undesirable disclosures.

For a list of executable formats, see object file.

Metamodels

Metadata on Models are called Metamodels. In Model Driven Engineering, a Model has to conform to a given Metamodel. According to the MDA guide, a metamodel is a model and each model conforms to a given metamodel. Meta-modeling allows strict and agile automatic processing of models and metamodels.

Strange metadata

Since metadata are also data, it is possible to have metadata of metadata–"meta-metadata." Machine-generated meta-metadata, such as the reversed index created by a free-text search engine, is generally not considered metadata, though.

Metadata that are embedded with content is called embedded metadata. A data repository typically stores the metadata detached from the data.

Digital library metadata

There are three categories of metadata that are frequently used to describe objects in a digital library [[2]][[3]]:

  1. descriptive - Information describing the intellectual content of the object, such as MARC cataloguing records, finding aids or similar schemes. It is typically used for bibliographic purposes and for search and retrieval.
  2. structural - Information that ties each object to others to make up logical units (e.g., information that relates individual images of pages from a book to the others that make up the book).
  3. administrative - Information used to manage the object or control access to it. This may include information on how it was scanned, its storage format, copyright and licensing information, and information necessary for the long-term preservation of the digital objects.

See also

External links

References

Template:Footnote David C. Hay, Data Model Patterns: A Metadata Map, Morgan Kaufman, 2006, ISBN 0120887983

Template:Footnote Ralph Kimball, The Data Warehouse Lifecycle Toolkit, Wiley, 1998, ISBN 0471255475

Template:Footnote Michael H. Brackett, Data Resource Quality, Addison-Wesley, 2000, ISBN 0201713063

Template:Footnote David Marco, Building and Managing the Meta Data Repository: A Full Lifecycle Guide, Wiley, 2000, ISBN 0471355232

Template:Footnote Adrienne Tannenbaum, Metadata Solutions: Using Metamodels, repositories, XML, and Enterprise Portals to Generate Information on Demand, Addison-Wesley, 2002, ISBN 0201719762

Template:Footnote Guy V Tozer, Metadata Management for Information Control and Business Success, Artech House, 1999, ISBN 0890062803ar:ميتاداتا cs:Metadata de:Metadaten et:Metaandmed es:Metadato eo:Meta-dateno fr:Métadonnée it:Metadata lv:Metadati hu:Metaadat nl:Metadata ja:メタデータ no:Metadata pl:Metadane pt:Metadados ru:Метаданные fi:Metatieto sv:Metadata th:เมตะเดตา vi:Metadata