OLAP

From Free net encyclopedia

Image:Mergefrom.gif It has been suggested that FASMI be merged into this article or section. ([[{{{2|: talk:OLAP}}}|Discuss]])

OLAP is an acronym for On Line Analytical Processing. It is an approach to quickly provide the answer to analytical queries that are dimensional in nature. It is part of the broader category business intelligence, which also includes ETL, relational reporting and data mining. The typical applications of OLAP are in business reporting for sales, marketing, management reporting, business performance management (BPM), budgeting and forecasting, financial reporting and similar areas. The term OLAP was created as a slight modification of the traditional database term OLTP (On Line Transaction Processing).

Databases configured for OLAP employ a multidimensional data model, allowing for complex analytical and ad-hoc queries with a rapid execution time. Nigel Pendse has suggested that an alternative and perhaps more descriptive term to describe the concept of OLAP is Fast Analysis of Shared Multidimensional Information (FASMI). They borrow aspects of navigational databases and hierarchical databases that are speedier than their relational kin.

Contents

Functionality

OLAP takes a snapshot of a set of source data and restructures it into an OLAP cube. The queries can then be run against this. It has been claimed that for complex queries OLAP can produce an answer in around 0.1% of the time for the same query on OLTP relational data.

The cube is created from a star schema of tables. At the centre is the fact table which lists the core facts which make up the query. Numerous dimension tables are linked to the fact tables. These tables indicate how the aggregations of relational data can be analysed. The number of possible aggregations is determined by every possible manner in which the original data can be hierarchically linked.

For example a set of customers can be grouped by city, by district or by country; so with 50 cities, 8 districts and two countries there are three hierarchical levels with 60 members. These customers can be considered in relation to products; if there are 250 products with 20 categories, three families and three departments then there are 276 product members. With just these two dimensions there are 16,560 (276 * 60) possible aggregations. As the data considered increases the number of aggregations can quickly total tens of millions or more.

The calculation of the aggregations AND the base data combined make up an OLAP cube, which can potentially contain all the answers to every query which can be answered from the data (as in Gray, Bosworth, Layman, and Pirahesh, 1997). Due to the potentially large number of aggregations to be calculated, often only a predetermined number are fully calculated while the remainder are solved on demand.

Types of OLAP

There are three types of OLAP.

Multidimensional OLAP

Template:Main MOLAP is the 'classic' form of OLAP and is sometimes referred to as just OLAP. MOLAP uses database structures that are generally optimily attributes such as time period, location, product or account code. The way that each dimension will be aggregated is defined in advance by one or more hierarchies.

Relational OLAP

Template:Main ROLAP works directly with relational databases, the base data and the dimension tables are stored as relational tables and new tables are created to hold the aggregated information.

Hybrid OLAP

Template:Main There is no clear agreement across the industry as to what constitutes "Hybrid OLAP", except that a database will divide data between relational and specialized storage. For example, for some vendors, a HOLAP database will use relational tables to hold the larger quantities of detailed data, and use specialized storage for at least some aspects of the smaller quantities of more-aggregate or less-detailed data.

Comparison

Each type has certain benefits, although there is disagreement about the specifics of the benefits between providers. MOLAP is better on smaller sets of data, it is faster to calculate the aggregations and return answers and does need less storage space.

ROLAP is considered more scalable. However, large volume pre-processing is difficult to implement efficiently so it is frequently skipped. ROLAP query performance can therefore suffer.

HOLAP is between the two in all areas, but it can pre-process quickly and scale well. All types though are prone to database explosion. Database explosion is a phenomenon causing vast amount of storage space being used by OLAP databases when certain but frequent conditions are met: high number of dimensions, pre-calculated results and sparse multidimensional data. The difficulty in implementing OLAP comes in forming the queries, choosing the base data and developing the schema, as a result of which most modern OLAP products come with huge libraries of pre-configured queries. Another problem is in the base data quality - it must be complete and consistent.

Other types

The following acronyms are also used sometimes, although they are not as widespread as the ones above

  • WOLAP - Web-based OLAP
  • DOLAP - Desktop OLAP
  • RTOLAP - Real-Time OLAP

APIs and query languages

Unlike relational databases - which had SQL as the standard query language, and wide-spread APIs such as ODBC, JDBC and OLEDB - there long was no such unification in the OLAP world. The first real standard API was OLEDB for OLAP specification from Microsoft which appeared in 1997 and introduced the MDX query language. Several OLAP vendors - both server and client - adopted it. In 2001 Microsoft and Hyperion announced the XML for Analysis specification, which was endorsed by most of the OLAP vendors. Since this also used MDX as a query language, MDX became the de-facto standard in the OLAP world.

Products

History

The first product which performed OLAP queries was IRI's Express which was released in 1970 (and acquired by Oracle in 1995). However, the term did not appear until 1993 when it was coined by Ted Codd, who has been described as "the father of the relational database". But Codd's paper was financed by the former Arbor Software (now Hyperion Solutions), as a sort of marketing coup: the company had released its own OLAP product — Essbase — a year earlier. As a result Codd's "twelve laws of online analytical processing" were explicit in their reference to Essbase. There was some ensuing controversy, and when Computerworld learned that Codd was paid by Arbor, it retracted the article.

Market Shares

According to the influential OLAP Report site, the market shares for the top commercial OLAP products in 2005 were:

  1. Microsoft - 28.0%
  2. Hyperion Solutions Corporation - 19.3%
  3. Cognos - 14.0%
  4. Business Objects - 7.4%
  5. MicroStrategy - 7.3%
  6. SAP AG - 5.9%
  7. Cartesis SA - 3.8%
  8. Systems Union/MIS AG - 3.4%
  9. Oracle Corporation - 3.4%
  10. Applix - 3.2%

Commercial OLAP products

Template:Col-begin Template:Col-break

Template:Col-break

Template:Col-end

Open Source OLAP

Template:Main

  • Palo - An Open Source MOLAP Server
  • Mondrian - An Open Source ROLAP Server
  • JPalo - Open Source Development Tools for Palo

See also

External links

de:Online Analytical Processing es:OLAP fr:OLAP it:OLAP nl:OLAP ja:OLAP pt:OLAP ru:OLAP pl:On-Line Analitical Processing