Decision tree
From Free net encyclopedia
Template:Cleanup-dateIn decision theory, a decision tree is a graph of decisions and their possible consequences, (including resource costs and risks) used to create a plan to reach a goal. Decision trees are constructed in order to help with making decisions. A decision tree is a special form of tree structure.
Contents |
General
In machine learning, a decision tree is a predictive model; that is, a mapping of observations about an item to conclusions about the item's target value. Each interior node corresponds to a variable; an arc to a child represents a possible value of that variable. A leaf represents the predicted value of target variable given the values of the variables represented by the path from the root.
The machine learning technique for inducing a decision tree from data is called decision tree learning, or (colloquially) decision trees.
Decision tree learning is also a common method used in data mining. Here, a decision tree describes a tree structure wherein leaves represent classifications and branches represent conjunctions of features that lead to those classifications [1]. A decision tree can be learned by splitting the source set into subsets based on an attribute value test [1]. This process is repeated on each derived subset in a recursive manner. The recursion is completed when splitting is either non-feasible, or a singular classification can be applied to each element of the derived subset. A random forest classifier uses a number of decision trees, in order to improve the classification rate.
Decision trees are also a descriptive means for calculating conditional probabilities.
Decision tree can be described also as the synergy of mathematical and computing techniques that aids on the description, categorisation and generalisation of a given set of data. Data comes in records of the form:
(x, y) = (x1, x2, x3…, xk, y)
The dependent variable, Y, is the variable that we are trying to understand, classify or generalise. The other variables x1, x2, x3 etc are the variables that will help us on that job.
Types
Decision tree has three other names:
Classification tree analysis is a term used when the predicted outcome is the class to which the data belongs.
Regression tree analysis is a term used when the predicted outcome can be considered a real number (e.g. the price of a house, or a patient’s length of stay in a hospital).
CART analysis is a term used to refer to both of the above procedures. The name CART is an acronym from the words Classification And Regression Trees, and was first introduced by Breiman et al. [BFOS84].
Practical example
We will use an example to explain decision trees:
- Our friend David is the manager of a famous golf club. Sadly, he is having some trouble with his customer attendance. There are days when everyone wants to play golf and the staff are overworked. On other days, for no apparent reason, no one plays golf and staff have too much slack time. David’s objective is to optimise staff availability by trying to predict when people will play golf. To accomplish that he needs to understand the reason people decide to play and if there is any explanation for that. He assumes that weather must be an important underlying factor, so he decides to use the weather forecast for the upcoming week. So during two weeks he has been recording the:
- Outlook, whether it was sunny, clouded or raining.
- The temperature (in degrees Fahrenheit).
- The Relative Humidity in percent.
- Whether it was windy or not.
- And of course if people attended the golf club on that day.
- David compiled this dataset into a table containing 14 rows and 5 columns as shown below.
- He then applied a decision tree model to solve his problem.
- A decision tree is a model of the data that encodes the distribution of the class label (again the Y) in terms of the predictor attributes. It is a directed, acyclic graph in form of a tree. The top node represents all the data. The classification tree algorithm finds out that the best way to explain the dependent variable, play, is by using the variable Outlook. Using the categories of the variable outlook three different groups were found:
- *One that plays golf when the weather is sunny,
- *One that plays when the weather is clouded, and
- *One that plays when its raining.
- David's first conclusion: if the outlook is overcast people always play golf, and there are some fanatical people that play golf even in the rain. Then again he divided the sunny group in two groups. He realised that customers don't like to play golf if the humidity is higher than seventy percent.
- Finally he divided the rain category into two and found that customers will also not play golf if it is windy.
- And here is the short solution of the problem given by the classification tree: David, dismisses most of the staff on days that are sunny and humid or on rainy days that are windy because almost no one is going to play golf on those days. On days when a lot of people will play golf, he hires extra staff. The conclusion is that the decision tree helped David turn a complex data representation into a much easier structure (parsimonious).
Formulas
Gini impurity
Used by the CART algorithm (Classification and Regression Trees). It is based on squared probabilities of membership for each target category in the node. It reaches its minimum (zero) when all cases in the node fall into a single target category.
Suppose y takes on values in {1, 2, ..., m}, and let f(i, j) = frequency of value j in node i. That is, f(i, j) is the proportion of records assigned to node i for which y = j.
<math>I_{G}(i) = 1 - \sum^{m}_{j=1} f (i,j)^{2}</math>
Information gain
Used by the ID3, C4.5 and C5.0 tree generation algorithms. Information gain is based on the concept of entropy used in information theory.
<math>I_{E}(i) = - \sum^{m}_{j=1} f (i,j) \log^{}_2 f (i, j)</math>
Decision tree advantages
Amongst other data mining methods, decision trees is the method that has several advantages:
Decision trees:
- are simple to understand and interpret. People are able to understand decision tree models after a brief explanation.
- require little data preparation. Other techniques often require data normalisation, dummy variables need to be created and blank values to be removed.
- able to handle both nominal and categorical data. Other techniques are usually specialised in analysing datasets that have only one type of variable. Ex: relation rules can be only used with nominal variables while neural networks can be used only with numerical variables.
- use a a white box model. If a given situation is observable in a model the explanation for the condition is easily explained by boolean logic. An example of a black box model is an artificial neural network since the explanation for the results is excessively complex to be comprehended.
- make it possible to validate a model using statistical tests. That makes it possible to account for the reliability of the model.
- are robust, perform well with large data in a short time. Large amounts of data can be analysed using personal computers in a time short enough to enable stakeholders to take decisions based on its analysis.
Extending decision trees with decision graphs
In a decision tree, all paths from the root node to the leaf node proceed by way of conjunction, or AND. In a decision graph, it is possible to use disjunctions (ORs) to join two more paths together.
A complement to Decision Trees is Morphological Analysis.
External sources
- [BFOS84] L. Breiman, J. Friedman, R. A. Olshen and C. J. Stone, "Classification and regression trees". Wadsworth, 1984.
- [1] T. Menzies, Y. Hu, Data Mining For Very Busy People. IEEE Computer, October 2003, pgs. 18-25.
- Decision Tree Analysis mindtools.com
- J.W. Comley and D.L. Dowe, "Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages", chapter 11 (pp265-294) in P. Grunwald, M.A. Pitt and I.J. Myung (eds)., Advances in Minimum Description Length: Theory and Applications, M.I.T. Press, April 2005, ISBN 0262072629. (This paper puts decision trees in internal nodes of Bayesian networks using Minimum Message Length (MML). An earlier version is Comley and Dowe (2003), .pdf.)
- P.J. Tan and D.L. Dowe (2004), MML Inference of Oblique Decision Trees, Lecture Notes in Artificial Intelligence (LNAI) 3339, Springer-Verlag, pp1082-1088. (This paper uses Minimum Message Length.)
- Eruditionhome Great directory site for data mining resources
- Decision Tree Basics vanguardsw.com
- General Morphological Analysis: A General Method for Non-Quantified Modelling From the Swedish Morphological Society
- decisiontrees.net Interactive Tutorial
- Building Decision Trees in Python From O'Reilly.de:Entscheidungsbaum
it:Albero di decisione nl:Beslissingsboom ja:決定木 pl:Drzewo decyzyjne th:ต้นไม้การตัดสินใจ vi:Cây quyết định zh:决策树