Genetic algorithm
From Free net encyclopedia
A genetic algorithm (GA) is a search technique used in computer science to find approximate solutions to optimization and search problems. Genetic algorithms are a particular class of evolutionary algorithms that use techniques inspired by evolutionary biology such as inheritance, mutation, natural selection, and recombination (or crossover).
Genetic algorithms are typically implemented as a computer simulation in which a population of abstract representations (called chromosomes) of candidate solutions (called individuals) to an optimization problem evolves toward better solutions. Traditionally, solutions are represented in binary as strings of 0s and 1s, but different encodings are also possible. The evolution starts from a population of completely random individuals and happens in generations. In each generation, the fitness of the whole population is evaluated, multiple individuals are stochastically selected from the current population (based on their fitness), modified (mutated or recombined) to form a new population, which becomes current in the next iteration of the algorithm.
Contents |
Operation of a GA
Two elements are required for any problem before a genetic algorithm can be used to search for a solution: First, there must be a method of representing a solution in a manner that can be manipulated by the algorithm. Traditionally, a solution can be represented by a string of bits, numbers, characters or by a special struct. Second, there must be some method of measuring the quality of any proposed solution, using a fitness function.
For instance, if the problem involves fitting as many different weights as possible into a knapsack without breaking it (see knapsack problem), a representation of a solution might be a string of bits, where each bit represents a different weight, and the value of the bit (0 or 1) represents whether or not the weight is added to the knapsack. The fitness of the solution would be measured by determining the total weight of the proposed solution: The higher the weight, the greater the fitness, provided that the solution is possible.
Initialization
Initially many individual solutions are randomly generated to form an initial population. The population size depends on the nature of the problem, but typically contains several hundreds or thousands of possible solutions. Traditionally, the population is generated randomly, covering the entire range of possible solutions (the search space). Occasionally, the solutions may be "seeded" in areas where optimal solutions are likely to be found.
Selection
During each successive epoch, a proportion of the existing population is selected to breed a new generation. Individual solutions are selected through a fitness-based process, where fitter solutions (as measured by a fitness function) are typically more likely to be selected. Certain selection methods rate the fitness of each solution and preferentially select the best solutions. Other methods rate only a random sample of the population, as this process may be very time-consuming.
Most functions are stochastic and designed so that a small proportion of less fit solutions are selected. This helps keep the diversity of the population large, preventing premature convergence on poor solutions. Popular and well-studied selection methods include roulette wheel selection and tournament selection.
Reproduction
- See main articles: crossover (genetic algorithm) and mutation (genetic algorithm)
The next step is to generate a second generation population of solutions from those selected through genetic operators: crossover (or recombination), and mutation.
For each new solution to be produced, a pair of "parent" solutions is selected for breeding from the pool selected previously. By producing a "child" solution using the above methods of crossover and mutation, a new solution is created which typically shares many of the characteristics of its "parents." New parents are selected for each child, and the process continues until a new population of solutions of appropriate size is generated.
These processes ultimately result in the next generation population of chromosomes that is different from the initial generation. Generally the average fitness will have increased by this procedure for the population, since only the best organisms from the first generation are selected for breeding, alongwith a small proportion of less fit solutions, for reasons already mentioned above.
Termination
This generational process is repeated until a termination condition has been reached. Common terminating conditions are
- A solution is found that satisfies minimum criteria
- Fixed number of generations reached
- Allocated budget (computation time/money) reached
- The highest ranking solution's fitness is reaching or has reached a plateau such that successive iterations no longer produce better results
- Manual inspection
- Combinations of the above
Pseudo-code algorithm
Choose initial population Repeat Evaluate the individual fitnesses of a certain proportion of the population Select pairs of best-ranking individuals to reproduce Breed new generation through crossover and mutation Until terminating condition
Observations
There are several general observations about the generation of solutions via a genetic algorithm:
- In many problems with sufficient complexity, GAs may have a tendency to converge towards local optima rather than the global optimum of the problem. The likelihood of this occurring depends on the shape of the fitness landscape: certain problems may provide an easy ascent towards a global optimum, others may make it easier for the function to find the local optima. This problem may be alleviated by using a different fitness function, or by using techniques to maintain a diverse population of solutions.
- Operating on dynamic data sets is difficult, as genomes begin to converge early on towards solutions which may no longer be valid for later data. Several methods have been proposed to remedy this by increasing genetic diversity somehow and preventing early convergence, either by increasing the probability of mutation when the solution quality drops (called triggered hypermutation), or by occasionally introducing entirely new, randomly generated elements into the gene pool (called random immigrants). Recent research has also shown the benefits of using biological exaptation (or preadaptation) in solving this problem.
- Selection is clearly an important genetic operator, but opinion is divided over the importance of crossover versus mutation. Some argue that crossover is the most important, while mutation is only necessary to ensure that potential solutions are not lost. Others argue that crossover in a largely uniform population only serves to propagate innovations originally found by mutation, and in a non-uniform population crossover is nearly always equivalent to a very large mutation (which is likely to be catastrophic).
- Often, GAs can rapidly locate good solutions, even for difficult search spaces.
- For specific optimization problems and problem instantiations, simpler optimization algorithms may find better solutions than genetic algorithms (given the same amount of computation time). GA practitioners may wish to try other algorithms in addition to GAs.
- GAs cannot effectively solve problems in which there is no way to judge the fitness of an answer other than right/wrong, as there is no way to converge on the solution. These problems are often called "needle in a haystack" problems.
- As with all current machine learning problems it is worth tuning the parameters such as mutation probability, recombination probability and population size to find reasonable settings for the problem class you are working on. A very small mutation rate may lead to genetic drift (which is non-ergodic in nature) or premature convergence of the genetic algorithm in a local optimum. A mutation rate that is too high may lead to loss of good solutions. There are theoretical but not yet practical upper and lower bounds for these parameters that can help guide selection.
- How the fitness function is evaluated is an important factor in speed and efficiency of the algorithm.
Variants
The simplest algorithm represents each chromosome as a bit string. Typically, numeric parameters can be represented by integers, though it is possible to use floating point representations. The basic algorithm performs crossover and mutation at the bit level. Other variants treat the chromosome as a list of numbers which are indexes into an instruction table, nodes in a linked list, hashes, objects, or any other imaginable data structure. Crossover and mutation are performed so as to respect data element boundaries. For most data types, specific variation operators can be designed. Different chromosomal data types seem to work better or worse for different specific problem domains.
When bit strings representations of integers are used, Gray coding is often employed. In this way, small changes in the integer can be readily effected through mutations or crossovers. This has been found to help prevent premature convergence at so called Hamming walls, in which too many simultaneous mutations (or crossover events) must occur in order to change the chromosome to a better solution.
Other approaches involve using arrays of real-valued numbers instead of bit strings to represent chromosomes. Theoretically, the smaller the alphabet, the better the performance, but paradoxically, good results have been obtained from using real-valued chromosomes.
A slight, but very successful variant of the general process of constructing a new population is to allow some of the better organisms from the current generation to carry over to the next, unaltered. This strategy is known as elitist selection.
Parallel implementations of genetic algorithms come in two flavours. Coarse grained parallel genetic algorithms assume a population on each of the computer nodes and migration of individuals among the nodes. Fine grained parallel genetic algorithms assume an individual on each processor node which acts with neighboring individuals for selection and reproduction. Other variants, like genetic algorithms for online optimization problems, introduce time-dependence or noise in the fitness function.
Problem domains
Problems which appear to be particularly appropriate for solution by genetic algorithms include timetabling and scheduling problems, and many scheduling software packages are based on GAs. GAs have also been applied to engineering. Genetic algorithms are often applied as an approach to solve global optimization problems.
As a general rule of thumb genetic algorithms might be useful in problem domains that have a complex fitness landscape as recombination is designed to move the population away from local minima that a traditional hill climbing algorithm might get stuck in.
History
Genetic algorithms originated from the studies of cellular automata, conducted by John Holland and his colleagues at the University of Michigan. Research in GAs remained largely theoretical until the mid-1980s, when The First International Conference on Genetic Algorithms was held at The University of Illinois. As academic interest grew, the dramatic increase in desktop computational power allowed for practical application of the new technique. In 1989, The New York Times writer John Markoff wrote about Evolver, the first commercially available desktop genetic algorithm. Custom computer applications began to emerge in a wide variety of fields, and these algorithms are now used by a majority of Fortune 500 companies to solve difficult scheduling, data fitting, trend spotting and budgeting problems, and virtually any other type of combinatorial optimization problem.
Applications
- Artificial Creativity
- Automated design, including research on composite material design and multi-objective design of automotive components for crashworthiness, weight savings, and other characteristics.
- Automated design of mechatronic systems using bond graphs and genetic programming (NSF).
- Automated design of industrial equipment using catalogs of exemplar lever patterns.
- Calculation of Bound States and Local Density Approximations.
- Configuration applications, particularly physics applications of optimal molecule configurations for particular systems like C60 (buckyballs).
- Container loading optimization.
- Code-breaking, using the GA to search large solution spaces of ciphers for the one correct decryption.
- Design of water distribution systems.
- Distributed computer network topologies.
- Electronic circuit design, known as Evolvable hardware.
- File allocation for a distributed system.
- Parallelization of GAs/GPs including use of hierarchical decomposition of problem domains and design spaces nesting of irregular shapes using feature matching and GAs.
- Game Theory Equilibrium Resolution.
- Learning Robot behavior using Genetic Algorithms.
- Learning fuzzy rule base using genetic algorithms.
- Linguistic analysis, including Grammar Induction and other aspects of Natural Language Processing (NLP) such as word sense disambiguation.
- Mobile communications infrastructure optimization.
- Molecular Structure Optimization (Chemistry).
- Multiple population topologies and interchange methodologies.
- Optimisation of data compression systems, for example using wavelets.
- Protein folding and protein/ligand docking.
- Plant floor layout.
- Scheduling applications, including job-shop scheduling. The objective being to schedule jobs in a sequence dependent or non-sequence dependent setup environment in order to maximize the volume of production while minimizing penalties such as tardiness.
- Software engineering
- Solving the machine-component grouping problem required for cellular manufacturing systems.
- Tactical asset allocation and international equity strategies.
- Timetabling problems, such as designing a non-conflicting class timetable for a large university.
- Training artificial neural networks when pre-classified training examples are not readily obtainable (neuroevolution).
- Traveling Salesman Problem.
Related techniques
Genetic programming (GP) is a related technique popularized by John Koza, in which computer programs, rather than function parameters, are optimized. Genetic programming often uses tree-based internal data structures to represent the computer programs for adaptation instead of the list, or array, structures typical of genetic algorithms.
Interactive genetic algorithms (IGA) are genetic algorithms that use human evaluation. They are usually applied to domains where it is hard to design a computational fitness function, for example, evolving images, music, artistic designs and forms to fit users' aesthetic preference.
Simulated annealing (SA) is a related global optimization technique which traverses the search space by testing random mutations on an individual solution. A mutation that increases fitness is always accepted. A mutation which lowers fitness is accepted probabilistically based on the difference in fitness and a decreasing temperature parameter. In SA parlance, one speaks of seeking the lowest energy instead of the maximum fitness. SA can also be used within a standard GA algorithm, simply by starting with a relatively high rate of mutation, which decreases over time along a given schedule.
Tabu search (TS) is similar to Simulated Annealing, in that both traverse the solution space by testing mutations of an individual solution. While simulated annealing generates only one mutated solution, tabu search generates many mutated solutions and moves to the solution with the lowest fitness of those generated. In order to prevent cycling and encourage greater movement through the solution space, a tabu list is maintained of partial or complete solutions. It is forbidden to move to a solution that contains elements of the tabu list, which is updated as the solution traverses the solution space.
Ant colony optimization (ACO) uses many ants (or agents) to traverse the solution space and find locally productive areas. While usually inferior to genetic algorithms and other forms of local search, it is able to produce results in problems where no global or up-to-date perspective can be obtained, and thus the other methods cannot be applied.
Memetic algorithm (MA) is a term that is used by some researchers to define forms of genetic algorithm that are combined with other forms of local search, such as simulated annealing.
Building block hypothesis
[Goldberg, 1989, page 41], talking of binary bit string genetic algorithms, says "Short, low order, and highly fit schemata are sampled, recombined (crossed over), and resampled to form strings of potentially higher fitness. In a way, by working with these particular schemata (the building blocks), we have reduced the complexity of our problem; instead of building high-performance strings by trying every conceivable combination, we construct better and better strings from the best partial solutions of past samplings." "Just as a child creates magnificent fortresses through the arrangement of simple blocks of wood (building blocks), so does a genetic algorithm seek near optimal performance through the juxtaposition of short, low-order, high-performance schemata, or building blocks. Note [Goldberg, 1989] suggests that building blocks are highly fit schemata with only a few defined bits (low order), and that these are close together (short).
References
- Fentress, Sam W (2005), Exaptation as a means of evolving complex solutions, MA Thesis, University of Edinburgh. (pdf)
- Goldberg, David E (1989), Genetic Algorithms in Search, Optimization and Machine Learning, Kluwer Academic Publishers, Boston, MA.
- Goldberg, David E (2002), The Design of Innovation: Lessons from and for Competent Genetic Algorithms, Addison-Wesley, Reading, MA.
- Harvey, Inman (1992), Species Adaptation Genetic Algorithms: A basis for a continuing SAGA, in 'Toward a Practice of Autonomous Systems: Proceedings of the First European Conference on Artificial Life', F.J. Varela and P. Bourgine (eds.), MIT Press/Bradford Books, Cambridge, MA, pp. 346-354.
- Holland, John H (1975), "Adaptation in Natural and Artificial Systems", University of Michigan Press, Ann Arbor
- Koza, John (1992), Genetic Programming: On the Programming of Computers by Means of Natural Selection
- Matthews, Robert A J (1993), The use of genetic algorithms in cryptanalysis, Cryptologia vol 17 187-201
- Michalewicz, Zbigniew (1999), Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag.
- Mitchell, Melanie, (1996), An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA.
- Schmitt, Lothar M (2001), Theory of Genetic Algorithms, Theoretical Computer Science (259), pp. 1-61
- Schmitt, Lothar M (2004), Theory of Genetic Algorithms II: models for genetic operators over the string-tensor representation of populations and convergence to global optima for arbitrary fitness function under scaling, Theoretical Computer Science (310), pp. 181-231
- Vose, Michael D (1999), The Simple Genetic Algorithm: Foundations and Theory, MIT Press, Cambridge, MA.
- Whitley, D. (1994). A genetic algorithm tutorial. Statistics and Computing 4,
65–85.
External links
- Applying Genetic Algorithms to Multi-objective Land-Use Planning within the scope of Land Allocation Decision Support System, Keith Matthews.
- Follow-the-wall Robot - An interactive applet that allows you to experiment with genetic algorithms.
- http://cs.felk.cvut.cz/~xobitko/ga/ - An online introduction to genetic algorithms with Java applets
- IlliGAL - Illinois Genetic Algorithms Laboratory - Download technical reports and code
- HeuristicLab - A free .NET environment for heuristic optimization (Genetic Algorithms, Genetic Programming, Evolution Strategies, Simulated Annealing, Tabu Search, etc.)
- Golem Project - Automatic Design and Manufacture of Robotic Lifeforms
- Optimal Allocation of Fixed and Switched Capacitors for Unbalanced Radial Distribution Feeders Using Genetic Algorithm
- Introduction to Genetic Algorithms Using RPL2
- Talk.Origins FAQ on the uses of genetic algorithms, by Adam Marczyk
- Genetic algorithm in search and optimization, by Richard Baker
- Differential Evolution using Genetic Algorithm
- Genetic algorithms and Markov Chain Monte Carlo: Differential Evolution makes Bayesian computing easy (genetic algorithm for statistical analysis)
- Introduction to Genetic Algorithms and Neural Networks including an example windows program
- Genetic Algorithm Solves the Toads and Frogs Puzzle at cut-the-knot
- Not-So-Mad Science: Genetic Algorithms and Web Page Design for Marketers by Matthew Syrett
- MIT OpenCourseWare | Health Sciences and Technology | HST.508 Genomics and Computational Biology, Fall 2002 | Lecture Notes
- A Genetic Algorithm Tutorial by Darrell Whitley Computer Science Department Colorado State University An excellent tutorial with lots of theory
- Evolutionary computing in search-based software engineering. Leo Rela. How evolutionary computing (specifically GA) is used in software engineering.
- Commercial application of genetic algorithm software for database marketing, direct marketing list development, and predictive modeling
- Genetic Algorithm Applet A Java Applet that allows you to experiment with genetic algorithms. Source code included.
- Trainable Mechanical Arm Physical simulation of a mechanical arm and a ball. Using genetic algorithms it learns how to perform a given exercise (e.g. throw the ball as far as possible).
- A video showing a 20 generation evolution of a natural walking legs model
- VisualBots - Freeware multi-agent simulator in Microsoft Excel. Sample programs include genetic algorithm, ACO, and simulated annealing solutions to TSP.
- Genetics Studio - Freeware numeric genetic algorithm utility for the novice to advanced user.
- GA for Excel - Freeware GA application for constrained problems in MS Excel worksheets.
- PIKAIA- a free genetic algorithm subroutine developed in Fortran. PIKAIA is also available in an Excel/VBA version and a parallel processing version.
- DGPF - The Distributed Genetic Programming Framework provides a multi-objective Genetic Algorithm library with many distribution abilities: client/server, p2p, hybrid.ar:خوارزميات وراثية
cs:Genetický algoritmus da:Genetisk algoritme de:Genetischer Algorithmus es:Algoritmo genético fa:الگوریتم ژنتیک fr:Algorithme génétique gl:Algoritmo xenético it:Algoritmo genetico he:אלגוריתמים גנטיים lt:Genetiniai algoritmai nl:Genetisch algoritme ja:遺伝的アルゴリズム pl:Algorytm genetyczny pt:Algoritmo genético ru:Генетический алгоритм th:ขั้นตอนวิธีเชิงพันธุกรรม vi:Giải thuật di truyền zh:遗传算法