Road to ML Engineer #72

In the previous article, we covered GA, ES, CMA-ES, and OpenAI-ES in the context of finding global maxima of Schaffer and Rastrigin functions and other relevant evolutionary algorithms in neuroevolution. Building on those discussions, in this article, we will discuss the applications of those algorithms in evolving neural networks.

Fixed-Topology Neuroevolution

The simplest approach to applying EAs to neural networks is to fix the topology of the networks and focus on optimizing the weights. The direct encoding of network weights as real-valued vectors allows us to utilize any of the EAs we discussed so far like we did for Schaffer and Rastrigin functions, and the fitness can be computed by running the resulting networks and measuring the objectives or rewards. EvoJAX, the toolkit built on JAX for hardware-accelerated neuroevolution that we will make use of, provides us with implementations of various optimization algorithms including EAs like CMA-ES and OpenAI-ES on fixed-topology neuroevolution for various supervised learning, control, and other novel tasks. I highly recommend trying to experiment with EvoJAX.

Although fixed-topology neuroevolution reportedly achieved robust performance in some tasks (we will discuss some results in a later section), we would ideally want to optimize for other hyperparameters including network topology and obtain regularized topology and weight evolving ANNs (TWEANNs), which have the potential of producing novel and compact architectures, reducing the compute required, and achieving even more robust performance. However, it is not intuitive to represent topologies with real-valued vectors and apply ES on TWEANNs, and there are other unique challenges associated with TWEANNs that prevent us from naively applying evolutionary algorithms.

Challenges of TWEANNs

The first unique challenge of TWEANNs is permutation invariance of neural networks, where two functionally identical networks can be represented in different orders like how a network with hidden nodes (A, B, C) can also be represented with (C, B, A). This not only leads to duplicated evaluations of the same networks but also potentially disrupts cross-over of networks (cross over between (A, B, C) and (C, B, A) should ideally be limited to (A, B, C) or (C, B, A) for preserving the functionalities but can result in (A, B, A) or (C, B, C)), which is an important variational operator for diverse search.

The second unique challenge of TWEANNs is that adding new structure to neural networks often hurts performance initially, leading to new networks with added complexity being dominated by the surviving simpler networks. This causes the loss of new structural innovations and limits the search space significantly. Finally, using randomly generated large networks in the initial population can lead to an explosive increase in initial search dimension and unnecessarily large networks depending on the tasks. Therefore, in order to achieve successful TWEANNs, we need mechanisms that are capable of handling competing encodings, protecting new structural mutations, and gradually increasing complexity from minimal solutions.

NEAT

Neuroevolution of Augmenting Topologies (NEAT) gained traction as the GA for evolving both weights and topologies of neural networks that addressed the challenges mentioned above with its genetic encoding with historical marking, speciation of similar topologies, and incremental complexification approach. NEAT encodes a neural network with a list of node gene objects (attributed by node ID and optionally with bias and activation) and a list of connection gene objects (attributed by innovation number, source and target node IDs, weight, and whether the connection is enabled or not) corresponding to an adjacency list. Here, it assigns a unique innovation number to each newly mutated connection gene for tracking the historical origin of the gene, which enables proper crossover that matches the connection genes without expensive matching algorithms.

Appropriate matching of connection genes that share the same historical origin using innovation numbers also enables simple computation of structural differences between the genomes, which can be used to group genomes into species and protect new structural innovations. NEAT specifically uses the number of disjoint and excess connection genes $D$ and $E$ (connection genes that only appear in one of the genome pair) and average weight distance $\bar{W}$ of the matching connection genes to compute the genomic distance $\delta=(c_1/N) E + (c_2/N) D + c_3\bar{W}$ (where $c_i$ is the set of hyperparameters) and groups genomes within the specified threshold $\delta_t$ into the same species. NEAT also uses explicit fitness sharing, which normalizes the fitness within the species, to prevent a species from dominating the population and protect new species.

Since the speciation protects new structural innovations, NEAT can comfortably start evolving the topologies of the networks from networks with minimal structure. NEAT gradually complexifies the networks with minimal structure using structural mutations that add links (adding new connection genes) or add neurons in between a randomly chosen connection (disabling the chosen connection, adding new node genes, and adding new connection genes that connect the previous source and target nodes to the new node). The evolution starting from networks with minimal structure can lead to effective exploitation and faster optimization and discovery of compact and performant networks.

Similarly to other neuroevolution algorithms, NEAT can also be implemented so that fitness computation of individuals is parallelizable by padding the neural networks to fixed width and depth (though this limits the size of the evolved network). The skip connections are achievable by keeping the activations of all previous layers with identity weights and zero biases. As a quick experiment, I implemented NEAT in this way and evolved 1000 networks on the easy Cart Pole task in EvoJAX, where fitness of the individuals is computed by averaging the reward of 10 randomly initialized games (1000 timesteps). (All games were parallelized with vectorization. I cannot disclose my implementation here so I highly recommend trying to implement it yourself.)

Here, NEAT managed to find a compact network that is able to balance the pole at the center the whole time in generations without fine-tuning the hyperparameters (scored a minimum of 2834.66 and average of 2882.02 in 3000 timesteps with standard deviation of 15.79 over 100 randomly initialized test games). The original paper by Stanley, O. K. & Miikkulainen, R. (2002) also confirms NEAT's relatively higher performance and efficiency compared to other algorithms in finding compact networks for simple XOR classification (supervised task) and gradually harder double pole balancing task (control task). The importance of crossover, speciation, and evolution from minimal networks in NEAT was also confirmed by the ablation studies of the original paper, which observed performance decline after removing each of those components. Moreover, NEAT offers the history of evolution and interprets the impact of structural mutations, making it appealing in various potential applications.

Deep Neuroevolution

Although much of the focus of neuroevolution including NEAT was on evolving shallower neural networks, we can also apply neuroevolution to deep neural networks and do so well and fast with the use of modern compute. For example, OpenAI-ES parallelized across thousands of workers by Ho, S. & Chen, X., et al. (2017) achieved good performance on complex continuous control tasks like 3D humanoid walking in 10 minutes and competitive results in Atari games within an hour, and simple GA with truncated selection and elitism by Such, P. & Madhavan, C., et al. (2017) achieved competitive results to ES and RL methods (DQN, A3C, etc.) in training neural networks with over 4 million parameters to play Atari games from pixels alone and even outperformed other approaches in longer games.

In the latter example, it was also found that a random search variation, dominated by GA, surprisingly outperformed DQN and A3C in some games tested, further suggesting the drawbacks of following gradients in some tasks that involve noisy, delayed, sparse reward structures and the robustness of (deep) neuroevolution in those tasks. On top of the robustness, neuroevolution also offers potentially interpretable evolution history and tends to produce explainable and regularized networks instead of networks working with highly distributed embeddings and overfitting to the dataset. Moreover, as mentioned previously and demonstrated by EvoJAX and NEAT, fitness computation of individuals in the population in neuroevolution is easily parallelizable with vectorization and can naturally make use of TPUs/GPUs unlike some traditional RL algorithms. Hence, research in neuroevolution and its synergies with other techniques and compute is strongly encouraged and ongoing across various domains.

Conclusion

In this article, we briefly discussed fixed-topology neuroevolution and unique challenges of TWEANNs, and introduced NEAT that addresses the unique challenges primarily with its encoding. We also discussed results of deep neuroevolution in some tasks and benefits of neuroevolution that motivate further research in neuroevolution. As mentioned repeatedly, I highly recommend checking out resources cited below for more details and trying to implement the neuroevolution algorithms yourself.

Resources

Risi, S. et al. 2025. Neuroevolution: Harnessing Creativity in AI Agent Design. Neuroevolution.
Stanley, O. K. & Miikkulainen, R. 2002. Evolving Neural Networks through Augmenting Topologies.The MIT Press Journals.
Tang, Y. & Tiang, Y. & Ha, D. 2022. EvoJAX: Hardware-Accelerated Neuroevolution. GitHub.