Reinforcement Learning in Coevolution Models

QINXIN PAN

1. What I asked?

Most biological networks such as gene regulatory network, protein-protein interaction networks etc are scale free. They have only a few hubs and most of the nodes only low degree. Study shown that the scale free structure makes the network relatively robust than random networks. Here I asked the question that whether the scale free structure property brings in any co-evolutionary advantages. So I plan to test the coevolution of scalefree& scale free and random&random networks.

2. What I achieved?

2.1 I implemented the networks with scale free structure.

We use threshold random Boolean network to simulate gene regulatory networks. As shown in the figure below, each node has its own regulators. Each edge has a weight. The node represent a gene, it is either on or off, means that the gene is either expressed or not. The edges indicate the regulating relationships between different genes. For each node, the state at next time point will be decided by its regulators at this step. Given an initial state, we can run the network into next time point again and again according to the regulating relationships. Since we only have limited number of nodes, and each node has only 2 states, the total possible combinations of all states of all the nodes, which is a configuration, is limited. So at certain time point, the network will hit a configuration which has already shown up before and thus after that the network will keep repeating those configurations. Those configurations being repeated is the attractor of the network. I generated those networks with population size 50, run them into attractors and measured their fitness. As shown in Fig.1, the networks I generated have scale free structure, its output degree distribution is powerlaw and indegree distribution is poission.

Pulpit rock

Figure1. The out&in degree distribution of the networks constructed. Each population has 50 networks and each network has 30 nodes.

2.2 I implemented the coevolution algorithm.

To mimic the collaborative relationships between different species within biological world, I evolve two populations at the same time and allow them to interact as shown in Fig.3.

Pulpit rock

Figure2. The dynamics& Evolution of networks. Network Construction and Evolutionary Algorithm. (a) Each node has its regulators and each weight has a weight uniformly distributed between (-1,1). For each node, the state at next time point will be decided by the weight of its coming in edges and the states of its regulators at current time point. (b) For one network, given the initial configuration, it will reach its attractor at certain point and start repeating. Choose one node as output node randomly, define its states in the attractor as output of network. Set an arbitrary binary sequence as target function, fitness can be measured based on the hamming distance between output of the network and the target function. (c)shows how the mutation work in the evolutionary algorithm. Mutations can happen on both edge existence and edge weights with fixed probability u=0.02. One individual at mother generation will generate 3 mutated offsprings. Together with the mother itself, the 4 networks will go through a tournament and the one with best fitness will be copied to the daughter generation. (Figure from Oikonomou etc.)

I implemented the evolutionary algorithm. Can as shown in Fig.4, the populations are evolving well. The average of 50 runs with population size 50 shown that the populations evolve an average fitness ~0.9 within 1000 generations.

Pulpit rock

Fig.3 Coevolution of 2 populations. At every generation, every individual from population1 with size N will generate 3 mutants and together with the un-mutated networks, the population1 will expand to size 4N. For each individual, the output function will be combined with the output function of a randomly drawed representative from population2 into a bigger output function. Measure the distance between this big output function and target function, the fitness will be assigned to the individual in population1. Then the best N in the 4N are chosen and put into the next generation. Symmetrically, population2 can be generated by collaborating with representative from population1.

Pulpit rock

Figure4.Fitness of scalefree& scalefree coevolution. Data are reported every 20 generations and the average of 50 repeats.

3. What is left?

I already implemented those code and started running experiments on cluster. By final presentation, I aimed to do the 2 things below.

3.1 Compare the learning curve of scalefree& scalefree networks and radom& random networks, see whether scalefree combination show any advantage.

3.2 Test how robust the learning curves of the two combinations are to different average connectivity of the networks.