Home

Caltech
Center for Neuromorphic Systems Engineering

Home
Research
News
People

 

[back]

Distributed Learning in Swarm Systems
Ling Li, Alcherio Martinoli, Yaser Abu-Mostafa


Abstract. Distributed learning is the learning process of multiple autonomous agents in a varying environment, where each agent may have only partial information about the environment and other agents. We model the system and individual agents, then use several techniques such as reinforcement learning to find the optimal strategy for each agent in order to maximize the group performance. Our experiments with the stick-pulling problem showed agents became specialized automatically.

Motivation and Aims.
Natural systems consisting of many agents, such as ants, wasps, and termites, appear to have the ability to transcend the constituent individual agents. Scalability, flexibility and robustness are three main advantages for such swarm intelligence systems.

We would like to apply the principles inspired from these natural systems to distributed problems, such as the control of a swarm of robots. We would also like to investigate the role of individual learning capabilities on the emerging collective behavior.

Research. We looked at the stick-pulling problem where multiple robots in an arena worked on a task that cannot be done without collaboration. Besides experimenting with real robots, we also used a probabilistic model in simulation. The probabilistic model describes the experiment as a series of stochastic events with probabilities based on simple geometrical considerations and systematic experiments with one or two real robots, and is pretty fast in simulation. The learning task here is to find the optimal parameter (gripping time) for each robot, in order to maximize the group performance.

By different means of communication, robots can use public or private knowledge; by different feedback from the environment, the learning can be conducted with group reinforcement or individual reinforcement; by forcing all the robots to be the same or not, the parameters setting could be homogeneous or heterogeneous. We try to analyze and design different learning algorithms for different combinations.

Achievements.
We investigated the case without communication (private knowledge). The learning was conducted under individual reinforcement. Several methods, such as adaptive line search and Q-learning, were used to find the optimal gripping time. Below is a plot of performance v.s. initial gripping time. The augmented solid curves show that the performance does increase with learning.

Figure 1. The performance (collaboration rate) with learning. Different colors represent experiments with different number of robots. Robots were initially given a gripping time. With learning, they adjusted their gripping time and achieved a higher performance. Error bars are standard deviations of performance over 50 runs. Dashed curves are performance without learning.

The results also showed that after learning the robots usually became specialized. This is quite interesting since we never incorporated preference for specialization in the learning algorithm, and there was no communication among the robots. Ijspeert et al. showed with a systematic study in [1] that under certain contraints there is an advantage in being specialized.

Figure 2. During one simulation, 4 robots had 210s as the initial gripping time, and they were specialized at the end of the simulation.

References
Collaboration through the exploitation of local interactions in autonomous collective robotics: the stick pulling experiment. A. J. Ijspeert, A. Martinoli, A. Billard, and L. M. Gambardella. Autonomous Robots, 11(2):149--171, 2001.

A Macroscopic Analytical Model of Collaboration in Distributed Robotic Systems. K. Lerman, A. Galstyan, A. Martinoli, and A. J. Ijspeert. Artificial Life. MIT Press. To appear.



top