|
[back]
Distributed
Learning in Swarm Systems
Ling Li, Alcherio
Martinoli, Yaser Abu-Mostafa
Abstract. Distributed learning is the learning process of multiple
autonomous agents in a varying environment, where each agent may have
only partial information about the environment and other agents. We
model the system and individual agents, then use several techniques
such as reinforcement learning to find the optimal strategy for each
agent in order to maximize the group performance. Our experiments with
the stick-pulling problem showed agents became specialized automatically.
Motivation and Aims. Natural systems consisting of many agents,
such as ants, wasps, and termites, appear to have the ability to transcend
the constituent individual agents. Scalability, flexibility and robustness
are three main advantages for such swarm intelligence systems.
We would like to apply the principles inspired from these natural systems
to distributed problems, such as the control of a swarm of robots. We
would also like to investigate the role of individual learning capabilities
on the emerging collective behavior.
Research. We looked at the stick-pulling problem where multiple
robots in an arena worked on a task that cannot be done without collaboration.
Besides experimenting with real robots, we also used a probabilistic
model in simulation. The probabilistic model describes the experiment
as a series of stochastic events with probabilities based on simple
geometrical considerations and systematic experiments with one or two
real robots, and is pretty fast in simulation. The learning task here
is to find the optimal parameter (gripping time) for each robot, in
order to maximize the group performance.
By different means of communication, robots can use public or private
knowledge; by different feedback from the environment, the learning
can be conducted with group reinforcement or individual reinforcement;
by forcing all the robots to be the same or not, the parameters setting
could be homogeneous or heterogeneous. We try to analyze and design
different learning algorithms for different combinations.
Achievements. We investigated the case without communication (private
knowledge). The learning was conducted under individual reinforcement.
Several methods, such as adaptive line search and Q-learning, were used
to find the optimal gripping time. Below is a plot of performance v.s.
initial gripping time. The augmented solid curves show that the performance
does increase with learning.

Figure 1. The performance (collaboration rate) with learning.
Different colors represent experiments with different number of robots.
Robots were initially given a gripping time. With learning, they adjusted
their gripping time and achieved a higher performance. Error bars are
standard deviations of performance over 50 runs. Dashed curves are performance
without learning.
The
results also showed that after learning the robots usually became specialized.
This is quite interesting since we never incorporated preference for
specialization in the learning algorithm, and there was no communication
among the robots. Ijspeert et al. showed with a systematic study in
[1] that under certain contraints there is an advantage in being specialized.

Figure
2. During one simulation, 4 robots had 210s as the initial gripping
time, and they were specialized at the end of the simulation.
References
Collaboration through the exploitation of local interactions in autonomous
collective robotics: the stick pulling experiment. A. J. Ijspeert,
A. Martinoli, A. Billard, and L. M. Gambardella. Autonomous Robots,
11(2):149--171, 2001.
A Macroscopic Analytical Model of Collaboration in Distributed Robotic
Systems. K. Lerman, A. Galstyan, A. Martinoli, and A. J. Ijspeert.
Artificial Life. MIT Press. To appear.
top
|