The supervisor is responsible for starting, stopping and monitoring its child
processes. The basic idea of a supervisor is that it should keep its child
processes alive by restarting them when necessary.
The children of a supervisor is defined as a list of child
specifications. When the supervisor is started, the child processes are
started in order from left to right according to this list. When the supervisor
terminates, it first terminates its child processes in reversed start order,
from right to left.
A supervisor can have one of the following restart
strategies:
-
one_for_one - if one child process terminates and
should be restarted, only that child process is affected.
-
one_for_all - if one child process terminates and
should be restarted, all other child processes are terminated and then all
child processes are restarted.
-
rest_for_one - if one child process terminates
and should be restarted, the ‘rest‘ of the child processes -- i.e. the child
processes after the terminated child process in the start order -- are
terminated. Then the terminated child process and all child processes after it
are restarted.
-
simple_one_for_one - a simplified one_for_one
supervisor, where all child processes are dynamically added instances of the
same process type, i.e. running the same code.
The functions delete_child/2 and restart_child/2
are invalid for simple_one_for_one supervisors and
will return {error,simple_one_for_one} if the
specified supervisor uses this restart strategy.
The function terminate_child/2 can be used for
children under simple_one_for_one supervisors by
giving the child‘s pid() as the second argument. If
instead the child specification identifier is used, terminate_child/2 will return {error,simple_one_for_one}.
Because a simple_one_for_one supervisor could
have many children, it shuts them all down at same time. So, order in which
they are stopped is not defined. For the same reason, it could have an
overhead with regards to the Shutdown
strategy.
To prevent a supervisor from getting into an infinite loop of child process
terminations and restarts, a maximum restart frequency is
defined using two integer values MaxR and MaxT.
If more than MaxR restarts occur within MaxT
seconds, the supervisor terminates all child processes and then itself.
This is the type definition of a child specification:
child_spec() = {Id,StartFunc,Restart,Shutdown,Type,Modules}
Id = term()
StartFunc = {M,F,A}
M = F = atom()
A = [term()]
Restart = permanent | transient | temporary
Shutdown = brutal_kill | int()>0 | infinity
Type = worker | supervisor
Modules = [Module] | dynamic
Module = atom()
-
Id is a name that is used to identify the child
specification internally by the supervisor.
-
StartFunc defines the function call used to start
the child process. It should be a module-function-arguments tuple {M,F,A}
used as apply(M,F,A).
The start function must create
and link to the child process, and
should return {ok,Child} or {ok,Child,Info} where Child is
the pid of the child process and Info an arbitrary
term which is ignored by the supervisor.It should be (or result in) a call to
supervisor:start_link, gen_server:start_link, gen_fsm:start_link or gen_event:start_link. (Or a function compliant with these
functions, see supervisor(3) for details.
-
The start function can also return ignore if the
child process for some reason cannot be started, in which case the child
specification will be kept by the supervisor (unless it is a temporary child)
but the non-existing child process will be ignored.
If something goes wrong, the function may also return an error tuple {error,Error}.
Note that the start_link functions of the
different behaviour modules fulfill the above requirements.
-
Restart defines when a terminated child process
should be restarted. A permanent child process
should always be restarted, a temporary child
process should never be restarted (even when the supervisor‘s restart strategy
is rest_for_one or one_for_all and a sibling‘s death causes the temporary
process to be terminated) and a transient child
process should be restarted only if it terminates abnormally, i.e. with
another exit reason than normal, shutdown or {shutdown,Term}.
-
Shutdown defines how a child process should be
terminated. brutal_kill means the child process will
be unconditionally terminated using exit(Child,kill). An integer timeout value means that the
supervisor will tell the child process to terminate by calling exit(Child,shutdown)
and then wait for an exit signal with reason shutdown back from the child process. If no exit signal is
received within the specified number of milliseconds, the child process is
unconditionally terminated using exit(Child,kill).
If the child process is another supervisor, Shutdown should be set to infinity to give the subtree ample time to shutdown. It is
also allowed to set it to infinity, if the child
process is a worker.
Warning
Be careful by setting the Shutdown strategy to
infinity when the child process is a worker.
Because, in this situation, the termination of the supervision tree depends on
the child process, it must be implemented in a safe way and its cleanup
procedure must always return.
Note that all child processes implemented using the standard OTP behavior
modules automatically adhere to the shutdown protocol.
-
Type specifies if the child process is a
supervisor or a worker.
-
Modules is used by the release handler during
code replacement to determine which processes are using a certain module. As a
rule of thumb Modules should be a list with one
element [Module], where Module is the callback module, if the child process is a
supervisor, gen_server or gen_fsm. If the child process is an event manager
(gen_event) with a dynamic set of callback modules, Modules should be dynamic. See
OTP
Design Principles for more information about release
handling.
-
Internally, the supervisor also keeps track of the pid Child of the child process, or undefined if no pid exists.