From da96ae7c1eff8d12adad8f1384751ac012ee1454 Mon Sep 17 00:00:00 2001 From: Oleg Samarin Date: Fri, 2 Oct 2020 15:06:43 +0300 Subject: [PATCH 1/6] Unable to restart foundationdb after fdbmonitor has died #3838 --- packaging/rpm/foundationdb.service | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/packaging/rpm/foundationdb.service b/packaging/rpm/foundationdb.service index 511436360a..c01a117a07 100755 --- a/packaging/rpm/foundationdb.service +++ b/packaging/rpm/foundationdb.service @@ -7,7 +7,9 @@ Wants=network-online.target Type=forking PIDFile=/var/run/fdbmonitor.pid ExecStart=/usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid --daemonize -KillMode=process +KillMode=mixed +Restart=on-failure +RestartSec=5s [Install] WantedBy=multi-user.target From 91f9c7a3fe8d29f912a6a00d1eac7bb0c5041598 Mon Sep 17 00:00:00 2001 From: Oleg Samarin Date: Thu, 8 Oct 2020 13:02:16 +0300 Subject: [PATCH 2/6] Changed defaults for RestartSec --- packaging/rpm/foundationdb.service | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/packaging/rpm/foundationdb.service b/packaging/rpm/foundationdb.service index c01a117a07..6f5261c670 100755 --- a/packaging/rpm/foundationdb.service +++ b/packaging/rpm/foundationdb.service @@ -9,7 +9,7 @@ PIDFile=/var/run/fdbmonitor.pid ExecStart=/usr/lib/foundationdb/fdbmonitor --conffile /etc/foundationdb/foundationdb.conf --lockfile /var/run/fdbmonitor.pid --daemonize KillMode=mixed Restart=on-failure -RestartSec=5s +RestartSec=60s [Install] WantedBy=multi-user.target From 43f538ff8b92cf5908c9afa5236c489e696a6e50 Mon Sep 17 00:00:00 2001 From: Oleg Samarin Date: Fri, 9 Oct 2020 10:46:55 +0300 Subject: [PATCH 3/6] Documented fdb processes restart behavior --- .../sphinx/source/administration.rst | 9 ++++++-- documentation/sphinx/source/configuration.rst | 21 +++++++++++++++++++ 2 files changed, 28 insertions(+), 2 deletions(-) diff --git a/documentation/sphinx/source/administration.rst b/documentation/sphinx/source/administration.rst index 297e70046f..8aacc2d302 100644 --- a/documentation/sphinx/source/administration.rst +++ b/documentation/sphinx/source/administration.rst @@ -28,8 +28,6 @@ Starting and stopping After installation, FoundationDB is set to start automatically. You can manually start and stop the database with the commands shown below. -These commands start and stop the master ``fdbmonitor`` process, which in turn starts ``fdbserver`` and ``backup-agent`` processes. See :ref:`administration_fdbmonitor` for details. - Linux ----- @@ -58,6 +56,13 @@ It can be stopped and prevented from starting at boot as follows:: host:~ user$ sudo launchctl unload -w /Library/LaunchDaemons/com.foundationdb.fdbmonitor.plist +Start, stop and restart behaviour +================================= + +These commands above start and stop the master ``fdbmonitor`` process, which in turn starts ``fdbserver`` and ``backup-agent`` processes. See :ref:`administration_fdbmonitor` for details. +After any of child processes has terminated by any reason, ``fdbmonitor`` tries to restart it. See :ref:`restarting parameters ` +When ``fdbmonitor`` itself is killed unexpectedly (for example, by the ``out-of-memory killer``), all the child processes are also terminated. Then the operating system is responsible for restarting it. See :ref:`Configuring autorestart of fdbmonitor ` + .. _foundationdb-cluster-file: Cluster files diff --git a/documentation/sphinx/source/configuration.rst b/documentation/sphinx/source/configuration.rst index 14ada3a126..16d0f25ac9 100644 --- a/documentation/sphinx/source/configuration.rst +++ b/documentation/sphinx/source/configuration.rst @@ -229,6 +229,8 @@ Contains settings applicable to all processes (e.g. fdbserver, backup_agent). * ``kill_on_configuration_change``: If ``true``, affected processes will be restarted whenever the configuration file changes. Defaults to ``true``. * ``disable_lifecycle_logging``: If ``true``, ``fdbmonitor`` will not write log events when processes start or terminate. Defaults to ``false``. +.. _configuration-restarting + The ``[general]`` section also contains some parameters to control how processes are restarted when they die. ``fdbmonitor`` uses backoff logic to prevent a process that dies repeatedly from cycling too quickly, and it also introduces up to +/-10% random jitter into the delay to avoid multiple processes all restarting simultaneously. ``fdbmonitor`` tracks separate backoff state for each process, so the restarting of one process will have no effect on the backoff behavior of another. * ``restart_delay``: The maximum number of seconds (subject to jitter) that fdbmonitor will delay before restarting a failed process. @@ -236,6 +238,8 @@ The ``[general]`` section also contains some parameters to control how processes * ``restart_backoff``: Controls how quickly ``fdbmonitor`` backs off when a process dies repeatedly. The previous delay (or 1, if the previous delay is 0) is multiplied by ``restart_backoff`` to get the next delay, maxing out at the value of ``restart_delay``. Defaults to the value of ``restart_delay``, meaning that the second and subsequent failures will all delay ``restart_delay`` between restarts. * ``restart_delay_reset_interval``: The number of seconds a process must be running before resetting the backoff back to the value of ``initial_restart_delay``. Defaults to the value of ``restart_delay``. + These ``restart_`` parameters are not applicable to the ``fdbmonitor`` process itself. See :ref:`Configuring autorestart of fdbmonitor ` for details + As an example, let's say the following parameters have been set: .. code-block:: ini @@ -322,6 +326,23 @@ Backup agent sections These sections run and configure the backup agent process used for :doc:`point-in-time backups ` of FoundationDB. These don't usually need to be modified. The structure and functionality is similar to the ``[fdbserver]`` and ``[fdbserver.]`` sections. +.. _configuration-restart-fdbmonitor + +Configuring autorestart of fdbmonitor +===================================== + +Configuring the restart parameters for ``fdbmonitor`` is operating system-specific. + +In Linux +-------- + + ``systemd`` controls the ``foundationdb`` service. When ``fdbmonitor`` is killed unexpectedly, by default, systemd restarts it in 60 seconds. For addusting this value you have to create file ``/etc/systemd/system/foundationdb.service.d/override.conf`` with the overriding values. For example: + +.. code-block:: ini + [Service] + RestartSec=20s + +For disabling auto-restart of ``fdbmonitor`` you have to put ``Restart=no`` to the same section. .. _configuration-choosing-redundancy-mode: From d67fc569b1c1c5858eb4383e2125d08150839b8f Mon Sep 17 00:00:00 2001 From: Oleg Samarin Date: Wed, 14 Oct 2020 10:18:15 +0300 Subject: [PATCH 4/6] Apply suggestions from code review Spellcheck corrections in the doc Co-authored-by: A.J. Beamon --- documentation/sphinx/source/administration.rst | 8 +++++--- documentation/sphinx/source/configuration.rst | 10 +++++----- 2 files changed, 10 insertions(+), 8 deletions(-) diff --git a/documentation/sphinx/source/administration.rst b/documentation/sphinx/source/administration.rst index 8aacc2d302..5f6369d889 100644 --- a/documentation/sphinx/source/administration.rst +++ b/documentation/sphinx/source/administration.rst @@ -56,12 +56,14 @@ It can be stopped and prevented from starting at boot as follows:: host:~ user$ sudo launchctl unload -w /Library/LaunchDaemons/com.foundationdb.fdbmonitor.plist -Start, stop and restart behaviour +Start, stop and restart behavior ================================= These commands above start and stop the master ``fdbmonitor`` process, which in turn starts ``fdbserver`` and ``backup-agent`` processes. See :ref:`administration_fdbmonitor` for details. -After any of child processes has terminated by any reason, ``fdbmonitor`` tries to restart it. See :ref:`restarting parameters ` -When ``fdbmonitor`` itself is killed unexpectedly (for example, by the ``out-of-memory killer``), all the child processes are also terminated. Then the operating system is responsible for restarting it. See :ref:`Configuring autorestart of fdbmonitor ` + +After any child process has terminated by any reason, ``fdbmonitor`` tries to restart it. See :ref:`restarting parameters `. + +When ``fdbmonitor`` itself is killed unexpectedly (for example, by the ``out-of-memory killer``), all the child processes are also terminated. Then the operating system is responsible for restarting it. See :ref:`Configuring autorestart of fdbmonitor `. .. _foundationdb-cluster-file: diff --git a/documentation/sphinx/source/configuration.rst b/documentation/sphinx/source/configuration.rst index 16d0f25ac9..bd86d2d15f 100644 --- a/documentation/sphinx/source/configuration.rst +++ b/documentation/sphinx/source/configuration.rst @@ -238,7 +238,7 @@ The ``[general]`` section also contains some parameters to control how processes * ``restart_backoff``: Controls how quickly ``fdbmonitor`` backs off when a process dies repeatedly. The previous delay (or 1, if the previous delay is 0) is multiplied by ``restart_backoff`` to get the next delay, maxing out at the value of ``restart_delay``. Defaults to the value of ``restart_delay``, meaning that the second and subsequent failures will all delay ``restart_delay`` between restarts. * ``restart_delay_reset_interval``: The number of seconds a process must be running before resetting the backoff back to the value of ``initial_restart_delay``. Defaults to the value of ``restart_delay``. - These ``restart_`` parameters are not applicable to the ``fdbmonitor`` process itself. See :ref:`Configuring autorestart of fdbmonitor ` for details + These ``restart_`` parameters are not applicable to the ``fdbmonitor`` process itself. See :ref:`Configuring autorestart of fdbmonitor ` for details. As an example, let's say the following parameters have been set: @@ -333,16 +333,16 @@ Configuring autorestart of fdbmonitor Configuring the restart parameters for ``fdbmonitor`` is operating system-specific. -In Linux --------- +Linux (RHEL/CentOS) +------------------- - ``systemd`` controls the ``foundationdb`` service. When ``fdbmonitor`` is killed unexpectedly, by default, systemd restarts it in 60 seconds. For addusting this value you have to create file ``/etc/systemd/system/foundationdb.service.d/override.conf`` with the overriding values. For example: + ``systemd`` controls the ``foundationdb`` service. When ``fdbmonitor`` is killed unexpectedly, by default, systemd restarts it in 60 seconds. To adjust this value you have to create a file ``/etc/systemd/system/foundationdb.service.d/override.conf`` with the overriding values. For example: .. code-block:: ini [Service] RestartSec=20s -For disabling auto-restart of ``fdbmonitor`` you have to put ``Restart=no`` to the same section. +To disable auto-restart of ``fdbmonitor``, put ``Restart=no`` in the same section. .. _configuration-choosing-redundancy-mode: From bfcf921c3e26e8f249aba2bd029af4f4a91cb552 Mon Sep 17 00:00:00 2001 From: Oleg Samarin Date: Wed, 14 Oct 2020 19:16:30 +0300 Subject: [PATCH 5/6] Apply suggestions from code review Missing some colons in doc links Co-authored-by: A.J. Beamon --- documentation/sphinx/source/configuration.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/documentation/sphinx/source/configuration.rst b/documentation/sphinx/source/configuration.rst index bd86d2d15f..103303b892 100644 --- a/documentation/sphinx/source/configuration.rst +++ b/documentation/sphinx/source/configuration.rst @@ -229,7 +229,7 @@ Contains settings applicable to all processes (e.g. fdbserver, backup_agent). * ``kill_on_configuration_change``: If ``true``, affected processes will be restarted whenever the configuration file changes. Defaults to ``true``. * ``disable_lifecycle_logging``: If ``true``, ``fdbmonitor`` will not write log events when processes start or terminate. Defaults to ``false``. -.. _configuration-restarting +.. _configuration-restarting: The ``[general]`` section also contains some parameters to control how processes are restarted when they die. ``fdbmonitor`` uses backoff logic to prevent a process that dies repeatedly from cycling too quickly, and it also introduces up to +/-10% random jitter into the delay to avoid multiple processes all restarting simultaneously. ``fdbmonitor`` tracks separate backoff state for each process, so the restarting of one process will have no effect on the backoff behavior of another. @@ -326,7 +326,7 @@ Backup agent sections These sections run and configure the backup agent process used for :doc:`point-in-time backups ` of FoundationDB. These don't usually need to be modified. The structure and functionality is similar to the ``[fdbserver]`` and ``[fdbserver.]`` sections. -.. _configuration-restart-fdbmonitor +.. _configuration-restart-fdbmonitor: Configuring autorestart of fdbmonitor ===================================== From 9adeff4a9af53a7cb150455251e56b7a8532194c Mon Sep 17 00:00:00 2001 From: Oleg Samarin Date: Wed, 14 Oct 2020 19:39:57 +0300 Subject: [PATCH 6/6] Update documentation/sphinx/source/configuration.rst Added a newline to doc Co-authored-by: A.J. Beamon --- documentation/sphinx/source/configuration.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/documentation/sphinx/source/configuration.rst b/documentation/sphinx/source/configuration.rst index 103303b892..dee34986c5 100644 --- a/documentation/sphinx/source/configuration.rst +++ b/documentation/sphinx/source/configuration.rst @@ -339,6 +339,7 @@ Linux (RHEL/CentOS) ``systemd`` controls the ``foundationdb`` service. When ``fdbmonitor`` is killed unexpectedly, by default, systemd restarts it in 60 seconds. To adjust this value you have to create a file ``/etc/systemd/system/foundationdb.service.d/override.conf`` with the overriding values. For example: .. code-block:: ini + [Service] RestartSec=20s