Merge branch 'master' of https://github.com/apple/foundationdb into ppwiggle

This commit is contained in:
Xiaoxi Wang 2021-06-23 03:34:59 +00:00
commit 3602ec80fb
120 changed files with 9578 additions and 2162 deletions

View File

@ -16,3 +16,4 @@ The following documents give detailed descriptions of the API for each language:
Go API <https://godoc.org/github.com/apple/foundationdb/bindings/go/src/fdb>
api-c
api-error-codes
special-keys

View File

@ -757,240 +757,6 @@ If you only need to detect the *fact* of a change, and your response doesn't dep
.. _developer-guide-peformance-considerations:
Special keys
============
Keys starting with the bytes ``\xff\xff`` are called "special" keys, and they are materialized when read. :doc:`\\xff\\xff/status/json <mr-status>` is an example of a special key.
As of api version 630, additional features have been exposed as special keys and are available to read as ranges instead of just individual keys. Additionally, the special keys are now organized into "modules".
Read-only modules
-----------------
A module is loosely defined as a key range in the special key space where a user can expect similar behavior from reading any key in that range.
By default, users will see a ``special_keys_no_module_found`` error if they read from a range not contained in a module.
The error indicates the read would always return an empty set of keys if it proceeded. This could be caused by typo in the keys to read.
Users will also (by default) see a ``special_keys_cross_module_read`` error if their read spans a module boundary.
The error is to save the user from the surprise of seeing the behavior of multiple modules in the same read.
Users may opt out of these restrictions by setting the ``special_key_space_relaxed`` transaction option.
Each special key that existed before api version 630 is its own module. These are
#. ``\xff\xff/cluster_file_path`` See :ref:`cluster file client access <cluster-file-client-access>`
#. ``\xff\xff/cluster_file_path`` See :ref:`cluster file client access <cluster-file-client-access>`
#. ``\xff\xff/status/json`` See :doc:`Machine-readable status <mr-status>`
Prior to api version 630, it was also possible to read a range starting at
``\xff\xff/worker_interfaces``. This is mostly an implementation detail of fdbcli,
but it's available in api version 630 as a module with prefix ``\xff\xff/worker_interfaces/``.
Api version 630 includes two new modules with prefixes
``\xff\xff/transaction/`` (information about the current transaction), and
``\xff\xff/metrics/`` (various metrics, not transactional).
Transaction module
------------------
Reads from the transaction module generally do not require an rpc and only inspect in-memory state for the current transaction.
There are three sets of keys exposed by the transaction module, and each set uses the same encoding, so let's first describe that encoding.
Let's say we have a set of keys represented as intervals of the form ``begin1 <= k < end1 && begin2 <= k < end2 && ...``.
It could be the case that some of the intervals overlap, e.g. if ``begin1 <= begin2 < end1``, or are adjacent, e.g. if ``end1 == begin2``.
If we merge all overlapping/adjacent intervals then sort, we end up with a canonical representation of this set of keys.
We encode this canonical set as ordered key value pairs like this::
<namespace><begin1> -> "1"
<namespace><end1> -> "0"
<namespace><begin2> -> "1"
<namespace><end2> -> "0"
...
Python example::
>>> tr = db.create_transaction()
>>> tr.add_read_conflict_key('foo')
>>> tr.add_read_conflict_range('bar/', 'bar0')
>>> for k, v in tr.get_range_startswith('\xff\xff/transaction/read_conflict_range/'):
... print(k, v)
...
('\xff\xff/transaction/read_conflict_range/bar/', '1')
('\xff\xff/transaction/read_conflict_range/bar0', '0')
('\xff\xff/transaction/read_conflict_range/foo', '1')
('\xff\xff/transaction/read_conflict_range/foo\x00', '0')
For read-your-writes transactions, this canonical encoding of conflict ranges
is already available in memory, and so requesting small ranges is
correspondingly cheaper than large ranges.
For transactions with read-your-writes disabled, this canonical encoding is computed on
every read, so you're paying the full cost in CPU time whether or not you
request a small range.
The namespaces for sets of keys are
#. ``\xff\xff/transaction/read_conflict_range/`` This is the set of keys that will be used for read conflict detection. If another transaction writes to any of these keys after this transaction's read version, then this transaction won't commit.
#. ``\xff\xff/transaction/write_conflict_range/`` This is the set of keys that will be used for write conflict detection. Keys in this range may cause other transactions which read these keys to abort if this transaction commits.
#. ``\xff\xff/transaction/conflicting_keys/`` If this transaction failed due to a conflict, it must be the case that some transaction attempted [#conflicting_keys]_ to commit with a write conflict range that intersects this transaction's read conflict range. This is the subset of your read conflict range that actually intersected a write conflict from another transaction.
Caveats
~~~~~~~
#. ``\xff\xff/transaction/read_conflict_range/`` The conflict range for a read is sometimes not known until that read completes (e.g. range reads with limits, key selectors). When you read from these special keys, the returned future first blocks until all pending reads are complete so it can give an accurate response.
#. ``\xff\xff/transaction/write_conflict_range/`` The conflict range range for a ``set_versionstamped_key`` atomic op is not known until commit time. You'll get an approximate range (the actual range will be a subset of the approximate range) until the precise range is known.
#. ``\xff\xff/transaction/conflicting_keys/`` Since using this feature costs server (i.e., commit proxy and resolver) resources, it's disabled by default. You must opt in by setting the ``report_conflicting_keys`` transaction option.
Metrics module
--------------
Reads in the metrics module are not transactional and may require rpcs to complete.
``\xff\xff/metrics/data_distribution_stats/<begin>`` represent stats about the shard that begins at ``<begin>``
>>> for k, v in db.get_range_startswith('\xff\xff/metrics/data_distribution_stats/', limit=3):
... print(k, v)
...
('\xff\xff/metrics/data_distribution_stats/', '{"shard_bytes":3828000}')
('\xff\xff/metrics/data_distribution_stats/mako00079', '{"shard_bytes":2013000}')
('\xff\xff/metrics/data_distribution_stats/mako00126', '{"shard_bytes":3201000}')
========================= ======== ===============
**Field** **Type** **Description**
------------------------- -------- ---------------
shard_bytes number An estimate of the sum of kv sizes for this shard.
========================= ======== ===============
Keys starting with ``\xff\xff/metrics/health/`` represent stats about the health of the cluster, suitable for application-level throttling.
Some of this information is also available in ``\xff\xff/status/json``, but these keys are significantly cheaper (in terms of server resources) to read.
>>> for k, v in db.get_range_startswith('\xff\xff/metrics/health/'):
... print(k, v)
...
('\xff\xff/metrics/health/aggregate', '{"batch_limited":false,"limiting_storage_durability_lag":5000000,"limiting_storage_queue":1000,"tps_limit":483988.66315011407,"worst_storage_durability_lag":5000001,"worst_storage_queue":2036,"worst_log_queue":300}')
('\xff\xff/metrics/health/log/e639a9ad0373367784cc550c615c469b', '{"log_queue":300}')
('\xff\xff/metrics/health/storage/ab2ce4caf743c9c1ae57063629c6678a', '{"cpu_usage":2.398696781487125,"disk_usage":0.059995917598039405,"storage_durability_lag":5000001,"storage_queue":2036}')
``\xff\xff/metrics/health/aggregate``
Aggregate stats about cluster health. Reading this key alone is slightly cheaper than reading any of the per-process keys.
=================================== ======== ===============
**Field** **Type** **Description**
----------------------------------- -------- ---------------
batch_limited boolean Whether or not the cluster is limiting batch priority transactions
limiting_storage_durability_lag number storage_durability_lag that ratekeeper is using to determing throttling (see the description for storage_durability_lag)
limiting_storage_queue number storage_queue that ratekeeper is using to determing throttling (see the description for storage_queue)
tps_limit number The rate at which normal priority transactions are allowed to start
worst_storage_durability_lag number See the description for storage_durability_lag
worst_storage_queue number See the description for storage_queue
worst_log_queue number See the description for log_queue
=================================== ======== ===============
``\xff\xff/metrics/health/log/<id>``
Stats about the health of a particular transaction log process
========================= ======== ===============
**Field** **Type** **Description**
------------------------- -------- ---------------
log_queue number The number of bytes of mutations that need to be stored in memory on this transaction log process
========================= ======== ===============
``\xff\xff/metrics/health/storage/<id>``
Stats about the health of a particular storage process
========================== ======== ===============
**Field** **Type** **Description**
-------------------------- -------- ---------------
cpu_usage number The cpu percentage used by this storage process
disk_usage number The disk IO percentage used by this storage process
storage_durability_lag number The difference between the newest version and the durable version on this storage process. On a lightly loaded cluster this will stay just above 5000000 [#max_read_transaction_life_versions]_.
storage_queue number The number of bytes of mutations that need to be stored in memory on this storage process
========================== ======== ===============
Caveats
~~~~~~~
#. ``\xff\xff/metrics/health/`` These keys may return data that's several seconds old, and the data may not be available for a brief period during recovery. This will be indicated by the keys being absent.
Read/write modules
------------------
As of api version 700, some modules in the special key space allow writes as
well as reads. In these modules, a user can expect that mutations (i.e. sets,
clears, etc) do not have side-effects outside of the current transaction
until commit is called (the same is true for writes to the normal key space).
A user can also expect the effects on commit to be atomic. Reads to
special keys may require reading system keys (whose format is an implementation
detail), and for those reads appropriate read conflict ranges are added on
the underlying system keys.
Writes to read/write modules in the special key space are disabled by
default. Use the ``special_key_space_enable_writes`` transaction option to
enable them [#special_key_space_enable_writes]_.
.. _special-key-space-management-module:
Management module
~~~~~~~~~~~~~~~~~
The management module is for temporary cluster configuration changes. For
example, in order to safely remove a process from the cluster, one can add an
exclusion to the ``\xff\xff/management/excluded/`` key prefix that matches
that process, and wait for necessary data to be moved away.
#. ``\xff\xff/management/excluded/<exclusion>`` Read/write. Indicates that the cluster should move data away from processes matching ``<exclusion>``, so that they can be safely removed. See :ref:`removing machines from a cluster <removing-machines-from-a-cluster>` for documentation for the corresponding fdbcli command.
#. ``\xff\xff/management/failed/<exclusion>`` Read/write. Indicates that the cluster should consider matching processes as permanently failed. This allows the cluster to avoid maintaining extra state and doing extra work in the hope that these processes come back. See :ref:`removing machines from a cluster <removing-machines-from-a-cluster>` for documentation for the corresponding fdbcli command.
#. ``\xff\xff/management/in_progress_exclusion/<address>`` Read-only. Indicates that the process matching ``<address>`` matches an exclusion, but still has necessary data and can't yet be safely removed.
#. ``\xff\xff/management/options/excluded/force`` Read/write. Setting this key disables safety checks for writes to ``\xff\xff/management/excluded/<exclusion>``. Setting this key only has an effect in the current transaction and is not persisted on commit.
#. ``\xff\xff/management/options/failed/force`` Read/write. Setting this key disables safety checks for writes to ``\xff\xff/management/failed/<exclusion>``. Setting this key only has an effect in the current transaction and is not persisted on commit.
#. ``\xff\xff/management/min_required_commit_version`` Read/write. Changing this key will change the corresponding system key ``\xff/minRequiredCommitVersion = [[Version]]``. The value of this special key is the literal text of the underlying ``Version``, which is ``int64_t``. If you set the key with a value failed to be parsed as ``int64_t``, ``special_keys_api_failure`` will be thrown. In addition, the given ``Version`` should be larger than the current read version and smaller than the upper bound(``2**63-1-version_per_second*3600*24*365*1000``). Otherwise, ``special_keys_api_failure`` is thrown. For more details, see help text of ``fdbcli`` command ``advanceversion``.
#. ``\xff\xff/management/profiling/<client_txn_sample_rate|client_txn_size_limit>`` Read/write. Changing these two keys will change the corresponding system keys ``\xff\x02/fdbClientInfo/<client_txn_sample_rate|client_txn_size_limit>``, respectively. The value of ``\xff\xff/management/client_txn_sample_rate`` is a literal text of ``double``, and the value of ``\xff\xff/management/client_txn_size_limit`` is a literal text of ``int64_t``. A special value ``default`` can be set to or read from these two keys, representing the client profiling is disabled. In addition, ``clear`` in this range is not allowed. For more details, see help text of ``fdbcli`` command ``profile client``.
#. ``\xff\xff/management/maintenance/<zone_id> := <seconds>`` Read/write. Set/clear a key in this range will change the corresponding system key ``\xff\x02/healthyZone``. The value is a literal text of a non-negative ``double`` which represents the remaining time for the zone to be in maintenance. Commiting with an invalid value will throw ``special_keys_api_failure``. Only one zone is allowed to be in maintenance at the same time. Setting a new key in the range will override the old one and the transaction will throw ``special_keys_api_failure`` error if more than one zone is given. For more details, see help text of ``fdbcli`` command ``maintenance``.
In addition, a special key ``\xff\xff/management/maintenance/IgnoreSSFailures`` in the range, if set, will disable datadistribution for storage server failures.
It is doing the same thing as the fdbcli command ``datadistribution disable ssfailure``.
Maintenance mode will be unable to use until the key is cleared, which is the same as the fdbcli command ``datadistribution enable ssfailure``.
While the key is set, any commit that tries to set a key in the range will fail with the ``special_keys_api_failure`` error.
#. ``\xff\xff/management/data_distribution/<mode|rebalance_ignored>`` Read/write. Changing these two keys will change the two corresponding system keys ``\xff/dataDistributionMode`` and ``\xff\x02/rebalanceDDIgnored``. The value of ``\xff\xff/management/data_distribution/mode`` is a literal text of ``0`` (disable) or ``1`` (enable). Transactions committed with invalid values will throw ``special_keys_api_failure`` . The value of ``\xff\xff/management/data_distribution/rebalance_ignored`` is empty. If present, it means data distribution is disabled for rebalance. Any transaction committed with non-empty value for this key will throw ``special_keys_api_failure``. For more details, see help text of ``fdbcli`` command ``datadistribution``.
#. ``\xff\xff/management/consistency_check_suspended`` Read/write. Set or read this key will set or read the underlying system key ``\xff\x02/ConsistencyCheck/Suspend``. The value of this special key is unused thus if present, will be empty. In particular, if the key exists, then consistency is suspended. For more details, see help text of ``fdbcli`` command ``consistencycheck``.
#. ``\xff\xff/management/db_locked`` Read/write. A single key that can be read and modified. Set the key will lock the database and clear the key will unlock. If the database is already locked, then the commit will fail with the ``special_keys_api_failure`` error. For more details, see help text of ``fdbcli`` command ``lock`` and ``unlock``.
#. ``\xff\xff/management/auto_coordinators`` Read-only. A single key, if read, will return a set of processes which is able to satisfy the current redundency level and serve as new coordinators. The return value is formatted as a comma delimited string of network addresses of coordinators, i.e. ``<ip:port>,<ip:port>,...,<ip:port>``.
An exclusion is syntactically either an ip address (e.g. ``127.0.0.1``), or
an ip address and port (e.g. ``127.0.0.1:4500``). If no port is specified,
then all processes on that host match the exclusion.
Configuration module
~~~~~~~~~~~~~~~~~~~~
The configuration module is for changing the cluster configuration.
For example, you can change a process type or update coordinators by manipulating related special keys through transactions.
#. ``\xff\xff/configuration/process/class_type/<address> := <class_type>`` Read/write. Reading keys in the range will retrieve processes' class types. Setting keys in the range will update processes' class types. The process matching ``<address>`` will be assigned to the given class type if the commit is successful. The valid class types are ``storage``, ``transaction``, ``resolution``, etc. A full list of class type can be found via ``fdbcli`` command ``help setclass``. Clearing keys is forbidden in the range. Instead, you can set the type as ``default``, which will clear the assigned class type if existing. For more details, see help text of ``fdbcli`` command ``setclass``.
#. ``\xff\xff/configuration/process/class_source/<address> := <class_source>`` Read-only. Reading keys in the range will retrieve processes' class source. The class source is one of ``command_line``, ``configure_auto``, ``set_class`` and ``invalid``, indicating the source that the process's class type comes from.
#. ``\xff\xff/configuration/coordinators/processes := <ip:port>,<ip:port>,...,<ip:port>`` Read/write. A single key, if read, will return a comma delimited string of coordinators' network addresses. Thus to provide a new set of cooridinators, set the key with a correct formatted string of new coordinators' network addresses. As there's always the need to have coordinators, clear on the key is forbidden and a transaction will fail with the ``special_keys_api_failure`` error if the clear is committed. For more details, see help text of ``fdbcli`` command ``coordinators``.
#. ``\xff\xff/configuration/coordinators/cluster_description := <new_description>`` Read/write. A single key, if read, will return the cluster description. Thus modifying the key will update the cluster decription. The new description needs to match ``[A-Za-z0-9_]+``, otherwise, the ``special_keys_api_failure`` error will be thrown. In addition, clear on the key is meaningless thus forbidden. For more details, see help text of ``fdbcli`` command ``coordinators``.
The ``<address>`` here is the network address of the corresponding process. Thus the general form is ``ip:port``.
Error message module
~~~~~~~~~~~~~~~~~~~~
Each module written to validates the transaction before committing, and this
validation failing is indicated by a ``special_keys_api_failure`` error.
More detailed information about why this validation failed can be accessed through the ``\xff\xff/error_message`` key, whose value is a json document with the following schema.
========================== ======== ===============
**Field** **Type** **Description**
-------------------------- -------- ---------------
retriable boolean Whether or not this operation might succeed if retried
command string The fdbcli command corresponding to this operation
message string Help text explaining the reason this operation failed
========================== ======== ===============
Performance considerations
==========================
@ -1189,7 +955,3 @@ The trickiest errors are non-retryable errors. ``Transaction.on_error`` will ret
If you see one of those errors, the best way of action is to fail the client.
At a first glance this looks very similar to an ``commit_unknown_result``. However, these errors lack the one guarantee ``commit_unknown_result`` still gives to the user: if the commit has already been sent to the database, the transaction could get committed at a later point in time. This means that if you retry the transaction, your new transaction might race with the old transaction. While this technically doesn't violate any consistency guarantees, abandoning a transaction means that there are no causality guaranatees.
.. [#conflicting_keys] In practice, the transaction probably committed successfully. However, if you're running multiple resolvers then it's possible for a transaction to cause another to abort even if it doesn't commit successfully.
.. [#max_read_transaction_life_versions] The number 5000000 comes from the server knob MAX_READ_TRANSACTION_LIFE_VERSIONS
.. [#special_key_space_enable_writes] Enabling this option enables other transaction options, such as ``ACCESS_SYSTEM_KEYS``. This may change in the future.

View File

@ -24,13 +24,15 @@ Performance
Reliability
-----------
* Improved worker recruitment logic to avoid unnecessary recoveries when processes are added or removed from a cluster. `(PR #4695) <https://github.com/apple/foundationdb/pull/4695>`_ `(PR #4631) <https://github.com/apple/foundationdb/pull/4631>`_ `(PR #4509) <https://github.com/apple/foundationdb/pull/4509>`_
* Log class processes are prioritized above transaction class proceses for becoming tlogs. `(PR #4509) <https://github.com/apple/foundationdb/pull/4509>`_
Fixes
-----
* Fixed a rare crash on the cluster controller when using multi-region configurations. `(PR #4547) <https://github.com/apple/foundationdb/pull/4547>`_
* Using the ``exclude failed`` command could leave the data distributor in a state where it cannot complete relocations. `(PR #4495) <https://github.com/apple/foundationdb/pull/4495>`_
* Fixed a rare crash that could happen on the sequencer during recovery. `(PR #4548) <https://github.com/apple/foundationdb/pull/4548>`_
* When configured with ``usable_regions=2``, a cluster would not fail over to a region which contained only storage class processes. `(PR #4599) <https://github.com/apple/foundationdb/pull/4599>`_
Status
------

View File

@ -0,0 +1,239 @@
.. _special-keys:
============
Special Keys
============
Keys starting with the bytes ``\xff\xff`` are called "special" keys, and they are materialized when read. :doc:`\\xff\\xff/status/json <mr-status>` is an example of a special key.
As of api version 630, additional features have been exposed as special keys and are available to read as ranges instead of just individual keys. Additionally, the special keys are now organized into "modules".
Read-only modules
=================
A module is loosely defined as a key range in the special key space where a user can expect similar behavior from reading any key in that range.
By default, users will see a ``special_keys_no_module_found`` error if they read from a range not contained in a module.
The error indicates the read would always return an empty set of keys if it proceeded. This could be caused by typo in the keys to read.
Users will also (by default) see a ``special_keys_cross_module_read`` error if their read spans a module boundary.
The error is to save the user from the surprise of seeing the behavior of multiple modules in the same read.
Users may opt out of these restrictions by setting the ``special_key_space_relaxed`` transaction option.
Each special key that existed before api version 630 is its own module. These are
#. ``\xff\xff/cluster_file_path`` See :ref:`cluster file client access <cluster-file-client-access>`
#. ``\xff\xff/status/json`` See :doc:`Machine-readable status <mr-status>`
Prior to api version 630, it was also possible to read a range starting at
``\xff\xff/worker_interfaces``. This is mostly an implementation detail of fdbcli,
but it's available in api version 630 as a module with prefix ``\xff\xff/worker_interfaces/``.
Api version 630 includes two new modules with prefixes
``\xff\xff/transaction/`` (information about the current transaction), and
``\xff\xff/metrics/`` (various metrics, not transactional).
Transaction module
------------------
Reads from the transaction module generally do not require an rpc and only inspect in-memory state for the current transaction.
There are three sets of keys exposed by the transaction module, and each set uses the same encoding, so let's first describe that encoding.
Let's say we have a set of keys represented as intervals of the form ``begin1 <= k < end1 && begin2 <= k < end2 && ...``.
It could be the case that some of the intervals overlap, e.g. if ``begin1 <= begin2 < end1``, or are adjacent, e.g. if ``end1 == begin2``.
If we merge all overlapping/adjacent intervals then sort, we end up with a canonical representation of this set of keys.
We encode this canonical set as ordered key value pairs like this::
<namespace><begin1> -> "1"
<namespace><end1> -> "0"
<namespace><begin2> -> "1"
<namespace><end2> -> "0"
...
Python example::
>>> tr = db.create_transaction()
>>> tr.add_read_conflict_key('foo')
>>> tr.add_read_conflict_range('bar/', 'bar0')
>>> for k, v in tr.get_range_startswith('\xff\xff/transaction/read_conflict_range/'):
... print(k, v)
...
('\xff\xff/transaction/read_conflict_range/bar/', '1')
('\xff\xff/transaction/read_conflict_range/bar0', '0')
('\xff\xff/transaction/read_conflict_range/foo', '1')
('\xff\xff/transaction/read_conflict_range/foo\x00', '0')
For read-your-writes transactions, this canonical encoding of conflict ranges
is already available in memory, and so requesting small ranges is
correspondingly cheaper than large ranges.
For transactions with read-your-writes disabled, this canonical encoding is computed on
every read, so you're paying the full cost in CPU time whether or not you
request a small range.
The namespaces for sets of keys are
#. ``\xff\xff/transaction/read_conflict_range/`` This is the set of keys that will be used for read conflict detection. If another transaction writes to any of these keys after this transaction's read version, then this transaction won't commit.
#. ``\xff\xff/transaction/write_conflict_range/`` This is the set of keys that will be used for write conflict detection. Keys in this range may cause other transactions which read these keys to abort if this transaction commits.
#. ``\xff\xff/transaction/conflicting_keys/`` If this transaction failed due to a conflict, it must be the case that some transaction attempted [#conflicting_keys]_ to commit with a write conflict range that intersects this transaction's read conflict range. This is the subset of your read conflict range that actually intersected a write conflict from another transaction.
Caveats
~~~~~~~
#. ``\xff\xff/transaction/read_conflict_range/`` The conflict range for a read is sometimes not known until that read completes (e.g. range reads with limits, key selectors). When you read from these special keys, the returned future first blocks until all pending reads are complete so it can give an accurate response.
#. ``\xff\xff/transaction/write_conflict_range/`` The conflict range range for a ``set_versionstamped_key`` atomic op is not known until commit time. You'll get an approximate range (the actual range will be a subset of the approximate range) until the precise range is known.
#. ``\xff\xff/transaction/conflicting_keys/`` Since using this feature costs server (i.e., commit proxy and resolver) resources, it's disabled by default. You must opt in by setting the ``report_conflicting_keys`` transaction option.
Metrics module
--------------
Reads in the metrics module are not transactional and may require rpcs to complete.
``\xff\xff/metrics/data_distribution_stats/<begin>`` represent stats about the shard that begins at ``<begin>``
>>> for k, v in db.get_range_startswith('\xff\xff/metrics/data_distribution_stats/', limit=3):
... print(k, v)
...
('\xff\xff/metrics/data_distribution_stats/', '{"shard_bytes":3828000}')
('\xff\xff/metrics/data_distribution_stats/mako00079', '{"shard_bytes":2013000}')
('\xff\xff/metrics/data_distribution_stats/mako00126', '{"shard_bytes":3201000}')
========================= ======== ===============
**Field** **Type** **Description**
------------------------- -------- ---------------
shard_bytes number An estimate of the sum of kv sizes for this shard.
========================= ======== ===============
Keys starting with ``\xff\xff/metrics/health/`` represent stats about the health of the cluster, suitable for application-level throttling.
Some of this information is also available in ``\xff\xff/status/json``, but these keys are significantly cheaper (in terms of server resources) to read.
>>> for k, v in db.get_range_startswith('\xff\xff/metrics/health/'):
... print(k, v)
...
('\xff\xff/metrics/health/aggregate', '{"batch_limited":false,"limiting_storage_durability_lag":5000000,"limiting_storage_queue":1000,"tps_limit":483988.66315011407,"worst_storage_durability_lag":5000001,"worst_storage_queue":2036,"worst_log_queue":300}')
('\xff\xff/metrics/health/log/e639a9ad0373367784cc550c615c469b', '{"log_queue":300}')
('\xff\xff/metrics/health/storage/ab2ce4caf743c9c1ae57063629c6678a', '{"cpu_usage":2.398696781487125,"disk_usage":0.059995917598039405,"storage_durability_lag":5000001,"storage_queue":2036}')
``\xff\xff/metrics/health/aggregate``
Aggregate stats about cluster health. Reading this key alone is slightly cheaper than reading any of the per-process keys.
=================================== ======== ===============
**Field** **Type** **Description**
----------------------------------- -------- ---------------
batch_limited boolean Whether or not the cluster is limiting batch priority transactions
limiting_storage_durability_lag number storage_durability_lag that ratekeeper is using to determing throttling (see the description for storage_durability_lag)
limiting_storage_queue number storage_queue that ratekeeper is using to determing throttling (see the description for storage_queue)
tps_limit number The rate at which normal priority transactions are allowed to start
worst_storage_durability_lag number See the description for storage_durability_lag
worst_storage_queue number See the description for storage_queue
worst_log_queue number See the description for log_queue
=================================== ======== ===============
``\xff\xff/metrics/health/log/<id>``
Stats about the health of a particular transaction log process
========================= ======== ===============
**Field** **Type** **Description**
------------------------- -------- ---------------
log_queue number The number of bytes of mutations that need to be stored in memory on this transaction log process
========================= ======== ===============
``\xff\xff/metrics/health/storage/<id>``
Stats about the health of a particular storage process
========================== ======== ===============
**Field** **Type** **Description**
-------------------------- -------- ---------------
cpu_usage number The cpu percentage used by this storage process
disk_usage number The disk IO percentage used by this storage process
storage_durability_lag number The difference between the newest version and the durable version on this storage process. On a lightly loaded cluster this will stay just above 5000000 [#max_read_transaction_life_versions]_.
storage_queue number The number of bytes of mutations that need to be stored in memory on this storage process
========================== ======== ===============
Caveats
~~~~~~~
#. ``\xff\xff/metrics/health/`` These keys may return data that's several seconds old, and the data may not be available for a brief period during recovery. This will be indicated by the keys being absent.
Read/write modules
==================
As of api version 700, some modules in the special key space allow writes as
well as reads. In these modules, a user can expect that mutations (i.e. sets,
clears, etc) do not have side-effects outside of the current transaction
until commit is called (the same is true for writes to the normal key space).
A user can also expect the effects on commit to be atomic. Reads to
special keys may require reading system keys (whose format is an implementation
detail), and for those reads appropriate read conflict ranges are added on
the underlying system keys.
Writes to read/write modules in the special key space are disabled by
default. Use the ``special_key_space_enable_writes`` transaction option to
enable them [#special_key_space_enable_writes]_.
.. _special-key-space-management-module:
Management module
-----------------
The management module is for temporary cluster configuration changes. For
example, in order to safely remove a process from the cluster, one can add an
exclusion to the ``\xff\xff/management/excluded/`` key prefix that matches
that process, and wait for necessary data to be moved away.
#. ``\xff\xff/management/excluded/<exclusion>`` Read/write. Indicates that the cluster should move data away from processes matching ``<exclusion>``, so that they can be safely removed. See :ref:`removing machines from a cluster <removing-machines-from-a-cluster>` for documentation for the corresponding fdbcli command.
#. ``\xff\xff/management/failed/<exclusion>`` Read/write. Indicates that the cluster should consider matching processes as permanently failed. This allows the cluster to avoid maintaining extra state and doing extra work in the hope that these processes come back. See :ref:`removing machines from a cluster <removing-machines-from-a-cluster>` for documentation for the corresponding fdbcli command.
#. ``\xff\xff/management/in_progress_exclusion/<address>`` Read-only. Indicates that the process matching ``<address>`` matches an exclusion, but still has necessary data and can't yet be safely removed.
#. ``\xff\xff/management/options/excluded/force`` Read/write. Setting this key disables safety checks for writes to ``\xff\xff/management/excluded/<exclusion>``. Setting this key only has an effect in the current transaction and is not persisted on commit.
#. ``\xff\xff/management/options/failed/force`` Read/write. Setting this key disables safety checks for writes to ``\xff\xff/management/failed/<exclusion>``. Setting this key only has an effect in the current transaction and is not persisted on commit.
#. ``\xff\xff/management/min_required_commit_version`` Read/write. Changing this key will change the corresponding system key ``\xff/minRequiredCommitVersion = [[Version]]``. The value of this special key is the literal text of the underlying ``Version``, which is ``int64_t``. If you set the key with a value failed to be parsed as ``int64_t``, ``special_keys_api_failure`` will be thrown. In addition, the given ``Version`` should be larger than the current read version and smaller than the upper bound(``2**63-1-version_per_second*3600*24*365*1000``). Otherwise, ``special_keys_api_failure`` is thrown. For more details, see help text of ``fdbcli`` command ``advanceversion``.
#. ``\xff\xff/management/profiling/<client_txn_sample_rate|client_txn_size_limit>`` Read/write. Changing these two keys will change the corresponding system keys ``\xff\x02/fdbClientInfo/<client_txn_sample_rate|client_txn_size_limit>``, respectively. The value of ``\xff\xff/management/client_txn_sample_rate`` is a literal text of ``double``, and the value of ``\xff\xff/management/client_txn_size_limit`` is a literal text of ``int64_t``. A special value ``default`` can be set to or read from these two keys, representing the client profiling is disabled. In addition, ``clear`` in this range is not allowed. For more details, see help text of ``fdbcli`` command ``profile client``.
#. ``\xff\xff/management/maintenance/<zone_id> := <seconds>`` Read/write. Set/clear a key in this range will change the corresponding system key ``\xff\x02/healthyZone``. The value is a literal text of a non-negative ``double`` which represents the remaining time for the zone to be in maintenance. Commiting with an invalid value will throw ``special_keys_api_failure``. Only one zone is allowed to be in maintenance at the same time. Setting a new key in the range will override the old one and the transaction will throw ``special_keys_api_failure`` error if more than one zone is given. For more details, see help text of ``fdbcli`` command ``maintenance``.
In addition, a special key ``\xff\xff/management/maintenance/IgnoreSSFailures`` in the range, if set, will disable datadistribution for storage server failures.
It is doing the same thing as the fdbcli command ``datadistribution disable ssfailure``.
Maintenance mode will be unable to use until the key is cleared, which is the same as the fdbcli command ``datadistribution enable ssfailure``.
While the key is set, any commit that tries to set a key in the range will fail with the ``special_keys_api_failure`` error.
#. ``\xff\xff/management/data_distribution/<mode|rebalance_ignored>`` Read/write. Changing these two keys will change the two corresponding system keys ``\xff/dataDistributionMode`` and ``\xff\x02/rebalanceDDIgnored``. The value of ``\xff\xff/management/data_distribution/mode`` is a literal text of ``0`` (disable) or ``1`` (enable). Transactions committed with invalid values will throw ``special_keys_api_failure`` . The value of ``\xff\xff/management/data_distribution/rebalance_ignored`` is empty. If present, it means data distribution is disabled for rebalance. Any transaction committed with non-empty value for this key will throw ``special_keys_api_failure``. For more details, see help text of ``fdbcli`` command ``datadistribution``.
#. ``\xff\xff/management/consistency_check_suspended`` Read/write. Set or read this key will set or read the underlying system key ``\xff\x02/ConsistencyCheck/Suspend``. The value of this special key is unused thus if present, will be empty. In particular, if the key exists, then consistency is suspended. For more details, see help text of ``fdbcli`` command ``consistencycheck``.
#. ``\xff\xff/management/db_locked`` Read/write. A single key that can be read and modified. Set the key will lock the database and clear the key will unlock. If the database is already locked, then the commit will fail with the ``special_keys_api_failure`` error. For more details, see help text of ``fdbcli`` command ``lock`` and ``unlock``.
#. ``\xff\xff/management/auto_coordinators`` Read-only. A single key, if read, will return a set of processes which is able to satisfy the current redundency level and serve as new coordinators. The return value is formatted as a comma delimited string of network addresses of coordinators, i.e. ``<ip:port>,<ip:port>,...,<ip:port>``.
An exclusion is syntactically either an ip address (e.g. ``127.0.0.1``), or
an ip address and port (e.g. ``127.0.0.1:4500``). If no port is specified,
then all processes on that host match the exclusion.
Configuration module
--------------------
The configuration module is for changing the cluster configuration.
For example, you can change a process type or update coordinators by manipulating related special keys through transactions.
#. ``\xff\xff/configuration/process/class_type/<address> := <class_type>`` Read/write. Reading keys in the range will retrieve processes' class types. Setting keys in the range will update processes' class types. The process matching ``<address>`` will be assigned to the given class type if the commit is successful. The valid class types are ``storage``, ``transaction``, ``resolution``, etc. A full list of class type can be found via ``fdbcli`` command ``help setclass``. Clearing keys is forbidden in the range. Instead, you can set the type as ``default``, which will clear the assigned class type if existing. For more details, see help text of ``fdbcli`` command ``setclass``.
#. ``\xff\xff/configuration/process/class_source/<address> := <class_source>`` Read-only. Reading keys in the range will retrieve processes' class source. The class source is one of ``command_line``, ``configure_auto``, ``set_class`` and ``invalid``, indicating the source that the process's class type comes from.
#. ``\xff\xff/configuration/coordinators/processes := <ip:port>,<ip:port>,...,<ip:port>`` Read/write. A single key, if read, will return a comma delimited string of coordinators' network addresses. Thus to provide a new set of cooridinators, set the key with a correct formatted string of new coordinators' network addresses. As there's always the need to have coordinators, clear on the key is forbidden and a transaction will fail with the ``special_keys_api_failure`` error if the clear is committed. For more details, see help text of ``fdbcli`` command ``coordinators``.
#. ``\xff\xff/configuration/coordinators/cluster_description := <new_description>`` Read/write. A single key, if read, will return the cluster description. Thus modifying the key will update the cluster decription. The new description needs to match ``[A-Za-z0-9_]+``, otherwise, the ``special_keys_api_failure`` error will be thrown. In addition, clear on the key is meaningless thus forbidden. For more details, see help text of ``fdbcli`` command ``coordinators``.
The ``<address>`` here is the network address of the corresponding process. Thus the general form is ``ip:port``.
Error message module
--------------------
Each module written to validates the transaction before committing, and this
validation failing is indicated by a ``special_keys_api_failure`` error.
More detailed information about why this validation failed can be accessed through the ``\xff\xff/error_message`` key, whose value is a json document with the following schema.
========================== ======== ===============
**Field** **Type** **Description**
-------------------------- -------- ---------------
retriable boolean Whether or not this operation might succeed if retried
command string The fdbcli command corresponding to this operation
message string Help text explaining the reason this operation failed
========================== ======== ===============
.. [#conflicting_keys] In practice, the transaction probably committed successfully. However, if you're running multiple resolvers then it's possible for a transaction to cause another to abort even if it doesn't commit successfully.
.. [#max_read_transaction_life_versions] The number 5000000 comes from the server knob MAX_READ_TRANSACTION_LIFE_VERSIONS
.. [#special_key_space_enable_writes] Enabling this option enables other transaction options, such as ``ACCESS_SYSTEM_KEYS``. This may change in the future.

View File

@ -1,23 +1,23 @@
/*
* tutorial.actor.cpp
* tutorial.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "flow/flow.h"
#include "flow/Platform.h"
@ -100,10 +100,11 @@ struct EchoServerInterface {
RequestStream<struct GetInterfaceRequest> getInterface;
RequestStream<struct EchoRequest> echo;
RequestStream<struct ReverseRequest> reverse;
RequestStream<struct StreamRequest> stream;
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, echo, reverse);
serializer(ar, echo, reverse, stream);
}
};
@ -141,19 +142,60 @@ struct ReverseRequest {
}
};
struct StreamReply : ReplyPromiseStreamReply {
constexpr static FileIdentifier file_identifier = 440804;
int index = 0;
StreamReply() = default;
explicit StreamReply(int index) : index(index) {}
size_t expectedSize() const { return 2e6; }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, ReplyPromiseStreamReply::acknowledgeToken, index);
}
};
struct StreamRequest {
constexpr static FileIdentifier file_identifier = 5410805;
ReplyPromiseStream<StreamReply> reply;
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, reply);
}
};
uint64_t tokenCounter = 1;
ACTOR Future<Void> echoServer() {
state EchoServerInterface echoServer;
echoServer.getInterface.makeWellKnownEndpoint(UID(-1, ++tokenCounter), TaskPriority::DefaultEndpoint);
loop {
choose {
when(GetInterfaceRequest req = waitNext(echoServer.getInterface.getFuture())) {
req.reply.send(echoServer);
try {
choose {
when(GetInterfaceRequest req = waitNext(echoServer.getInterface.getFuture())) {
req.reply.send(echoServer);
}
when(EchoRequest req = waitNext(echoServer.echo.getFuture())) { req.reply.send(req.message); }
when(ReverseRequest req = waitNext(echoServer.reverse.getFuture())) {
req.reply.send(std::string(req.message.rbegin(), req.message.rend()));
}
when(state StreamRequest req = waitNext(echoServer.stream.getFuture())) {
state int i = 0;
for (; i < 100; ++i) {
wait(req.reply.onReady());
std::cout << "Send " << i << std::endl;
req.reply.send(StreamReply{ i });
}
req.reply.sendError(end_of_stream());
}
}
when(EchoRequest req = waitNext(echoServer.echo.getFuture())) { req.reply.send(req.message); }
when(ReverseRequest req = waitNext(echoServer.reverse.getFuture())) {
req.reply.send(std::string(req.message.rbegin(), req.message.rend()));
} catch (Error& e) {
if (e.code() != error_code_operation_obsolete) {
fprintf(stderr, "Error: %s\n", e.what());
throw e;
}
}
}
@ -172,6 +214,18 @@ ACTOR Future<Void> echoClient() {
reverseRequest.message = "Hello World";
std::string reverseString = wait(server.reverse.getReply(reverseRequest));
std::cout << format("Sent %s to reverse, received %s\n", "Hello World", reverseString.c_str());
state ReplyPromiseStream<StreamReply> stream = server.stream.getReplyStream(StreamRequest{});
state int j = 0;
try {
loop {
StreamReply rep = waitNext(stream.getFuture());
std::cout << "Rep: " << rep.index << std::endl;
ASSERT(rep.index == j++);
}
} catch (Error& e) {
ASSERT(e.code() == error_code_end_of_stream || e.code() == error_code_connection_failed);
}
return Void();
}
@ -347,6 +401,68 @@ ACTOR Future<Void> multipleClients() {
std::string clusterFile = "fdb.cluster";
ACTOR Future<Void> logThroughput(int64_t* v, Key* next) {
loop {
state int64_t last = *v;
wait(delay(1));
printf("throughput: %ld bytes/s, next: %s\n", *v - last, printable(*next).c_str());
}
}
ACTOR Future<Void> fdbClientStream() {
state Database db = Database::createDatabase(clusterFile, 300);
state Transaction tx(db);
state Key next;
state int64_t bytes = 0;
state Future<Void> logFuture = logThroughput(&bytes, &next);
loop {
state PromiseStream<Standalone<RangeResultRef>> results;
try {
state Future<Void> stream = tx.getRangeStream(results,
KeySelector(firstGreaterOrEqual(next), next.arena()),
KeySelector(firstGreaterOrEqual(normalKeys.end)),
GetRangeLimits());
loop {
Standalone<RangeResultRef> range = waitNext(results.getFuture());
if (range.size()) {
bytes += range.expectedSize();
next = keyAfter(range.back().key);
}
}
} catch (Error& e) {
if (e.code() == error_code_end_of_stream) {
break;
}
wait(tx.onError(e));
}
}
return Void();
}
ACTOR Future<Void> fdbClientGetRange() {
state Database db = Database::createDatabase(clusterFile, 300);
state Transaction tx(db);
state Key next;
state int64_t bytes = 0;
state Future<Void> logFuture = logThroughput(&bytes, &next);
loop {
try {
Standalone<RangeResultRef> range =
wait(tx.getRange(KeySelector(firstGreaterOrEqual(next), next.arena()),
KeySelector(firstGreaterOrEqual(normalKeys.end)),
GetRangeLimits(GetRangeLimits::ROW_LIMIT_UNLIMITED, CLIENT_KNOBS->REPLY_BYTE_LIMIT)));
bytes += range.expectedSize();
if (!range.more) {
break;
}
next = keyAfter(range.back().key);
} catch (Error& e) {
wait(tx.onError(e));
}
}
return Void();
}
ACTOR Future<Void> fdbClient() {
wait(delay(30));
state Database db = Database::createDatabase(clusterFile, 300);
@ -403,6 +519,8 @@ std::unordered_map<std::string, std::function<Future<Void>()>> actors = {
{ "kvStoreServer", &kvStoreServer }, // ./tutorial -p 6666 kvStoreServer
{ "kvSimpleClient", &kvSimpleClient }, // ./tutorial -s 127.0.0.1:6666 kvSimpleClient
{ "multipleClients", &multipleClients }, // ./tutorial -s 127.0.0.1:6666 multipleClients
{ "fdbClientStream", &fdbClientStream }, // ./tutorial -C $CLUSTER_FILE_PATH fdbClientStream
{ "fdbClientGetRange", &fdbClientGetRange }, // ./tutorial -C $CLUSTER_FILE_PATH fdbClientGetRange
{ "fdbClient", &fdbClient }, // ./tutorial -C $CLUSTER_FILE_PATH fdbClient
{ "fdbStatusStresser", &fdbStatusStresser }
}; // ./tutorial -C $CLUSTER_FILE_PATH fdbStatusStresser

View File

@ -38,6 +38,7 @@
#include "fdbclient/Status.h"
#include "fdbclient/BackupContainer.h"
#include "fdbclient/KeyBackedTypes.h"
#include "fdbclient/IKnobCollection.h"
#include "fdbclient/RunTransaction.actor.h"
#include "fdbclient/S3BlobStore.h"
#include "fdbclient/json_spirit/json_spirit_writer_template.h"
@ -3703,27 +3704,26 @@ int main(int argc, char* argv[]) {
}
}
for (auto k = knobs.begin(); k != knobs.end(); ++k) {
IKnobCollection::setGlobalKnobCollection(IKnobCollection::Type::CLIENT, Randomize::NO, IsSimulated::NO);
auto& g_knobs = IKnobCollection::getMutableGlobalKnobCollection();
for (const auto& [knobName, knobValueString] : knobs) {
try {
if (!globalFlowKnobs->setKnob(k->first, k->second) &&
!globalClientKnobs->setKnob(k->first, k->second)) {
fprintf(stderr, "WARNING: Unrecognized knob option '%s'\n", k->first.c_str());
TraceEvent(SevWarnAlways, "UnrecognizedKnobOption").detail("Knob", printable(k->first));
}
auto knobValue = g_knobs.parseKnobValue(knobName, knobValueString);
g_knobs.setKnob(knobName, knobValue);
} catch (Error& e) {
if (e.code() == error_code_invalid_option_value) {
fprintf(stderr,
"WARNING: Invalid value '%s' for knob option '%s'\n",
k->second.c_str(),
k->first.c_str());
knobValueString.c_str(),
knobName.c_str());
TraceEvent(SevWarnAlways, "InvalidKnobValue")
.detail("Knob", printable(k->first))
.detail("Value", printable(k->second));
.detail("Knob", printable(knobName))
.detail("Value", printable(knobValueString));
} else {
fprintf(stderr, "ERROR: Failed to set knob option '%s': %s\n", k->first.c_str(), e.what());
fprintf(stderr, "ERROR: Failed to set knob option '%s': %s\n", knobName.c_str(), e.what());
TraceEvent(SevError, "FailedToSetKnob")
.detail("Knob", printable(k->first))
.detail("Value", printable(k->second))
.detail("Knob", printable(knobName))
.detail("Value", printable(knobValueString))
.error(e);
throw;
}
@ -3731,8 +3731,7 @@ int main(int argc, char* argv[]) {
}
// Reinitialize knobs in order to update knobs that are dependent on explicitly set knobs
globalFlowKnobs->initialize(true);
globalClientKnobs->initialize(true);
g_knobs.initialize(Randomize::NO, IsSimulated::NO);
if (trace) {
if (!traceLogGroup.empty())

View File

@ -28,6 +28,7 @@
#include "fdbclient/StatusClient.h"
#include "fdbclient/DatabaseContext.h"
#include "fdbclient/GlobalConfig.actor.h"
#include "fdbclient/IKnobCollection.h"
#include "fdbclient/NativeAPI.actor.h"
#include "fdbclient/ReadYourWrites.h"
#include "fdbclient/ClusterInterface.h"
@ -3060,23 +3061,25 @@ struct CLIOptions {
return;
}
for (const auto& [knob, value] : knobs) {
auto& g_knobs = IKnobCollection::getMutableGlobalKnobCollection();
for (const auto& [knobName, knobValueString] : knobs) {
try {
if (!globalFlowKnobs->setKnob(knob, value) && !globalClientKnobs->setKnob(knob, value)) {
fprintf(stderr, "WARNING: Unrecognized knob option '%s'\n", knob.c_str());
TraceEvent(SevWarnAlways, "UnrecognizedKnobOption").detail("Knob", printable(knob));
}
auto knobValue = g_knobs.parseKnobValue(knobName, knobValueString);
g_knobs.setKnob(knobName, knobValue);
} catch (Error& e) {
if (e.code() == error_code_invalid_option_value) {
fprintf(stderr, "WARNING: Invalid value '%s' for knob option '%s'\n", value.c_str(), knob.c_str());
fprintf(stderr,
"WARNING: Invalid value '%s' for knob option '%s'\n",
knobValueString.c_str(),
knobName.c_str());
TraceEvent(SevWarnAlways, "InvalidKnobValue")
.detail("Knob", printable(knob))
.detail("Value", printable(value));
.detail("Knob", printable(knobName))
.detail("Value", printable(knobValueString));
} else {
fprintf(stderr, "ERROR: Failed to set knob option '%s': %s\n", knob.c_str(), e.what());
fprintf(stderr, "ERROR: Failed to set knob option '%s': %s\n", knobName.c_str(), e.what());
TraceEvent(SevError, "FailedToSetKnob")
.detail("Knob", printable(knob))
.detail("Value", printable(value))
.detail("Knob", printable(knobName))
.detail("Value", printable(knobValueString))
.error(e);
exit_code = FDB_EXIT_ERROR;
}
@ -3084,8 +3087,7 @@ struct CLIOptions {
}
// Reinitialize knobs in order to update knobs that are dependent on explicitly set knobs
globalFlowKnobs->initialize(true);
globalClientKnobs->initialize(true);
g_knobs.initialize(Randomize::NO, IsSimulated::NO);
}
int processArg(CSimpleOpt& args) {
@ -4857,6 +4859,8 @@ int main(int argc, char** argv) {
registerCrashHandler();
IKnobCollection::setGlobalKnobCollection(IKnobCollection::Type::CLIENT, Randomize::NO, IsSimulated::NO);
#ifdef __unixish__
struct sigaction act;

View File

@ -15,10 +15,19 @@ set(FDBCLIENT_SRCS
BackupContainerLocalDirectory.h
BackupContainerS3BlobStore.actor.cpp
BackupContainerS3BlobStore.h
ClientKnobCollection.cpp
ClientKnobCollection.h
ClientKnobs.cpp
ClientKnobs.h
ClientLogEvents.h
ClientWorkerInterface.h
ClusterInterface.h
CommitProxyInterface.h
CommitTransaction.h
ConfigKnobs.cpp
ConfigKnobs.h
ConfigTransactionInterface.cpp
ConfigTransactionInterface.h
CoordinationInterface.h
DatabaseBackupAgent.actor.cpp
DatabaseConfiguration.cpp
@ -26,6 +35,7 @@ set(FDBCLIENT_SRCS
DatabaseContext.h
EventTypes.actor.h
FDBOptions.h
FDBTypes.cpp
FDBTypes.h
FileBackupAgent.actor.cpp
GlobalConfig.h
@ -34,16 +44,20 @@ set(FDBCLIENT_SRCS
GrvProxyInterface.h
HTTP.actor.cpp
IClientApi.h
IConfigTransaction.cpp
IConfigTransaction.h
ISingleThreadTransaction.cpp
ISingleThreadTransaction.h
JsonBuilder.cpp
JsonBuilder.h
KeyBackedTypes.h
KeyRangeMap.actor.cpp
KeyRangeMap.h
Knobs.cpp
Knobs.h
IKnobCollection.cpp
IKnobCollection.h
ManagementAPI.actor.cpp
ManagementAPI.actor.h
CommitProxyInterface.h
MonitorLeader.actor.cpp
MonitorLeader.h
MultiVersionAssignmentVars.h
@ -53,6 +67,11 @@ set(FDBCLIENT_SRCS
NativeAPI.actor.cpp
NativeAPI.actor.h
Notified.h
ParallelStream.actor.cpp
ParallelStream.actor.h
PaxosConfigTransaction.actor.cpp
PaxosConfigTransaction.h
SimpleConfigTransaction.actor.cpp
SpecialKeySpace.actor.cpp
SpecialKeySpace.actor.h
ReadYourWrites.actor.cpp
@ -65,7 +84,14 @@ set(FDBCLIENT_SRCS
S3BlobStore.actor.cpp
Schemas.cpp
Schemas.h
ServerKnobCollection.cpp
ServerKnobCollection.h
ServerKnobs.cpp
ServerKnobs.h
SimpleConfigTransaction.h
SnapshotCache.h
SpecialKeySpace.actor.cpp
SpecialKeySpace.actor.h
Status.h
StatusClient.actor.cpp
StatusClient.h
@ -79,6 +105,8 @@ set(FDBCLIENT_SRCS
TagThrottle.h
TaskBucket.actor.cpp
TaskBucket.h
TestKnobCollection.cpp
TestKnobCollection.h
ThreadSafeTransaction.cpp
ThreadSafeTransaction.h
Tuple.cpp

View File

@ -0,0 +1,51 @@
/*
* ClientKnobCollection.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/ClientKnobCollection.h"
ClientKnobCollection::ClientKnobCollection(Randomize randomize, IsSimulated isSimulated)
: flowKnobs(randomize, isSimulated), clientKnobs(randomize) {}
void ClientKnobCollection::initialize(Randomize randomize, IsSimulated isSimulated) {
flowKnobs.initialize(randomize, isSimulated);
clientKnobs.initialize(randomize);
}
void ClientKnobCollection::reset(Randomize randomize, IsSimulated isSimulated) {
flowKnobs.reset(randomize, isSimulated);
clientKnobs.reset(randomize);
}
Optional<KnobValue> ClientKnobCollection::tryParseKnobValue(std::string const& knobName,
std::string const& knobValue) const {
auto parsedKnobValue = flowKnobs.parseKnobValue(knobName, knobValue);
if (!std::holds_alternative<NoKnobFound>(parsedKnobValue)) {
return KnobValueRef::create(parsedKnobValue);
}
parsedKnobValue = clientKnobs.parseKnobValue(knobName, knobValue);
if (!std::holds_alternative<NoKnobFound>(parsedKnobValue)) {
return KnobValueRef::create(parsedKnobValue);
}
return {};
}
bool ClientKnobCollection::trySetKnob(std::string const& knobName, KnobValueRef const& knobValue) {
return knobValue.visitSetKnob(knobName, flowKnobs) || knobValue.visitSetKnob(knobName, clientKnobs);
}

View File

@ -0,0 +1,46 @@
/*
* ClientKnobCollection.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/ClientKnobs.h"
#include "fdbclient/IKnobCollection.h"
#include "flow/Knobs.h"
/*
* Stores both flow knobs and client knobs, attempting to access server knobs or test knobs
* results in a run-time error
*/
class ClientKnobCollection : public IKnobCollection {
FlowKnobs flowKnobs;
ClientKnobs clientKnobs;
public:
ClientKnobCollection(Randomize randomize, IsSimulated isSimulated);
void initialize(Randomize randomize, IsSimulated isSimulated) override;
void reset(Randomize randomize, IsSimulated isSimulated) override;
FlowKnobs const& getFlowKnobs() const override { return flowKnobs; }
ClientKnobs const& getClientKnobs() const override { return clientKnobs; }
ClientKnobs& getMutableClientKnobs() { return clientKnobs; }
ServerKnobs const& getServerKnobs() const override { throw internal_error(); }
TestKnobs const& getTestKnobs() const override { throw internal_error(); }
Optional<KnobValue> tryParseKnobValue(std::string const& knobName, std::string const& knobValue) const override;
bool trySetKnob(std::string const& knobName, KnobValueRef const& knobValue) override;
};

View File

@ -1,5 +1,5 @@
/*
* Knobs.cpp
* ClientKnobs.cpp
*
* This source file is part of the FoundationDB open source project
*
@ -23,16 +23,14 @@
#include "fdbclient/SystemData.h"
#include "flow/UnitTest.h"
std::unique_ptr<ClientKnobs> globalClientKnobs = std::make_unique<ClientKnobs>();
ClientKnobs const* CLIENT_KNOBS = globalClientKnobs.get();
#define init(knob, value) initKnob(knob, value, #knob)
ClientKnobs::ClientKnobs() {
initialize();
ClientKnobs::ClientKnobs(Randomize randomize) {
initialize(randomize);
}
void ClientKnobs::initialize(bool randomize) {
void ClientKnobs::initialize(Randomize _randomize) {
bool const randomize = (_randomize == Randomize::YES);
// clang-format off
init( TOO_MANY, 1000000 );
@ -97,6 +95,8 @@ void ClientKnobs::initialize(bool randomize) {
init( DETAILED_HEALTH_METRICS_MAX_STALENESS, 5.0 );
init( MID_SHARD_SIZE_MAX_STALENESS, 10.0 );
init( TAG_ENCODE_KEY_SERVERS, false ); if( randomize && BUGGIFY ) TAG_ENCODE_KEY_SERVERS = true;
init( RANGESTREAM_FRAGMENT_SIZE, 1e6 );
init( RANGESTREAM_BUFFERED_FRAGMENTS_LIMIT, 20 );
init( QUARANTINE_TSS_ON_MISMATCH, true ); if( randomize && BUGGIFY ) QUARANTINE_TSS_ON_MISMATCH = false; // if true, a tss mismatch will put the offending tss in quarantine. If false, it will just be killed
//KeyRangeMap
@ -112,8 +112,8 @@ void ClientKnobs::initialize(bool randomize) {
// Core
init( CORE_VERSIONSPERSECOND, 1e6 );
init( LOG_RANGE_BLOCK_SIZE, 1e6 ); //Dependent on CORE_VERSIONSPERSECOND
init( MUTATION_BLOCK_SIZE, 10000 );
init( LOG_RANGE_BLOCK_SIZE, CORE_VERSIONSPERSECOND );
init( MUTATION_BLOCK_SIZE, 10000);
// TaskBucket
init( TASKBUCKET_LOGGING_DELAY, 5.0 );
@ -253,14 +253,14 @@ void ClientKnobs::initialize(bool randomize) {
TEST_CASE("/fdbclient/knobs/initialize") {
// This test depends on TASKBUCKET_TIMEOUT_VERSIONS being defined as a constant multiple of CORE_VERSIONSPERSECOND
ClientKnobs clientKnobs;
int initialCoreVersionsPerSecond = clientKnobs.CORE_VERSIONSPERSECOND;
ClientKnobs clientKnobs(Randomize::NO);
int64_t initialCoreVersionsPerSecond = clientKnobs.CORE_VERSIONSPERSECOND;
int initialTaskBucketTimeoutVersions = clientKnobs.TASKBUCKET_TIMEOUT_VERSIONS;
clientKnobs.setKnob("core_versionspersecond", format("%ld", initialCoreVersionsPerSecond * 2));
ASSERT(clientKnobs.CORE_VERSIONSPERSECOND == initialCoreVersionsPerSecond * 2);
ASSERT(clientKnobs.TASKBUCKET_TIMEOUT_VERSIONS == initialTaskBucketTimeoutVersions);
clientKnobs.initialize();
ASSERT(clientKnobs.CORE_VERSIONSPERSECOND == initialCoreVersionsPerSecond * 2);
ASSERT(clientKnobs.TASKBUCKET_TIMEOUT_VERSIONS == initialTaskBucketTimeoutVersions * 2);
clientKnobs.setKnob("core_versionspersecond", initialCoreVersionsPerSecond * 2);
ASSERT_EQ(clientKnobs.CORE_VERSIONSPERSECOND, initialCoreVersionsPerSecond * 2);
ASSERT_EQ(clientKnobs.TASKBUCKET_TIMEOUT_VERSIONS, initialTaskBucketTimeoutVersions);
clientKnobs.initialize(Randomize::NO);
ASSERT_EQ(clientKnobs.CORE_VERSIONSPERSECOND, initialCoreVersionsPerSecond * 2);
ASSERT_EQ(clientKnobs.TASKBUCKET_TIMEOUT_VERSIONS, initialTaskBucketTimeoutVersions * 2);
return Void();
}

241
fdbclient/ClientKnobs.h Normal file
View File

@ -0,0 +1,241 @@
/*
* ClientKnobs.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#ifndef FDBCLIENT_KNOBS_H
#define FDBCLIENT_KNOBS_H
#pragma once
#include "flow/Knobs.h"
#include "flow/flow.h"
class ClientKnobs : public KnobsImpl<ClientKnobs> {
public:
int TOO_MANY; // FIXME: this should really be split up so we can control these more specifically
double SYSTEM_MONITOR_INTERVAL;
double NETWORK_BUSYNESS_MONITOR_INTERVAL; // The interval in which we should update the network busyness metric
double FAILURE_MAX_DELAY;
double FAILURE_MIN_DELAY;
double FAILURE_TIMEOUT_DELAY;
double CLIENT_FAILURE_TIMEOUT_DELAY;
double FAILURE_EMERGENCY_DELAY;
double FAILURE_MAX_GENERATIONS;
double RECOVERY_DELAY_START_GENERATION;
double RECOVERY_DELAY_SECONDS_PER_GENERATION;
double MAX_GENERATIONS;
double MAX_GENERATIONS_OVERRIDE;
double MAX_GENERATIONS_SIM;
double COORDINATOR_RECONNECTION_DELAY;
int CLIENT_EXAMPLE_AMOUNT;
double MAX_CLIENT_STATUS_AGE;
int MAX_COMMIT_PROXY_CONNECTIONS;
int MAX_GRV_PROXY_CONNECTIONS;
double STATUS_IDLE_TIMEOUT;
// wrong_shard_server sometimes comes from the only nonfailed server, so we need to avoid a fast spin
double WRONG_SHARD_SERVER_DELAY; // SOMEDAY: This delay can limit performance of retrieving data when the cache is
// mostly wrong (e.g. dumping the database after a test)
double FUTURE_VERSION_RETRY_DELAY;
int REPLY_BYTE_LIMIT;
double DEFAULT_BACKOFF;
double DEFAULT_MAX_BACKOFF;
double BACKOFF_GROWTH_RATE;
double RESOURCE_CONSTRAINED_MAX_BACKOFF;
int PROXY_COMMIT_OVERHEAD_BYTES;
double SHARD_STAT_SMOOTH_AMOUNT;
int INIT_MID_SHARD_BYTES;
int TRANSACTION_SIZE_LIMIT;
int64_t KEY_SIZE_LIMIT;
int64_t SYSTEM_KEY_SIZE_LIMIT;
int64_t VALUE_SIZE_LIMIT;
int64_t SPLIT_KEY_SIZE_LIMIT;
int METADATA_VERSION_CACHE_SIZE;
int MAX_BATCH_SIZE;
double GRV_BATCH_TIMEOUT;
int BROADCAST_BATCH_SIZE;
double TRANSACTION_TIMEOUT_DELAY_INTERVAL;
// When locationCache in DatabaseContext gets to be this size, items will be evicted
int LOCATION_CACHE_EVICTION_SIZE;
int LOCATION_CACHE_EVICTION_SIZE_SIM;
int GET_RANGE_SHARD_LIMIT;
int WARM_RANGE_SHARD_LIMIT;
int STORAGE_METRICS_SHARD_LIMIT;
int SHARD_COUNT_LIMIT;
double STORAGE_METRICS_UNFAIR_SPLIT_LIMIT;
double STORAGE_METRICS_TOO_MANY_SHARDS_DELAY;
double AGGREGATE_HEALTH_METRICS_MAX_STALENESS;
double DETAILED_HEALTH_METRICS_MAX_STALENESS;
double MID_SHARD_SIZE_MAX_STALENESS;
bool TAG_ENCODE_KEY_SERVERS;
int64_t RANGESTREAM_FRAGMENT_SIZE;
int RANGESTREAM_BUFFERED_FRAGMENTS_LIMIT;
bool QUARANTINE_TSS_ON_MISMATCH;
// KeyRangeMap
int KRM_GET_RANGE_LIMIT;
int KRM_GET_RANGE_LIMIT_BYTES; // This must be sufficiently larger than KEY_SIZE_LIMIT to ensure that at least two
// entries will be returned from an attempt to read a key range map
int DEFAULT_MAX_OUTSTANDING_WATCHES;
int ABSOLUTE_MAX_WATCHES; // The client cannot set the max outstanding watches higher than this
double WATCH_POLLING_TIME;
double NO_RECENT_UPDATES_DURATION;
double FAST_WATCH_TIMEOUT;
double WATCH_TIMEOUT;
double IS_ACCEPTABLE_DELAY;
// Core
int64_t CORE_VERSIONSPERSECOND; // This is defined within the server but used for knobs based on server value
int LOG_RANGE_BLOCK_SIZE;
int MUTATION_BLOCK_SIZE;
// Taskbucket
double TASKBUCKET_LOGGING_DELAY;
int TASKBUCKET_MAX_PRIORITY;
double TASKBUCKET_CHECK_TIMEOUT_CHANCE;
double TASKBUCKET_TIMEOUT_JITTER_OFFSET;
double TASKBUCKET_TIMEOUT_JITTER_RANGE;
double TASKBUCKET_CHECK_ACTIVE_DELAY;
int TASKBUCKET_CHECK_ACTIVE_AMOUNT;
int TASKBUCKET_TIMEOUT_VERSIONS;
int TASKBUCKET_MAX_TASK_KEYS;
// Backup
int BACKUP_LOCAL_FILE_WRITE_BLOCK;
int BACKUP_CONCURRENT_DELETES;
int BACKUP_SIMULATED_LIMIT_BYTES;
int BACKUP_GET_RANGE_LIMIT_BYTES;
int BACKUP_LOCK_BYTES;
double BACKUP_RANGE_TIMEOUT;
double BACKUP_RANGE_MINWAIT;
int BACKUP_SNAPSHOT_DISPATCH_INTERVAL_SEC;
int BACKUP_DEFAULT_SNAPSHOT_INTERVAL_SEC;
int BACKUP_SHARD_TASK_LIMIT;
double BACKUP_AGGREGATE_POLL_RATE;
double BACKUP_AGGREGATE_POLL_RATE_UPDATE_INTERVAL;
int BACKUP_LOG_WRITE_BATCH_MAX_SIZE;
int BACKUP_LOG_ATOMIC_OPS_SIZE;
int BACKUP_MAX_LOG_RANGES;
int BACKUP_SIM_COPY_LOG_RANGES;
int BACKUP_OPERATION_COST_OVERHEAD;
int BACKUP_VERSION_DELAY;
int BACKUP_MAP_KEY_LOWER_LIMIT;
int BACKUP_MAP_KEY_UPPER_LIMIT;
int BACKUP_COPY_TASKS;
int BACKUP_BLOCK_SIZE;
int COPY_LOG_BLOCK_SIZE;
int COPY_LOG_BLOCKS_PER_TASK;
int COPY_LOG_PREFETCH_BLOCKS;
int COPY_LOG_READ_AHEAD_BYTES;
double COPY_LOG_TASK_DURATION_NANOS;
int BACKUP_TASKS_PER_AGENT;
int BACKUP_POLL_PROGRESS_SECONDS;
int64_t VERSIONS_PER_SECOND; // Copy of SERVER_KNOBS, as we can't link with it
int SIM_BACKUP_TASKS_PER_AGENT;
int BACKUP_RANGEFILE_BLOCK_SIZE;
int BACKUP_LOGFILE_BLOCK_SIZE;
int BACKUP_DISPATCH_ADDTASK_SIZE;
int RESTORE_DISPATCH_ADDTASK_SIZE;
int RESTORE_DISPATCH_BATCH_SIZE;
int RESTORE_WRITE_TX_SIZE;
int APPLY_MAX_LOCK_BYTES;
int APPLY_MIN_LOCK_BYTES;
int APPLY_BLOCK_SIZE;
double APPLY_MAX_DECAY_RATE;
double APPLY_MAX_INCREASE_FACTOR;
double BACKUP_ERROR_DELAY;
double BACKUP_STATUS_DELAY;
double BACKUP_STATUS_JITTER;
double MIN_CLEANUP_SECONDS;
int64_t FASTRESTORE_ATOMICOP_WEIGHT; // workload amplication factor for atomic op
// Configuration
int32_t DEFAULT_AUTO_COMMIT_PROXIES;
int32_t DEFAULT_AUTO_GRV_PROXIES;
int32_t DEFAULT_COMMIT_GRV_PROXIES_RATIO;
int32_t DEFAULT_MAX_GRV_PROXIES;
int32_t DEFAULT_AUTO_RESOLVERS;
int32_t DEFAULT_AUTO_LOGS;
// Client Status Info
double CSI_SAMPLING_PROBABILITY;
int64_t CSI_SIZE_LIMIT;
double CSI_STATUS_DELAY;
int HTTP_SEND_SIZE;
int HTTP_READ_SIZE;
int HTTP_VERBOSE_LEVEL;
std::string HTTP_REQUEST_ID_HEADER;
int BLOBSTORE_CONNECT_TRIES;
int BLOBSTORE_CONNECT_TIMEOUT;
int BLOBSTORE_MAX_CONNECTION_LIFE;
int BLOBSTORE_REQUEST_TRIES;
int BLOBSTORE_REQUEST_TIMEOUT_MIN;
int BLOBSTORE_REQUESTS_PER_SECOND;
int BLOBSTORE_LIST_REQUESTS_PER_SECOND;
int BLOBSTORE_WRITE_REQUESTS_PER_SECOND;
int BLOBSTORE_READ_REQUESTS_PER_SECOND;
int BLOBSTORE_DELETE_REQUESTS_PER_SECOND;
int BLOBSTORE_CONCURRENT_REQUESTS;
int BLOBSTORE_MULTIPART_MAX_PART_SIZE;
int BLOBSTORE_MULTIPART_MIN_PART_SIZE;
int BLOBSTORE_CONCURRENT_UPLOADS;
int BLOBSTORE_CONCURRENT_LISTS;
int BLOBSTORE_CONCURRENT_WRITES_PER_FILE;
int BLOBSTORE_CONCURRENT_READS_PER_FILE;
int BLOBSTORE_READ_BLOCK_SIZE;
int BLOBSTORE_READ_AHEAD_BLOCKS;
int BLOBSTORE_READ_CACHE_BLOCKS_PER_FILE;
int BLOBSTORE_MAX_SEND_BYTES_PER_SECOND;
int BLOBSTORE_MAX_RECV_BYTES_PER_SECOND;
int CONSISTENCY_CHECK_RATE_LIMIT_MAX;
int CONSISTENCY_CHECK_ONE_ROUND_TARGET_COMPLETION_TIME;
// fdbcli
int CLI_CONNECT_PARALLELISM;
double CLI_CONNECT_TIMEOUT;
// trace
int TRACE_LOG_FILE_IDENTIFIER_MAX_LENGTH;
// transaction tags
int MAX_TRANSACTION_TAG_LENGTH;
int MAX_TAGS_PER_TRANSACTION;
int COMMIT_SAMPLE_COST; // The expectation of sampling is every COMMIT_SAMPLE_COST sample once
int WRITE_COST_BYTE_FACTOR;
int INCOMPLETE_SHARD_PLUS; // The size of (possible) incomplete shard when estimate clear range
double READ_TAG_SAMPLE_RATE; // Communicated to clients from cluster
double TAG_THROTTLE_SMOOTHING_WINDOW;
double TAG_THROTTLE_RECHECK_INTERVAL;
double TAG_THROTTLE_EXPIRATION_INTERVAL;
ClientKnobs(Randomize randomize);
void initialize(Randomize randomize);
};
#endif

View File

@ -42,7 +42,7 @@ struct ClusterInterface {
UID id() const { return openDatabase.getEndpoint().token; }
NetworkAddress address() const { return openDatabase.getEndpoint().getPrimaryAddress(); }
bool hasMessage() {
bool hasMessage() const {
return openDatabase.getFuture().isReady() || failureMonitoring.getFuture().isReady() ||
databaseStatus.getFuture().isReady() || ping.getFuture().isReady() ||
getClientWorkers.getFuture().isReady() || forceRecovery.getFuture().isReady();

View File

@ -1,4 +1,3 @@
/*
* CommitProxyInterface.h
*

159
fdbclient/ConfigKnobs.cpp Normal file
View File

@ -0,0 +1,159 @@
/*
* ConfigKnobs.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/ConfigKnobs.h"
#include "fdbclient/Tuple.h"
#include "flow/UnitTest.h"
ConfigKey ConfigKeyRef::decodeKey(KeyRef const& key) {
Tuple tuple;
try {
tuple = Tuple::unpack(key);
} catch (Error& e) {
TraceEvent(SevWarnAlways, "FailedToUnpackConfigKey").detail("Key", printable(key)).error(e);
throw invalid_config_db_key();
}
if (tuple.size() != 2) {
throw invalid_config_db_key();
}
if (tuple.getType(0) == Tuple::NULL_TYPE) {
return ConfigKeyRef({}, tuple.getString(1));
} else {
if (tuple.getType(0) != Tuple::BYTES || tuple.getType(1) != Tuple::BYTES) {
throw invalid_config_db_key();
}
return ConfigKeyRef(tuple.getString(0), tuple.getString(1));
}
}
Value KnobValueRef::ToValueFunc::operator()(int v) const {
return BinaryWriter::toValue(v, Unversioned());
}
Value KnobValueRef::ToValueFunc::operator()(int64_t v) const {
return BinaryWriter::toValue(v, Unversioned());
}
Value KnobValueRef::ToValueFunc::operator()(bool v) const {
return BinaryWriter::toValue(v, Unversioned());
}
Value KnobValueRef::ToValueFunc::operator()(ValueRef v) const {
return v;
}
Value KnobValueRef::ToValueFunc::operator()(double v) const {
return BinaryWriter::toValue(v, Unversioned());
}
KnobValue KnobValueRef::CreatorFunc::operator()(NoKnobFound) const {
ASSERT(false);
return {};
}
KnobValue KnobValueRef::CreatorFunc::operator()(int v) const {
return KnobValueRef(v);
}
KnobValue KnobValueRef::CreatorFunc::operator()(double v) const {
return KnobValueRef(v);
}
KnobValue KnobValueRef::CreatorFunc::operator()(int64_t v) const {
return KnobValueRef(v);
}
KnobValue KnobValueRef::CreatorFunc::operator()(bool v) const {
return KnobValueRef(v);
}
KnobValue KnobValueRef::CreatorFunc::operator()(std::string const& v) const {
return KnobValueRef(ValueRef(reinterpret_cast<uint8_t const*>(v.c_str()), v.size()));
}
namespace {
class SetKnobFunc {
Knobs* knobs;
std::string const* knobName;
public:
SetKnobFunc(Knobs& knobs, std::string const& knobName) : knobs(&knobs), knobName(&knobName) {}
template <class T>
bool operator()(T const& v) const {
return knobs->setKnob(*knobName, v);
}
bool operator()(StringRef const& v) const { return knobs->setKnob(*knobName, v.toString()); }
};
struct ToStringFunc {
std::string operator()(int v) const { return format("int:%d", v); }
std::string operator()(int64_t v) const { return format("int64_t:%ld", v); }
std::string operator()(bool v) const { return format("bool:%d", v); }
std::string operator()(ValueRef v) const { return "string:" + v.toString(); }
std::string operator()(double v) const { return format("double:%lf", v); }
};
} // namespace
KnobValue KnobValueRef::create(ParsedKnobValue const& v) {
return std::visit(CreatorFunc{}, v);
}
bool KnobValueRef::visitSetKnob(std::string const& knobName, Knobs& knobs) const {
return std::visit(SetKnobFunc{ knobs, knobName }, value);
}
std::string KnobValueRef::toString() const {
return std::visit(ToStringFunc{}, value);
}
TEST_CASE("/fdbclient/ConfigDB/ConfigKey/EncodeDecode") {
Tuple tuple;
tuple << "class-A"_sr
<< "test_long"_sr;
auto packed = tuple.pack();
auto unpacked = ConfigKeyRef::decodeKey(packed);
ASSERT(unpacked.configClass.get() == "class-A"_sr);
ASSERT(unpacked.knobName == "test_long"_sr);
return Void();
}
namespace {
void decodeFailureTest(KeyRef key) {
try {
ConfigKeyRef::decodeKey(key);
} catch (Error& e) {
ASSERT_EQ(e.code(), error_code_invalid_config_db_key);
return;
}
ASSERT(false);
}
} // namespace
TEST_CASE("/fdbclient/ConfigDB/ConfigKey/DecodeFailure") {
{
Tuple tuple;
tuple << "s1"_sr
<< "s2"_sr
<< "s3"_sr;
decodeFailureTest(tuple.pack());
}
{
Tuple tuple;
tuple << "s1"_sr << 5;
decodeFailureTest(tuple.pack());
}
decodeFailureTest("non-tuple-key"_sr);
return Void();
}

206
fdbclient/ConfigKnobs.h Normal file
View File

@ -0,0 +1,206 @@
/*
* ConfigKnobs.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include <string>
#include <variant>
#include "fdbclient/FDBTypes.h"
/*
* KnobValueRefs are stored in the configuration database, and in local configuration files. They are created from
* ParsedKnobValue objects, so it is assumed that the value type is correct for the corresponding knob name
*/
class KnobValueRef {
std::variant<int, double, int64_t, bool, ValueRef> value;
template <class T>
explicit KnobValueRef(T const& v) : value(std::in_place_type<T>, v) {}
explicit KnobValueRef(Arena& arena, ValueRef const& v) : value(std::in_place_type<ValueRef>, arena, v) {}
struct CreatorFunc {
Standalone<KnobValueRef> operator()(NoKnobFound) const;
Standalone<KnobValueRef> operator()(int) const;
Standalone<KnobValueRef> operator()(double) const;
Standalone<KnobValueRef> operator()(int64_t) const;
Standalone<KnobValueRef> operator()(bool) const;
Standalone<KnobValueRef> operator()(std::string const& v) const;
};
struct ToValueFunc {
Value operator()(int) const;
Value operator()(int64_t) const;
Value operator()(bool) const;
Value operator()(ValueRef) const;
Value operator()(double) const;
};
public:
static constexpr FileIdentifier file_identifier = 9297109;
template <class T>
static Value toValue(T const& v) {
return ToValueFunc{}(v);
}
KnobValueRef() = default;
explicit KnobValueRef(Arena& arena, KnobValueRef const& rhs) : value(rhs.value) {
if (std::holds_alternative<ValueRef>(value)) {
value = ValueRef(arena, std::get<ValueRef>(value));
}
}
static Standalone<KnobValueRef> create(ParsedKnobValue const& v);
size_t expectedSize() const {
return std::holds_alternative<KeyRef>(value) ? std::get<KeyRef>(value).expectedSize() : 0;
}
Value toValue() const { return std::visit(ToValueFunc{}, value); }
// Returns true if and only if the knob was successfully found and set
bool visitSetKnob(std::string const& knobName, Knobs& knobs) const;
std::string toString() const;
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, value);
}
};
using KnobValue = Standalone<KnobValueRef>;
/*
* In the configuration database, each key contains a configuration class (or no configuration class, in the case of
* global updates), and a knob name
*/
struct ConfigKeyRef {
static constexpr FileIdentifier file_identifier = 5918726;
// Empty config class means the update is global
Optional<KeyRef> configClass;
KeyRef knobName;
ConfigKeyRef() = default;
explicit ConfigKeyRef(Optional<KeyRef> configClass, KeyRef knobName)
: configClass(configClass), knobName(knobName) {}
explicit ConfigKeyRef(Arena& arena, Optional<KeyRef> configClass, KeyRef knobName)
: configClass(arena, configClass), knobName(arena, knobName) {}
explicit ConfigKeyRef(Arena& arena, ConfigKeyRef const& rhs) : ConfigKeyRef(arena, rhs.configClass, rhs.knobName) {}
static Standalone<ConfigKeyRef> decodeKey(KeyRef const&);
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, configClass, knobName);
}
bool operator==(ConfigKeyRef const& rhs) const {
return (configClass == rhs.configClass) && (knobName == rhs.knobName);
}
bool operator!=(ConfigKeyRef const& rhs) const { return !(*this == rhs); }
size_t expectedSize() const { return configClass.expectedSize() + knobName.expectedSize(); }
};
using ConfigKey = Standalone<ConfigKeyRef>;
inline bool operator<(ConfigKeyRef const& lhs, ConfigKeyRef const& rhs) {
if (lhs.configClass != rhs.configClass) {
return lhs.configClass < rhs.configClass;
} else {
return lhs.knobName < rhs.knobName;
}
}
/*
* Only set and point clear configuration database mutations are currently permitted.
*/
class ConfigMutationRef {
ConfigKeyRef key;
// Empty value means this is a clear mutation
Optional<KnobValueRef> value;
public:
static constexpr FileIdentifier file_identifier = 7219528;
ConfigMutationRef() = default;
explicit ConfigMutationRef(Arena& arena, ConfigKeyRef key, Optional<KnobValueRef> value)
: key(arena, key), value(arena, value) {}
explicit ConfigMutationRef(ConfigKeyRef key, Optional<KnobValueRef> value) : key(key), value(value) {}
ConfigKeyRef getKey() const { return key; }
Optional<KeyRef> getConfigClass() const { return key.configClass; }
KeyRef getKnobName() const { return key.knobName; }
KnobValueRef getValue() const { return value.get(); }
ConfigMutationRef(Arena& arena, ConfigMutationRef const& rhs) : key(arena, rhs.key), value(arena, rhs.value) {}
bool isSet() const { return value.present(); }
static Standalone<ConfigMutationRef> createConfigMutation(KeyRef encodedKey, KnobValueRef value) {
auto key = ConfigKeyRef::decodeKey(encodedKey);
return ConfigMutationRef(key, value);
}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, key, value);
}
size_t expectedSize() const { return key.expectedSize() + value.expectedSize(); }
};
using ConfigMutation = Standalone<ConfigMutationRef>;
/*
* Each configuration database commit is annotated with:
* - A description (set manually by the client)
* - A commit timestamp (automatically generated at commit time)
*/
struct ConfigCommitAnnotationRef {
KeyRef description;
double timestamp{ 0.0 };
ConfigCommitAnnotationRef() = default;
explicit ConfigCommitAnnotationRef(KeyRef description, double timestamp)
: description(description), timestamp(timestamp) {}
explicit ConfigCommitAnnotationRef(Arena& arena, ConfigCommitAnnotationRef& rhs)
: description(arena, rhs.description), timestamp(rhs.timestamp) {}
size_t expectedSize() const { return description.expectedSize(); }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, timestamp, description);
}
};
using ConfigCommitAnnotation = Standalone<ConfigCommitAnnotationRef>;
enum class UseConfigDB {
DISABLED,
SIMPLE,
PAXOS,
};

View File

@ -0,0 +1,47 @@
/*
* ConfigTransactionInterface.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/ConfigTransactionInterface.h"
#include "fdbclient/CoordinationInterface.h"
#include "flow/IRandom.h"
ConfigTransactionInterface::ConfigTransactionInterface() : _id(deterministicRandom()->randomUniqueID()) {}
void ConfigTransactionInterface::setupWellKnownEndpoints() {
getVersion.makeWellKnownEndpoint(WLTOKEN_CONFIGTXN_GETVERSION, TaskPriority::Coordination);
get.makeWellKnownEndpoint(WLTOKEN_CONFIGTXN_GET, TaskPriority::Coordination);
getClasses.makeWellKnownEndpoint(WLTOKEN_CONFIGTXN_GETCLASSES, TaskPriority::Coordination);
getKnobs.makeWellKnownEndpoint(WLTOKEN_CONFIGTXN_GETKNOBS, TaskPriority::Coordination);
commit.makeWellKnownEndpoint(WLTOKEN_CONFIGTXN_COMMIT, TaskPriority::Coordination);
}
ConfigTransactionInterface::ConfigTransactionInterface(NetworkAddress const& remote)
: getVersion(Endpoint({ remote }, WLTOKEN_CONFIGTXN_GETVERSION)), get(Endpoint({ remote }, WLTOKEN_CONFIGTXN_GET)),
getClasses(Endpoint({ remote }, WLTOKEN_CONFIGTXN_GETCLASSES)),
getKnobs(Endpoint({ remote }, WLTOKEN_CONFIGTXN_GETKNOBS)), commit(Endpoint({ remote }, WLTOKEN_CONFIGTXN_COMMIT)) {
}
bool ConfigTransactionInterface::operator==(ConfigTransactionInterface const& rhs) const {
return _id == rhs._id;
}
bool ConfigTransactionInterface::operator!=(ConfigTransactionInterface const& rhs) const {
return !(*this == rhs);
}

View File

@ -0,0 +1,193 @@
/*
* ConfigTransactionInterface.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/FDBTypes.h"
#include "fdbclient/CommitTransaction.h"
#include "fdbclient/ConfigKnobs.h"
#include "fdbclient/CoordinationInterface.h"
#include "fdbrpc/fdbrpc.h"
#include "flow/flow.h"
struct ConfigTransactionGetVersionReply {
static constexpr FileIdentifier file_identifier = 2934851;
ConfigTransactionGetVersionReply() = default;
explicit ConfigTransactionGetVersionReply(Version version) : version(version) {}
Version version;
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version);
}
};
struct ConfigTransactionGetVersionRequest {
static constexpr FileIdentifier file_identifier = 138941;
ReplyPromise<ConfigTransactionGetVersionReply> reply;
ConfigTransactionGetVersionRequest() = default;
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, reply);
}
};
struct ConfigTransactionGetReply {
static constexpr FileIdentifier file_identifier = 2034110;
Optional<KnobValue> value;
ConfigTransactionGetReply() = default;
explicit ConfigTransactionGetReply(Optional<KnobValue> const& value) : value(value) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, value);
}
};
struct ConfigTransactionGetRequest {
static constexpr FileIdentifier file_identifier = 923040;
Version version;
ConfigKey key;
ReplyPromise<ConfigTransactionGetReply> reply;
ConfigTransactionGetRequest() = default;
explicit ConfigTransactionGetRequest(Version version, ConfigKey key) : version(version), key(key) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version, key, reply);
}
};
struct ConfigTransactionCommitRequest {
static constexpr FileIdentifier file_identifier = 103841;
Arena arena;
Version version{ ::invalidVersion };
VectorRef<ConfigMutationRef> mutations;
ConfigCommitAnnotationRef annotation;
ReplyPromise<Void> reply;
size_t expectedSize() const { return mutations.expectedSize() + annotation.expectedSize(); }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, arena, version, mutations, annotation, reply);
}
};
struct ConfigTransactionGetRangeReply {
static constexpr FileIdentifier file_identifier = 430263;
Standalone<RangeResultRef> range;
ConfigTransactionGetRangeReply() = default;
explicit ConfigTransactionGetRangeReply(Standalone<RangeResultRef> range) : range(range) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, range);
}
};
struct ConfigTransactionGetConfigClassesReply {
static constexpr FileIdentifier file_identifier = 5309618;
Standalone<VectorRef<KeyRef>> configClasses;
ConfigTransactionGetConfigClassesReply() = default;
explicit ConfigTransactionGetConfigClassesReply(Standalone<VectorRef<KeyRef>> const& configClasses)
: configClasses(configClasses) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, configClasses);
}
};
struct ConfigTransactionGetConfigClassesRequest {
static constexpr FileIdentifier file_identifier = 7163400;
Version version;
ReplyPromise<ConfigTransactionGetConfigClassesReply> reply;
ConfigTransactionGetConfigClassesRequest() = default;
explicit ConfigTransactionGetConfigClassesRequest(Version version) : version(version) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version);
}
};
struct ConfigTransactionGetKnobsReply {
static constexpr FileIdentifier file_identifier = 4109852;
Standalone<VectorRef<KeyRef>> knobNames;
ConfigTransactionGetKnobsReply() = default;
explicit ConfigTransactionGetKnobsReply(Standalone<VectorRef<KeyRef>> const& knobNames) : knobNames(knobNames) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, knobNames);
}
};
struct ConfigTransactionGetKnobsRequest {
static constexpr FileIdentifier file_identifier = 987410;
Version version;
Optional<Key> configClass;
ReplyPromise<ConfigTransactionGetKnobsReply> reply;
ConfigTransactionGetKnobsRequest() = default;
explicit ConfigTransactionGetKnobsRequest(Version version, Optional<Key> configClass)
: version(version), configClass(configClass) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version, configClass, reply);
}
};
/*
* Configuration database nodes serve a ConfigTransactionInterface which contains well known endpoints,
* used by clients to transactionally update the configuration database
*/
struct ConfigTransactionInterface {
UID _id;
public:
static constexpr FileIdentifier file_identifier = 982485;
struct RequestStream<ConfigTransactionGetVersionRequest> getVersion;
struct RequestStream<ConfigTransactionGetRequest> get;
struct RequestStream<ConfigTransactionGetConfigClassesRequest> getClasses;
struct RequestStream<ConfigTransactionGetKnobsRequest> getKnobs;
struct RequestStream<ConfigTransactionCommitRequest> commit;
ConfigTransactionInterface();
void setupWellKnownEndpoints();
ConfigTransactionInterface(NetworkAddress const& remote);
bool operator==(ConfigTransactionInterface const& rhs) const;
bool operator!=(ConfigTransactionInterface const& rhs) const;
UID id() const { return _id; }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, getVersion, get, getClasses, getKnobs, commit);
}
};

View File

@ -30,6 +30,7 @@
const int MAX_CLUSTER_FILE_BYTES = 60000;
// well known endpoints published to the client.
constexpr UID WLTOKEN_CLIENTLEADERREG_GETLEADER(-1, 2);
constexpr UID WLTOKEN_CLIENTLEADERREG_OPENDATABASE(-1, 3);
@ -37,7 +38,12 @@ constexpr UID WLTOKEN_CLIENTLEADERREG_OPENDATABASE(-1, 3);
constexpr UID WLTOKEN_PROTOCOL_INFO(-1, 10);
constexpr UID WLTOKEN_CLIENTLEADERREG_DESCRIPTOR_MUTABLE(-1, 11);
// well known endpoints published to the client.
constexpr UID WLTOKEN_CONFIGTXN_GETVERSION(-1, 12);
constexpr UID WLTOKEN_CONFIGTXN_GET(-1, 13);
constexpr UID WLTOKEN_CONFIGTXN_GETCLASSES(-1, 14);
constexpr UID WLTOKEN_CONFIGTXN_GETKNOBS(-1, 15);
constexpr UID WLTOKEN_CONFIGTXN_COMMIT(-1, 16);
struct ClientLeaderRegInterface {
RequestStream<struct GetLeaderRequest> getLeader;
RequestStream<struct OpenDatabaseCoordRequest> openDatabase;

View File

@ -219,15 +219,15 @@ public:
Error deferredError;
bool lockAware;
bool isError() { return deferredError.code() != invalid_error_code; }
bool isError() const { return deferredError.code() != invalid_error_code; }
void checkDeferredError() {
void checkDeferredError() const {
if (isError()) {
throw deferredError;
}
}
int apiVersionAtLeast(int minVersion) { return apiVersion < 0 || apiVersion >= minVersion; }
int apiVersionAtLeast(int minVersion) const { return apiVersion < 0 || apiVersion >= minVersion; }
Future<Void> onConnected(); // Returns after a majority of coordination servers are available and have reported a
// leader. The cluster file therefore is valid, but the database might be unavailable.
@ -351,6 +351,7 @@ public:
Counter transactionGetKeyRequests;
Counter transactionGetValueRequests;
Counter transactionGetRangeRequests;
Counter transactionGetRangeStreamRequests;
Counter transactionWatchRequests;
Counter transactionGetAddressesForKeyRequests;
Counter transactionBytesRead;
@ -413,6 +414,7 @@ public:
double healthMetricsLastUpdated;
double detailedHealthMetricsLastUpdated;
Smoother smoothMidShardSize;
bool useConfigDatabase{ false };
UniqueOrderedOptionList<FDBTransactionOptions> transactionDefaults;

67
fdbclient/FDBTypes.cpp Normal file
View File

@ -0,0 +1,67 @@
/*
* FDBTypes.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/FDBTypes.h"
#include "fdbclient/Knobs.h"
KeyRef keyBetween(const KeyRangeRef& keys) {
int pos = 0; // will be the position of the first difference between keys.begin and keys.end
int minSize = std::min(keys.begin.size(), keys.end.size());
for (; pos < minSize && pos < CLIENT_KNOBS->SPLIT_KEY_SIZE_LIMIT; pos++) {
if (keys.begin[pos] != keys.end[pos]) {
return keys.end.substr(0, pos + 1);
}
}
// If one more character keeps us in the limit, and the latter key is simply
// longer, then we only need one more byte of the end string.
if (pos < CLIENT_KNOBS->SPLIT_KEY_SIZE_LIMIT && keys.begin.size() < keys.end.size()) {
return keys.end.substr(0, pos + 1);
}
return keys.end;
}
void KeySelectorRef::setKey(KeyRef const& key) {
// There are no keys in the database with size greater than KEY_SIZE_LIMIT, so if this key selector has a key
// which is large, then we can translate it to an equivalent key selector with a smaller key
if (key.size() >
(key.startsWith(LiteralStringRef("\xff")) ? CLIENT_KNOBS->SYSTEM_KEY_SIZE_LIMIT : CLIENT_KNOBS->KEY_SIZE_LIMIT))
this->key = key.substr(0,
(key.startsWith(LiteralStringRef("\xff")) ? CLIENT_KNOBS->SYSTEM_KEY_SIZE_LIMIT
: CLIENT_KNOBS->KEY_SIZE_LIMIT) +
1);
else
this->key = key;
}
std::string KeySelectorRef::toString() const {
if (offset > 0) {
if (orEqual)
return format("%d+firstGreaterThan(%s)", offset - 1, printable(key).c_str());
else
return format("%d+firstGreaterOrEqual(%s)", offset - 1, printable(key).c_str());
} else {
if (orEqual)
return format("%d+lastLessOrEqual(%s)", offset, printable(key).c_str());
else
return format("%d+lastLessThan(%s)", offset, printable(key).c_str());
}
}

View File

@ -28,7 +28,6 @@
#include "flow/Arena.h"
#include "flow/flow.h"
#include "fdbclient/Knobs.h"
typedef int64_t Version;
typedef uint64_t LogEpoch;
@ -514,28 +513,12 @@ inline KeyRange prefixRange(KeyRef prefix) {
range.contents() = KeyRangeRef(start, end);
return range;
}
inline KeyRef keyBetween(const KeyRangeRef& keys) {
// Returns (one of) the shortest key(s) either contained in keys or equal to keys.end,
// assuming its length is no more than CLIENT_KNOBS->SPLIT_KEY_SIZE_LIMIT. If the length of
// the shortest key exceeds that limit, then the end key is returned.
// The returned reference is valid as long as keys is valid.
int pos = 0; // will be the position of the first difference between keys.begin and keys.end
int minSize = std::min(keys.begin.size(), keys.end.size());
for (; pos < minSize && pos < CLIENT_KNOBS->SPLIT_KEY_SIZE_LIMIT; pos++) {
if (keys.begin[pos] != keys.end[pos]) {
return keys.end.substr(0, pos + 1);
}
}
// If one more character keeps us in the limit, and the latter key is simply
// longer, then we only need one more byte of the end string.
if (pos < CLIENT_KNOBS->SPLIT_KEY_SIZE_LIMIT && keys.begin.size() < keys.end.size()) {
return keys.end.substr(0, pos + 1);
}
return keys.end;
}
// Returns (one of) the shortest key(s) either contained in keys or equal to keys.end,
// assuming its length is no more than CLIENT_KNOBS->SPLIT_KEY_SIZE_LIMIT. If the length of
// the shortest key exceeds that limit, then the end key is returned.
// The returned reference is valid as long as keys is valid.
KeyRef keyBetween(const KeyRangeRef& keys);
struct KeySelectorRef {
private:
@ -560,32 +543,9 @@ public:
KeyRef getKey() const { return key; }
void setKey(KeyRef const& key) {
// There are no keys in the database with size greater than KEY_SIZE_LIMIT, so if this key selector has a key
// which is large, then we can translate it to an equivalent key selector with a smaller key
if (key.size() > (key.startsWith(LiteralStringRef("\xff")) ? CLIENT_KNOBS->SYSTEM_KEY_SIZE_LIMIT
: CLIENT_KNOBS->KEY_SIZE_LIMIT))
this->key = key.substr(0,
(key.startsWith(LiteralStringRef("\xff")) ? CLIENT_KNOBS->SYSTEM_KEY_SIZE_LIMIT
: CLIENT_KNOBS->KEY_SIZE_LIMIT) +
1);
else
this->key = key;
}
void setKey(KeyRef const& key);
std::string toString() const {
if (offset > 0) {
if (orEqual)
return format("%d+firstGreaterThan(%s)", offset - 1, printable(key).c_str());
else
return format("%d+firstGreaterOrEqual(%s)", offset - 1, printable(key).c_str());
} else {
if (orEqual)
return format("%d+lastLessOrEqual(%s)", offset, printable(key).c_str());
else
return format("%d+lastLessThan(%s)", offset, printable(key).c_str());
}
}
std::string toString() const;
bool isBackward() const {
return !orEqual && offset <= 0;
@ -709,7 +669,6 @@ struct RangeResultRef : VectorRef<KeyValueRef> {
" readToBegin:" + std::to_string(readToBegin) + " readThroughEnd:" + std::to_string(readThroughEnd);
}
};
using RangeResult = Standalone<RangeResultRef>;
template <>
struct Traceable<RangeResultRef> : std::true_type {

View File

@ -0,0 +1,35 @@
/*
* IConfigTransaction.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/IConfigTransaction.h"
#include "fdbclient/SimpleConfigTransaction.h"
#include "fdbclient/PaxosConfigTransaction.h"
Reference<IConfigTransaction> IConfigTransaction::createTestSimple(ConfigTransactionInterface const& cti) {
return makeReference<SimpleConfigTransaction>(cti);
}
Reference<IConfigTransaction> IConfigTransaction::createSimple(Database const& cx) {
return makeReference<SimpleConfigTransaction>(cx);
}
Reference<IConfigTransaction> IConfigTransaction::createPaxos(Database const& cx) {
return makeReference<PaxosConfigTransaction>(cx);
}

View File

@ -0,0 +1,67 @@
/*
* IConfigTransaction.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include <memory>
#include "fdbclient/ConfigKnobs.h"
#include "fdbclient/ConfigTransactionInterface.h"
#include "fdbclient/ISingleThreadTransaction.h"
#include "fdbclient/NativeAPI.actor.h"
/*
* Configuration transactions are used by clients to update the configuration database, in order
* to dynamically update knobs. The interface is similar to that of regular transactions, but simpler, and
* many virtual methods of ISingleThreadTransaction are disallowed here
*/
class IConfigTransaction : public ISingleThreadTransaction {
protected:
IConfigTransaction() = default;
public:
virtual ~IConfigTransaction() = default;
static Reference<IConfigTransaction> createTestSimple(ConfigTransactionInterface const&);
static Reference<IConfigTransaction> createSimple(Database const&);
static Reference<IConfigTransaction> createPaxos(Database const&);
// Not implemented:
void setVersion(Version) override { throw client_invalid_operation(); }
Future<Key> getKey(KeySelector const& key, bool snapshot = false) override { throw client_invalid_operation(); }
Future<Standalone<VectorRef<const char*>>> getAddressesForKey(Key const& key) override {
throw client_invalid_operation();
}
Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(KeyRange const& range, int64_t chunkSize) override {
throw client_invalid_operation();
}
Future<int64_t> getEstimatedRangeSizeBytes(KeyRange const& keys) override { throw client_invalid_operation(); }
void addReadConflictRange(KeyRangeRef const& keys) override { throw client_invalid_operation(); }
void makeSelfConflicting() override { throw client_invalid_operation(); }
void atomicOp(KeyRef const& key, ValueRef const& operand, uint32_t operationType) override {
throw client_invalid_operation();
}
Future<Void> watch(Key const& key) override { throw client_invalid_operation(); }
void addWriteConflictRange(KeyRangeRef const& keys) override { throw client_invalid_operation(); }
Future<Standalone<StringRef>> getVersionstamp() override { throw client_invalid_operation(); }
// Implemented:
void getWriteConflicts(KeyRangeMap<bool>* result) override{};
};

View File

@ -0,0 +1,90 @@
/*
* IKnobCollection.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/IKnobCollection.h"
#include "fdbclient/ClientKnobCollection.h"
#include "fdbclient/ServerKnobCollection.h"
#include "fdbclient/TestKnobCollection.h"
std::unique_ptr<IKnobCollection> IKnobCollection::create(Type type, Randomize randomize, IsSimulated isSimulated) {
if (type == Type::CLIENT) {
return std::make_unique<ClientKnobCollection>(randomize, isSimulated);
} else if (type == Type::SERVER) {
return std::make_unique<ServerKnobCollection>(randomize, isSimulated);
} else if (type == Type::TEST) {
return std::make_unique<TestKnobCollection>(randomize, isSimulated);
}
UNSTOPPABLE_ASSERT(false);
}
KnobValue IKnobCollection::parseKnobValue(std::string const& knobName, std::string const& knobValue) const {
auto result = tryParseKnobValue(knobName, knobValue);
if (!result.present()) {
throw invalid_option();
}
return result.get();
}
void IKnobCollection::setKnob(std::string const& knobName, KnobValueRef const& knobValue) {
if (!trySetKnob(knobName, knobValue)) {
TraceEvent(SevWarnAlways, "FailedToSetKnob")
.detail("KnobName", knobName)
.detail("KnobValue", knobValue.toString());
throw invalid_option_value();
}
}
KnobValue IKnobCollection::parseKnobValue(std::string const& knobName, std::string const& knobValue, Type type) {
// TODO: Ideally it should not be necessary to create a template object to parse knobs
static std::unique_ptr<IKnobCollection> clientKnobCollection, serverKnobCollection, testKnobCollection;
if (type == Type::CLIENT) {
if (!clientKnobCollection) {
clientKnobCollection = create(type, Randomize::NO, IsSimulated::NO);
}
return clientKnobCollection->parseKnobValue(knobName, knobValue);
} else if (type == Type::SERVER) {
if (!serverKnobCollection) {
serverKnobCollection = create(type, Randomize::NO, IsSimulated::NO);
}
return serverKnobCollection->parseKnobValue(knobName, knobValue);
} else if (type == Type::TEST) {
if (!testKnobCollection) {
testKnobCollection = create(type, Randomize::NO, IsSimulated::NO);
}
return testKnobCollection->parseKnobValue(knobName, knobValue);
}
UNSTOPPABLE_ASSERT(false);
}
std::unique_ptr<IKnobCollection> IKnobCollection::globalKnobCollection =
IKnobCollection::create(IKnobCollection::Type::CLIENT, Randomize::NO, IsSimulated::NO);
void IKnobCollection::setGlobalKnobCollection(Type type, Randomize randomize, IsSimulated isSimulated) {
globalKnobCollection = create(type, randomize, isSimulated);
FLOW_KNOBS = &globalKnobCollection->getFlowKnobs();
}
IKnobCollection const& IKnobCollection::getGlobalKnobCollection() {
return *globalKnobCollection;
}
IKnobCollection& IKnobCollection::getMutableGlobalKnobCollection() {
return *globalKnobCollection;
}

View File

@ -0,0 +1,65 @@
/*
* IKnobCollection.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include <memory>
#include "fdbclient/ClientKnobs.h"
#include "fdbclient/ConfigKnobs.h"
#include "fdbclient/ServerKnobs.h"
#include "flow/Knobs.h"
/*
* Each IKnobCollection instantiation stores several Knobs objects, and when parsing or setting a knob,
* these Knobs objects will be traversed in order, until the requested knob name is found. The order of traversal is:
* - FlowKnobs
* - ClientKnobs
* - ServerKnobs
* - TestKnobs
*/
class IKnobCollection {
static std::unique_ptr<IKnobCollection> globalKnobCollection;
public:
enum class Type {
CLIENT,
SERVER,
TEST,
};
static std::unique_ptr<IKnobCollection> create(Type, Randomize, IsSimulated);
virtual void initialize(Randomize randomize, IsSimulated isSimulated) = 0;
virtual void reset(Randomize randomize, IsSimulated isSimulated) = 0;
virtual FlowKnobs const& getFlowKnobs() const = 0;
virtual ClientKnobs const& getClientKnobs() const = 0;
virtual ServerKnobs const& getServerKnobs() const = 0;
virtual class TestKnobs const& getTestKnobs() const = 0;
virtual Optional<KnobValue> tryParseKnobValue(std::string const& knobName, std::string const& knobValue) const = 0;
KnobValue parseKnobValue(std::string const& knobName, std::string const& knobValue) const;
static KnobValue parseKnobValue(std::string const& knobName, std::string const& knobValue, Type);
// Result indicates whether or not knob was successfully set:
virtual bool trySetKnob(std::string const& knobName, KnobValueRef const& knobValue) = 0;
void setKnob(std::string const& knobName, KnobValueRef const& knobValue);
static void setGlobalKnobCollection(Type, Randomize, IsSimulated);
static IKnobCollection const& getGlobalKnobCollection();
static IKnobCollection& getMutableGlobalKnobCollection();
};

View File

@ -0,0 +1,58 @@
/*
* ISingleThreadTransaction.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/DatabaseContext.h"
#include "fdbclient/ISingleThreadTransaction.h"
#include "fdbclient/PaxosConfigTransaction.h"
#include "fdbclient/ReadYourWrites.h"
#include "fdbclient/SimpleConfigTransaction.h"
ISingleThreadTransaction* ISingleThreadTransaction::allocateOnForeignThread(Type type) {
if (type == Type::RYW) {
auto tr =
(ReadYourWritesTransaction*)(ReadYourWritesTransaction::operator new(sizeof(ReadYourWritesTransaction)));
tr->preinitializeOnForeignThread();
return tr;
} else if (type == Type::SIMPLE_CONFIG) {
auto tr = (SimpleConfigTransaction*)(SimpleConfigTransaction::operator new(sizeof(SimpleConfigTransaction)));
return tr;
} else if (type == Type::PAXOS_CONFIG) {
auto tr = (PaxosConfigTransaction*)(PaxosConfigTransaction::operator new(sizeof(PaxosConfigTransaction)));
return tr;
}
ASSERT(false);
return nullptr;
}
void ISingleThreadTransaction::create(ISingleThreadTransaction* tr, Type type, Database db) {
switch (type) {
case Type::RYW:
new (tr) ReadYourWritesTransaction(db);
break;
case Type::SIMPLE_CONFIG:
new (tr) SimpleConfigTransaction(db);
break;
case Type::PAXOS_CONFIG:
new (tr) PaxosConfigTransaction(db);
break;
default:
ASSERT(false);
}
}

View File

@ -0,0 +1,89 @@
/*
* ISingleThreadTransaction.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/FDBOptions.g.h"
#include "fdbclient/FDBTypes.h"
#include "fdbclient/KeyRangeMap.h"
#include "flow/Error.h"
#include "flow/FastRef.h"
/*
* Used by ThreadSafeTransaction to execute normal or configuration transaction operations on the network thread
*/
class ISingleThreadTransaction : public ReferenceCounted<ISingleThreadTransaction> {
protected:
ISingleThreadTransaction() = default;
ISingleThreadTransaction(Error const& deferredError) : deferredError(deferredError) {}
public:
virtual ~ISingleThreadTransaction() = default;
enum class Type {
RYW,
SIMPLE_CONFIG,
PAXOS_CONFIG,
};
static ISingleThreadTransaction* allocateOnForeignThread(Type type);
static void create(ISingleThreadTransaction* tr, Type type, Database db);
virtual void setVersion(Version v) = 0;
virtual Future<Version> getReadVersion() = 0;
virtual Optional<Version> getCachedReadVersion() const = 0;
virtual Future<Optional<Value>> get(const Key& key, bool snapshot = false) = 0;
virtual Future<Key> getKey(const KeySelector& key, bool snapshot = false) = 0;
virtual Future<Standalone<RangeResultRef>> getRange(const KeySelector& begin,
const KeySelector& end,
int limit,
bool snapshot = false,
bool reverse = false) = 0;
virtual Future<Standalone<RangeResultRef>> getRange(KeySelector begin,
KeySelector end,
GetRangeLimits limits,
bool snapshot = false,
bool reverse = false) = 0;
virtual Future<Standalone<VectorRef<const char*>>> getAddressesForKey(Key const& key) = 0;
virtual Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(KeyRange const& range, int64_t chunkSize) = 0;
virtual Future<int64_t> getEstimatedRangeSizeBytes(KeyRange const& keys) = 0;
virtual void addReadConflictRange(KeyRangeRef const& keys) = 0;
virtual void makeSelfConflicting() = 0;
virtual void atomicOp(KeyRef const& key, ValueRef const& operand, uint32_t operationType) = 0;
virtual void set(KeyRef const& key, ValueRef const& value) = 0;
virtual void clear(const KeyRangeRef& range) = 0;
virtual void clear(KeyRef const& key) = 0;
virtual Future<Void> watch(Key const& key) = 0;
virtual void addWriteConflictRange(KeyRangeRef const& keys) = 0;
virtual Future<Void> commit() = 0;
virtual Version getCommittedVersion() const = 0;
virtual int64_t getApproximateSize() const = 0;
virtual Future<Standalone<StringRef>> getVersionstamp() = 0;
virtual void setOption(FDBTransactionOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) = 0;
virtual Future<Void> onError(Error const& e) = 0;
virtual void cancel() = 0;
virtual void reset() = 0;
virtual void debugTransaction(UID dID) = 0;
virtual void checkDeferredError() const = 0;
virtual void getWriteConflicts(KeyRangeMap<bool>* result) = 0;
// Used by ThreadSafeTransaction for exceptions thrown in void methods
Error deferredError;
};

View File

@ -18,225 +18,8 @@
* limitations under the License.
*/
#ifndef FDBCLIENT_KNOBS_H
#define FDBCLIENT_KNOBS_H
#pragma once
#include "flow/Knobs.h"
#include "flow/flow.h"
#include "fdbclient/IKnobCollection.h"
class ClientKnobs : public Knobs {
public:
int TOO_MANY; // FIXME: this should really be split up so we can control these more specifically
double SYSTEM_MONITOR_INTERVAL;
double NETWORK_BUSYNESS_MONITOR_INTERVAL; // The interval in which we should update the network busyness metric
double FAILURE_MAX_DELAY;
double FAILURE_MIN_DELAY;
double FAILURE_TIMEOUT_DELAY;
double CLIENT_FAILURE_TIMEOUT_DELAY;
double FAILURE_EMERGENCY_DELAY;
double FAILURE_MAX_GENERATIONS;
double RECOVERY_DELAY_START_GENERATION;
double RECOVERY_DELAY_SECONDS_PER_GENERATION;
double MAX_GENERATIONS;
double MAX_GENERATIONS_OVERRIDE;
double MAX_GENERATIONS_SIM;
double COORDINATOR_RECONNECTION_DELAY;
int CLIENT_EXAMPLE_AMOUNT;
double MAX_CLIENT_STATUS_AGE;
int MAX_COMMIT_PROXY_CONNECTIONS;
int MAX_GRV_PROXY_CONNECTIONS;
double STATUS_IDLE_TIMEOUT;
// wrong_shard_server sometimes comes from the only nonfailed server, so we need to avoid a fast spin
double WRONG_SHARD_SERVER_DELAY; // SOMEDAY: This delay can limit performance of retrieving data when the cache is
// mostly wrong (e.g. dumping the database after a test)
double FUTURE_VERSION_RETRY_DELAY;
int REPLY_BYTE_LIMIT;
double DEFAULT_BACKOFF;
double DEFAULT_MAX_BACKOFF;
double BACKOFF_GROWTH_RATE;
double RESOURCE_CONSTRAINED_MAX_BACKOFF;
int PROXY_COMMIT_OVERHEAD_BYTES;
double SHARD_STAT_SMOOTH_AMOUNT;
int INIT_MID_SHARD_BYTES;
int TRANSACTION_SIZE_LIMIT;
int64_t KEY_SIZE_LIMIT;
int64_t SYSTEM_KEY_SIZE_LIMIT;
int64_t VALUE_SIZE_LIMIT;
int64_t SPLIT_KEY_SIZE_LIMIT;
int METADATA_VERSION_CACHE_SIZE;
int MAX_BATCH_SIZE;
double GRV_BATCH_TIMEOUT;
int BROADCAST_BATCH_SIZE;
double TRANSACTION_TIMEOUT_DELAY_INTERVAL;
// When locationCache in DatabaseContext gets to be this size, items will be evicted
int LOCATION_CACHE_EVICTION_SIZE;
int LOCATION_CACHE_EVICTION_SIZE_SIM;
int GET_RANGE_SHARD_LIMIT;
int WARM_RANGE_SHARD_LIMIT;
int STORAGE_METRICS_SHARD_LIMIT;
int SHARD_COUNT_LIMIT;
double STORAGE_METRICS_UNFAIR_SPLIT_LIMIT;
double STORAGE_METRICS_TOO_MANY_SHARDS_DELAY;
double AGGREGATE_HEALTH_METRICS_MAX_STALENESS;
double DETAILED_HEALTH_METRICS_MAX_STALENESS;
double MID_SHARD_SIZE_MAX_STALENESS;
bool TAG_ENCODE_KEY_SERVERS;
bool QUARANTINE_TSS_ON_MISMATCH;
// KeyRangeMap
int KRM_GET_RANGE_LIMIT;
int KRM_GET_RANGE_LIMIT_BYTES; // This must be sufficiently larger than KEY_SIZE_LIMIT to ensure that at least two
// entries will be returned from an attempt to read a key range map
int DEFAULT_MAX_OUTSTANDING_WATCHES;
int ABSOLUTE_MAX_WATCHES; // The client cannot set the max outstanding watches higher than this
double WATCH_POLLING_TIME;
double NO_RECENT_UPDATES_DURATION;
double FAST_WATCH_TIMEOUT;
double WATCH_TIMEOUT;
double IS_ACCEPTABLE_DELAY;
// Core
int64_t CORE_VERSIONSPERSECOND; // This is defined within the server but used for knobs based on server value
int LOG_RANGE_BLOCK_SIZE;
int MUTATION_BLOCK_SIZE;
// Taskbucket
double TASKBUCKET_LOGGING_DELAY;
int TASKBUCKET_MAX_PRIORITY;
double TASKBUCKET_CHECK_TIMEOUT_CHANCE;
double TASKBUCKET_TIMEOUT_JITTER_OFFSET;
double TASKBUCKET_TIMEOUT_JITTER_RANGE;
double TASKBUCKET_CHECK_ACTIVE_DELAY;
int TASKBUCKET_CHECK_ACTIVE_AMOUNT;
int TASKBUCKET_TIMEOUT_VERSIONS;
int TASKBUCKET_MAX_TASK_KEYS;
// Backup
int BACKUP_LOCAL_FILE_WRITE_BLOCK;
int BACKUP_CONCURRENT_DELETES;
int BACKUP_SIMULATED_LIMIT_BYTES;
int BACKUP_GET_RANGE_LIMIT_BYTES;
int BACKUP_LOCK_BYTES;
double BACKUP_RANGE_TIMEOUT;
double BACKUP_RANGE_MINWAIT;
int BACKUP_SNAPSHOT_DISPATCH_INTERVAL_SEC;
int BACKUP_DEFAULT_SNAPSHOT_INTERVAL_SEC;
int BACKUP_SHARD_TASK_LIMIT;
double BACKUP_AGGREGATE_POLL_RATE;
double BACKUP_AGGREGATE_POLL_RATE_UPDATE_INTERVAL;
int BACKUP_LOG_WRITE_BATCH_MAX_SIZE;
int BACKUP_LOG_ATOMIC_OPS_SIZE;
int BACKUP_MAX_LOG_RANGES;
int BACKUP_SIM_COPY_LOG_RANGES;
int BACKUP_OPERATION_COST_OVERHEAD;
int BACKUP_VERSION_DELAY;
int BACKUP_MAP_KEY_LOWER_LIMIT;
int BACKUP_MAP_KEY_UPPER_LIMIT;
int BACKUP_COPY_TASKS;
int BACKUP_BLOCK_SIZE;
int COPY_LOG_BLOCK_SIZE;
int COPY_LOG_BLOCKS_PER_TASK;
int COPY_LOG_PREFETCH_BLOCKS;
int COPY_LOG_READ_AHEAD_BYTES;
double COPY_LOG_TASK_DURATION_NANOS;
int BACKUP_TASKS_PER_AGENT;
int BACKUP_POLL_PROGRESS_SECONDS;
int64_t VERSIONS_PER_SECOND; // Copy of SERVER_KNOBS, as we can't link with it
int SIM_BACKUP_TASKS_PER_AGENT;
int BACKUP_RANGEFILE_BLOCK_SIZE;
int BACKUP_LOGFILE_BLOCK_SIZE;
int BACKUP_DISPATCH_ADDTASK_SIZE;
int RESTORE_DISPATCH_ADDTASK_SIZE;
int RESTORE_DISPATCH_BATCH_SIZE;
int RESTORE_WRITE_TX_SIZE;
int APPLY_MAX_LOCK_BYTES;
int APPLY_MIN_LOCK_BYTES;
int APPLY_BLOCK_SIZE;
double APPLY_MAX_DECAY_RATE;
double APPLY_MAX_INCREASE_FACTOR;
double BACKUP_ERROR_DELAY;
double BACKUP_STATUS_DELAY;
double BACKUP_STATUS_JITTER;
double MIN_CLEANUP_SECONDS;
int64_t FASTRESTORE_ATOMICOP_WEIGHT; // workload amplication factor for atomic op
// Configuration
int32_t DEFAULT_AUTO_COMMIT_PROXIES;
int32_t DEFAULT_AUTO_GRV_PROXIES;
int32_t DEFAULT_COMMIT_GRV_PROXIES_RATIO;
int32_t DEFAULT_MAX_GRV_PROXIES;
int32_t DEFAULT_AUTO_RESOLVERS;
int32_t DEFAULT_AUTO_LOGS;
// Client Status Info
double CSI_SAMPLING_PROBABILITY;
int64_t CSI_SIZE_LIMIT;
double CSI_STATUS_DELAY;
int HTTP_SEND_SIZE;
int HTTP_READ_SIZE;
int HTTP_VERBOSE_LEVEL;
std::string HTTP_REQUEST_ID_HEADER;
int BLOBSTORE_CONNECT_TRIES;
int BLOBSTORE_CONNECT_TIMEOUT;
int BLOBSTORE_MAX_CONNECTION_LIFE;
int BLOBSTORE_REQUEST_TRIES;
int BLOBSTORE_REQUEST_TIMEOUT_MIN;
int BLOBSTORE_REQUESTS_PER_SECOND;
int BLOBSTORE_LIST_REQUESTS_PER_SECOND;
int BLOBSTORE_WRITE_REQUESTS_PER_SECOND;
int BLOBSTORE_READ_REQUESTS_PER_SECOND;
int BLOBSTORE_DELETE_REQUESTS_PER_SECOND;
int BLOBSTORE_CONCURRENT_REQUESTS;
int BLOBSTORE_MULTIPART_MAX_PART_SIZE;
int BLOBSTORE_MULTIPART_MIN_PART_SIZE;
int BLOBSTORE_CONCURRENT_UPLOADS;
int BLOBSTORE_CONCURRENT_LISTS;
int BLOBSTORE_CONCURRENT_WRITES_PER_FILE;
int BLOBSTORE_CONCURRENT_READS_PER_FILE;
int BLOBSTORE_READ_BLOCK_SIZE;
int BLOBSTORE_READ_AHEAD_BLOCKS;
int BLOBSTORE_READ_CACHE_BLOCKS_PER_FILE;
int BLOBSTORE_MAX_SEND_BYTES_PER_SECOND;
int BLOBSTORE_MAX_RECV_BYTES_PER_SECOND;
int CONSISTENCY_CHECK_RATE_LIMIT_MAX;
int CONSISTENCY_CHECK_ONE_ROUND_TARGET_COMPLETION_TIME;
// fdbcli
int CLI_CONNECT_PARALLELISM;
double CLI_CONNECT_TIMEOUT;
// trace
int TRACE_LOG_FILE_IDENTIFIER_MAX_LENGTH;
// transaction tags
int MAX_TRANSACTION_TAG_LENGTH;
int MAX_TAGS_PER_TRANSACTION;
int COMMIT_SAMPLE_COST; // The expectation of sampling is every COMMIT_SAMPLE_COST sample once
int WRITE_COST_BYTE_FACTOR;
int INCOMPLETE_SHARD_PLUS; // The size of (possible) incomplete shard when estimate clear range
double READ_TAG_SAMPLE_RATE; // Communicated to clients from cluster
double TAG_THROTTLE_SMOOTHING_WINDOW;
double TAG_THROTTLE_RECHECK_INTERVAL;
double TAG_THROTTLE_EXPIRATION_INTERVAL;
ClientKnobs();
void initialize(bool randomize = false);
};
extern std::unique_ptr<ClientKnobs> globalClientKnobs;
extern ClientKnobs const* CLIENT_KNOBS;
#endif
#define CLIENT_KNOBS (&IKnobCollection::getGlobalKnobCollection().getClientKnobs())

View File

@ -37,15 +37,16 @@
#include "fdbclient/CoordinationInterface.h"
#include "fdbclient/DatabaseContext.h"
#include "fdbclient/GlobalConfig.actor.h"
#include "fdbclient/IKnobCollection.h"
#include "fdbclient/JsonBuilder.h"
#include "fdbclient/KeyBackedTypes.h"
#include "fdbclient/KeyRangeMap.h"
#include "fdbclient/Knobs.h"
#include "fdbclient/ManagementAPI.actor.h"
#include "fdbclient/CommitProxyInterface.h"
#include "fdbclient/MonitorLeader.h"
#include "fdbclient/MutationList.h"
#include "fdbclient/ReadYourWrites.h"
#include "fdbclient/ParallelStream.actor.h"
#include "fdbclient/SpecialKeySpace.actor.h"
#include "fdbclient/StorageServerInterface.h"
#include "fdbclient/SystemData.h"
@ -1090,7 +1091,8 @@ DatabaseContext::DatabaseContext(Reference<AsyncVar<Reference<ClusterConnectionF
transactionLogicalReads("LogicalUncachedReads", cc), transactionPhysicalReads("PhysicalReadRequests", cc),
transactionPhysicalReadsCompleted("PhysicalReadRequestsCompleted", cc),
transactionGetKeyRequests("GetKeyRequests", cc), transactionGetValueRequests("GetValueRequests", cc),
transactionGetRangeRequests("GetRangeRequests", cc), transactionWatchRequests("WatchRequests", cc),
transactionGetRangeRequests("GetRangeRequests", cc),
transactionGetRangeStreamRequests("GetRangeStreamRequests", cc), transactionWatchRequests("WatchRequests", cc),
transactionGetAddressesForKeyRequests("GetAddressesForKeyRequests", cc), transactionBytesRead("BytesRead", cc),
transactionKeysRead("KeysRead", cc), transactionMetadataVersionReads("MetadataVersionReads", cc),
transactionCommittedMutations("CommittedMutations", cc),
@ -1324,7 +1326,8 @@ DatabaseContext::DatabaseContext(const Error& err)
transactionLogicalReads("LogicalUncachedReads", cc), transactionPhysicalReads("PhysicalReadRequests", cc),
transactionPhysicalReadsCompleted("PhysicalReadRequestsCompleted", cc),
transactionGetKeyRequests("GetKeyRequests", cc), transactionGetValueRequests("GetValueRequests", cc),
transactionGetRangeRequests("GetRangeRequests", cc), transactionWatchRequests("WatchRequests", cc),
transactionGetRangeRequests("GetRangeRequests", cc),
transactionGetRangeStreamRequests("GetRangeStreamRequests", cc), transactionWatchRequests("WatchRequests", cc),
transactionGetAddressesForKeyRequests("GetAddressesForKeyRequests", cc), transactionBytesRead("BytesRead", cc),
transactionKeysRead("KeysRead", cc), transactionMetadataVersionReads("MetadataVersionReads", cc),
transactionCommittedMutations("CommittedMutations", cc),
@ -1547,6 +1550,10 @@ void DatabaseContext::setOption(FDBDatabaseOptions::Option option, Optional<Stri
validateOptionValue(value, false);
transactionTracingEnabled--;
break;
case FDBDatabaseOptions::USE_CONFIG_DATABASE:
validateOptionValue(value, false);
useConfigDatabase = true;
break;
default:
break;
}
@ -1835,14 +1842,12 @@ void setNetworkOption(FDBNetworkOptions::Option option, Optional<StringRef> valu
}
std::string knobName = optionValue.substr(0, eq);
std::string knobValue = optionValue.substr(eq + 1);
if (globalFlowKnobs->setKnob(knobName, knobValue)) {
// update dependent knobs
globalFlowKnobs->initialize();
} else if (globalClientKnobs->setKnob(knobName, knobValue)) {
// update dependent knobs
globalClientKnobs->initialize();
} else {
std::string knobValueString = optionValue.substr(eq + 1);
try {
auto knobValue = IKnobCollection::parseKnobValue(knobName, knobValueString, IKnobCollection::Type::CLIENT);
IKnobCollection::getMutableGlobalKnobCollection().setKnob(knobName, knobValue);
} catch (Error& e) {
TraceEvent(SevWarnAlways, "UnrecognizedKnob").detail("Knob", knobName.c_str());
fprintf(stderr, "FoundationDB client ignoring unrecognized knob option '%s'\n", knobName.c_str());
}
@ -3475,6 +3480,324 @@ ACTOR Future<RangeResult> getRange(Database cx,
}
}
// Streams all of the KV pairs in a target key range into a ParallelStream fragment
ACTOR Future<Void> getRangeStreamFragment(ParallelStream<RangeResult>::Fragment* results,
Database cx,
Reference<TransactionLogInfo> trLogInfo,
Version version,
KeyRange keys,
GetRangeLimits limits,
bool snapshot,
bool reverse,
TransactionInfo info,
TagSet tags,
SpanID spanContext) {
loop {
state vector<pair<KeyRange, Reference<LocationInfo>>> locations = wait(getKeyRangeLocations(
cx, keys, CLIENT_KNOBS->GET_RANGE_SHARD_LIMIT, reverse, &StorageServerInterface::getKeyValuesStream, info));
ASSERT(locations.size());
state int shard = 0;
loop {
const KeyRange& range = locations[shard].first;
state GetKeyValuesStreamRequest req;
req.version = version;
req.begin = firstGreaterOrEqual(range.begin);
req.end = firstGreaterOrEqual(range.end);
req.spanContext = spanContext;
req.limit = reverse ? -CLIENT_KNOBS->REPLY_BYTE_LIMIT : CLIENT_KNOBS->REPLY_BYTE_LIMIT;
req.limitBytes = std::numeric_limits<int>::max();
ASSERT(req.limitBytes > 0 && req.limit != 0 && req.limit < 0 == reverse);
// FIXME: buggify byte limits on internal functions that use them, instead of globally
req.tags = cx->sampleReadTags() ? tags : Optional<TagSet>();
req.debugID = info.debugID;
try {
if (info.debugID.present()) {
g_traceBatch.addEvent(
"TransactionDebug", info.debugID.get().first(), "NativeAPI.RangeStream.Before");
}
++cx->transactionPhysicalReads;
state GetKeyValuesStreamReply rep;
if (locations[shard].second->size() == 0) {
wait(cx->connectionFileChanged());
results->sendError(transaction_too_old());
return Void();
}
state int useIdx = -1;
loop {
// FIXME: create a load balance function for this code so future users of reply streams do not have
// to duplicate this code
int count = 0;
for (int i = 0; i < locations[shard].second->size(); i++) {
if (!IFailureMonitor::failureMonitor()
.getState(locations[shard]
.second->get(i, &StorageServerInterface::getKeyValuesStream)
.getEndpoint())
.failed) {
if (deterministicRandom()->random01() <= 1.0 / ++count) {
useIdx = i;
}
}
}
if (useIdx >= 0) {
break;
}
vector<Future<Void>> ok(locations[shard].second->size());
for (int i = 0; i < ok.size(); i++) {
ok[i] = IFailureMonitor::failureMonitor().onStateEqual(
locations[shard].second->get(i, &StorageServerInterface::getKeyValuesStream).getEndpoint(),
FailureStatus(false));
}
// Making this SevWarn means a lot of clutter
if (now() - g_network->networkInfo.newestAlternativesFailure > 1 ||
deterministicRandom()->random01() < 0.01) {
TraceEvent("AllAlternativesFailed")
.detail("Alternatives", locations[shard].second->description());
}
wait(allAlternativesFailedDelay(quorum(ok, 1)));
}
state ReplyPromiseStream<GetKeyValuesStreamReply> replyStream =
locations[shard]
.second->get(useIdx, &StorageServerInterface::getKeyValuesStream)
.getReplyStream(req);
state bool breakAgain = false;
loop {
wait(results->onEmpty());
try {
choose {
when(wait(cx->connectionFileChanged())) {
results->sendError(transaction_too_old());
return Void();
}
when(GetKeyValuesStreamReply _rep = waitNext(replyStream.getFuture())) { rep = _rep; }
}
++cx->transactionPhysicalReadsCompleted;
} catch (Error& e) {
++cx->transactionPhysicalReadsCompleted;
if (e.code() == error_code_broken_promise) {
throw connection_failed();
}
if (e.code() != error_code_end_of_stream) {
throw;
}
rep = GetKeyValuesStreamReply();
}
if (info.debugID.present())
g_traceBatch.addEvent(
"TransactionDebug", info.debugID.get().first(), "NativeAPI.getExactRange.After");
RangeResult output(RangeResultRef(rep.data, rep.more), rep.arena);
int64_t bytes = 0;
for (const KeyValueRef& kv : output) {
bytes += kv.key.size() + kv.value.size();
}
cx->transactionBytesRead += bytes;
cx->transactionKeysRead += output.size();
// If the reply says there is more but we know that we finished the shard, then fix rep.more
if (reverse && output.more && rep.data.size() > 0 &&
output[output.size() - 1].key == locations[shard].first.begin) {
output.more = false;
}
if (output.more) {
if (!rep.data.size()) {
TraceEvent(SevError, "GetRangeStreamError")
.detail("Reason", "More data indicated but no rows present")
.detail("LimitBytes", limits.bytes)
.detail("LimitRows", limits.rows)
.detail("OutputSize", output.size())
.detail("OutputBytes", output.expectedSize())
.detail("BlockSize", rep.data.size())
.detail("BlockBytes", rep.data.expectedSize());
ASSERT(false);
}
TEST(true); // GetKeyValuesStreamReply.more in getRangeStream
// Make next request to the same shard with a beginning key just after the last key returned
if (reverse)
locations[shard].first =
KeyRangeRef(locations[shard].first.begin, output[output.size() - 1].key);
else
locations[shard].first =
KeyRangeRef(keyAfter(output[output.size() - 1].key), locations[shard].first.end);
}
if (locations[shard].first.empty()) {
output.more = false;
}
if (!output.more) {
const KeyRange& range = locations[shard].first;
if (shard == locations.size() - 1) {
KeyRef begin = reverse ? keys.begin : range.end;
KeyRef end = reverse ? range.begin : keys.end;
if (begin >= end) {
if (range.begin == allKeys.begin) {
output.readToBegin = true;
}
if (range.end == allKeys.end) {
output.readThroughEnd = true;
}
output.arena().dependsOn(keys.arena());
output.readThrough = reverse ? keys.begin : keys.end;
results->send(std::move(output));
results->finish();
return Void();
}
keys = KeyRangeRef(begin, end);
breakAgain = true;
} else {
++shard;
}
output.arena().dependsOn(range.arena());
output.readThrough = reverse ? range.begin : range.end;
results->send(std::move(output));
break;
}
ASSERT(output.size());
if (keys.begin == allKeys.begin && !reverse) {
output.readToBegin = true;
}
if (keys.end == allKeys.end && reverse) {
output.readThroughEnd = true;
}
results->send(std::move(output));
}
if (breakAgain) {
break;
}
} catch (Error& e) {
if (e.code() == error_code_actor_cancelled) {
throw;
}
if (e.code() == error_code_wrong_shard_server || e.code() == error_code_all_alternatives_failed ||
e.code() == error_code_connection_failed) {
const KeyRangeRef& range = locations[shard].first;
if (reverse)
keys = KeyRangeRef(keys.begin, range.end);
else
keys = KeyRangeRef(range.begin, keys.end);
cx->invalidateCache(keys);
wait(delay(CLIENT_KNOBS->WRONG_SHARD_SERVER_DELAY, info.taskID));
break;
} else {
results->sendError(e);
return Void();
}
}
}
}
}
ACTOR Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(Database cx, KeyRange keys, int64_t chunkSize);
static KeyRange intersect(KeyRangeRef lhs, KeyRangeRef rhs) {
return KeyRange(KeyRangeRef(std::max(lhs.begin, rhs.begin), std::min(lhs.end, rhs.end)));
}
// Divides the requested key range into 1MB fragments, create range streams for each fragment, and merges the results so
// the client get them in order
ACTOR Future<Void> getRangeStream(PromiseStream<RangeResult> _results,
Database cx,
Reference<TransactionLogInfo> trLogInfo,
Future<Version> fVersion,
KeySelector begin,
KeySelector end,
GetRangeLimits limits,
Promise<std::pair<Key, Key>> conflictRange,
bool snapshot,
bool reverse,
TransactionInfo info,
TagSet tags) {
state ParallelStream<RangeResult> results(_results, CLIENT_KNOBS->RANGESTREAM_BUFFERED_FRAGMENTS_LIMIT);
// FIXME: better handling to disable row limits
ASSERT(!limits.hasRowLimit());
state Span span("NAPI:getRangeStream"_loc, info.spanID);
state Version version = wait(fVersion);
cx->validateVersion(version);
Future<Key> fb = resolveKey(cx, begin, version, info, tags);
state Future<Key> fe = resolveKey(cx, end, version, info, tags);
state Key b = wait(fb);
state Key e = wait(fe);
if (!snapshot) {
// FIXME: this conflict range is too large, and should be updated continously as results are returned
conflictRange.send(std::make_pair(std::min(b, Key(begin.getKey(), begin.arena())),
std::max(e, Key(end.getKey(), end.arena()))));
}
if (b >= e) {
wait(results.finish());
return Void();
}
// if e is allKeys.end, we have read through the end of the database
// if b is allKeys.begin, we have either read through the beginning of the database,
// or allKeys.begin exists in the database and will be part of the conflict range anyways
state std::vector<Future<Void>> outstandingRequests;
while (b < e) {
state pair<KeyRange, Reference<LocationInfo>> ssi =
wait(getKeyLocation(cx, reverse ? e : b, &StorageServerInterface::getKeyValuesStream, info, reverse));
state KeyRange shardIntersection = intersect(ssi.first, KeyRangeRef(b, e));
state Standalone<VectorRef<KeyRef>> splitPoints =
wait(getRangeSplitPoints(cx, shardIntersection, CLIENT_KNOBS->RANGESTREAM_FRAGMENT_SIZE));
state std::vector<KeyRange> toSend;
// state std::vector<Future<std::list<KeyRangeRef>::iterator>> outstandingRequests;
if (!splitPoints.empty()) {
toSend.push_back(KeyRange(KeyRangeRef(shardIntersection.begin, splitPoints.front()), splitPoints.arena()));
for (int i = 0; i < splitPoints.size() - 1; ++i) {
toSend.push_back(KeyRange(KeyRangeRef(splitPoints[i], splitPoints[i + 1]), splitPoints.arena()));
}
toSend.push_back(KeyRange(KeyRangeRef(splitPoints.back(), shardIntersection.end), splitPoints.arena()));
} else {
toSend.push_back(KeyRange(KeyRangeRef(shardIntersection.begin, shardIntersection.end)));
}
state int idx = 0;
state int useIdx = 0;
for (; idx < toSend.size(); ++idx) {
useIdx = reverse ? toSend.size() - idx - 1 : idx;
if (toSend[useIdx].empty()) {
continue;
}
ParallelStream<RangeResult>::Fragment* fragment = wait(results.createFragment());
outstandingRequests.push_back(getRangeStreamFragment(
fragment, cx, trLogInfo, version, toSend[useIdx], limits, snapshot, reverse, info, tags, span.context));
}
if (reverse) {
e = shardIntersection.begin;
} else {
b = shardIntersection.end;
}
}
wait(waitForAll(outstandingRequests) && results.finish());
return Void();
}
Future<RangeResult> getRange(Database const& cx,
Future<Version> const& fVersion,
KeySelector const& begin,
@ -3830,6 +4153,67 @@ Future<RangeResult> Transaction::getRange(const KeySelector& begin,
return getRange(begin, end, GetRangeLimits(limit), snapshot, reverse);
}
// A method for streaming data from the storage server that is more efficient than getRange when reading large amounts
// of data
Future<Void> Transaction::getRangeStream(const PromiseStream<RangeResult>& results,
const KeySelector& begin,
const KeySelector& end,
GetRangeLimits limits,
bool snapshot,
bool reverse) {
++cx->transactionLogicalReads;
++cx->transactionGetRangeStreamRequests;
// FIXME: limits are not implemented yet, and this code has not be tested with reverse=true
ASSERT(!limits.hasByteLimit() && !limits.hasRowLimit() && !reverse);
KeySelector b = begin;
if (b.orEqual) {
TEST(true); // Native stream begin orEqual==true
b.removeOrEqual(b.arena());
}
KeySelector e = end;
if (e.orEqual) {
TEST(true); // Native stream end orEqual==true
e.removeOrEqual(e.arena());
}
if (b.offset >= e.offset && b.getKey() >= e.getKey()) {
TEST(true); // Native stream range inverted
results.sendError(end_of_stream());
return Void();
}
Promise<std::pair<Key, Key>> conflictRange;
if (!snapshot) {
extraConflictRanges.push_back(conflictRange.getFuture());
}
return forwardErrors(::getRangeStream(results,
cx,
trLogInfo,
getReadVersion(),
b,
e,
limits,
conflictRange,
snapshot,
reverse,
info,
options.readTags),
results);
}
Future<Void> Transaction::getRangeStream(const PromiseStream<RangeResult>& results,
const KeySelector& begin,
const KeySelector& end,
int limit,
bool snapshot,
bool reverse) {
return getRangeStream(results, begin, end, GetRangeLimits(limit), snapshot, reverse);
}
void Transaction::addReadConflictRange(KeyRangeRef const& keys) {
ASSERT(!keys.empty());
@ -5106,7 +5490,7 @@ Future<Version> Transaction::getReadVersion(uint32_t flags) {
return readVersion;
}
Optional<Version> Transaction::getCachedReadVersion() {
Optional<Version> Transaction::getCachedReadVersion() const {
if (readVersion.isValid() && readVersion.isReady() && !readVersion.isError()) {
return readVersion.get();
} else {
@ -5546,7 +5930,7 @@ ACTOR Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(Database cx, Key
state vector<pair<KeyRange, Reference<LocationInfo>>> locations =
wait(getKeyRangeLocations(cx,
keys,
100,
CLIENT_KNOBS->TOO_MANY,
false,
&StorageServerInterface::getRangeSplitPoints,
TransactionInfo(TaskPriority::DataDistribution, span.context)));
@ -5671,7 +6055,7 @@ Future<Standalone<VectorRef<KeyRef>>> Transaction::splitStorageMetrics(KeyRange
return ::splitStorageMetrics(cx, keys, limit, estimated);
}
void Transaction::checkDeferredError() {
void Transaction::checkDeferredError() const {
cx->checkDeferredError();
}

View File

@ -246,7 +246,7 @@ public:
void setVersion(Version v);
Future<Version> getReadVersion() { return getReadVersion(0); }
Future<Version> getRawReadVersion();
Optional<Version> getCachedReadVersion();
Optional<Version> getCachedReadVersion() const;
[[nodiscard]] Future<Optional<Value>> get(const Key& key, bool snapshot = false);
[[nodiscard]] Future<Void> watch(Reference<Watch> watch);
@ -283,6 +283,45 @@ public:
reverse);
}
// A method for streaming data from the storage server that is more efficient than getRange when reading large
// amounts of data
[[nodiscard]] Future<Void> getRangeStream(const PromiseStream<Standalone<RangeResultRef>>& results,
const KeySelector& begin,
const KeySelector& end,
int limit,
bool snapshot = false,
bool reverse = false);
[[nodiscard]] Future<Void> getRangeStream(const PromiseStream<Standalone<RangeResultRef>>& results,
const KeySelector& begin,
const KeySelector& end,
GetRangeLimits limits,
bool snapshot = false,
bool reverse = false);
[[nodiscard]] Future<Void> getRangeStream(const PromiseStream<Standalone<RangeResultRef>>& results,
const KeyRange& keys,
int limit,
bool snapshot = false,
bool reverse = false) {
return getRangeStream(results,
KeySelector(firstGreaterOrEqual(keys.begin), keys.arena()),
KeySelector(firstGreaterOrEqual(keys.end), keys.arena()),
limit,
snapshot,
reverse);
}
[[nodiscard]] Future<Void> getRangeStream(const PromiseStream<Standalone<RangeResultRef>>& results,
const KeyRange& keys,
GetRangeLimits limits,
bool snapshot = false,
bool reverse = false) {
return getRangeStream(results,
KeySelector(firstGreaterOrEqual(keys.begin), keys.arena()),
KeySelector(firstGreaterOrEqual(keys.end), keys.arena()),
limits,
snapshot,
reverse);
}
[[nodiscard]] Future<Standalone<VectorRef<const char*>>> getAddressesForKey(const Key& key);
void enableCheckWrites();
@ -320,7 +359,9 @@ public:
void setOption(FDBTransactionOptions::Option option, Optional<StringRef> value = Optional<StringRef>());
Version getCommittedVersion() { return committedVersion; } // May be called only after commit() returns success
Version getCommittedVersion() const {
return committedVersion;
} // May be called only after commit() returns success
[[nodiscard]] Future<Standalone<StringRef>>
getVersionstamp(); // Will be fulfilled only after commit() returns success
@ -352,7 +393,7 @@ public:
int apiVersionAtLeast(int minVersion) const;
void checkDeferredError();
void checkDeferredError() const;
Database getDatabase() const { return cx; }
static Reference<TransactionLogInfo> createTrLogInfoProbabilistically(const Database& cx);

View File

@ -0,0 +1,80 @@
/*
* ParallelStreamCorrectness.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <vector>
#include "fdbclient/ParallelStream.actor.h"
#include "flow/UnitTest.h"
#include "flow/actorcompiler.h" // This must be the last #include.
namespace ParallelStreamTest {
struct TestValue {
int x;
TestValue(int x) : x(x) {}
int expectedSize() const { return sizeof(int); }
};
ACTOR static Future<Void> produce(ParallelStream<ParallelStreamTest::TestValue>::Fragment* fragment,
ParallelStreamTest::TestValue value) {
wait(delay(deterministicRandom()->random01()));
fragment->send(value);
wait(delay(deterministicRandom()->random01()));
fragment->finish();
return Void();
}
ACTOR static Future<Void> consume(FutureStream<ParallelStreamTest::TestValue> stream, int expected) {
state int next;
try {
loop {
ParallelStreamTest::TestValue value = waitNext(stream);
ASSERT(value.x == next++);
}
} catch (Error& e) {
ASSERT(e.code() == error_code_end_of_stream);
ASSERT(next == expected);
return Void();
}
}
} // namespace ParallelStreamTest
TEST_CASE("/fdbclient/ParallelStream") {
state PromiseStream<ParallelStreamTest::TestValue> results;
state size_t bufferLimit = deterministicRandom()->randomInt(0, 21);
state size_t numProducers = deterministicRandom()->randomInt(1, 1001);
state ParallelStream<ParallelStreamTest::TestValue> parallelStream(results, bufferLimit);
state Future<Void> consumer = ParallelStreamTest::consume(results.getFuture(), numProducers);
state std::vector<Future<Void>> producers;
TraceEvent("StartingParallelStreamTest")
.detail("BufferLimit", bufferLimit)
.detail("NumProducers", numProducers);
state int i = 0;
for (; i < numProducers; ++i) {
ParallelStream<ParallelStreamTest::TestValue>::Fragment* fragment = wait(parallelStream.createFragment());
producers.push_back(ParallelStreamTest::produce(fragment, ParallelStreamTest::TestValue(i)));
}
wait(parallelStream.finish());
wait(consumer);
return Void();
}
void forceLinkParallelStreamTests() {}

View File

@ -0,0 +1,130 @@
/*
* ParallelStream.actor.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
// When actually compiled (NO_INTELLISENSE), include the generated version of this file. In intellisense use the source
// version.
#if defined(NO_INTELLISENSE) && !defined(FDBCLIENT_PARALLEL_STREAM_ACTOR_G_H)
#define FDBCLIENT_PARALLEL_STREAM_ACTOR_G_H
#include "fdbclient/ParallelStream.actor.g.h"
#elif !defined(FDBCLIENT_PARALLEL_STREAM_ACTOR_H)
#define FDBCLIENT_PARALLEL_STREAM_ACTOR_H
#include "flow/genericactors.actor.h"
#include "flow/actorcompiler.h" // must be last include
// ParallelStream is used to fetch data from multiple streams in parallel and then merge them back into a single stream
// in order.
template <class T>
class ParallelStream {
BoundedFlowLock semaphore;
struct FragmentConstructorTag {
explicit FragmentConstructorTag() = default;
};
public:
// A Fragment is a single stream that will get results to be merged back into the main output stream
class Fragment : public ReferenceCounted<Fragment> {
ParallelStream* parallelStream;
PromiseStream<T> stream;
BoundedFlowLock::Releaser releaser;
friend class ParallelStream;
public:
Fragment(ParallelStream* parallelStream, int64_t permitNumber, FragmentConstructorTag)
: parallelStream(parallelStream), releaser(&parallelStream->semaphore, permitNumber) {}
template <class U>
void send(U&& value) {
stream.send(std::forward<U>(value));
}
void sendError(Error e) { stream.sendError(e); }
void finish() {
releaser.release(); // Release before destruction to free up pending fragments
stream.sendError(end_of_stream());
}
Future<Void> onEmpty() { return stream.onEmpty(); }
};
private:
PromiseStream<Reference<Fragment>> fragments;
size_t fragmentsProcessed{ 0 };
PromiseStream<T> results;
Future<Void> flusher;
public:
// A background actor which take results from the oldest fragment and sends them to the main output stream
ACTOR static Future<Void> flushToClient(ParallelStream<T>* self) {
state const int messagesBetweenYields = 1000;
state int messagesSinceYield = 0;
try {
loop {
state Reference<Fragment> fragment = waitNext(self->fragments.getFuture());
loop {
try {
wait(self->results.onEmpty());
T value = waitNext(fragment->stream.getFuture());
self->results.send(value);
if (++messagesSinceYield == messagesBetweenYields) {
wait(yield());
messagesSinceYield = 0;
}
} catch (Error& e) {
if (e.code() == error_code_end_of_stream) {
fragment.clear();
break;
} else {
throw e;
}
}
}
}
} catch (Error& e) {
if (e.code() == error_code_actor_cancelled) {
throw;
}
self->results.sendError(e);
return Void();
}
}
ParallelStream(PromiseStream<T> results, size_t bufferLimit) : results(results), semaphore(1, bufferLimit) {
flusher = flushToClient(this);
}
// Creates a fragment to get merged into the main output stream
ACTOR static Future<Fragment*> createFragmentImpl(ParallelStream<T>* self) {
int64_t permitNumber = wait(self->semaphore.take());
auto fragment = makeReference<Fragment>(self, permitNumber, FragmentConstructorTag());
self->fragments.send(fragment);
return fragment.getPtr();
}
Future<Fragment*> createFragment() { return createFragmentImpl(this); }
Future<Void> finish() {
fragments.sendError(end_of_stream());
return flusher;
}
};
#include "flow/unactorcompiler.h"
#endif

View File

@ -0,0 +1,130 @@
/*
* PaxosConfigTransaction.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/PaxosConfigTransaction.h"
#include "flow/actorcompiler.h" // must be last include
class PaxosConfigTransactionImpl {};
Future<Version> PaxosConfigTransaction::getReadVersion() {
// TODO: Implement
return ::invalidVersion;
}
Optional<Version> PaxosConfigTransaction::getCachedReadVersion() const {
// TODO: Implement
return ::invalidVersion;
}
Future<Optional<Value>> PaxosConfigTransaction::get(Key const& key, bool snapshot) {
// TODO: Implement
return Optional<Value>{};
}
Future<Standalone<RangeResultRef>> PaxosConfigTransaction::getRange(KeySelector const& begin,
KeySelector const& end,
int limit,
bool snapshot,
bool reverse) {
// TODO: Implement
ASSERT(false);
return Standalone<RangeResultRef>{};
}
Future<Standalone<RangeResultRef>> PaxosConfigTransaction::getRange(KeySelector begin,
KeySelector end,
GetRangeLimits limits,
bool snapshot,
bool reverse) {
// TODO: Implememnt
ASSERT(false);
return Standalone<RangeResultRef>{};
}
void PaxosConfigTransaction::set(KeyRef const& key, ValueRef const& value) {
// TODO: Implememnt
ASSERT(false);
}
void PaxosConfigTransaction::clear(KeyRef const& key) {
// TODO: Implememnt
ASSERT(false);
}
Future<Void> PaxosConfigTransaction::commit() {
// TODO: Implememnt
ASSERT(false);
return Void();
}
Version PaxosConfigTransaction::getCommittedVersion() const {
// TODO: Implement
ASSERT(false);
return ::invalidVersion;
}
int64_t PaxosConfigTransaction::getApproximateSize() const {
// TODO: Implement
ASSERT(false);
return 0;
}
void PaxosConfigTransaction::setOption(FDBTransactionOptions::Option option, Optional<StringRef> value) {
// TODO: Implement
ASSERT(false);
}
Future<Void> PaxosConfigTransaction::onError(Error const& e) {
// TODO: Implement
ASSERT(false);
return Void();
}
void PaxosConfigTransaction::cancel() {
// TODO: Implement
ASSERT(false);
}
void PaxosConfigTransaction::reset() {
// TODO: Implement
ASSERT(false);
}
void PaxosConfigTransaction::fullReset() {
// TODO: Implement
ASSERT(false);
}
void PaxosConfigTransaction::debugTransaction(UID dID) {
// TODO: Implement
ASSERT(false);
}
void PaxosConfigTransaction::checkDeferredError() const {
// TODO: Implement
ASSERT(false);
}
PaxosConfigTransaction::PaxosConfigTransaction(Database const& cx) {
// TODO: Implement
ASSERT(false);
}
PaxosConfigTransaction::~PaxosConfigTransaction() = default;

View File

@ -0,0 +1,66 @@
/*
* PaxosConfigTransaction.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include <memory>
#include "fdbclient/IConfigTransaction.h"
/*
* Fault-tolerant configuration transaction implementation
*/
class PaxosConfigTransaction final : public IConfigTransaction, public FastAllocated<PaxosConfigTransaction> {
std::unique_ptr<class PaxosConfigTransactionImpl> _impl;
PaxosConfigTransactionImpl const& impl() const { return *_impl; }
PaxosConfigTransactionImpl& impl() { return *_impl; }
public:
PaxosConfigTransaction(Database const&);
~PaxosConfigTransaction();
Future<Version> getReadVersion() override;
Optional<Version> getCachedReadVersion() const override;
Future<Optional<Value>> get(Key const& key, bool snapshot = false) override;
Future<Standalone<RangeResultRef>> getRange(KeySelector const& begin,
KeySelector const& end,
int limit,
bool snapshot = false,
bool reverse = false) override;
Future<Standalone<RangeResultRef>> getRange(KeySelector begin,
KeySelector end,
GetRangeLimits limits,
bool snapshot = false,
bool reverse = false) override;
void set(KeyRef const& key, ValueRef const& value) override;
void clear(KeyRangeRef const&) override { throw client_invalid_operation(); }
void clear(KeyRef const&) override;
Future<Void> commit() override;
Version getCommittedVersion() const override;
int64_t getApproximateSize() const override;
void setOption(FDBTransactionOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override;
Future<Void> onError(Error const& e) override;
void cancel() override;
void reset() override;
void debugTransaction(UID dID) override;
void checkDeferredError() const override;
void fullReset();
};

View File

@ -1280,8 +1280,8 @@ public:
};
ReadYourWritesTransaction::ReadYourWritesTransaction(Database const& cx)
: cache(&arena), writes(&arena), tr(cx), retries(0), approximateSize(0), creationTime(now()), commitStarted(false),
options(tr), deferredError(cx->deferredError), versionStampFuture(tr.getVersionstamp()),
: ISingleThreadTransaction(cx->deferredError), cache(&arena), writes(&arena), tr(cx), retries(0), approximateSize(0),
creationTime(now()), commitStarted(false), options(tr), versionStampFuture(tr.getVersionstamp()),
specialKeySpaceWriteMap(std::make_pair(false, Optional<Value>()), specialKeys.end) {
std::copy(
cx.getTransactionDefaults().begin(), cx.getTransactionDefaults().end(), std::back_inserter(persistentOptions));
@ -1729,6 +1729,10 @@ void ReadYourWritesTransaction::getWriteConflicts(KeyRangeMap<bool>* result) {
}
}
void ReadYourWritesTransaction::preinitializeOnForeignThread() {
tr.preinitializeOnForeignThread();
}
void ReadYourWritesTransaction::setTransactionID(uint64_t id) {
tr.setTransactionID(id);
}
@ -2274,11 +2278,10 @@ void ReadYourWritesTransaction::operator=(ReadYourWritesTransaction&& r) noexcep
}
ReadYourWritesTransaction::ReadYourWritesTransaction(ReadYourWritesTransaction&& r) noexcept
: cache(std::move(r.cache)), writes(std::move(r.writes)), arena(std::move(r.arena)), reading(std::move(r.reading)),
retries(r.retries), approximateSize(r.approximateSize), creationTime(r.creationTime),
deferredError(std::move(r.deferredError)), timeoutActor(std::move(r.timeoutActor)),
resetPromise(std::move(r.resetPromise)), commitStarted(r.commitStarted), options(r.options),
transactionDebugInfo(r.transactionDebugInfo) {
: ISingleThreadTransaction(std::move(r.deferredError)), cache(std::move(r.cache)), writes(std::move(r.writes)),
arena(std::move(r.arena)), reading(std::move(r.reading)), retries(r.retries), approximateSize(r.approximateSize),
creationTime(r.creationTime), timeoutActor(std::move(r.timeoutActor)), resetPromise(std::move(r.resetPromise)),
commitStarted(r.commitStarted), options(r.options), transactionDebugInfo(r.transactionDebugInfo) {
cache.arena = &arena;
writes.arena = &arena;
tr = std::move(r.tr);

View File

@ -25,6 +25,7 @@
#include "fdbclient/NativeAPI.actor.h"
#include "fdbclient/KeyRangeMap.h"
#include "fdbclient/RYWIterator.h"
#include "fdbclient/ISingleThreadTransaction.h"
#include <list>
// SOMEDAY: Optimize getKey to avoid using getRange
@ -61,35 +62,31 @@ struct TransactionDebugInfo : public ReferenceCounted<TransactionDebugInfo> {
// keeping a reference to a value longer than its creating transaction would hold all of the memory generated by the
// transaction
class ReadYourWritesTransaction final : NonCopyable,
public ReferenceCounted<ReadYourWritesTransaction>,
public ISingleThreadTransaction,
public FastAllocated<ReadYourWritesTransaction> {
public:
static ReadYourWritesTransaction* allocateOnForeignThread() {
ReadYourWritesTransaction* tr =
(ReadYourWritesTransaction*)ReadYourWritesTransaction::operator new(sizeof(ReadYourWritesTransaction));
tr->tr.preinitializeOnForeignThread();
return tr;
}
explicit ReadYourWritesTransaction(Database const& cx);
~ReadYourWritesTransaction();
void setVersion(Version v) { tr.setVersion(v); }
Future<Version> getReadVersion();
Optional<Version> getCachedReadVersion() { return tr.getCachedReadVersion(); }
Future<Optional<Value>> get(const Key& key, bool snapshot = false);
Future<Key> getKey(const KeySelector& key, bool snapshot = false);
Future<RangeResult> getRange(const KeySelector& begin,
const KeySelector& end,
int limit,
bool snapshot = false,
bool reverse = false);
Future<RangeResult> getRange(KeySelector begin,
KeySelector end,
GetRangeLimits limits,
bool snapshot = false,
bool reverse = false);
Future<RangeResult> getRange(const KeyRange& keys, int limit, bool snapshot = false, bool reverse = false) {
void setVersion(Version v) override { tr.setVersion(v); }
Future<Version> getReadVersion() override;
Optional<Version> getCachedReadVersion() const override { return tr.getCachedReadVersion(); }
Future<Optional<Value>> get(const Key& key, bool snapshot = false) override;
Future<Key> getKey(const KeySelector& key, bool snapshot = false) override;
Future<Standalone<RangeResultRef>> getRange(const KeySelector& begin,
const KeySelector& end,
int limit,
bool snapshot = false,
bool reverse = false) override;
Future<Standalone<RangeResultRef>> getRange(KeySelector begin,
KeySelector end,
GetRangeLimits limits,
bool snapshot = false,
bool reverse = false) override;
Future<Standalone<RangeResultRef>> getRange(const KeyRange& keys,
int limit,
bool snapshot = false,
bool reverse = false) {
return getRange(KeySelector(firstGreaterOrEqual(keys.begin), keys.arena()),
KeySelector(firstGreaterOrEqual(keys.end), keys.arena()),
limit,
@ -107,39 +104,39 @@ public:
reverse);
}
[[nodiscard]] Future<Standalone<VectorRef<const char*>>> getAddressesForKey(const Key& key);
Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRange& range, int64_t chunkSize);
Future<int64_t> getEstimatedRangeSizeBytes(const KeyRange& keys);
[[nodiscard]] Future<Standalone<VectorRef<const char*>>> getAddressesForKey(const Key& key) override;
Future<Standalone<VectorRef<KeyRef>>> getRangeSplitPoints(const KeyRange& range, int64_t chunkSize) override;
Future<int64_t> getEstimatedRangeSizeBytes(const KeyRange& keys) override;
void addReadConflictRange(KeyRangeRef const& keys);
void makeSelfConflicting() { tr.makeSelfConflicting(); }
void addReadConflictRange(KeyRangeRef const& keys) override;
void makeSelfConflicting() override { tr.makeSelfConflicting(); }
void atomicOp(const KeyRef& key, const ValueRef& operand, uint32_t operationType);
void set(const KeyRef& key, const ValueRef& value);
void clear(const KeyRangeRef& range);
void clear(const KeyRef& key);
void atomicOp(const KeyRef& key, const ValueRef& operand, uint32_t operationType) override;
void set(const KeyRef& key, const ValueRef& value) override;
void clear(const KeyRangeRef& range) override;
void clear(const KeyRef& key) override;
[[nodiscard]] Future<Void> watch(const Key& key);
[[nodiscard]] Future<Void> watch(const Key& key) override;
void addWriteConflictRange(KeyRangeRef const& keys);
void addWriteConflictRange(KeyRangeRef const& keys) override;
[[nodiscard]] Future<Void> commit();
Version getCommittedVersion() { return tr.getCommittedVersion(); }
int64_t getApproximateSize() { return approximateSize; }
[[nodiscard]] Future<Standalone<StringRef>> getVersionstamp();
[[nodiscard]] Future<Void> commit() override;
Version getCommittedVersion() const override { return tr.getCommittedVersion(); }
int64_t getApproximateSize() const override { return approximateSize; }
[[nodiscard]] Future<Standalone<StringRef>> getVersionstamp() override;
void setOption(FDBTransactionOptions::Option option, Optional<StringRef> value = Optional<StringRef>());
void setOption(FDBTransactionOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override;
[[nodiscard]] Future<Void> onError(Error const& e);
[[nodiscard]] Future<Void> onError(Error const& e) override;
// These are to permit use as state variables in actors:
ReadYourWritesTransaction() : cache(&arena), writes(&arena) {}
void operator=(ReadYourWritesTransaction&& r) noexcept;
ReadYourWritesTransaction(ReadYourWritesTransaction&& r) noexcept;
void cancel();
void reset();
void debugTransaction(UID dID) { tr.debugTransaction(dID); }
void cancel() override;
void reset() override;
void debugTransaction(UID dID) override { tr.debugTransaction(dID); }
Future<Void> debug_onIdle() { return reading; }
@ -148,16 +145,15 @@ public:
// Throws before the lifetime of this transaction ends
Future<Void> resetFuture() { return resetPromise.getFuture(); }
// Used by ThreadSafeTransaction for exceptions thrown in void methods
Error deferredError;
void checkDeferredError() {
void checkDeferredError() const override {
tr.checkDeferredError();
if (deferredError.code() != invalid_error_code)
throw deferredError;
}
void getWriteConflicts(KeyRangeMap<bool>* result);
void getWriteConflicts(KeyRangeMap<bool>* result) override;
void preinitializeOnForeignThread();
Database getDatabase() const { return tr.getDatabase(); }

View File

@ -0,0 +1,52 @@
/*
* ServerKnobCollection.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/ServerKnobCollection.h"
ServerKnobCollection::ServerKnobCollection(Randomize randomize, IsSimulated isSimulated)
: clientKnobCollection(randomize, isSimulated),
serverKnobs(randomize, &clientKnobCollection.getMutableClientKnobs(), isSimulated) {}
void ServerKnobCollection::initialize(Randomize randomize, IsSimulated isSimulated) {
clientKnobCollection.initialize(randomize, isSimulated);
serverKnobs.initialize(randomize, &clientKnobCollection.getMutableClientKnobs(), isSimulated);
}
void ServerKnobCollection::reset(Randomize randomize, IsSimulated isSimulated) {
clientKnobCollection.reset(randomize, isSimulated);
serverKnobs.reset(randomize, &clientKnobCollection.getMutableClientKnobs(), isSimulated);
}
Optional<KnobValue> ServerKnobCollection::tryParseKnobValue(std::string const& knobName,
std::string const& knobValue) const {
auto result = clientKnobCollection.tryParseKnobValue(knobName, knobValue);
if (result.present()) {
return result;
}
auto parsedKnobValue = serverKnobs.parseKnobValue(knobName, knobValue);
if (!std::holds_alternative<NoKnobFound>(parsedKnobValue)) {
return KnobValueRef::create(parsedKnobValue);
}
return {};
}
bool ServerKnobCollection::trySetKnob(std::string const& knobName, KnobValueRef const& knobValue) {
return clientKnobCollection.trySetKnob(knobName, knobValue) || knobValue.visitSetKnob(knobName, serverKnobs);
}

View File

@ -0,0 +1,44 @@
/*
* ServerKnobCollection.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/ClientKnobCollection.h"
#include "fdbclient/IKnobCollection.h"
/*
* Stores both flow knobs, client knobs, and server knobs. Attempting to access test knobs
* results in a run-time error
*/
class ServerKnobCollection : public IKnobCollection {
ClientKnobCollection clientKnobCollection;
ServerKnobs serverKnobs;
public:
ServerKnobCollection(Randomize randomize, IsSimulated isSimulated);
void initialize(Randomize randomize, IsSimulated isSimulated) override;
void reset(Randomize randomize, IsSimulated isSimulated) override;
FlowKnobs const& getFlowKnobs() const override { return clientKnobCollection.getFlowKnobs(); }
ClientKnobs const& getClientKnobs() const override { return clientKnobCollection.getClientKnobs(); }
ServerKnobs const& getServerKnobs() const override { return serverKnobs; }
TestKnobs const& getTestKnobs() const override { throw internal_error(); }
Optional<KnobValue> tryParseKnobValue(std::string const& knobName, std::string const& knobValue) const override;
bool trySetKnob(std::string const& knobName, KnobValueRef const& knobValue) override;
};

View File

@ -1,5 +1,5 @@
/*
* Knobs.cpp
* ServerKnobs.cpp
*
* This source file is part of the FoundationDB open source project
*
@ -18,20 +18,17 @@
* limitations under the License.
*/
#include "fdbserver/Knobs.h"
#include "fdbrpc/Locality.h"
#include <cmath>
std::unique_ptr<ServerKnobs> globalServerKnobs = std::make_unique<ServerKnobs>();
ServerKnobs const* SERVER_KNOBS = globalServerKnobs.get();
#include "fdbclient/ServerKnobs.h"
#define init(knob, value) initKnob(knob, value, #knob)
ServerKnobs::ServerKnobs() {
initialize();
ServerKnobs::ServerKnobs(Randomize randomize, ClientKnobs* clientKnobs, IsSimulated isSimulated) {
initialize(randomize, clientKnobs, isSimulated);
}
void ServerKnobs::initialize(bool randomize, ClientKnobs* clientKnobs, bool isSimulated) {
void ServerKnobs::initialize(Randomize _randomize, ClientKnobs* clientKnobs, IsSimulated _isSimulated) {
bool const randomize = _randomize == Randomize::YES;
bool const isSimulated = _isSimulated == IsSimulated::YES;
// clang-format off
// Versions
init( VERSIONS_PER_SECOND, 1e6 );
@ -580,11 +577,14 @@ void ServerKnobs::initialize(bool randomize, ClientKnobs* clientKnobs, bool isSi
init( FUTURE_VERSION_DELAY, 1.0 );
init( STORAGE_LIMIT_BYTES, 500000 );
init( BUGGIFY_LIMIT_BYTES, 1000 );
init( FETCH_USING_STREAMING, true ); if( randomize && BUGGIFY ) FETCH_USING_STREAMING = false; //Determines if fetch keys uses streaming reads
init( FETCH_BLOCK_BYTES, 2e6 );
init( FETCH_KEYS_PARALLELISM_BYTES, 4e6 ); if( randomize && BUGGIFY ) FETCH_KEYS_PARALLELISM_BYTES = 3e6;
init( FETCH_KEYS_PARALLELISM, 2 );
init( FETCH_KEYS_LOWER_PRIORITY, 0 );
init( BUGGIFY_BLOCK_BYTES, 10000 );
init( STORAGE_COMMIT_BYTES, 10000000 ); if( randomize && BUGGIFY ) STORAGE_COMMIT_BYTES = 2000000;
init( STORAGE_FETCH_BYTES, 2500000 ); if( randomize && BUGGIFY ) STORAGE_FETCH_BYTES = 500000;
init( STORAGE_DURABILITY_LAG_REJECT_THRESHOLD, 0.25 );
init( STORAGE_DURABILITY_LAG_MIN_RATE, 0.1 );
init( STORAGE_COMMIT_INTERVAL, 0.5 ); if( randomize && BUGGIFY ) STORAGE_COMMIT_INTERVAL = 2.0;
@ -611,6 +611,7 @@ void ServerKnobs::initialize(bool randomize, ClientKnobs* clientKnobs, bool isSi
init( DD_METRICS_REPORT_INTERVAL, 30.0 );
init( FETCH_KEYS_TOO_LONG_TIME_CRITERIA, 300.0 );
init( MAX_STORAGE_COMMIT_TIME, 120.0 ); //The max fsync stall time on the storage server and tlog before marking a disk as failed
init( RANGESTREAM_LIMIT_BYTES, 2e6 ); if( randomize && BUGGIFY ) RANGESTREAM_LIMIT_BYTES = 1;
//Wait Failure
init( MAX_OUTSTANDING_WAIT_FAILURE_REQUESTS, 250 ); if( randomize && BUGGIFY ) MAX_OUTSTANDING_WAIT_FAILURE_REQUESTS = 2;

679
fdbclient/ServerKnobs.h Normal file
View File

@ -0,0 +1,679 @@
/*
* ServerKnobs.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "flow/Knobs.h"
#include "fdbrpc/fdbrpc.h"
#include "fdbrpc/Locality.h"
#include "fdbclient/ClientKnobs.h"
// Disk queue
static constexpr int _PAGE_SIZE = 4096;
class ServerKnobs : public KnobsImpl<ServerKnobs> {
public:
// Versions
int64_t VERSIONS_PER_SECOND;
int64_t MAX_VERSIONS_IN_FLIGHT;
int64_t MAX_VERSIONS_IN_FLIGHT_FORCED;
int64_t MAX_READ_TRANSACTION_LIFE_VERSIONS;
int64_t MAX_WRITE_TRANSACTION_LIFE_VERSIONS;
double MAX_COMMIT_BATCH_INTERVAL; // Each commit proxy generates a CommitTransactionBatchRequest at least this
// often, so that versions always advance smoothly
// TLogs
double TLOG_TIMEOUT; // tlog OR commit proxy failure - master's reaction time
double TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS; // Warns if a tlog takes too long to rejoin
double RECOVERY_TLOG_SMART_QUORUM_DELAY; // smaller might be better for bug amplification
double TLOG_STORAGE_MIN_UPDATE_INTERVAL;
double BUGGIFY_TLOG_STORAGE_MIN_UPDATE_INTERVAL;
int DESIRED_TOTAL_BYTES;
int DESIRED_UPDATE_BYTES;
double UPDATE_DELAY;
int MAXIMUM_PEEK_BYTES;
int APPLY_MUTATION_BYTES;
int RECOVERY_DATA_BYTE_LIMIT;
int BUGGIFY_RECOVERY_DATA_LIMIT;
double LONG_TLOG_COMMIT_TIME;
int64_t LARGE_TLOG_COMMIT_BYTES;
double BUGGIFY_RECOVER_MEMORY_LIMIT;
double BUGGIFY_WORKER_REMOVED_MAX_LAG;
int64_t UPDATE_STORAGE_BYTE_LIMIT;
int64_t REFERENCE_SPILL_UPDATE_STORAGE_BYTE_LIMIT;
double TLOG_PEEK_DELAY;
int LEGACY_TLOG_UPGRADE_ENTRIES_PER_VERSION;
int VERSION_MESSAGES_OVERHEAD_FACTOR_1024THS; // Multiplicative factor to bound total space used to store a version
// message (measured in 1/1024ths, e.g. a value of 2048 yields a
// factor of 2).
int64_t VERSION_MESSAGES_ENTRY_BYTES_WITH_OVERHEAD;
double TLOG_MESSAGE_BLOCK_OVERHEAD_FACTOR;
int64_t TLOG_MESSAGE_BLOCK_BYTES;
int64_t MAX_MESSAGE_SIZE;
int LOG_SYSTEM_PUSHED_DATA_BLOCK_SIZE;
double PEEK_TRACKER_EXPIRATION_TIME;
int PARALLEL_GET_MORE_REQUESTS;
int MULTI_CURSOR_PRE_FETCH_LIMIT;
int64_t MAX_QUEUE_COMMIT_BYTES;
int DESIRED_OUTSTANDING_MESSAGES;
double DESIRED_GET_MORE_DELAY;
int CONCURRENT_LOG_ROUTER_READS;
int LOG_ROUTER_PEEK_FROM_SATELLITES_PREFERRED; // 0==peek from primary, non-zero==peek from satellites
double DISK_QUEUE_ADAPTER_MIN_SWITCH_TIME;
double DISK_QUEUE_ADAPTER_MAX_SWITCH_TIME;
int64_t TLOG_SPILL_REFERENCE_MAX_PEEK_MEMORY_BYTES;
int64_t TLOG_SPILL_REFERENCE_MAX_BATCHES_PER_PEEK;
int64_t TLOG_SPILL_REFERENCE_MAX_BYTES_PER_BATCH;
int64_t DISK_QUEUE_FILE_EXTENSION_BYTES; // When we grow the disk queue, by how many bytes should it grow?
int64_t DISK_QUEUE_FILE_SHRINK_BYTES; // When we shrink the disk queue, by how many bytes should it shrink?
int64_t DISK_QUEUE_MAX_TRUNCATE_BYTES; // A truncate larger than this will cause the file to be replaced instead.
double TLOG_DEGRADED_DURATION;
int64_t MAX_CACHE_VERSIONS;
double TXS_POPPED_MAX_DELAY;
double TLOG_MAX_CREATE_DURATION;
int PEEK_LOGGING_AMOUNT;
double PEEK_LOGGING_DELAY;
double PEEK_RESET_INTERVAL;
double PEEK_MAX_LATENCY;
bool PEEK_COUNT_SMALL_MESSAGES;
double PEEK_STATS_INTERVAL;
double PEEK_STATS_SLOW_AMOUNT;
double PEEK_STATS_SLOW_RATIO;
double PUSH_RESET_INTERVAL;
double PUSH_MAX_LATENCY;
double PUSH_STATS_INTERVAL;
double PUSH_STATS_SLOW_AMOUNT;
double PUSH_STATS_SLOW_RATIO;
int TLOG_POP_BATCH_SIZE;
// Data distribution queue
double HEALTH_POLL_TIME;
double BEST_TEAM_STUCK_DELAY;
double BG_REBALANCE_POLLING_INTERVAL;
double BG_REBALANCE_SWITCH_CHECK_INTERVAL;
double DD_QUEUE_LOGGING_INTERVAL;
double RELOCATION_PARALLELISM_PER_SOURCE_SERVER;
int DD_QUEUE_MAX_KEY_SERVERS;
int DD_REBALANCE_PARALLELISM;
int DD_REBALANCE_RESET_AMOUNT;
double BG_DD_MAX_WAIT;
double BG_DD_MIN_WAIT;
double BG_DD_INCREASE_RATE;
double BG_DD_DECREASE_RATE;
double BG_DD_SATURATION_DELAY;
double INFLIGHT_PENALTY_HEALTHY;
double INFLIGHT_PENALTY_REDUNDANT;
double INFLIGHT_PENALTY_UNHEALTHY;
double INFLIGHT_PENALTY_ONE_LEFT;
bool USE_OLD_NEEDED_SERVERS;
// Higher priorities are executed first
// Priority/100 is the "priority group"/"superpriority". Priority inversion
// is possible within but not between priority groups; fewer priority groups
// mean better worst case time bounds
// Maximum allowable priority is 999.
int PRIORITY_RECOVER_MOVE;
int PRIORITY_REBALANCE_UNDERUTILIZED_TEAM;
int PRIORITY_REBALANCE_OVERUTILIZED_TEAM;
int PRIORITY_PERPETUAL_STORAGE_WIGGLE;
int PRIORITY_TEAM_HEALTHY;
int PRIORITY_TEAM_CONTAINS_UNDESIRED_SERVER;
int PRIORITY_TEAM_REDUNDANT;
int PRIORITY_MERGE_SHARD;
int PRIORITY_POPULATE_REGION;
int PRIORITY_TEAM_UNHEALTHY;
int PRIORITY_TEAM_2_LEFT;
int PRIORITY_TEAM_1_LEFT;
int PRIORITY_TEAM_FAILED; // Priority when a server in the team is excluded as failed
int PRIORITY_TEAM_0_LEFT;
int PRIORITY_SPLIT_SHARD;
// Data distribution
double RETRY_RELOCATESHARD_DELAY;
double DATA_DISTRIBUTION_FAILURE_REACTION_TIME;
int MIN_SHARD_BYTES, SHARD_BYTES_RATIO, SHARD_BYTES_PER_SQRT_BYTES, MAX_SHARD_BYTES, KEY_SERVER_SHARD_BYTES;
int64_t SHARD_MAX_BYTES_PER_KSEC, // Shards with more than this bandwidth will be split immediately
SHARD_MIN_BYTES_PER_KSEC, // Shards with more than this bandwidth will not be merged
SHARD_SPLIT_BYTES_PER_KSEC; // When splitting a shard, it is split into pieces with less than this bandwidth
double SHARD_MAX_READ_DENSITY_RATIO;
int64_t SHARD_READ_HOT_BANDWITH_MIN_PER_KSECONDS;
double SHARD_MAX_BYTES_READ_PER_KSEC_JITTER;
double STORAGE_METRIC_TIMEOUT;
double METRIC_DELAY;
double ALL_DATA_REMOVED_DELAY;
double INITIAL_FAILURE_REACTION_DELAY;
double CHECK_TEAM_DELAY;
double LOG_ON_COMPLETION_DELAY;
int BEST_TEAM_MAX_TEAM_TRIES;
int BEST_TEAM_OPTION_COUNT;
int BEST_OF_AMT;
double SERVER_LIST_DELAY;
double RECRUITMENT_IDLE_DELAY;
double STORAGE_RECRUITMENT_DELAY;
bool TSS_HACK_IDENTITY_MAPPING;
double TSS_RECRUITMENT_TIMEOUT;
double TSS_DD_CHECK_INTERVAL;
double DATA_DISTRIBUTION_LOGGING_INTERVAL;
double DD_ENABLED_CHECK_DELAY;
double DD_STALL_CHECK_DELAY;
double DD_LOW_BANDWIDTH_DELAY;
double DD_MERGE_COALESCE_DELAY;
double STORAGE_METRICS_POLLING_DELAY;
double STORAGE_METRICS_RANDOM_DELAY;
double AVAILABLE_SPACE_RATIO_CUTOFF;
int DESIRED_TEAMS_PER_SERVER;
int MAX_TEAMS_PER_SERVER;
int64_t DD_SHARD_SIZE_GRANULARITY;
int64_t DD_SHARD_SIZE_GRANULARITY_SIM;
int DD_MOVE_KEYS_PARALLELISM;
int DD_FETCH_SOURCE_PARALLELISM;
int DD_MERGE_LIMIT;
double DD_SHARD_METRICS_TIMEOUT;
int64_t DD_LOCATION_CACHE_SIZE;
double MOVEKEYS_LOCK_POLLING_DELAY;
double DEBOUNCE_RECRUITING_DELAY;
int REBALANCE_MAX_RETRIES;
int DD_OVERLAP_PENALTY;
int DD_EXCLUDE_MIN_REPLICAS;
bool DD_VALIDATE_LOCALITY;
int DD_CHECK_INVALID_LOCALITY_DELAY;
bool DD_ENABLE_VERBOSE_TRACING;
int64_t
DD_SS_FAILURE_VERSIONLAG; // Allowed SS version lag from the current read version before marking it as failed.
int64_t DD_SS_ALLOWED_VERSIONLAG; // SS will be marked as healthy if it's version lag goes below this value.
double DD_SS_STUCK_TIME_LIMIT; // If a storage server is not getting new versions for this amount of time, then it
// becomes undesired.
int DD_TEAMS_INFO_PRINT_INTERVAL;
int DD_TEAMS_INFO_PRINT_YIELD_COUNT;
int DD_TEAM_ZERO_SERVER_LEFT_LOG_DELAY;
int DD_STORAGE_WIGGLE_PAUSE_THRESHOLD; // How many unhealthy relocations are ongoing will pause storage wiggle
// TeamRemover to remove redundant teams
bool TR_FLAG_DISABLE_MACHINE_TEAM_REMOVER; // disable the machineTeamRemover actor
double TR_REMOVE_MACHINE_TEAM_DELAY; // wait for the specified time before try to remove next machine team
bool TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS; // guard to select which machineTeamRemover logic to use
bool TR_FLAG_DISABLE_SERVER_TEAM_REMOVER; // disable the serverTeamRemover actor
double TR_REMOVE_SERVER_TEAM_DELAY; // wait for the specified time before try to remove next server team
double TR_REMOVE_SERVER_TEAM_EXTRA_DELAY; // serverTeamRemover waits for the delay and check DD healthyness again to
// ensure it runs after machineTeamRemover
// Remove wrong storage engines
double DD_REMOVE_STORE_ENGINE_DELAY; // wait for the specified time before remove the next batch
double DD_FAILURE_TIME;
double DD_ZERO_HEALTHY_TEAM_DELAY;
// Redwood Storage Engine
int PREFIX_TREE_IMMEDIATE_KEY_SIZE_LIMIT;
int PREFIX_TREE_IMMEDIATE_KEY_SIZE_MIN;
// KeyValueStore SQLITE
int CLEAR_BUFFER_SIZE;
double READ_VALUE_TIME_ESTIMATE;
double READ_RANGE_TIME_ESTIMATE;
double SET_TIME_ESTIMATE;
double CLEAR_TIME_ESTIMATE;
double COMMIT_TIME_ESTIMATE;
int CHECK_FREE_PAGE_AMOUNT;
double DISK_METRIC_LOGGING_INTERVAL;
int64_t SOFT_HEAP_LIMIT;
int SQLITE_PAGE_SCAN_ERROR_LIMIT;
int SQLITE_BTREE_PAGE_USABLE;
int SQLITE_BTREE_CELL_MAX_LOCAL;
int SQLITE_BTREE_CELL_MIN_LOCAL;
int SQLITE_FRAGMENT_PRIMARY_PAGE_USABLE;
int SQLITE_FRAGMENT_OVERFLOW_PAGE_USABLE;
double SQLITE_FRAGMENT_MIN_SAVINGS;
int SQLITE_CHUNK_SIZE_PAGES;
int SQLITE_CHUNK_SIZE_PAGES_SIM;
int SQLITE_READER_THREADS;
int SQLITE_WRITE_WINDOW_LIMIT;
double SQLITE_WRITE_WINDOW_SECONDS;
// KeyValueStoreSqlite spring cleaning
double SPRING_CLEANING_NO_ACTION_INTERVAL;
double SPRING_CLEANING_LAZY_DELETE_INTERVAL;
double SPRING_CLEANING_VACUUM_INTERVAL;
double SPRING_CLEANING_LAZY_DELETE_TIME_ESTIMATE;
double SPRING_CLEANING_VACUUM_TIME_ESTIMATE;
double SPRING_CLEANING_VACUUMS_PER_LAZY_DELETE_PAGE;
int SPRING_CLEANING_MIN_LAZY_DELETE_PAGES;
int SPRING_CLEANING_MAX_LAZY_DELETE_PAGES;
int SPRING_CLEANING_LAZY_DELETE_BATCH_SIZE;
int SPRING_CLEANING_MIN_VACUUM_PAGES;
int SPRING_CLEANING_MAX_VACUUM_PAGES;
// KeyValueStoreMemory
int64_t REPLACE_CONTENTS_BYTES;
// KeyValueStoreRocksDB
int ROCKSDB_BACKGROUND_PARALLELISM;
int ROCKSDB_READ_PARALLELISM;
int64_t ROCKSDB_MEMTABLE_BYTES;
bool ROCKSDB_UNSAFE_AUTO_FSYNC;
int64_t ROCKSDB_PERIODIC_COMPACTION_SECONDS;
int ROCKSDB_PREFIX_LEN;
int64_t ROCKSDB_BLOCK_CACHE_SIZE;
// Leader election
int MAX_NOTIFICATIONS;
int MIN_NOTIFICATIONS;
double NOTIFICATION_FULL_CLEAR_TIME;
double CANDIDATE_MIN_DELAY;
double CANDIDATE_MAX_DELAY;
double CANDIDATE_GROWTH_RATE;
double POLLING_FREQUENCY;
double HEARTBEAT_FREQUENCY;
// Commit CommitProxy
double START_TRANSACTION_BATCH_INTERVAL_MIN;
double START_TRANSACTION_BATCH_INTERVAL_MAX;
double START_TRANSACTION_BATCH_INTERVAL_LATENCY_FRACTION;
double START_TRANSACTION_BATCH_INTERVAL_SMOOTHER_ALPHA;
double START_TRANSACTION_BATCH_QUEUE_CHECK_INTERVAL;
double START_TRANSACTION_MAX_TRANSACTIONS_TO_START;
int START_TRANSACTION_MAX_REQUESTS_TO_START;
double START_TRANSACTION_RATE_WINDOW;
double START_TRANSACTION_MAX_EMPTY_QUEUE_BUDGET;
int START_TRANSACTION_MAX_QUEUE_SIZE;
int KEY_LOCATION_MAX_QUEUE_SIZE;
double COMMIT_TRANSACTION_BATCH_INTERVAL_FROM_IDLE;
double COMMIT_TRANSACTION_BATCH_INTERVAL_MIN;
double COMMIT_TRANSACTION_BATCH_INTERVAL_MAX;
double COMMIT_TRANSACTION_BATCH_INTERVAL_LATENCY_FRACTION;
double COMMIT_TRANSACTION_BATCH_INTERVAL_SMOOTHER_ALPHA;
int COMMIT_TRANSACTION_BATCH_COUNT_MAX;
int COMMIT_TRANSACTION_BATCH_BYTES_MIN;
int COMMIT_TRANSACTION_BATCH_BYTES_MAX;
double COMMIT_TRANSACTION_BATCH_BYTES_SCALE_BASE;
double COMMIT_TRANSACTION_BATCH_BYTES_SCALE_POWER;
int64_t COMMIT_BATCHES_MEM_BYTES_HARD_LIMIT;
double COMMIT_BATCHES_MEM_FRACTION_OF_TOTAL;
double COMMIT_BATCHES_MEM_TO_TOTAL_MEM_SCALE_FACTOR;
double RESOLVER_COALESCE_TIME;
int BUGGIFIED_ROW_LIMIT;
double PROXY_SPIN_DELAY;
double UPDATE_REMOTE_LOG_VERSION_INTERVAL;
int MAX_TXS_POP_VERSION_HISTORY;
double MIN_CONFIRM_INTERVAL;
double ENFORCED_MIN_RECOVERY_DURATION;
double REQUIRED_MIN_RECOVERY_DURATION;
bool ALWAYS_CAUSAL_READ_RISKY;
int MAX_COMMIT_UPDATES;
double MAX_PROXY_COMPUTE;
double MAX_COMPUTE_PER_OPERATION;
int PROXY_COMPUTE_BUCKETS;
double PROXY_COMPUTE_GROWTH_RATE;
int TXN_STATE_SEND_AMOUNT;
double REPORT_TRANSACTION_COST_ESTIMATION_DELAY;
bool PROXY_REJECT_BATCH_QUEUED_TOO_LONG;
int RESET_MASTER_BATCHES;
int RESET_RESOLVER_BATCHES;
double RESET_MASTER_DELAY;
double RESET_RESOLVER_DELAY;
// Master Server
double COMMIT_SLEEP_TIME;
double MIN_BALANCE_TIME;
int64_t MIN_BALANCE_DIFFERENCE;
double SECONDS_BEFORE_NO_FAILURE_DELAY;
int64_t MAX_TXS_SEND_MEMORY;
int64_t MAX_RECOVERY_VERSIONS;
double MAX_RECOVERY_TIME;
double PROVISIONAL_START_DELAY;
double PROVISIONAL_DELAY_GROWTH;
double PROVISIONAL_MAX_DELAY;
double SECONDS_BEFORE_RECRUIT_BACKUP_WORKER;
double CC_INTERFACE_TIMEOUT;
// Resolver
int64_t KEY_BYTES_PER_SAMPLE;
int64_t SAMPLE_OFFSET_PER_KEY;
double SAMPLE_EXPIRATION_TIME;
double SAMPLE_POLL_TIME;
int64_t RESOLVER_STATE_MEMORY_LIMIT;
// Backup Worker
double BACKUP_TIMEOUT; // master's reaction time for backup failure
double BACKUP_NOOP_POP_DELAY;
int BACKUP_FILE_BLOCK_BYTES;
int64_t BACKUP_LOCK_BYTES;
double BACKUP_UPLOAD_DELAY;
// Cluster Controller
double CLUSTER_CONTROLLER_LOGGING_DELAY;
double MASTER_FAILURE_REACTION_TIME;
double MASTER_FAILURE_SLOPE_DURING_RECOVERY;
int WORKER_COORDINATION_PING_DELAY;
double SIM_SHUTDOWN_TIMEOUT;
double SHUTDOWN_TIMEOUT;
double MASTER_SPIN_DELAY;
double CC_CHANGE_DELAY;
double CC_CLASS_DELAY;
double WAIT_FOR_GOOD_RECRUITMENT_DELAY;
double WAIT_FOR_GOOD_REMOTE_RECRUITMENT_DELAY;
double ATTEMPT_RECRUITMENT_DELAY;
double WAIT_FOR_DISTRIBUTOR_JOIN_DELAY;
double WAIT_FOR_RATEKEEPER_JOIN_DELAY;
double WORKER_FAILURE_TIME;
double CHECK_OUTSTANDING_INTERVAL;
double INCOMPATIBLE_PEERS_LOGGING_INTERVAL;
double VERSION_LAG_METRIC_INTERVAL;
int64_t MAX_VERSION_DIFFERENCE;
double FORCE_RECOVERY_CHECK_DELAY;
double RATEKEEPER_FAILURE_TIME;
double REPLACE_INTERFACE_DELAY;
double REPLACE_INTERFACE_CHECK_DELAY;
double COORDINATOR_REGISTER_INTERVAL;
double CLIENT_REGISTER_INTERVAL;
// Knobs used to select the best policy (via monte carlo)
int POLICY_RATING_TESTS; // number of tests per policy (in order to compare)
int POLICY_GENERATIONS; // number of policies to generate
int EXPECTED_MASTER_FITNESS;
int EXPECTED_TLOG_FITNESS;
int EXPECTED_LOG_ROUTER_FITNESS;
int EXPECTED_COMMIT_PROXY_FITNESS;
int EXPECTED_GRV_PROXY_FITNESS;
int EXPECTED_RESOLVER_FITNESS;
double RECRUITMENT_TIMEOUT;
int DBINFO_SEND_AMOUNT;
double DBINFO_BATCH_DELAY;
// Move Keys
double SHARD_READY_DELAY;
double SERVER_READY_QUORUM_INTERVAL;
double SERVER_READY_QUORUM_TIMEOUT;
double REMOVE_RETRY_DELAY;
int MOVE_KEYS_KRM_LIMIT;
int MOVE_KEYS_KRM_LIMIT_BYTES; // This must be sufficiently larger than CLIENT_KNOBS->KEY_SIZE_LIMIT
// (fdbclient/Knobs.h) to ensure that at least two entries will be returned from an
// attempt to read a key range map
int MAX_SKIP_TAGS;
double MAX_ADDED_SOURCES_MULTIPLIER;
// FdbServer
double MIN_REBOOT_TIME;
double MAX_REBOOT_TIME;
std::string LOG_DIRECTORY;
int64_t SERVER_MEM_LIMIT;
double SYSTEM_MONITOR_FREQUENCY;
// Ratekeeper
double SMOOTHING_AMOUNT;
double SLOW_SMOOTHING_AMOUNT;
double METRIC_UPDATE_RATE;
double DETAILED_METRIC_UPDATE_RATE;
double LAST_LIMITED_RATIO;
double RATEKEEPER_DEFAULT_LIMIT;
int64_t TARGET_BYTES_PER_STORAGE_SERVER;
int64_t SPRING_BYTES_STORAGE_SERVER;
int64_t AUTO_TAG_THROTTLE_STORAGE_QUEUE_BYTES;
int64_t TARGET_BYTES_PER_STORAGE_SERVER_BATCH;
int64_t SPRING_BYTES_STORAGE_SERVER_BATCH;
int64_t STORAGE_HARD_LIMIT_BYTES;
int64_t STORAGE_DURABILITY_LAG_HARD_MAX;
int64_t STORAGE_DURABILITY_LAG_SOFT_MAX;
int64_t LOW_PRIORITY_STORAGE_QUEUE_BYTES;
int64_t LOW_PRIORITY_DURABILITY_LAG;
int64_t TARGET_BYTES_PER_TLOG;
int64_t SPRING_BYTES_TLOG;
int64_t TARGET_BYTES_PER_TLOG_BATCH;
int64_t SPRING_BYTES_TLOG_BATCH;
int64_t TLOG_SPILL_THRESHOLD;
int64_t TLOG_HARD_LIMIT_BYTES;
int64_t TLOG_RECOVER_MEMORY_LIMIT;
double TLOG_IGNORE_POP_AUTO_ENABLE_DELAY;
int64_t MAX_MANUAL_THROTTLED_TRANSACTION_TAGS;
int64_t MAX_AUTO_THROTTLED_TRANSACTION_TAGS;
double MIN_TAG_COST;
double AUTO_THROTTLE_TARGET_TAG_BUSYNESS;
double AUTO_THROTTLE_RAMP_TAG_BUSYNESS;
double AUTO_TAG_THROTTLE_RAMP_UP_TIME;
double AUTO_TAG_THROTTLE_DURATION;
double TAG_THROTTLE_PUSH_INTERVAL;
double AUTO_TAG_THROTTLE_START_AGGREGATION_TIME;
double AUTO_TAG_THROTTLE_UPDATE_FREQUENCY;
double TAG_THROTTLE_EXPIRED_CLEANUP_INTERVAL;
bool AUTO_TAG_THROTTLING_ENABLED;
double MAX_TRANSACTIONS_PER_BYTE;
int64_t MIN_AVAILABLE_SPACE;
double MIN_AVAILABLE_SPACE_RATIO;
double TARGET_AVAILABLE_SPACE_RATIO;
double AVAILABLE_SPACE_UPDATE_DELAY;
double MAX_TL_SS_VERSION_DIFFERENCE; // spring starts at half this value
double MAX_TL_SS_VERSION_DIFFERENCE_BATCH;
int MAX_MACHINES_FALLING_BEHIND;
int MAX_TPS_HISTORY_SAMPLES;
int NEEDED_TPS_HISTORY_SAMPLES;
int64_t TARGET_DURABILITY_LAG_VERSIONS;
int64_t AUTO_TAG_THROTTLE_DURABILITY_LAG_VERSIONS;
int64_t TARGET_DURABILITY_LAG_VERSIONS_BATCH;
int64_t DURABILITY_LAG_UNLIMITED_THRESHOLD;
double INITIAL_DURABILITY_LAG_MULTIPLIER;
double DURABILITY_LAG_REDUCTION_RATE;
double DURABILITY_LAG_INCREASE_RATE;
double STORAGE_SERVER_LIST_FETCH_TIMEOUT;
// disk snapshot
int64_t MAX_FORKED_PROCESS_OUTPUT;
double SNAP_CREATE_MAX_TIMEOUT;
// Storage Metrics
double STORAGE_METRICS_AVERAGE_INTERVAL;
double STORAGE_METRICS_AVERAGE_INTERVAL_PER_KSECONDS;
double SPLIT_JITTER_AMOUNT;
int64_t IOPS_UNITS_PER_SAMPLE;
int64_t BANDWIDTH_UNITS_PER_SAMPLE;
int64_t BYTES_READ_UNITS_PER_SAMPLE;
int64_t READ_HOT_SUB_RANGE_CHUNK_SIZE;
int64_t EMPTY_READ_PENALTY;
bool READ_SAMPLING_ENABLED;
// Storage Server
double STORAGE_LOGGING_DELAY;
double STORAGE_SERVER_POLL_METRICS_DELAY;
double FUTURE_VERSION_DELAY;
int STORAGE_LIMIT_BYTES;
int BUGGIFY_LIMIT_BYTES;
bool FETCH_USING_STREAMING;
int FETCH_BLOCK_BYTES;
int FETCH_KEYS_PARALLELISM_BYTES;
int FETCH_KEYS_PARALLELISM;
int FETCH_KEYS_LOWER_PRIORITY;
int BUGGIFY_BLOCK_BYTES;
double STORAGE_DURABILITY_LAG_REJECT_THRESHOLD;
double STORAGE_DURABILITY_LAG_MIN_RATE;
int STORAGE_COMMIT_BYTES;
int STORAGE_FETCH_BYTES;
double STORAGE_COMMIT_INTERVAL;
double UPDATE_SHARD_VERSION_INTERVAL;
int BYTE_SAMPLING_FACTOR;
int BYTE_SAMPLING_OVERHEAD;
int MAX_STORAGE_SERVER_WATCH_BYTES;
int MAX_BYTE_SAMPLE_CLEAR_MAP_SIZE;
double LONG_BYTE_SAMPLE_RECOVERY_DELAY;
int BYTE_SAMPLE_LOAD_PARALLELISM;
double BYTE_SAMPLE_LOAD_DELAY;
double BYTE_SAMPLE_START_DELAY;
double UPDATE_STORAGE_PROCESS_STATS_INTERVAL;
double BEHIND_CHECK_DELAY;
int BEHIND_CHECK_COUNT;
int64_t BEHIND_CHECK_VERSIONS;
double WAIT_METRICS_WRONG_SHARD_CHANCE;
int64_t MIN_TAG_READ_PAGES_RATE;
int64_t MIN_TAG_WRITE_PAGES_RATE;
double TAG_MEASUREMENT_INTERVAL;
int64_t READ_COST_BYTE_FACTOR;
bool PREFIX_COMPRESS_KVS_MEM_SNAPSHOTS;
bool REPORT_DD_METRICS;
double DD_METRICS_REPORT_INTERVAL;
double FETCH_KEYS_TOO_LONG_TIME_CRITERIA;
double MAX_STORAGE_COMMIT_TIME;
int64_t RANGESTREAM_LIMIT_BYTES;
// Wait Failure
int MAX_OUTSTANDING_WAIT_FAILURE_REQUESTS;
double WAIT_FAILURE_DELAY_LIMIT;
// Worker
double WORKER_LOGGING_INTERVAL;
double HEAP_PROFILER_INTERVAL;
double UNKNOWN_CC_TIMEOUT;
double DEGRADED_RESET_INTERVAL;
double DEGRADED_WARNING_LIMIT;
double DEGRADED_WARNING_RESET_DELAY;
int64_t TRACE_LOG_FLUSH_FAILURE_CHECK_INTERVAL_SECONDS;
double TRACE_LOG_PING_TIMEOUT_SECONDS;
double MIN_DELAY_CC_WORST_FIT_CANDIDACY_SECONDS; // Listen for a leader for N seconds, and if not heard, then try to
// become the leader.
double MAX_DELAY_CC_WORST_FIT_CANDIDACY_SECONDS;
double DBINFO_FAILED_DELAY;
bool ENABLE_WORKER_HEALTH_MONITOR;
double WORKER_HEALTH_MONITOR_INTERVAL; // Interval between two health monitor health check.
int PEER_LATENCY_CHECK_MIN_POPULATION; // The minimum number of latency samples required to check a peer.
double PEER_LATENCY_DEGRADATION_PERCENTILE; // The percentile latency used to check peer health.
double PEER_LATENCY_DEGRADATION_THRESHOLD; // The latency threshold to consider a peer degraded.
double PEER_TIMEOUT_PERCENTAGE_DEGRADATION_THRESHOLD; // The percentage of timeout to consider a peer degraded.
// Test harness
double WORKER_POLL_DELAY;
// Coordination
double COORDINATED_STATE_ONCONFLICT_POLL_INTERVAL;
bool ENABLE_CROSS_CLUSTER_SUPPORT; // Allow a coordinator to serve requests whose connection string does not match
// the local descriptor
// Buggification
double BUGGIFIED_EVENTUAL_CONSISTENCY;
bool BUGGIFY_ALL_COORDINATION;
// Status
double STATUS_MIN_TIME_BETWEEN_REQUESTS;
double MAX_STATUS_REQUESTS_PER_SECOND;
int CONFIGURATION_ROWS_TO_FETCH;
bool DISABLE_DUPLICATE_LOG_WARNING;
double HISTOGRAM_REPORT_INTERVAL;
// IPager
int PAGER_RESERVED_PAGES;
// IndirectShadowPager
int FREE_PAGE_VACUUM_THRESHOLD;
int VACUUM_QUEUE_SIZE;
int VACUUM_BYTES_PER_SECOND;
// Timekeeper
int64_t TIME_KEEPER_DELAY;
int64_t TIME_KEEPER_MAX_ENTRIES;
// Fast Restore
// TODO: After 6.3, review FR knobs, remove unneeded ones and change default value
int64_t FASTRESTORE_FAILURE_TIMEOUT;
int64_t FASTRESTORE_HEARTBEAT_INTERVAL;
double FASTRESTORE_SAMPLING_PERCENT;
int64_t FASTRESTORE_NUM_LOADERS;
int64_t FASTRESTORE_NUM_APPLIERS;
// FASTRESTORE_TXN_BATCH_MAX_BYTES is target txn size used by appliers to apply mutations
double FASTRESTORE_TXN_BATCH_MAX_BYTES;
// FASTRESTORE_VERSIONBATCH_MAX_BYTES is the maximum data size in each version batch
double FASTRESTORE_VERSIONBATCH_MAX_BYTES;
// FASTRESTORE_VB_PARALLELISM is the number of concurrently running version batches
int64_t FASTRESTORE_VB_PARALLELISM;
int64_t FASTRESTORE_VB_MONITOR_DELAY; // How quickly monitor finished version batch
double FASTRESTORE_VB_LAUNCH_DELAY;
int64_t FASTRESTORE_ROLE_LOGGING_DELAY;
int64_t FASTRESTORE_UPDATE_PROCESS_STATS_INTERVAL; // How quickly to update process metrics for restore
int64_t FASTRESTORE_ATOMICOP_WEIGHT; // workload amplication factor for atomic op
int64_t FASTRESTORE_APPLYING_PARALLELISM; // number of outstanding txns writing to dest. DB
int64_t FASTRESTORE_MONITOR_LEADER_DELAY;
int64_t FASTRESTORE_STRAGGLER_THRESHOLD_SECONDS;
bool FASTRESTORE_TRACK_REQUEST_LATENCY; // true to track reply latency of each request in a request batch
bool FASTRESTORE_TRACK_LOADER_SEND_REQUESTS; // track requests of load send mutations to appliers?
int64_t FASTRESTORE_MEMORY_THRESHOLD_MB_SOFT; // threshold when pipelined actors should be delayed
int64_t FASTRESTORE_WAIT_FOR_MEMORY_LATENCY;
int64_t FASTRESTORE_HEARTBEAT_DELAY; // interval for master to ping loaders and appliers
int64_t
FASTRESTORE_HEARTBEAT_MAX_DELAY; // master claim a node is down if no heart beat from the node for this delay
int64_t FASTRESTORE_APPLIER_FETCH_KEYS_SIZE; // number of keys to fetch in a txn on applier
int64_t FASTRESTORE_LOADER_SEND_MUTATION_MSG_BYTES; // desired size of mutation message sent from loader to appliers
bool FASTRESTORE_GET_RANGE_VERSIONS_EXPENSIVE; // parse each range file to get (range, version) it has?
int64_t FASTRESTORE_REQBATCH_PARALLEL; // number of requests to wait on for getBatchReplies()
bool FASTRESTORE_REQBATCH_LOG; // verbose log information for getReplyBatches
int FASTRESTORE_TXN_CLEAR_MAX; // threshold to start tracking each clear op in a txn
int FASTRESTORE_TXN_RETRY_MAX; // threshold to start output error on too many retries
double FASTRESTORE_TXN_EXTRA_DELAY; // extra delay to avoid overwhelming fdb
bool FASTRESTORE_NOT_WRITE_DB; // do not write result to DB. Only for dev testing
bool FASTRESTORE_USE_RANGE_FILE; // use range file in backup
bool FASTRESTORE_USE_LOG_FILE; // use log file in backup
int64_t FASTRESTORE_SAMPLE_MSG_BYTES; // sample message desired size
double FASTRESTORE_SCHED_UPDATE_DELAY; // delay in seconds in updating process metrics
int FASTRESTORE_SCHED_TARGET_CPU_PERCENT; // release as many requests as possible when cpu usage is below the knob
int FASTRESTORE_SCHED_MAX_CPU_PERCENT; // max cpu percent when scheduler shall not release non-urgent requests
int FASTRESTORE_SCHED_INFLIGHT_LOAD_REQS; // number of inflight requests to load backup files
int FASTRESTORE_SCHED_INFLIGHT_SEND_REQS; // number of inflight requests for loaders to send mutations to appliers
int FASTRESTORE_SCHED_LOAD_REQ_BATCHSIZE; // number of load request to release at once
int FASTRESTORE_SCHED_INFLIGHT_SENDPARAM_THRESHOLD; // we can send future VB requests if it is less than this knob
int FASTRESTORE_SCHED_SEND_FUTURE_VB_REQS_BATCH; // number of future VB sendLoadingParam requests to process at once
int FASTRESTORE_NUM_TRACE_EVENTS;
bool FASTRESTORE_EXPENSIVE_VALIDATION; // when set true, performance will be heavily affected
double FASTRESTORE_WRITE_BW_MB; // target aggregated write bandwidth from all appliers
double FASTRESTORE_RATE_UPDATE_SECONDS; // how long to update appliers target write rate
int REDWOOD_DEFAULT_PAGE_SIZE; // Page size for new Redwood files
int REDWOOD_DEFAULT_EXTENT_SIZE; // Extent size for new Redwood files
int REDWOOD_DEFAULT_EXTENT_READ_SIZE; // Extent read size for Redwood files
int REDWOOD_EXTENT_CONCURRENT_READS; // Max number of simultaneous extent disk reads in progress.
int REDWOOD_KVSTORE_CONCURRENT_READS; // Max number of simultaneous point or range reads in progress.
int REDWOOD_COMMIT_CONCURRENT_READS; // Max number of concurrent reads done to support commit operations
double REDWOOD_PAGE_REBUILD_MAX_SLACK; // When rebuilding pages, max slack to allow in page
int REDWOOD_LAZY_CLEAR_BATCH_SIZE_PAGES; // Number of pages to try to pop from the lazy delete queue and process at
// once
int REDWOOD_LAZY_CLEAR_MIN_PAGES; // Minimum number of pages to free before ending a lazy clear cycle, unless the
// queue is empty
int REDWOOD_LAZY_CLEAR_MAX_PAGES; // Maximum number of pages to free before ending a lazy clear cycle, unless the
// queue is empty
int64_t REDWOOD_REMAP_CLEANUP_WINDOW; // Remap remover lag interval in which to coalesce page writes
double REDWOOD_REMAP_CLEANUP_LAG; // Maximum allowed remap remover lag behind the cleanup window as a multiple of
// the window size
double REDWOOD_LOGGING_INTERVAL;
// Server request latency measurement
int LATENCY_SAMPLE_SIZE;
double LATENCY_METRICS_LOGGING_INTERVAL;
ServerKnobs(Randomize, ClientKnobs*, IsSimulated);
void initialize(Randomize, ClientKnobs*, IsSimulated);
};

View File

@ -0,0 +1,299 @@
/*
* SimpleConfigTransaction.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <algorithm>
#include "fdbclient/CommitTransaction.h"
#include "fdbclient/DatabaseContext.h"
#include "fdbclient/IKnobCollection.h"
#include "fdbclient/SimpleConfigTransaction.h"
#include "fdbserver/Knobs.h"
#include "flow/Arena.h"
#include "flow/actorcompiler.h" // This must be the last #include.
class SimpleConfigTransactionImpl {
ConfigTransactionCommitRequest toCommit;
Future<Version> getVersionFuture;
ConfigTransactionInterface cti;
int numRetries{ 0 };
bool committed{ false };
Optional<UID> dID;
Database cx;
ACTOR static Future<Version> getReadVersion(SimpleConfigTransactionImpl* self) {
if (self->dID.present()) {
TraceEvent("SimpleConfigTransactionGettingReadVersion", self->dID.get());
}
ConfigTransactionGetVersionRequest req;
ConfigTransactionGetVersionReply reply =
wait(self->cti.getVersion.getReply(ConfigTransactionGetVersionRequest{}));
if (self->dID.present()) {
TraceEvent("SimpleConfigTransactionGotReadVersion", self->dID.get()).detail("Version", reply.version);
}
return reply.version;
}
ACTOR static Future<Optional<Value>> get(SimpleConfigTransactionImpl* self, KeyRef key) {
if (!self->getVersionFuture.isValid()) {
self->getVersionFuture = getReadVersion(self);
}
state ConfigKey configKey = ConfigKey::decodeKey(key);
Version version = wait(self->getVersionFuture);
if (self->dID.present()) {
TraceEvent("SimpleConfigTransactionGettingValue", self->dID.get())
.detail("ConfigClass", configKey.configClass)
.detail("KnobName", configKey.knobName);
}
ConfigTransactionGetReply reply =
wait(self->cti.get.getReply(ConfigTransactionGetRequest{ version, configKey }));
if (self->dID.present()) {
TraceEvent("SimpleConfigTransactionGotValue", self->dID.get())
.detail("Value", reply.value.get().toString());
}
if (reply.value.present()) {
return reply.value.get().toValue();
} else {
return {};
}
}
ACTOR static Future<Standalone<RangeResultRef>> getConfigClasses(SimpleConfigTransactionImpl* self) {
if (!self->getVersionFuture.isValid()) {
self->getVersionFuture = getReadVersion(self);
}
Version version = wait(self->getVersionFuture);
ConfigTransactionGetConfigClassesReply reply =
wait(self->cti.getClasses.getReply(ConfigTransactionGetConfigClassesRequest{ version }));
Standalone<RangeResultRef> result;
for (const auto& configClass : reply.configClasses) {
result.push_back_deep(result.arena(), KeyValueRef(configClass, ""_sr));
}
return result;
}
ACTOR static Future<Standalone<RangeResultRef>> getKnobs(SimpleConfigTransactionImpl* self,
Optional<Key> configClass) {
if (!self->getVersionFuture.isValid()) {
self->getVersionFuture = getReadVersion(self);
}
Version version = wait(self->getVersionFuture);
ConfigTransactionGetKnobsReply reply =
wait(self->cti.getKnobs.getReply(ConfigTransactionGetKnobsRequest{ version, configClass }));
Standalone<RangeResultRef> result;
for (const auto& knobName : reply.knobNames) {
result.push_back_deep(result.arena(), KeyValueRef(knobName, ""_sr));
}
return result;
}
ACTOR static Future<Void> commit(SimpleConfigTransactionImpl* self) {
if (!self->getVersionFuture.isValid()) {
self->getVersionFuture = getReadVersion(self);
}
wait(store(self->toCommit.version, self->getVersionFuture));
self->toCommit.annotation.timestamp = now();
wait(self->cti.commit.getReply(self->toCommit));
self->committed = true;
return Void();
}
public:
SimpleConfigTransactionImpl(Database const& cx) : cx(cx) {
auto coordinators = cx->getConnectionFile()->getConnectionString().coordinators();
std::sort(coordinators.begin(), coordinators.end());
cti = ConfigTransactionInterface(coordinators[0]);
}
SimpleConfigTransactionImpl(ConfigTransactionInterface const& cti) : cti(cti) {}
void set(KeyRef key, ValueRef value) {
if (key == configTransactionDescriptionKey) {
toCommit.annotation.description = KeyRef(toCommit.arena, value);
} else {
ConfigKey configKey = ConfigKeyRef::decodeKey(key);
auto knobValue = IKnobCollection::parseKnobValue(
configKey.knobName.toString(), value.toString(), IKnobCollection::Type::TEST);
toCommit.mutations.emplace_back_deep(toCommit.arena, configKey, knobValue.contents());
}
}
void clear(KeyRef key) {
if (key == configTransactionDescriptionKey) {
toCommit.annotation.description = ""_sr;
} else {
toCommit.mutations.emplace_back_deep(
toCommit.arena, ConfigKeyRef::decodeKey(key), Optional<KnobValueRef>{});
}
}
Future<Optional<Value>> get(KeyRef key) { return get(this, key); }
Future<Standalone<RangeResultRef>> getRange(KeyRangeRef keys) {
if (keys == configClassKeys) {
return getConfigClasses(this);
} else if (keys == globalConfigKnobKeys) {
return getKnobs(this, {});
} else if (configKnobKeys.contains(keys) && keys.singleKeyRange()) {
const auto configClass = keys.begin.removePrefix(configKnobKeys.begin);
return getKnobs(this, configClass);
} else {
throw invalid_config_db_range_read();
}
}
Future<Void> commit() { return commit(this); }
Future<Void> onError(Error const& e) {
// TODO: Improve this:
if (e.code() == error_code_transaction_too_old) {
reset();
return delay((1 << numRetries++) * 0.01 * deterministicRandom()->random01());
}
throw e;
}
Future<Version> getReadVersion() {
if (!getVersionFuture.isValid())
getVersionFuture = getReadVersion(this);
return getVersionFuture;
}
Optional<Version> getCachedReadVersion() const {
if (getVersionFuture.isValid() && getVersionFuture.isReady() && !getVersionFuture.isError()) {
return getVersionFuture.get();
} else {
return {};
}
}
Version getCommittedVersion() const { return committed ? getVersionFuture.get() : ::invalidVersion; }
void reset() {
getVersionFuture = Future<Version>{};
toCommit = {};
committed = false;
}
void fullReset() {
numRetries = 0;
dID = {};
reset();
}
size_t getApproximateSize() const { return toCommit.expectedSize(); }
void debugTransaction(UID dID) {
this->dID = dID;
}
void checkDeferredError(Error const& deferredError) const {
if (deferredError.code() != invalid_error_code) {
throw deferredError;
}
if (cx.getPtr()) {
cx->checkDeferredError();
}
}
}; // SimpleConfigTransactionImpl
Future<Version> SimpleConfigTransaction::getReadVersion() {
return impl().getReadVersion();
}
Optional<Version> SimpleConfigTransaction::getCachedReadVersion() const {
return impl().getCachedReadVersion();
}
Future<Optional<Value>> SimpleConfigTransaction::get(Key const& key, bool snapshot) {
return impl().get(key);
}
Future<Standalone<RangeResultRef>> SimpleConfigTransaction::getRange(KeySelector const& begin,
KeySelector const& end,
int limit,
bool snapshot,
bool reverse) {
return impl().getRange(KeyRangeRef(begin.getKey(), end.getKey()));
}
Future<Standalone<RangeResultRef>> SimpleConfigTransaction::getRange(KeySelector begin,
KeySelector end,
GetRangeLimits limits,
bool snapshot,
bool reverse) {
return impl().getRange(KeyRangeRef(begin.getKey(), end.getKey()));
}
void SimpleConfigTransaction::set(KeyRef const& key, ValueRef const& value) {
impl().set(key, value);
}
void SimpleConfigTransaction::clear(KeyRef const& key) {
impl().clear(key);
}
Future<Void> SimpleConfigTransaction::commit() {
return impl().commit();
}
Version SimpleConfigTransaction::getCommittedVersion() const {
return impl().getCommittedVersion();
}
int64_t SimpleConfigTransaction::getApproximateSize() const {
return impl().getApproximateSize();
}
void SimpleConfigTransaction::setOption(FDBTransactionOptions::Option option, Optional<StringRef> value) {
// TODO: Support using this option to determine atomicity
}
Future<Void> SimpleConfigTransaction::onError(Error const& e) {
return impl().onError(e);
}
void SimpleConfigTransaction::cancel() {
// TODO: Implement someday
throw client_invalid_operation();
}
void SimpleConfigTransaction::reset() {
return impl().reset();
}
void SimpleConfigTransaction::fullReset() {
return impl().fullReset();
}
void SimpleConfigTransaction::debugTransaction(UID dID) {
impl().debugTransaction(dID);
}
void SimpleConfigTransaction::checkDeferredError() const {
impl().checkDeferredError(deferredError);
}
SimpleConfigTransaction::SimpleConfigTransaction(Database const& cx)
: _impl(std::make_unique<SimpleConfigTransactionImpl>(cx)) {}
SimpleConfigTransaction::SimpleConfigTransaction(ConfigTransactionInterface const& cti)
: _impl(std::make_unique<SimpleConfigTransactionImpl>(cti)) {}
SimpleConfigTransaction::~SimpleConfigTransaction() = default;

View File

@ -0,0 +1,75 @@
/*
* SimpleConfigTransaction.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include <memory>
#include "fdbclient/CommitTransaction.h"
#include "fdbclient/ConfigTransactionInterface.h"
#include "fdbclient/CoordinationInterface.h"
#include "fdbclient/FDBTypes.h"
#include "fdbclient/IConfigTransaction.h"
#include "flow/Error.h"
#include "flow/flow.h"
/*
* A configuration transaction implementation that interacts with a simple (test-only) implementation of
* the configuration database. All configuration database data is assumed to live on a single node
* (the lowest coordinator by IP address), so there is no fault tolerance.
*/
class SimpleConfigTransaction final : public IConfigTransaction, public FastAllocated<SimpleConfigTransaction> {
std::unique_ptr<class SimpleConfigTransactionImpl> _impl;
SimpleConfigTransactionImpl const& impl() const { return *_impl; }
SimpleConfigTransactionImpl& impl() { return *_impl; }
public:
SimpleConfigTransaction(ConfigTransactionInterface const&);
SimpleConfigTransaction(Database const&);
~SimpleConfigTransaction();
Future<Version> getReadVersion() override;
Optional<Version> getCachedReadVersion() const override;
Future<Optional<Value>> get(Key const& key, bool snapshot = false) override;
Future<Standalone<RangeResultRef>> getRange(KeySelector const& begin,
KeySelector const& end,
int limit,
bool snapshot = false,
bool reverse = false) override;
Future<Standalone<RangeResultRef>> getRange(KeySelector begin,
KeySelector end,
GetRangeLimits limits,
bool snapshot = false,
bool reverse = false) override;
Future<Void> commit() override;
Version getCommittedVersion() const override;
void setOption(FDBTransactionOptions::Option option, Optional<StringRef> value = Optional<StringRef>()) override;
Future<Void> onError(Error const& e) override;
void cancel() override;
void reset() override;
void debugTransaction(UID dID) override;
void checkDeferredError() const override;
int64_t getApproximateSize() const override;
void set(KeyRef const&, ValueRef const&) override;
void clear(KeyRangeRef const&) override { throw client_invalid_operation(); }
void clear(KeyRef const&) override;
void fullReset();
};

View File

@ -76,6 +76,7 @@ struct StorageServerInterface {
RequestStream<struct WatchValueRequest> watchValue;
RequestStream<struct ReadHotSubRangeRequest> getReadHotRanges;
RequestStream<struct SplitRangeRequest> getRangeSplitPoints;
RequestStream<struct GetKeyValuesStreamRequest> getKeyValuesStream;
explicit StorageServerInterface(UID uid) : uniqueID(uid) {}
StorageServerInterface() : uniqueID(deterministicRandom()->randomUniqueID()) {}
@ -116,6 +117,8 @@ struct StorageServerInterface {
RequestStream<struct ReadHotSubRangeRequest>(getValue.getEndpoint().getAdjustedEndpoint(11));
getRangeSplitPoints =
RequestStream<struct SplitRangeRequest>(getValue.getEndpoint().getAdjustedEndpoint(12));
getKeyValuesStream =
RequestStream<struct GetKeyValuesStreamRequest>(getValue.getEndpoint().getAdjustedEndpoint(13));
}
} else {
ASSERT(Ar::isDeserializing);
@ -157,6 +160,7 @@ struct StorageServerInterface {
streams.push_back(watchValue.getReceiver());
streams.push_back(getReadHotRanges.getReceiver());
streams.push_back(getRangeSplitPoints.getReceiver());
streams.push_back(getKeyValuesStream.getReceiver(TaskPriority::LoadBalancedEndpoint));
FlowTransport::transport().addEndpoints(streams);
}
};
@ -293,6 +297,45 @@ struct GetKeyValuesRequest : TimedRequest {
}
};
struct GetKeyValuesStreamReply : public ReplyPromiseStreamReply {
constexpr static FileIdentifier file_identifier = 1783066;
Arena arena;
VectorRef<KeyValueRef, VecSerStrategy::String> data;
Version version; // useful when latestVersion was requested
bool more;
bool cached = false;
GetKeyValuesStreamReply() : version(invalidVersion), more(false), cached(false) {}
GetKeyValuesStreamReply(GetKeyValuesReply r)
: arena(r.arena), data(r.data), version(r.version), more(r.more), cached(r.cached) {}
int expectedSize() const { return sizeof(GetKeyValuesStreamReply) + data.expectedSize(); }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, ReplyPromiseStreamReply::acknowledgeToken, data, version, more, cached, arena);
}
};
struct GetKeyValuesStreamRequest {
constexpr static FileIdentifier file_identifier = 6795746;
SpanID spanContext;
Arena arena;
KeySelectorRef begin, end;
Version version; // or latestVersion
int limit, limitBytes;
bool isFetchKeys;
Optional<TagSet> tags;
Optional<UID> debugID;
ReplyPromiseStream<GetKeyValuesStreamReply> reply;
GetKeyValuesStreamRequest() : isFetchKeys(false) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, begin, end, version, limit, limitBytes, isFetchKeys, tags, debugID, reply, spanContext, arena);
}
};
struct GetKeyReply : public LoadBalancedReply {
constexpr static FileIdentifier file_identifier = 11226513;
KeySelector sel;

View File

@ -1004,6 +1004,11 @@ const KeyRef writeRecoveryKey = LiteralStringRef("\xff/writeRecovery");
const ValueRef writeRecoveryKeyTrue = LiteralStringRef("1");
const KeyRef snapshotEndVersionKey = LiteralStringRef("\xff/snapshotEndVersion");
const KeyRef configTransactionDescriptionKey = "\xff\xff/description"_sr;
const KeyRange globalConfigKnobKeys = singleKeyRange("\xff\xff/globalKnobs"_sr);
const KeyRangeRef configKnobKeys("\xff\xff/knobs/"_sr, "\xff\xff/knobs0"_sr);
const KeyRangeRef configClassKeys("\xff\xff/configClasses/"_sr, "\xff\xff/configClasses0"_sr);
// for tests
void testSSISerdes(StorageServerInterface const& ssi, bool useFB) {
printf("ssi=\nid=%s\nlocality=%s\nisTss=%s\ntssId=%s\naddress=%s\ngetValue=%s\n\n\n",
@ -1059,4 +1064,4 @@ TEST_CASE("/SystemData/SerDes/SSI") {
printf("ssi serdes test complete\n");
return Void();
}
}

View File

@ -477,6 +477,12 @@ extern const ValueRef writeRecoveryKeyTrue;
// Allows incremental restore to read and set starting version for consistency.
extern const KeyRef snapshotEndVersionKey;
// Configuration database special keys
extern const KeyRef configTransactionDescriptionKey;
extern const KeyRange globalConfigKnobKeys;
extern const KeyRangeRef configKnobKeys;
extern const KeyRangeRef configClassKeys;
#pragma clang diagnostic pop
#endif

View File

@ -0,0 +1,76 @@
/*
* TestKnobCollection.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/TestKnobCollection.h"
TestKnobCollection::TestKnobCollection(Randomize randomize, IsSimulated isSimulated)
: serverKnobCollection(randomize, isSimulated) {
initialize(randomize, isSimulated);
}
void TestKnobCollection::initialize(Randomize randomize, IsSimulated isSimulated) {
serverKnobCollection.initialize(randomize, isSimulated);
testKnobs.initialize();
}
void TestKnobCollection::reset(Randomize randomize, IsSimulated isSimulated) {
serverKnobCollection.reset(randomize, isSimulated);
testKnobs.reset();
}
Optional<KnobValue> TestKnobCollection::tryParseKnobValue(std::string const& knobName,
std::string const& knobValue) const {
auto result = serverKnobCollection.tryParseKnobValue(knobName, knobValue);
if (result.present()) {
return result;
}
auto parsedKnobValue = testKnobs.parseKnobValue(knobName, knobValue);
if (!std::holds_alternative<NoKnobFound>(parsedKnobValue)) {
return KnobValueRef::create(parsedKnobValue);
}
return {};
}
bool TestKnobCollection::trySetKnob(std::string const& knobName, KnobValueRef const& knobValue) {
return serverKnobCollection.trySetKnob(knobName, knobValue) || knobValue.visitSetKnob(knobName, testKnobs);
}
#define init(knob, value) initKnob(knob, value, #knob)
TestKnobs::TestKnobs() {
initialize();
}
void TestKnobs::initialize() {
init(TEST_LONG, 0);
init(TEST_INT, 0);
init(TEST_DOUBLE, 0.0);
init(TEST_BOOL, false);
init(TEST_STRING, "");
}
bool TestKnobs::operator==(TestKnobs const& rhs) const {
return (TEST_LONG == rhs.TEST_LONG) && (TEST_INT == rhs.TEST_INT) && (TEST_DOUBLE == rhs.TEST_DOUBLE) &&
(TEST_BOOL == rhs.TEST_BOOL) && (TEST_STRING == rhs.TEST_STRING);
}
bool TestKnobs::operator!=(TestKnobs const& rhs) const {
return !(*this == rhs);
}

View File

@ -0,0 +1,57 @@
/*
* TestKnobCollection.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/IKnobCollection.h"
#include "fdbclient/ServerKnobCollection.h"
class TestKnobs : public KnobsImpl<TestKnobs> {
public:
TestKnobs();
int64_t TEST_LONG;
int TEST_INT;
double TEST_DOUBLE;
bool TEST_BOOL;
std::string TEST_STRING;
bool operator==(TestKnobs const&) const;
bool operator!=(TestKnobs const&) const;
void initialize();
};
/*
* Stores both flow knobs, client knobs, server knobs, and test knobs. As the name implies, this class is only meant to
* be used for testing
*/
class TestKnobCollection : public IKnobCollection {
ServerKnobCollection serverKnobCollection;
TestKnobs testKnobs;
public:
TestKnobCollection(Randomize randomize, IsSimulated isSimulated);
void initialize(Randomize randomize, IsSimulated isSimulated) override;
void reset(Randomize randomize, IsSimulated isSimulated) override;
FlowKnobs const& getFlowKnobs() const override { return serverKnobCollection.getFlowKnobs(); }
ClientKnobs const& getClientKnobs() const override { return serverKnobCollection.getClientKnobs(); }
ServerKnobs const& getServerKnobs() const override { return serverKnobCollection.getServerKnobs(); }
TestKnobs const& getTestKnobs() const override { return testKnobs; }
Optional<KnobValue> tryParseKnobValue(std::string const& knobName, std::string const& knobValue) const override;
bool trySetKnob(std::string const& knobName, KnobValueRef const& knobValue) override;
};

View File

@ -19,7 +19,6 @@
*/
#include "fdbclient/ThreadSafeTransaction.h"
#include "fdbclient/ReadYourWrites.h"
#include "fdbclient/DatabaseContext.h"
#include "fdbclient/versions.h"
#include "fdbclient/NativeAPI.actor.h"
@ -46,7 +45,9 @@ ThreadFuture<Reference<IDatabase>> ThreadSafeDatabase::createFromExistingDatabas
}
Reference<ITransaction> ThreadSafeDatabase::createTransaction() {
return Reference<ITransaction>(new ThreadSafeTransaction(db));
auto type =
isConfigDB ? ISingleThreadTransaction::Type::SIMPLE_CONFIG : ISingleThreadTransaction::Type::RYW;
return Reference<ITransaction>(new ThreadSafeTransaction(db, type));
}
void ThreadSafeDatabase::setOption(FDBDatabaseOptions::Option option, Optional<StringRef> value) {
@ -57,6 +58,9 @@ void ThreadSafeDatabase::setOption(FDBDatabaseOptions::Option option, Optional<S
TraceEvent("UnknownDatabaseOption").detail("Option", option);
throw invalid_option();
}
if (itr->first == FDBDatabaseOptions::USE_CONFIG_DATABASE) {
isConfigDB = true;
}
DatabaseContext* db = this->db;
Standalone<Optional<StringRef>> passValue = value;
@ -134,7 +138,7 @@ ThreadSafeDatabase::~ThreadSafeDatabase() {
onMainThreadVoid([db]() { db->delref(); }, nullptr);
}
ThreadSafeTransaction::ThreadSafeTransaction(DatabaseContext* cx) {
ThreadSafeTransaction::ThreadSafeTransaction(DatabaseContext* cx, ISingleThreadTransaction::Type type) {
// Allocate memory for the transaction from this thread (so the pointer is known for subsequent method calls)
// but run its constructor on the main thread
@ -142,12 +146,12 @@ ThreadSafeTransaction::ThreadSafeTransaction(DatabaseContext* cx) {
// because the reference count of the DatabaseContext is solely managed from the main thread. If cx is destructed
// immediately after this call, it will defer the DatabaseContext::delref (and onMainThread preserves the order of
// these operations).
ReadYourWritesTransaction* tr = this->tr = ReadYourWritesTransaction::allocateOnForeignThread();
auto tr = this->tr = ISingleThreadTransaction::allocateOnForeignThread(type);
// No deferred error -- if the construction of the RYW transaction fails, we have no where to put it
onMainThreadVoid(
[tr, cx]() {
[tr, type, cx]() {
cx->addref();
new (tr) ReadYourWritesTransaction(Database(cx));
ISingleThreadTransaction::create(tr, type, Database(cx));
},
nullptr);
}
@ -159,23 +163,23 @@ ThreadSafeTransaction::ThreadSafeTransaction(ReadYourWritesTransaction* ryw) : t
}
ThreadSafeTransaction::~ThreadSafeTransaction() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
if (tr)
onMainThreadVoid([tr]() { tr->delref(); }, nullptr);
}
void ThreadSafeTransaction::cancel() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr]() { tr->cancel(); }, nullptr);
}
void ThreadSafeTransaction::setVersion(Version v) {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr, v]() { tr->setVersion(v); }, &tr->deferredError);
}
ThreadFuture<Version> ThreadSafeTransaction::getReadVersion() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr]() -> Future<Version> {
tr->checkDeferredError();
return tr->getReadVersion();
@ -185,7 +189,7 @@ ThreadFuture<Version> ThreadSafeTransaction::getReadVersion() {
ThreadFuture<Optional<Value>> ThreadSafeTransaction::get(const KeyRef& key, bool snapshot) {
Key k = key;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, k, snapshot]() -> Future<Optional<Value>> {
tr->checkDeferredError();
return tr->get(k, snapshot);
@ -195,7 +199,7 @@ ThreadFuture<Optional<Value>> ThreadSafeTransaction::get(const KeyRef& key, bool
ThreadFuture<Key> ThreadSafeTransaction::getKey(const KeySelectorRef& key, bool snapshot) {
KeySelector k = key;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, k, snapshot]() -> Future<Key> {
tr->checkDeferredError();
return tr->getKey(k, snapshot);
@ -205,7 +209,7 @@ ThreadFuture<Key> ThreadSafeTransaction::getKey(const KeySelectorRef& key, bool
ThreadFuture<int64_t> ThreadSafeTransaction::getEstimatedRangeSizeBytes(const KeyRangeRef& keys) {
KeyRange r = keys;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, r]() -> Future<int64_t> {
tr->checkDeferredError();
return tr->getEstimatedRangeSizeBytes(r);
@ -216,7 +220,7 @@ ThreadFuture<Standalone<VectorRef<KeyRef>>> ThreadSafeTransaction::getRangeSplit
int64_t chunkSize) {
KeyRange r = range;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, r, chunkSize]() -> Future<Standalone<VectorRef<KeyRef>>> {
tr->checkDeferredError();
return tr->getRangeSplitPoints(r, chunkSize);
@ -231,7 +235,7 @@ ThreadFuture<RangeResult> ThreadSafeTransaction::getRange(const KeySelectorRef&
KeySelector b = begin;
KeySelector e = end;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, b, e, limit, snapshot, reverse]() -> Future<RangeResult> {
tr->checkDeferredError();
return tr->getRange(b, e, limit, snapshot, reverse);
@ -246,7 +250,7 @@ ThreadFuture<RangeResult> ThreadSafeTransaction::getRange(const KeySelectorRef&
KeySelector b = begin;
KeySelector e = end;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, b, e, limits, snapshot, reverse]() -> Future<RangeResult> {
tr->checkDeferredError();
return tr->getRange(b, e, limits, snapshot, reverse);
@ -256,7 +260,7 @@ ThreadFuture<RangeResult> ThreadSafeTransaction::getRange(const KeySelectorRef&
ThreadFuture<Standalone<VectorRef<const char*>>> ThreadSafeTransaction::getAddressesForKey(const KeyRef& key) {
Key k = key;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, k]() -> Future<Standalone<VectorRef<const char*>>> {
tr->checkDeferredError();
return tr->getAddressesForKey(k);
@ -266,12 +270,12 @@ ThreadFuture<Standalone<VectorRef<const char*>>> ThreadSafeTransaction::getAddre
void ThreadSafeTransaction::addReadConflictRange(const KeyRangeRef& keys) {
KeyRange r = keys;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr, r]() { tr->addReadConflictRange(r); }, &tr->deferredError);
}
void ThreadSafeTransaction::makeSelfConflicting() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr]() { tr->makeSelfConflicting(); }, &tr->deferredError);
}
@ -279,7 +283,7 @@ void ThreadSafeTransaction::atomicOp(const KeyRef& key, const ValueRef& value, u
Key k = key;
Value v = value;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr, k, v, operationType]() { tr->atomicOp(k, v, operationType); }, &tr->deferredError);
}
@ -287,14 +291,14 @@ void ThreadSafeTransaction::set(const KeyRef& key, const ValueRef& value) {
Key k = key;
Value v = value;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr, k, v]() { tr->set(k, v); }, &tr->deferredError);
}
void ThreadSafeTransaction::clear(const KeyRangeRef& range) {
KeyRange r = range;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr, r]() { tr->clear(r); }, &tr->deferredError);
}
@ -302,7 +306,7 @@ void ThreadSafeTransaction::clear(const KeyRef& begin, const KeyRef& end) {
Key b = begin;
Key e = end;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid(
[tr, b, e]() {
if (b > e)
@ -316,14 +320,14 @@ void ThreadSafeTransaction::clear(const KeyRef& begin, const KeyRef& end) {
void ThreadSafeTransaction::clear(const KeyRef& key) {
Key k = key;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr, k]() { tr->clear(k); }, &tr->deferredError);
}
ThreadFuture<Void> ThreadSafeTransaction::watch(const KeyRef& key) {
Key k = key;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, k]() -> Future<Void> {
tr->checkDeferredError();
return tr->watch(k);
@ -333,12 +337,12 @@ ThreadFuture<Void> ThreadSafeTransaction::watch(const KeyRef& key) {
void ThreadSafeTransaction::addWriteConflictRange(const KeyRangeRef& keys) {
KeyRange r = keys;
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr, r]() { tr->addWriteConflictRange(r); }, &tr->deferredError);
}
ThreadFuture<Void> ThreadSafeTransaction::commit() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr]() -> Future<Void> {
tr->checkDeferredError();
return tr->commit();
@ -351,12 +355,12 @@ Version ThreadSafeTransaction::getCommittedVersion() {
}
ThreadFuture<int64_t> ThreadSafeTransaction::getApproximateSize() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr]() -> Future<int64_t> { return tr->getApproximateSize(); });
}
ThreadFuture<Standalone<StringRef>> ThreadSafeTransaction::getVersionstamp() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr]() -> Future<Standalone<StringRef>> { return tr->getVersionstamp(); });
}
@ -366,8 +370,7 @@ void ThreadSafeTransaction::setOption(FDBTransactionOptions::Option option, Opti
TraceEvent("UnknownTransactionOption").detail("Option", option);
throw invalid_option();
}
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
Standalone<Optional<StringRef>> passValue = value;
// ThreadSafeTransaction is not allowed to do anything with options except pass them through to RYW.
@ -375,7 +378,7 @@ void ThreadSafeTransaction::setOption(FDBTransactionOptions::Option option, Opti
}
ThreadFuture<Void> ThreadSafeTransaction::checkDeferredError() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr]() {
try {
tr->checkDeferredError();
@ -388,7 +391,7 @@ ThreadFuture<Void> ThreadSafeTransaction::checkDeferredError() {
}
ThreadFuture<Void> ThreadSafeTransaction::onError(Error const& e) {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
return onMainThread([tr, e]() { return tr->onError(e); });
}
@ -403,7 +406,7 @@ ThreadSafeTransaction::ThreadSafeTransaction(ThreadSafeTransaction&& r) noexcept
}
void ThreadSafeTransaction::reset() {
ReadYourWritesTransaction* tr = this->tr;
ISingleThreadTransaction* tr = this->tr;
onMainThreadVoid([tr]() { tr->reset(); }, nullptr);
}

View File

@ -26,6 +26,7 @@
#include "flow/ThreadHelper.actor.h"
#include "fdbclient/ClusterInterface.h"
#include "fdbclient/IClientApi.h"
#include "fdbclient/ISingleThreadTransaction.h"
// An implementation of IDatabase that serializes operations onto the network thread and interacts with the lower-level
// client APIs exposed by NativeAPI and ReadYourWrites.
@ -58,6 +59,7 @@ public:
private:
friend class ThreadSafeTransaction;
bool isConfigDB { false };
DatabaseContext* db;
public: // Internal use only
@ -67,10 +69,10 @@ public: // Internal use only
};
// An implementation of ITransaction that serializes operations onto the network thread and interacts with the
// lower-level client APIs exposed by NativeAPI and ReadYourWrites.
// lower-level client APIs exposed by ISingleThreadTransaction
class ThreadSafeTransaction : public ITransaction, ThreadSafeReferenceCounted<ThreadSafeTransaction>, NonCopyable {
public:
explicit ThreadSafeTransaction(DatabaseContext* cx);
explicit ThreadSafeTransaction(DatabaseContext* cx, ISingleThreadTransaction::Type type);
~ThreadSafeTransaction() override;
// Note: used while refactoring fdbcli, need to be removed later
@ -145,7 +147,7 @@ public:
void delref() override { ThreadSafeReferenceCounted<ThreadSafeTransaction>::delref(); }
private:
ReadYourWritesTransaction* tr;
ISingleThreadTransaction* tr;
};
// An implementation of IClientApi that serializes operations onto the network thread and interacts with the lower-level

View File

@ -195,6 +195,8 @@ description is not currently required but encouraged.
<Option name="transaction_bypass_unreadable" code="700"
description="Allows ``get`` operations to read from sections of keyspace that have become unreadable because of versionstamp operations. This sets the ``bypass_unreadable`` option of each transaction created by this database. See the transaction option description for more information."
defaultFor="1100"/>
<Option name="use_config_database" code="800"
description="Use configuration database." />
</Scope>
<Scope name="TransactionOption">

View File

@ -14,6 +14,7 @@ set(FDBRPC_SRCS
genericactors.actor.cpp
HealthMonitor.actor.cpp
IAsyncFile.actor.cpp
LoadBalance.actor.cpp
LoadBalance.actor.h
Locality.cpp
Net2FileSystem.cpp

View File

@ -69,7 +69,7 @@ Future<Void> IFailureMonitor::onFailedFor(Endpoint const& endpoint, double susta
return waitForContinuousFailure(this, endpoint, sustainedFailureDuration, slope);
}
SimpleFailureMonitor::SimpleFailureMonitor() : endpointKnownFailed() {
SimpleFailureMonitor::SimpleFailureMonitor() {
// Mark ourselves as available in FailureMonitor
const auto& localAddresses = FlowTransport::transport().getLocalAddresses();
addressStatus[localAddresses.address] = FailureStatus(false);
@ -126,13 +126,20 @@ void SimpleFailureMonitor::endpointNotFound(Endpoint const& endpoint) {
.suppressFor(1.0)
.detail("Address", endpoint.getPrimaryAddress())
.detail("Token", endpoint.token);
failedEndpoints.insert(endpoint);
if (endpoint.getPrimaryAddress().isPublic()) {
if (failedEndpoints.size() > 100000) {
TraceEvent(SevWarnAlways, "TooManyFailedEndpoints").suppressFor(1.0);
failedEndpoints.clear();
}
failedEndpoints.insert(endpoint);
}
endpointKnownFailed.trigger(endpoint);
}
void SimpleFailureMonitor::notifyDisconnect(NetworkAddress const& address) {
//TraceEvent("NotifyDisconnect").detail("Address", address);
endpointKnownFailed.triggerRange(Endpoint({ address }, UID()), Endpoint({ address }, UID(-1, -1)));
disconnectTriggers.trigger(address);
}
Future<Void> SimpleFailureMonitor::onDisconnectOrFailure(Endpoint const& endpoint) {
@ -149,6 +156,10 @@ Future<Void> SimpleFailureMonitor::onDisconnectOrFailure(Endpoint const& endpoin
return endpointKnownFailed.onChange(endpoint);
}
Future<Void> SimpleFailureMonitor::onDisconnect(NetworkAddress const& address) {
return disconnectTriggers.onChange(address);
}
Future<Void> SimpleFailureMonitor::onStateChanged(Endpoint const& endpoint) {
// Wait on endpointKnownFailed if it is false, to pick up both endpointNotFound errors (which set it to true)
// and changes to addressStatus (which trigger a range). Don't wait on endpointKnownFailed if it is true, because

View File

@ -98,9 +98,12 @@ public:
// The next time the known status for the endpoint changes, returns the new status.
virtual Future<Void> onStateChanged(Endpoint const& endpoint) = 0;
// Returns when onFailed(endpoint) || transport().onDisconnect( endpoint.getPrimaryAddress() ), but more efficiently
// Returns when onFailed(endpoint) || transport().onDisconnect( endpoint.getPrimaryAddress() )
virtual Future<Void> onDisconnectOrFailure(Endpoint const& endpoint) = 0;
// Returns when transport().onDisconnect( address )
virtual Future<Void> onDisconnect(NetworkAddress const& address) = 0;
// Returns true if the endpoint is failed but the address of the endpoint is not failed.
virtual bool onlyEndpointFailed(Endpoint const& endpoint) const = 0;
@ -147,6 +150,7 @@ public:
FailureStatus getState(Endpoint const& endpoint) const override;
FailureStatus getState(NetworkAddress const& address) const override;
Future<Void> onDisconnectOrFailure(Endpoint const& endpoint) override;
Future<Void> onDisconnect(NetworkAddress const& address) override;
bool onlyEndpointFailed(Endpoint const& endpoint) const override;
bool permanentlyFailed(Endpoint const& endpoint) const override;
@ -155,6 +159,7 @@ public:
private:
std::unordered_map<NetworkAddress, FailureStatus> addressStatus;
YieldedAsyncMap<Endpoint, bool> endpointKnownFailed;
YieldedAsyncMap<NetworkAddress, bool> disconnectTriggers;
std::unordered_set<Endpoint> failedEndpoints;
friend class OnStateChangedActorActor;

View File

@ -1581,26 +1581,18 @@ TEST_CASE("/flow/flow/FlowMutex") {
}
error = e;
// Wait for all actors still running to finish their waits and try to take the mutex
// Some actors can still be running, waiting while locked or unlocked,
// but all should become ready, some with errors.
state int i;
if (verbose) {
printf("Waiting for completions\n");
printf("Waiting for completions. Future end states:\n");
}
wait(delay(2 * mutexTestDelay));
if (verbose) {
printf("Future end states:\n");
}
// All futures should be ready, some with errors.
bool allReady = true;
for (int i = 0; i < tests.size(); ++i) {
auto f = tests[i];
for (i = 0; i < tests.size(); ++i) {
ErrorOr<Void> f = wait(errorOr(tests[i]));
if (verbose) {
printf(
" %d: %s\n", i, f.isReady() ? (f.isError() ? f.getError().what() : "done") : "not ready");
printf(" %d: %s\n", i, f.isError() ? f.getError().what() : "done");
}
allReady = allReady && f.isReady();
}
ASSERT(allReady);
}
// If an error was caused, one should have been detected.

View File

@ -51,7 +51,7 @@ constexpr UID WLTOKEN_PING_PACKET(-1, 1);
constexpr int PACKET_LEN_WIDTH = sizeof(uint32_t);
const uint64_t TOKEN_STREAM_FLAG = 1;
const int WLTOKEN_COUNTS = 12; // number of wellKnownEndpoints
static constexpr int WLTOKEN_COUNTS = 20; // number of wellKnownEndpoints
class EndpointMap : NonCopyable {
public:
@ -158,7 +158,7 @@ const Endpoint& EndpointMap::insert(NetworkAddressList localAddresses,
NetworkMessageReceiver* EndpointMap::get(Endpoint::Token const& token) {
uint32_t index = token.second();
if (index < wellKnownEndpointCount && data[index].receiver == nullptr) {
TraceEvent(SevWarnAlways, "WellKnownEndpointNotAdded").detail("Token", token);
TraceEvent(SevWarnAlways, "WellKnownEndpointNotAdded").detail("Token", token).detail("Index", index).backtrace();
}
if (index < data.size() && data[index].token().first() == token.first() &&
((data[index].token().second() & 0xffffffff00000000LL) | index) == token.second())
@ -199,8 +199,9 @@ struct EndpointNotFoundReceiver final : NetworkMessageReceiver {
void receive(ArenaObjectReader& reader) override {
// Remote machine tells us it doesn't have endpoint e
Endpoint e;
reader.deserialize(e);
UID token;
reader.deserialize(token);
Endpoint e = FlowTransport::transport().loadedEndpoint(token);
IFailureMonitor::failureMonitor().endpointNotFound(e);
}
};
@ -624,7 +625,6 @@ ACTOR Future<Void> connectionKeeper(Reference<Peer> self,
IFailureMonitor::failureMonitor().getState(self->destination).isAvailable() ? "OK"
: "FAILED");
++self->connectOutgoingCount;
try {
choose {
when(Reference<IConnection> _conn =
@ -957,13 +957,13 @@ ACTOR static void deliver(TransportData* self,
if (destination.token.first() != -1) {
if (self->isLocalAddress(destination.getPrimaryAddress())) {
sendLocal(self,
SerializeSource<Endpoint>(Endpoint(self->localAddresses, destination.token)),
SerializeSource<UID>(destination.token),
Endpoint(destination.addresses, WLTOKEN_ENDPOINT_NOT_FOUND));
} else {
Reference<Peer> peer = self->getOrOpenPeer(destination.getPrimaryAddress());
sendPacket(self,
peer,
SerializeSource<Endpoint>(Endpoint(self->localAddresses, destination.token)),
SerializeSource<UID>(destination.token),
Endpoint(destination.addresses, WLTOKEN_ENDPOINT_NOT_FOUND),
false);
}
@ -1476,7 +1476,7 @@ Endpoint FlowTransport::loadedEndpoint(const UID& token) {
}
void FlowTransport::addPeerReference(const Endpoint& endpoint, bool isStream) {
if (!isStream || !endpoint.getPrimaryAddress().isValid())
if (!isStream || !endpoint.getPrimaryAddress().isValid() || !endpoint.getPrimaryAddress().isPublic())
return;
Reference<Peer> peer = self->getOrOpenPeer(endpoint.getPrimaryAddress());
@ -1488,7 +1488,7 @@ void FlowTransport::addPeerReference(const Endpoint& endpoint, bool isStream) {
}
void FlowTransport::removePeerReference(const Endpoint& endpoint, bool isStream) {
if (!isStream || !endpoint.getPrimaryAddress().isValid())
if (!isStream || !endpoint.getPrimaryAddress().isValid() || !endpoint.getPrimaryAddress().isPublic())
return;
Reference<Peer> peer = self->getPeer(endpoint.getPrimaryAddress());
if (peer) {
@ -1723,4 +1723,4 @@ void FlowTransport::createInstance(bool isClient, uint64_t transportId) {
HealthMonitor* FlowTransport::healthMonitor() {
return &self->healthMonitor;
}
}

View File

@ -64,7 +64,7 @@ public:
NetworkAddress getStableAddress() const { return addresses.getTLSAddress(); }
Endpoint getAdjustedEndpoint(uint32_t index) {
Endpoint getAdjustedEndpoint(uint32_t index) const {
uint32_t newIndex = token.second();
newIndex += index;
return Endpoint(

View File

@ -0,0 +1,52 @@
/*
* LoadBalance.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2021 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "flow/flow.h"
#include "flow/actorcompiler.h" // This must be the last #include.
// Throwing all_alternatives_failed will cause the client to issue a GetKeyLocationRequest to the proxy, so this actor
// attempts to limit the number of these errors thrown by a single client to prevent it from saturating the proxies with
// these requests
ACTOR Future<Void> allAlternativesFailedDelay(Future<Void> okFuture) {
if (now() - g_network->networkInfo.newestAlternativesFailure > FLOW_KNOBS->ALTERNATIVES_FAILURE_RESET_TIME) {
g_network->networkInfo.oldestAlternativesFailure = now();
}
double delay = FLOW_KNOBS->ALTERNATIVES_FAILURE_MIN_DELAY;
if (now() - g_network->networkInfo.lastAlternativesFailureSkipDelay > FLOW_KNOBS->ALTERNATIVES_FAILURE_SKIP_DELAY) {
g_network->networkInfo.lastAlternativesFailureSkipDelay = now();
} else {
double elapsed = now() - g_network->networkInfo.oldestAlternativesFailure;
delay = std::max(delay,
std::min(elapsed * FLOW_KNOBS->ALTERNATIVES_FAILURE_DELAY_RATIO,
FLOW_KNOBS->ALTERNATIVES_FAILURE_MAX_DELAY));
delay = std::max(delay,
std::min(elapsed * FLOW_KNOBS->ALTERNATIVES_FAILURE_SLOW_DELAY_RATIO,
FLOW_KNOBS->ALTERNATIVES_FAILURE_SLOW_MAX_DELAY));
}
g_network->networkInfo.newestAlternativesFailure = now();
choose {
when(wait(okFuture)) {}
when(wait(::delayJittered(delay))) { throw all_alternatives_failed(); }
}
return Void();
}

View File

@ -42,6 +42,8 @@
using std::vector;
ACTOR Future<Void> allAlternativesFailedDelay(Future<Void> okFuture);
struct ModelHolder : NonCopyable, public ReferenceCounted<ModelHolder> {
QueueModel* model;
bool released;
@ -527,43 +529,17 @@ Future<REPLY_TYPE(Request)> loadBalance(
FailureStatus(false));
}
Future<Void> okFuture = quorum(ok, 1);
if (!alternatives->alwaysFresh()) {
if (now() - g_network->networkInfo.newestAlternativesFailure >
FLOW_KNOBS->ALTERNATIVES_FAILURE_RESET_TIME) {
g_network->networkInfo.oldestAlternativesFailure = now();
}
double delay = FLOW_KNOBS->ALTERNATIVES_FAILURE_MIN_DELAY;
if (now() - g_network->networkInfo.lastAlternativesFailureSkipDelay >
FLOW_KNOBS->ALTERNATIVES_FAILURE_SKIP_DELAY) {
g_network->networkInfo.lastAlternativesFailureSkipDelay = now();
} else {
double elapsed = now() - g_network->networkInfo.oldestAlternativesFailure;
delay = std::max(delay,
std::min(elapsed * FLOW_KNOBS->ALTERNATIVES_FAILURE_DELAY_RATIO,
FLOW_KNOBS->ALTERNATIVES_FAILURE_MAX_DELAY));
delay = std::max(delay,
std::min(elapsed * FLOW_KNOBS->ALTERNATIVES_FAILURE_SLOW_DELAY_RATIO,
FLOW_KNOBS->ALTERNATIVES_FAILURE_SLOW_MAX_DELAY));
}
// Making this SevWarn means a lot of clutter
if (now() - g_network->networkInfo.newestAlternativesFailure > 1 ||
deterministicRandom()->random01() < 0.01) {
TraceEvent("AllAlternativesFailed")
.detail("Interval", FLOW_KNOBS->CACHE_REFRESH_INTERVAL_WHEN_ALL_ALTERNATIVES_FAILED)
.detail("Alternatives", alternatives->description())
.detail("Delay", delay);
}
g_network->networkInfo.newestAlternativesFailure = now();
choose {
when(wait(quorum(ok, 1))) {}
when(wait(::delayJittered(delay))) { throw all_alternatives_failed(); }
TraceEvent("AllAlternativesFailed").detail("Alternatives", alternatives->description());
}
wait(allAlternativesFailedDelay(okFuture));
} else {
wait(quorum(ok, 1));
wait(okFuture);
}
numAttempts = 0; // now that we've got a server back, reset the backoff

View File

@ -79,6 +79,8 @@ struct FlowReceiver : public NetworkMessageReceiver {
FlowTransport::transport().addWellKnownEndpoint(endpoint, this, taskID);
}
const Endpoint& getRawEndpoint() { return endpoint; }
private:
Optional<PeerCompatibilityPolicy> peerCompatibilityPolicy_;
Endpoint endpoint;
@ -251,6 +253,319 @@ void setReplyPriority(const ReplyPromise<Reply>& p, TaskPriority taskID) {
p.getEndpoint(taskID);
}
struct ReplyPromiseStreamReply {
Optional<UID> acknowledgeToken;
ReplyPromiseStreamReply() {}
};
struct AcknowledgementReply {
constexpr static FileIdentifier file_identifier = 1389929;
int64_t bytes;
AcknowledgementReply() : bytes(0) {}
explicit AcknowledgementReply(int64_t bytes) : bytes(bytes) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, bytes);
}
};
// Registered on the server to recieve acknowledgements that the client has received stream data. This prevents the
// server from sending too much data to the client if the client is not consuming it.
struct AcknowledgementReceiver final : FlowReceiver, FastAllocated<AcknowledgementReceiver> {
using FastAllocated<AcknowledgementReceiver>::operator new;
using FastAllocated<AcknowledgementReceiver>::operator delete;
int64_t bytesSent;
int64_t bytesAcknowledged;
int64_t bytesLimit;
Promise<Void> ready;
Future<Void> failures;
AcknowledgementReceiver() : bytesSent(0), bytesAcknowledged(0), bytesLimit(0), ready(nullptr) {}
AcknowledgementReceiver(const Endpoint& remoteEndpoint)
: FlowReceiver(remoteEndpoint, false), bytesSent(0), bytesAcknowledged(0), bytesLimit(0), ready(nullptr) {}
void receive(ArenaObjectReader& reader) override {
ErrorOr<AcknowledgementReply> message;
reader.deserialize(message);
if (message.isError()) {
// The client will send an operation_obsolete error on the acknowledgement stream when it cancels the
// ReplyPromiseStream
if (!ready.isValid()) {
ready = Promise<Void>();
}
ready.sendError(message.getError());
} else {
ASSERT(message.get().bytes > bytesAcknowledged);
bytesAcknowledged = message.get().bytes;
if (ready.isValid() && bytesSent - bytesAcknowledged < bytesLimit) {
Promise<Void> hold = ready;
ready = Promise<Void>(nullptr);
// Sending to this promise could cause the ready to be replaced, so we need to hold a local copy
hold.send(Void());
}
}
}
};
// A version of NetNotifiedQueue which adds support for acknowledgments.
template <class T>
struct NetNotifiedQueueWithAcknowledgements final : NotifiedQueue<T>,
FlowReceiver,
FastAllocated<NetNotifiedQueueWithAcknowledgements<T>> {
using FastAllocated<NetNotifiedQueueWithAcknowledgements<T>>::operator new;
using FastAllocated<NetNotifiedQueueWithAcknowledgements<T>>::operator delete;
AcknowledgementReceiver acknowledgements;
Endpoint requestStreamEndpoint;
bool sentError = false;
NetNotifiedQueueWithAcknowledgements(int futures, int promises) : NotifiedQueue<T>(futures, promises) {}
NetNotifiedQueueWithAcknowledgements(int futures, int promises, const Endpoint& remoteEndpoint)
: NotifiedQueue<T>(futures, promises), FlowReceiver(remoteEndpoint, true) {
// A ReplyPromiseStream will be terminated on the server side if the network connection with the client breaks
acknowledgements.failures = tagError<Void>(
makeDependent<T>(IFailureMonitor::failureMonitor()).onDisconnect(remoteEndpoint.getPrimaryAddress()),
operation_obsolete());
}
void destroy() override { delete this; }
void receive(ArenaObjectReader& reader) override {
this->addPromiseRef();
ErrorOr<EnsureTable<T>> message;
reader.deserialize(message);
if (message.isError()) {
if (message.getError().code() == error_code_broken_promise) {
ASSERT(requestStreamEndpoint.isValid());
// We will get a broken_promise on the client side only if the ReplyPromiseStream was cancelled without
// sending an error. In this case the storage server actor must have been cancelled so future
// GetKeyValuesStream requests on the same endpoint will fail
IFailureMonitor::failureMonitor().endpointNotFound(requestStreamEndpoint);
}
this->sendError(message.getError());
} else {
if (message.get().asUnderlyingType().acknowledgeToken.present()) {
acknowledgements = AcknowledgementReceiver(
FlowTransport::transport().loadedEndpoint(message.get().asUnderlyingType().acknowledgeToken.get()));
}
if (this->shouldFireImmediately()) {
// This message is going to be consumed by the client immediately (and therefore will not call pop()) so
// send an ack immediately
if (acknowledgements.getRawEndpoint().isValid()) {
acknowledgements.bytesAcknowledged += message.get().asUnderlyingType().expectedSize();
FlowTransport::transport().sendUnreliable(
SerializeSource<ErrorOr<AcknowledgementReply>>(
AcknowledgementReply(acknowledgements.bytesAcknowledged)),
acknowledgements.getEndpoint(TaskPriority::ReadSocket),
false);
}
}
this->send(std::move(message.get().asUnderlyingType()));
}
this->delPromiseRef();
}
T pop() override {
T res = this->popImpl();
// A reply that has been queued up is being consumed, so send an ack to the server
if (acknowledgements.getRawEndpoint().isValid()) {
acknowledgements.bytesAcknowledged += res.expectedSize();
FlowTransport::transport().sendUnreliable(SerializeSource<ErrorOr<AcknowledgementReply>>(
AcknowledgementReply(acknowledgements.bytesAcknowledged)),
acknowledgements.getEndpoint(TaskPriority::ReadSocket),
false);
}
return res;
}
~NetNotifiedQueueWithAcknowledgements() {
if (acknowledgements.getRawEndpoint().isValid() && acknowledgements.isRemoteEndpoint() && !this->hasError()) {
// Notify the server that a client is not using this ReplyPromiseStream anymore
FlowTransport::transport().sendUnreliable(
SerializeSource<ErrorOr<AcknowledgementReply>>(operation_obsolete()),
acknowledgements.getEndpoint(TaskPriority::ReadSocket),
false);
}
if (isRemoteEndpoint() && !sentError && !acknowledgements.failures.isReady()) {
// The ReplyPromiseStream was cancelled before sending an error, so the storage server must have died
FlowTransport::transport().sendUnreliable(SerializeSource<ErrorOr<EnsureTable<T>>>(broken_promise()),
getEndpoint(TaskPriority::ReadSocket),
false);
}
}
bool isStream() const override { return true; }
};
template <class T>
class ReplyPromiseStream {
public:
// The endpoints of a ReplyPromiseStream must be initialized at Task::ReadSocket, because with lower priorities a
// delay(0) in FlowTransport deliver can cause out of order delivery.
// stream.send( request )
// Unreliable at most once delivery: Delivers request unless there is a connection failure (zero or one times)
template <class U>
void send(U&& value) const {
if (queue->isRemoteEndpoint()) {
if (!queue->acknowledgements.getRawEndpoint().isValid()) {
value.acknowledgeToken = queue->acknowledgements.getEndpoint(TaskPriority::ReadSocket).token;
}
queue->acknowledgements.bytesSent += value.expectedSize();
FlowTransport::transport().sendUnreliable(
SerializeSource<ErrorOr<EnsureTable<T>>>(value), getEndpoint(), false);
} else {
queue->send(std::forward<U>(value));
}
}
template <class E>
void sendError(const E& exc) const {
if (queue->isRemoteEndpoint() && !queue->sentError) {
queue->sentError = true;
FlowTransport::transport().sendUnreliable(
SerializeSource<ErrorOr<EnsureTable<T>>>(exc), getEndpoint(), false);
} else {
queue->sendError(exc);
if (errors && errors->canBeSet()) {
errors->sendError(exc);
}
}
}
FutureStream<T> getFuture() const {
queue->addFutureRef();
return FutureStream<T>(queue);
}
ReplyPromiseStream() : queue(new NetNotifiedQueueWithAcknowledgements<T>(0, 1)), errors(new SAV<Void>(0, 1)) {}
ReplyPromiseStream(const ReplyPromiseStream& rhs) : queue(rhs.queue), errors(rhs.errors) {
queue->addPromiseRef();
if (errors) {
errors->addPromiseRef();
}
}
ReplyPromiseStream(ReplyPromiseStream&& rhs) noexcept : queue(rhs.queue), errors(rhs.errors) {
rhs.queue = nullptr;
rhs.errors = nullptr;
}
explicit ReplyPromiseStream(const Endpoint& endpoint)
: queue(new NetNotifiedQueueWithAcknowledgements<T>(0, 1, endpoint)), errors(nullptr) {}
// Used by endStreamOnDisconnect to detect when all references to the ReplyPromiseStream have been dropped
Future<Void> getErrorFutureAndDelPromiseRef() {
ASSERT(errors && errors->getPromiseReferenceCount() > 1);
errors->addFutureRef();
errors->delPromiseRef();
Future<Void> res(errors);
errors = nullptr;
return res;
}
void setRequestStreamEndpoint(const Endpoint& endpoint) { queue->requestStreamEndpoint = endpoint; }
~ReplyPromiseStream() {
if (queue)
queue->delPromiseRef();
if (errors)
errors->delPromiseRef();
}
const Endpoint& getEndpoint() const { return queue->getEndpoint(TaskPriority::ReadSocket); }
bool operator==(const ReplyPromiseStream<T>& rhs) const { return queue == rhs.queue; }
bool isEmpty() const { return !queue->isReady(); }
uint32_t size() const { return queue->size(); }
// Must be called on the server before sending results on the stream to ratelimit the amount of data outstanding to
// the client
Future<Void> onReady() {
ASSERT(queue->acknowledgements.bytesLimit > 0);
if (queue->acknowledgements.failures.isError()) {
return queue->acknowledgements.failures.getError();
}
if (queue->acknowledgements.ready.isValid() && queue->acknowledgements.ready.isSet()) {
return queue->acknowledgements.ready.getFuture().getError();
}
if (queue->acknowledgements.bytesSent - queue->acknowledgements.bytesAcknowledged <
queue->acknowledgements.bytesLimit) {
return Void();
}
if (!queue->acknowledgements.ready.isValid()) {
queue->acknowledgements.ready = Promise<Void>();
}
return queue->acknowledgements.ready.getFuture() || queue->acknowledgements.failures;
}
// Must be called on the server before using a ReplyPromiseStream to limit the amount of outstanding bytes to the
// client
void setByteLimit(int64_t byteLimit) { queue->acknowledgements.bytesLimit = byteLimit; }
void operator=(const ReplyPromiseStream& rhs) {
rhs.queue->addPromiseRef();
if (queue)
queue->delPromiseRef();
queue = rhs.queue;
if (rhs.errors)
rhs.errors->addPromiseRef();
if (errors)
errors->delPromiseRef();
errors = rhs.errors;
}
void operator=(ReplyPromiseStream&& rhs) noexcept {
if (queue != rhs.queue) {
if (queue)
queue->delPromiseRef();
queue = rhs.queue;
rhs.queue = 0;
}
if (errors != rhs.errors) {
if (errors)
errors->delPromiseRef();
errors = rhs.errors;
rhs.errors = 0;
}
}
private:
NetNotifiedQueueWithAcknowledgements<T>* queue;
SAV<Void>* errors;
};
template <class Ar, class T>
void save(Ar& ar, const ReplyPromiseStream<T>& value) {
auto const& ep = value.getEndpoint().token;
ar << ep;
}
template <class Ar, class T>
void load(Ar& ar, ReplyPromiseStream<T>& value) {
UID token;
ar >> token;
Endpoint endpoint = FlowTransport::transport().loadedEndpoint(token);
value = ReplyPromiseStream<T>(endpoint);
}
template <class T>
struct serializable_traits<ReplyPromiseStream<T>> : std::true_type {
template <class Archiver>
static void serialize(Archiver& ar, ReplyPromiseStream<T>& p) {
if constexpr (Archiver::isDeserializing) {
UID token;
serializer(ar, token);
auto endpoint = FlowTransport::transport().loadedEndpoint(token);
p = ReplyPromiseStream<T>(endpoint);
} else {
const auto& ep = p.getEndpoint().token;
serializer(ar, ep);
}
}
};
template <class T>
struct NetNotifiedQueue final : NotifiedQueue<T>, FlowReceiver, FastAllocated<NetNotifiedQueue<T>> {
using FastAllocated<NetNotifiedQueue<T>>::operator new;
@ -366,6 +681,30 @@ public:
}
}
// stream.getReplyStream( request )
// Unreliable at most once delivery.
// Registers the request with the remote endpoint which sends back a stream of replies, followed by an
// end_of_stream error. If the connection is ever broken the remote endpoint will stop attempting to send replies.
// The caller sends acknowledgements to the remote endpoint so that at most 2MB of replies is ever inflight.
template <class X>
ReplyPromiseStream<REPLYSTREAM_TYPE(X)> getReplyStream(const X& value) const {
if (queue->isRemoteEndpoint()) {
Future<Void> disc =
makeDependent<T>(IFailureMonitor::failureMonitor()).onDisconnectOrFailure(getEndpoint());
auto& p = getReplyPromiseStream(value);
Reference<Peer> peer =
FlowTransport::transport().sendUnreliable(SerializeSource<T>(value), getEndpoint(), true);
// FIXME: defer sending the message until we know the connection is established
endStreamOnDisconnect(disc, p, getEndpoint(), peer);
return p;
} else {
send(value);
auto& p = getReplyPromiseStream(value);
return p;
}
}
// stream.getReplyUnlessFailedFor( request, double sustainedFailureDuration, double sustainedFailureSlope )
// Reliable at least once delivery: Like getReply, delivers request at least once and returns one of the replies.
// However, if
@ -435,7 +774,7 @@ public:
// queue = (NetNotifiedQueue<T>*)0xdeadbeef;
}
Endpoint getEndpoint(TaskPriority taskID = TaskPriority::DefaultEndpoint) const {
const Endpoint& getEndpoint(TaskPriority taskID = TaskPriority::DefaultEndpoint) const {
return queue->getEndpoint(taskID);
}
void makeWellKnownEndpoint(Endpoint::Token token, TaskPriority taskID) {

View File

@ -197,6 +197,25 @@ struct PeerHolder {
}
};
// Implements getRepyStream, this a void actor with the same lifetime as the input ReplyPromiseStream.
// Because this actor holds a reference to the stream, normally it would be impossible to know when there are no other
// references. To get around this, there is a SAV inside the stream that has one less promise reference than it should
// (caused by getErrorFutureAndDelPromiseRef()). When that SAV gets a broken promise because no one besides this void
// actor is referencing it, this void actor will get a broken_promise dropping the final reference to the full
// ReplyPromiseStream
ACTOR template <class X>
void endStreamOnDisconnect(Future<Void> signal,
ReplyPromiseStream<X> stream,
Endpoint endpoint,
Reference<Peer> peer = Reference<Peer>()) {
state PeerHolder holder = PeerHolder(peer);
stream.setRequestStreamEndpoint(endpoint);
choose {
when(wait(signal)) { stream.sendError(connection_failed()); }
when(wait(stream.getErrorFutureAndDelPromiseRef())) {}
}
}
// Implements tryGetReply, getReplyUnlessFailedFor
ACTOR template <class X>
Future<ErrorOr<X>> waitValueOrSignal(Future<X> value,
@ -224,7 +243,6 @@ Future<ErrorOr<X>> waitValueOrSignal(Future<X> value,
// receiving the failure signal
if (e.code() != error_code_broken_promise || signal.isError())
return ErrorOr<X>(e);
IFailureMonitor::failureMonitor().endpointNotFound(endpoint);
value = Never();
}

View File

@ -6,6 +6,11 @@ set(FDBSERVER_SRCS
BackupProgress.actor.h
BackupWorker.actor.cpp
ClusterController.actor.cpp
ConfigBroadcaster.actor.cpp
ConfigBroadcaster.h
ConfigDatabaseUnitTests.actor.cpp
ConfigFollowerInterface.cpp
ConfigFollowerInterface.h
ConflictSet.h
CoordinatedState.actor.cpp
CoordinatedState.h
@ -23,6 +28,10 @@ set(FDBSERVER_SRCS
FDBExecHelper.actor.cpp
FDBExecHelper.actor.h
GrvProxyServer.actor.cpp
IConfigDatabaseNode.cpp
IConfigDatabaseNode.h
IConfigConsumer.cpp
IConfigConsumer.h
IDiskQueue.h
IKeyValueContainer.h
IKeyValueStore.h
@ -31,12 +40,13 @@ set(FDBSERVER_SRCS
KeyValueStoreMemory.actor.cpp
KeyValueStoreRocksDB.actor.cpp
KeyValueStoreSQLite.actor.cpp
Knobs.cpp
Knobs.h
LatencyBandConfig.cpp
LatencyBandConfig.h
LeaderElection.actor.cpp
LeaderElection.h
LocalConfiguration.actor.cpp
LocalConfiguration.h
LogProtocolMessage.h
LogRouter.actor.cpp
LogSystem.h
@ -58,7 +68,13 @@ set(FDBSERVER_SRCS
OldTLogServer_4_6.actor.cpp
OldTLogServer_6_0.actor.cpp
OldTLogServer_6_2.actor.cpp
OnDemandStore.actor.cpp
OnDemandStore.h
Orderer.actor.h
PaxosConfigConsumer.actor.cpp
PaxosConfigConsumer.h
PaxosConfigDatabaseNode.actor.cpp
PaxosConfigDatabaseNode.h
ProxyCommitData.actor.h
pubsub.actor.cpp
pubsub.h
@ -88,6 +104,9 @@ set(FDBSERVER_SRCS
ResolverInterface.h
ServerDBInfo.actor.h
ServerDBInfo.h
SimpleConfigConsumer.actor.cpp
SimpleConfigConsumer.h
SimpleConfigDatabaseNode.actor.cpp
SimulatedCluster.actor.cpp
SimulatedCluster.h
SkipList.cpp
@ -160,6 +179,7 @@ set(FDBSERVER_SRCS
workloads/FileSystem.actor.cpp
workloads/Fuzz.cpp
workloads/FuzzApiCorrectness.actor.cpp
workloads/GetRangeStream.actor.cpp
workloads/HealthMetricsApi.actor.cpp
workloads/IncrementalBackup.actor.cpp
workloads/Increment.actor.cpp

View File

@ -31,6 +31,7 @@
#include "fdbserver/CoordinationInterface.h"
#include "fdbserver/DataDistributorInterface.h"
#include "fdbserver/Knobs.h"
#include "fdbserver/ConfigBroadcaster.h"
#include "fdbserver/MoveKeys.actor.h"
#include "fdbserver/WorkerInterface.actor.h"
#include "fdbserver/LeaderElection.h"
@ -2765,7 +2766,9 @@ public:
Counter registerMasterRequests;
Counter statusRequests;
ClusterControllerData(ClusterControllerFullInterface const& ccInterface, LocalityData const& locality)
ClusterControllerData(ClusterControllerFullInterface const& ccInterface,
LocalityData const& locality,
ServerCoordinators const& coordinators)
: clusterControllerProcessId(locality.processId()), clusterControllerDcId(locality.dcId()), id(ccInterface.id()),
ac(false), outstandingRequestChecker(Void()), outstandingRemoteRequestChecker(Void()), gotProcessClasses(false),
gotFullyRecoveredConfig(false), startTime(now()), goodRecruitmentTime(Never()),
@ -2853,6 +2856,7 @@ ACTOR Future<Void> clusterWatchDatabase(ClusterControllerData* cluster, ClusterC
dbInfo.distributor = db->serverInfo->get().distributor;
dbInfo.ratekeeper = db->serverInfo->get().ratekeeper;
dbInfo.latencyBandConfig = db->serverInfo->get().latencyBandConfig;
dbInfo.configBroadcaster = db->serverInfo->get().configBroadcaster;
TraceEvent("CCWDB", cluster->id)
.detail("Lifetime", dbInfo.masterLifetime.toString())
@ -3669,7 +3673,8 @@ ACTOR Future<Void> timeKeeper(ClusterControllerData* self) {
ACTOR Future<Void> statusServer(FutureStream<StatusRequest> requests,
ClusterControllerData* self,
ServerCoordinators coordinators) {
ServerCoordinators coordinators,
ConfigBroadcaster const* configBroadcaster) {
// Seconds since the END of the last GetStatus executed
state double last_request_time = 0.0;
@ -3736,7 +3741,8 @@ ACTOR Future<Void> statusServer(FutureStream<StatusRequest> requests,
&self->db.clientStatus,
coordinators,
incompatibleConnections,
self->datacenterVersionDifference)));
self->datacenterVersionDifference,
configBroadcaster)));
if (result.isError() && result.getError().code() == error_code_actor_cancelled)
throw result.getError();
@ -4424,15 +4430,23 @@ ACTOR Future<Void> dbInfoUpdater(ClusterControllerData* self) {
ACTOR Future<Void> clusterControllerCore(ClusterControllerFullInterface interf,
Future<Void> leaderFail,
ServerCoordinators coordinators,
LocalityData locality) {
state ClusterControllerData self(interf, locality);
LocalityData locality,
UseConfigDB useConfigDB) {
state ClusterControllerData self(interf, locality, coordinators);
state ConfigBroadcaster configBroadcaster(coordinators, useConfigDB);
state Future<Void> coordinationPingDelay = delay(SERVER_KNOBS->WORKER_COORDINATION_PING_DELAY);
state uint64_t step = 0;
state Future<ErrorOr<Void>> error = errorOr(actorCollection(self.addActor.getFuture()));
if (useConfigDB != UseConfigDB::DISABLED) {
self.addActor.send(configBroadcaster.serve(self.db.serverInfo->get().configBroadcaster));
}
self.addActor.send(clusterWatchDatabase(&self, &self.db)); // Start the master database
self.addActor.send(self.updateWorkerList.init(self.db.db));
self.addActor.send(statusServer(interf.clientInterface.databaseStatus.getFuture(), &self, coordinators));
self.addActor.send(statusServer(interf.clientInterface.databaseStatus.getFuture(),
&self,
coordinators,
(useConfigDB == UseConfigDB::DISABLED) ? nullptr : &configBroadcaster));
self.addActor.send(timeKeeper(&self));
self.addActor.send(monitorProcessClasses(&self));
self.addActor.send(monitorServerInfoConfig(&self.db));
@ -4550,7 +4564,8 @@ ACTOR Future<Void> clusterController(ServerCoordinators coordinators,
Reference<AsyncVar<Optional<ClusterControllerFullInterface>>> currentCC,
bool hasConnected,
Reference<AsyncVar<ClusterControllerPriorityInfo>> asyncPriorityInfo,
LocalityData locality) {
LocalityData locality,
UseConfigDB useConfigDB) {
loop {
state ClusterControllerFullInterface cci;
state bool inRole = false;
@ -4577,7 +4592,7 @@ ACTOR Future<Void> clusterController(ServerCoordinators coordinators,
startRole(Role::CLUSTER_CONTROLLER, cci.id(), UID());
inRole = true;
wait(clusterControllerCore(cci, leaderFail, coordinators, locality));
wait(clusterControllerCore(cci, leaderFail, coordinators, locality, useConfigDB));
}
} catch (Error& e) {
if (inRole)
@ -4600,13 +4615,14 @@ ACTOR Future<Void> clusterController(Reference<ClusterConnectionFile> connFile,
Reference<AsyncVar<Optional<ClusterControllerFullInterface>>> currentCC,
Reference<AsyncVar<ClusterControllerPriorityInfo>> asyncPriorityInfo,
Future<Void> recoveredDiskFiles,
LocalityData locality) {
LocalityData locality,
UseConfigDB useConfigDB) {
wait(recoveredDiskFiles);
state bool hasConnected = false;
loop {
try {
ServerCoordinators coordinators(connFile);
wait(clusterController(coordinators, currentCC, hasConnected, asyncPriorityInfo, locality));
wait(clusterController(coordinators, currentCC, hasConnected, asyncPriorityInfo, locality, useConfigDB));
} catch (Error& e) {
if (e.code() != error_code_coordinators_changed)
throw; // Expected to terminate fdbserver

View File

@ -0,0 +1,139 @@
/*
* ConfigBroadcastFollowerInterface.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbserver/ConfigFollowerInterface.h"
class ConfigClassSet {
std::set<Key> classes;
public:
static constexpr FileIdentifier file_identifier = 9854021;
bool operator==(ConfigClassSet const& rhs) const { return classes == rhs.classes; }
bool operator!=(ConfigClassSet const& rhs) const { return !(*this == rhs); }
ConfigClassSet() = default;
ConfigClassSet(VectorRef<KeyRef> configClasses) {
for (const auto& configClass : configClasses) {
classes.insert(configClass);
}
}
bool contains(KeyRef configClass) const { return classes.count(configClass); }
std::set<Key> const& getClasses() const { return classes; }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, classes);
}
};
template <>
struct Traceable<ConfigClassSet> : std::true_type {
static std::string toString(ConfigClassSet const& value) { return describe(value.getClasses()); }
};
struct ConfigBroadcastFollowerGetSnapshotReply {
static constexpr FileIdentifier file_identifier = 8701983;
Version version{ 0 };
std::map<ConfigKey, KnobValue> snapshot;
ConfigBroadcastFollowerGetSnapshotReply() = default;
template <class Snapshot>
explicit ConfigBroadcastFollowerGetSnapshotReply(Version version, Snapshot&& snapshot)
: version(version), snapshot(std::forward<Snapshot>(snapshot)) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version, snapshot);
}
};
struct ConfigBroadcastFollowerGetSnapshotRequest {
static constexpr FileIdentifier file_identifier = 10911924;
ConfigClassSet configClassSet;
ReplyPromise<ConfigBroadcastFollowerGetSnapshotReply> reply;
ConfigBroadcastFollowerGetSnapshotRequest() = default;
explicit ConfigBroadcastFollowerGetSnapshotRequest(ConfigClassSet const& configClassSet)
: configClassSet(configClassSet) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, configClassSet, reply);
}
};
struct ConfigBroadcastFollowerGetChangesReply {
static constexpr FileIdentifier file_identifier = 4014927;
Version mostRecentVersion;
Standalone<VectorRef<VersionedConfigMutationRef>> changes;
ConfigBroadcastFollowerGetChangesReply()=default;
explicit ConfigBroadcastFollowerGetChangesReply(Version mostRecentVersion, Standalone<VectorRef<VersionedConfigMutationRef>> const& changes)
: mostRecentVersion(mostRecentVersion), changes(changes) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, mostRecentVersion, changes);
}
};
struct ConfigBroadcastFollowerGetChangesRequest {
static constexpr FileIdentifier file_identifier = 601280;
Version lastSeenVersion;
ConfigClassSet configClassSet;
ReplyPromise<ConfigBroadcastFollowerGetChangesReply> reply;
ConfigBroadcastFollowerGetChangesRequest() = default;
explicit ConfigBroadcastFollowerGetChangesRequest(Version lastSeenVersion, ConfigClassSet const& configClassSet)
: lastSeenVersion(lastSeenVersion), configClassSet(configClassSet) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, lastSeenVersion, configClassSet, reply);
}
};
/*
* The ConfigBroadcaster serves a ConfigBroadcastFollowerInterface which all
* workers use to fetch updates.
*/
class ConfigBroadcastFollowerInterface {
UID _id;
public:
static constexpr FileIdentifier file_identifier = 1984391;
RequestStream<ConfigBroadcastFollowerGetSnapshotRequest> getSnapshot;
RequestStream<ConfigBroadcastFollowerGetChangesRequest> getChanges;
ConfigBroadcastFollowerInterface() : _id(deterministicRandom()->randomUniqueID()) {}
bool operator==(ConfigBroadcastFollowerInterface const& rhs) const { return (_id == rhs._id); }
bool operator!=(ConfigBroadcastFollowerInterface const& rhs) const { return !(*this == rhs); }
UID id() const { return _id; }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, _id, getSnapshot, getChanges);
}
};

View File

@ -0,0 +1,478 @@
/*
* ConfigBroadcaster.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <algorithm>
#include "fdbserver/ConfigBroadcaster.h"
#include "fdbserver/Knobs.h"
#include "fdbserver/IConfigConsumer.h"
#include "flow/UnitTest.h"
#include "flow/actorcompiler.h" // must be last include
namespace {
bool matchesConfigClass(ConfigClassSet const& configClassSet, Optional<KeyRef> configClass) {
return !configClass.present() || configClassSet.contains(configClass.get());
}
// Helper functions for STL containers, with flow-friendly error handling
template <class MapContainer, class K>
auto get(MapContainer& m, K const& k) -> decltype(m.at(k)) {
auto it = m.find(k);
ASSERT(it != m.end());
return it->second;
}
template <class Container, class K>
void remove(Container& container, K const& k) {
auto it = container.find(k);
ASSERT(it != container.end());
container.erase(it);
}
} // namespace
class ConfigBroadcasterImpl {
// PendingRequestStore stores a set of pending ConfigBroadcastFollowerGetChangesRequests,
// indexed by configuration class. When an update is received, replies are sent for all
// pending requests with affected configuration classes
class PendingRequestStore {
using Req = ConfigBroadcastFollowerGetChangesRequest;
std::map<Key, std::set<Endpoint::Token>> configClassToTokens;
std::map<Endpoint::Token, Req> tokenToRequest;
public:
void addRequest(Req const& req) {
auto token = req.reply.getEndpoint().token;
tokenToRequest[token] = req;
for (const auto& configClass : req.configClassSet.getClasses()) {
configClassToTokens[configClass].insert(token);
}
}
std::vector<Req> getRequestsToNotify(Standalone<VectorRef<VersionedConfigMutationRef>> const& changes) const {
std::set<Endpoint::Token> tokenSet;
for (const auto& change : changes) {
if (!change.mutation.getConfigClass().present()) {
// Update everything
for (const auto& [token, req] : tokenToRequest) {
if (req.lastSeenVersion < change.version) {
tokenSet.insert(token);
}
}
} else {
Key configClass = change.mutation.getConfigClass().get();
if (configClassToTokens.count(configClass)) {
auto tokens = get(configClassToTokens, Key(change.mutation.getConfigClass().get()));
for (const auto& token : tokens) {
auto req = get(tokenToRequest, token);
if (req.lastSeenVersion < change.version) {
tokenSet.insert(token);
} else {
TEST(true); // Worker is ahead of config broadcaster
}
}
}
}
}
std::vector<Req> result;
for (const auto& token : tokenSet) {
result.push_back(get(tokenToRequest, token));
}
return result;
}
std::vector<Req> getOutdatedRequests(Version newSnapshotVersion) {
std::vector<Req> result;
for (const auto& [token, req] : tokenToRequest) {
if (req.lastSeenVersion < newSnapshotVersion) {
result.push_back(req);
}
}
return result;
}
void removeRequest(Req const& req) {
auto token = req.reply.getEndpoint().token;
for (const auto& configClass : req.configClassSet.getClasses()) {
remove(get(configClassToTokens, configClass), token);
// TODO: Don't leak config classes
}
remove(tokenToRequest, token);
}
} pending;
std::map<ConfigKey, KnobValue> snapshot;
std::deque<VersionedConfigMutation> mutationHistory;
std::deque<VersionedConfigCommitAnnotation> annotationHistory;
Version lastCompactedVersion;
Version mostRecentVersion;
std::unique_ptr<IConfigConsumer> consumer;
ActorCollection actors{ false };
UID id;
CounterCollection cc;
Counter compactRequest;
mutable Counter successfulChangeRequest;
Counter failedChangeRequest;
Counter snapshotRequest;
Future<Void> logger;
template <class Changes>
void sendChangesReply(ConfigBroadcastFollowerGetChangesRequest const& req, Changes const& changes) const {
ASSERT_LT(req.lastSeenVersion, mostRecentVersion);
ConfigBroadcastFollowerGetChangesReply reply;
reply.mostRecentVersion = mostRecentVersion;
for (const auto& versionedMutation : changes) {
if (versionedMutation.version > req.lastSeenVersion &&
matchesConfigClass(req.configClassSet, versionedMutation.mutation.getConfigClass())) {
TraceEvent te(SevDebug, "ConfigBroadcasterSendingChangeMutation", id);
te.detail("Version", versionedMutation.version)
.detail("ReqLastSeenVersion", req.lastSeenVersion)
.detail("ConfigClass", versionedMutation.mutation.getConfigClass())
.detail("KnobName", versionedMutation.mutation.getKnobName());
if (versionedMutation.mutation.isSet()) {
te.detail("Op", "Set").detail("KnobValue", versionedMutation.mutation.getValue().toString());
} else {
te.detail("Op", "Clear");
}
reply.changes.push_back_deep(reply.changes.arena(), versionedMutation);
}
}
req.reply.send(reply);
++successfulChangeRequest;
}
ACTOR static Future<Void> serve(ConfigBroadcaster* self,
ConfigBroadcasterImpl* impl,
ConfigBroadcastFollowerInterface cbfi) {
impl->actors.add(impl->consumer->consume(*self));
loop {
choose {
when(ConfigBroadcastFollowerGetSnapshotRequest req = waitNext(cbfi.getSnapshot.getFuture())) {
++impl->snapshotRequest;
ConfigBroadcastFollowerGetSnapshotReply reply;
for (const auto& [key, value] : impl->snapshot) {
if (matchesConfigClass(req.configClassSet, key.configClass)) {
reply.snapshot[key] = value;
}
}
reply.version = impl->mostRecentVersion;
TraceEvent(SevDebug, "ConfigBroadcasterGotSnapshotRequest", impl->id)
.detail("Size", reply.snapshot.size())
.detail("Version", reply.version);
req.reply.send(reply);
}
when(ConfigBroadcastFollowerGetChangesRequest req = waitNext(cbfi.getChanges.getFuture())) {
if (req.lastSeenVersion < impl->lastCompactedVersion) {
req.reply.sendError(version_already_compacted());
++impl->failedChangeRequest;
continue;
}
if (req.lastSeenVersion < impl->mostRecentVersion) {
impl->sendChangesReply(req, impl->mutationHistory);
} else {
TEST(req.lastSeenVersion > impl->mostRecentVersion); // Worker is ahead of ConfigBroadcaster
TraceEvent(SevDebug, "ConfigBroadcasterRegisteringChangeRequest", impl->id)
.detail("Peer", req.reply.getEndpoint().getPrimaryAddress())
.detail("MostRecentVersion", impl->mostRecentVersion)
.detail("ReqLastSeenVersion", req.lastSeenVersion)
.detail("ConfigClass", req.configClassSet);
impl->pending.addRequest(req);
}
}
when(wait(impl->actors.getResult())) { ASSERT(false); }
}
}
}
ConfigBroadcasterImpl()
: id(deterministicRandom()->randomUniqueID()), lastCompactedVersion(0), mostRecentVersion(0),
cc("ConfigBroadcaster"), compactRequest("CompactRequest", cc),
successfulChangeRequest("SuccessfulChangeRequest", cc), failedChangeRequest("FailedChangeRequest", cc),
snapshotRequest("SnapshotRequest", cc) {
logger = traceCounters(
"ConfigBroadcasterMetrics", id, SERVER_KNOBS->WORKER_LOGGING_INTERVAL, &cc, "ConfigBroadcasterMetrics");
}
void notifyFollowers(Standalone<VectorRef<VersionedConfigMutationRef>> const& changes) {
auto toNotify = pending.getRequestsToNotify(changes);
TraceEvent(SevDebug, "ConfigBroadcasterNotifyingFollowers", id)
.detail("ChangesSize", changes.size())
.detail("ToNotify", toNotify.size());
for (auto& req : toNotify) {
sendChangesReply(req, changes);
pending.removeRequest(req);
}
}
void notifyOutdatedRequests() {
auto outdated = pending.getOutdatedRequests(mostRecentVersion);
for (auto& req : outdated) {
req.reply.sendError(version_already_compacted());
pending.removeRequest(req);
}
}
void addChanges(Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version mostRecentVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations) {
this->mostRecentVersion = mostRecentVersion;
mutationHistory.insert(mutationHistory.end(), changes.begin(), changes.end());
annotationHistory.insert(annotationHistory.end(), annotations.begin(), annotations.end());
for (const auto& change : changes) {
const auto& mutation = change.mutation;
if (mutation.isSet()) {
snapshot[mutation.getKey()] = mutation.getValue();
} else {
snapshot.erase(mutation.getKey());
}
}
}
template <class Snapshot>
Future<Void> setSnapshot(Snapshot&& snapshot, Version snapshotVersion) {
this->snapshot = std::forward<Snapshot>(snapshot);
this->lastCompactedVersion = snapshotVersion;
return Void();
}
public:
Future<Void> serve(ConfigBroadcaster* self, ConfigBroadcastFollowerInterface const& cbfi) {
return serve(self, this, cbfi);
}
void applyChanges(Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version mostRecentVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations) {
TraceEvent(SevDebug, "ConfigBroadcasterApplyingChanges", id)
.detail("ChangesSize", changes.size())
.detail("CurrentMostRecentVersion", this->mostRecentVersion)
.detail("NewMostRecentVersion", mostRecentVersion)
.detail("AnnotationsSize", annotations.size());
addChanges(changes, mostRecentVersion, annotations);
notifyFollowers(changes);
}
template <class Snapshot>
void applySnapshotAndChanges(Snapshot&& snapshot,
Version snapshotVersion,
Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version changesVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations) {
TraceEvent(SevDebug, "ConfigBroadcasterApplyingSnapshotAndChanges", id)
.detail("CurrentMostRecentVersion", this->mostRecentVersion)
.detail("SnapshotSize", snapshot.size())
.detail("SnapshotVersion", snapshotVersion)
.detail("ChangesSize", changes.size())
.detail("ChangesVersion", changesVersion)
.detail("AnnotationsSize", annotations.size());
setSnapshot(std::forward<Snapshot>(snapshot), snapshotVersion);
addChanges(changes, changesVersion, annotations);
notifyOutdatedRequests();
}
ConfigBroadcasterImpl(ConfigFollowerInterface const& cfi) : ConfigBroadcasterImpl() {
consumer = IConfigConsumer::createTestSimple(cfi, 0.5, Optional<double>{});
TraceEvent(SevDebug, "ConfigBroadcasterStartingConsumer", id).detail("Consumer", consumer->getID());
}
ConfigBroadcasterImpl(ServerCoordinators const& coordinators, UseConfigDB useConfigDB) : ConfigBroadcasterImpl() {
if (useConfigDB != UseConfigDB::DISABLED) {
if (useConfigDB == UseConfigDB::SIMPLE) {
consumer = IConfigConsumer::createSimple(coordinators, 0.5, Optional<double>{});
} else {
consumer = IConfigConsumer::createPaxos(coordinators, 0.5, Optional<double>{});
}
TraceEvent(SevDebug, "BroadcasterStartingConsumer", id)
.detail("Consumer", consumer->getID())
.detail("UsingSimpleConsumer", useConfigDB == UseConfigDB::SIMPLE);
}
}
JsonBuilderObject getStatus() const {
JsonBuilderObject result;
JsonBuilderArray mutationsArray;
for (const auto& versionedMutation : mutationHistory) {
JsonBuilderObject mutationObject;
mutationObject["version"] = versionedMutation.version;
const auto& mutation = versionedMutation.mutation;
mutationObject["config_class"] = mutation.getConfigClass().orDefault("<global>"_sr);
mutationObject["knob_name"] = mutation.getKnobName();
mutationObject["knob_value"] = mutation.getValue().toString();
mutationsArray.push_back(std::move(mutationObject));
}
result["mutations"] = std::move(mutationsArray);
JsonBuilderArray commitsArray;
for (const auto& versionedAnnotation : annotationHistory) {
JsonBuilderObject commitObject;
commitObject["version"] = versionedAnnotation.version;
commitObject["description"] = versionedAnnotation.annotation.description;
commitObject["timestamp"] = versionedAnnotation.annotation.timestamp;
commitsArray.push_back(std::move(commitObject));
}
result["commits"] = std::move(commitsArray);
JsonBuilderObject snapshotObject;
std::map<Optional<Key>, std::vector<std::pair<Key, Value>>> snapshotMap;
for (const auto& [configKey, value] : snapshot) {
snapshotMap[configKey.configClass.castTo<Key>()].emplace_back(configKey.knobName, value.toString());
}
for (const auto& [configClass, kvs] : snapshotMap) {
JsonBuilderObject kvsObject;
for (const auto& [knobName, knobValue] : kvs) {
kvsObject[knobName] = knobValue;
}
snapshotObject[configClass.orDefault("<global>"_sr)] = std::move(kvsObject);
}
result["snapshot"] = std::move(snapshotObject);
result["last_compacted_version"] = lastCompactedVersion;
result["most_recent_version"] = mostRecentVersion;
return result;
}
void compact(Version compactionVersion) {
{
auto it = std::find_if(mutationHistory.begin(), mutationHistory.end(), [compactionVersion](const auto& vm) {
return vm.version > compactionVersion;
});
mutationHistory.erase(mutationHistory.begin(), it);
}
{
auto it = std::find_if(annotationHistory.begin(),
annotationHistory.end(),
[compactionVersion](const auto& va) { return va.version > compactionVersion; });
annotationHistory.erase(annotationHistory.begin(), it);
}
}
UID getID() const { return id; }
static void runPendingRequestStoreTest(bool includeGlobalMutation, int expectedMatches);
};
ConfigBroadcaster::ConfigBroadcaster(ConfigFollowerInterface const& cfi)
: _impl(std::make_unique<ConfigBroadcasterImpl>(cfi)) {}
ConfigBroadcaster::ConfigBroadcaster(ServerCoordinators const& coordinators, UseConfigDB useConfigDB)
: _impl(std::make_unique<ConfigBroadcasterImpl>(coordinators, useConfigDB)) {}
ConfigBroadcaster::ConfigBroadcaster(ConfigBroadcaster&&) = default;
ConfigBroadcaster& ConfigBroadcaster::operator=(ConfigBroadcaster&&) = default;
ConfigBroadcaster::~ConfigBroadcaster() = default;
Future<Void> ConfigBroadcaster::serve(ConfigBroadcastFollowerInterface const& cbfi) {
return impl().serve(this, cbfi);
}
void ConfigBroadcaster::applyChanges(Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version mostRecentVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations) {
impl().applyChanges(changes, mostRecentVersion, annotations);
}
void ConfigBroadcaster::applySnapshotAndChanges(
std::map<ConfigKey, KnobValue> const& snapshot,
Version snapshotVersion,
Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version changesVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations) {
impl().applySnapshotAndChanges(snapshot, snapshotVersion, changes, changesVersion, annotations);
}
void ConfigBroadcaster::applySnapshotAndChanges(
std::map<ConfigKey, KnobValue>&& snapshot,
Version snapshotVersion,
Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version changesVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations) {
impl().applySnapshotAndChanges(std::move(snapshot), snapshotVersion, changes, changesVersion, annotations);
}
UID ConfigBroadcaster::getID() const {
return impl().getID();
}
JsonBuilderObject ConfigBroadcaster::getStatus() const {
return impl().getStatus();
}
void ConfigBroadcaster::compact(Version compactionVersion) {
impl().compact(compactionVersion);
}
namespace {
Standalone<VectorRef<VersionedConfigMutationRef>> getTestChanges(Version version, bool includeGlobalMutation) {
Standalone<VectorRef<VersionedConfigMutationRef>> changes;
if (includeGlobalMutation) {
ConfigKey key = ConfigKeyRef({}, "test_long"_sr);
auto value = KnobValue::create(int64_t{ 5 });
ConfigMutation mutation = ConfigMutationRef(key, value.contents());
changes.emplace_back_deep(changes.arena(), version, mutation);
}
{
ConfigKey key = ConfigKeyRef("class-A"_sr, "test_long"_sr);
auto value = KnobValue::create(int64_t{ 5 });
ConfigMutation mutation = ConfigMutationRef(key, value.contents());
changes.emplace_back_deep(changes.arena(), version, mutation);
}
return changes;
}
ConfigBroadcastFollowerGetChangesRequest getTestRequest(Version lastSeenVersion,
std::vector<KeyRef> const& configClasses) {
Standalone<VectorRef<KeyRef>> configClassesVector;
for (const auto& configClass : configClasses) {
configClassesVector.push_back_deep(configClassesVector.arena(), configClass);
}
return ConfigBroadcastFollowerGetChangesRequest{ lastSeenVersion, ConfigClassSet{ configClassesVector } };
}
} // namespace
void ConfigBroadcasterImpl::runPendingRequestStoreTest(bool includeGlobalMutation, int expectedMatches) {
PendingRequestStore pending;
for (Version v = 0; v < 5; ++v) {
pending.addRequest(getTestRequest(v, {}));
pending.addRequest(getTestRequest(v, { "class-A"_sr }));
pending.addRequest(getTestRequest(v, { "class-B"_sr }));
pending.addRequest(getTestRequest(v, { "class-A"_sr, "class-B"_sr }));
}
auto toNotify = pending.getRequestsToNotify(getTestChanges(0, includeGlobalMutation));
ASSERT_EQ(toNotify.size(), 0);
for (Version v = 1; v <= 5; ++v) {
auto toNotify = pending.getRequestsToNotify(getTestChanges(v, includeGlobalMutation));
ASSERT_EQ(toNotify.size(), expectedMatches);
for (const auto& req : toNotify) {
pending.removeRequest(req);
}
}
}
TEST_CASE("/fdbserver/ConfigDB/ConfigBroadcaster/Internal/PendingRequestStore/Simple") {
ConfigBroadcasterImpl::runPendingRequestStoreTest(false, 2);
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/ConfigBroadcaster/Internal/PendingRequestStore/GlobalMutation") {
ConfigBroadcasterImpl::runPendingRequestStoreTest(true, 4);
return Void();
}

View File

@ -0,0 +1,66 @@
/*
* ConfigBroadcaster.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/CoordinationInterface.h"
#include "fdbclient/JsonBuilder.h"
#include "fdbserver/CoordinationInterface.h"
#include "fdbserver/ConfigBroadcastFollowerInterface.h"
#include "fdbserver/ConfigFollowerInterface.h"
#include "flow/flow.h"
#include <memory>
/*
* The configuration broadcaster runs on the cluster controller. The broadcaster listens uses
* an IConfigConsumer instantitation to consume updates from the configuration database, and broadcasts
* these updates to all workers' local configurations
*/
class ConfigBroadcaster {
std::unique_ptr<class ConfigBroadcasterImpl> _impl;
ConfigBroadcasterImpl& impl() { return *_impl; }
ConfigBroadcasterImpl const& impl() const { return *_impl; }
public:
explicit ConfigBroadcaster(ServerCoordinators const&, UseConfigDB);
ConfigBroadcaster(ConfigBroadcaster&&);
ConfigBroadcaster& operator=(ConfigBroadcaster&&);
~ConfigBroadcaster();
Future<Void> serve(ConfigBroadcastFollowerInterface const&);
void applyChanges(Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version mostRecentVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations);
void applySnapshotAndChanges(std::map<ConfigKey, KnobValue> const& snapshot,
Version snapshotVersion,
Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version changesVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations);
void applySnapshotAndChanges(std::map<ConfigKey, KnobValue>&& snapshot,
Version snapshotVersion,
Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Version changesVersion,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations);
UID getID() const;
JsonBuilderObject getStatus() const;
void compact(Version compactionVersion);
public: // Testing
explicit ConfigBroadcaster(ConfigFollowerInterface const&);
};

View File

@ -0,0 +1,766 @@
/*
* ConfigDatabaseUnitTests.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/CoordinationInterface.h"
#include "fdbclient/IConfigTransaction.h"
#include "fdbclient/TestKnobCollection.h"
#include "fdbserver/ConfigBroadcaster.h"
#include "fdbserver/IConfigDatabaseNode.h"
#include "fdbserver/LocalConfiguration.h"
#include "fdbclient/Tuple.h"
#include "flow/UnitTest.h"
#include "flow/actorcompiler.h" // must be last include
namespace {
Key encodeConfigKey(Optional<KeyRef> configClass, KeyRef knobName) {
Tuple tuple;
if (configClass.present()) {
tuple.append(configClass.get());
} else {
tuple.appendNull();
}
tuple << knobName;
return tuple.pack();
}
void appendVersionedMutation(Standalone<VectorRef<VersionedConfigMutationRef>>& versionedMutations,
Version version,
Optional<KeyRef> configClass,
KeyRef knobName,
Optional<KnobValueRef> knobValue) {
auto configKey = ConfigKeyRef(configClass, knobName);
auto mutation = ConfigMutationRef(configKey, knobValue);
versionedMutations.emplace_back_deep(versionedMutations.arena(), version, mutation);
}
class WriteToTransactionEnvironment {
std::string dataDir;
ConfigTransactionInterface cti;
ConfigFollowerInterface cfi;
Reference<IConfigDatabaseNode> node;
Future<Void> ctiServer;
Future<Void> cfiServer;
Version lastWrittenVersion{ 0 };
static Value longToValue(int64_t v) {
auto s = format("%ld", v);
return StringRef(reinterpret_cast<uint8_t const*>(s.c_str()), s.size());
}
ACTOR template <class T>
static Future<Void> set(WriteToTransactionEnvironment* self,
Optional<KeyRef> configClass,
T value,
KeyRef knobName) {
state Reference<IConfigTransaction> tr = IConfigTransaction::createTestSimple(self->cti);
auto configKey = encodeConfigKey(configClass, knobName);
tr->set(configKey, longToValue(value));
wait(tr->commit());
self->lastWrittenVersion = tr->getCommittedVersion();
return Void();
}
ACTOR static Future<Void> clear(WriteToTransactionEnvironment* self, Optional<KeyRef> configClass) {
state Reference<IConfigTransaction> tr = IConfigTransaction::createTestSimple(self->cti);
auto configKey = encodeConfigKey(configClass, "test_long"_sr);
tr->clear(configKey);
wait(tr->commit());
self->lastWrittenVersion = tr->getCommittedVersion();
return Void();
}
void setup() {
ctiServer = node->serve(cti);
cfiServer = node->serve(cfi);
}
public:
WriteToTransactionEnvironment(std::string const& dataDir)
: dataDir(dataDir), node(IConfigDatabaseNode::createSimple(dataDir)) {
platform::eraseDirectoryRecursive(dataDir);
setup();
}
template <class T>
Future<Void> set(Optional<KeyRef> configClass, T value, KeyRef knobName = "test_long"_sr) {
return set(this, configClass, value, knobName);
}
Future<Void> clear(Optional<KeyRef> configClass) { return clear(this, configClass); }
Future<Void> compact() { return cfi.compact.getReply(ConfigFollowerCompactRequest{ lastWrittenVersion }); }
void restartNode() {
cfiServer.cancel();
ctiServer.cancel();
node = IConfigDatabaseNode::createSimple(dataDir);
setup();
}
ConfigTransactionInterface getTransactionInterface() const { return cti; }
ConfigFollowerInterface getFollowerInterface() const { return cfi; }
Future<Void> getError() const { return cfiServer || ctiServer; }
};
class ReadFromLocalConfigEnvironment {
UID id;
std::string dataDir;
LocalConfiguration localConfiguration;
Reference<IDependentAsyncVar<ConfigBroadcastFollowerInterface> const> cbfi;
Future<Void> consumer;
ACTOR static Future<Void> checkEventually(LocalConfiguration const* localConfiguration,
Optional<int64_t> expected) {
state double lastMismatchTime = now();
loop {
if (localConfiguration->getTestKnobs().TEST_LONG == expected.orDefault(0)) {
return Void();
}
if (now() > lastMismatchTime + 1.0) {
TraceEvent(SevWarn, "CheckEventuallyStillChecking")
.detail("Expected", expected.present() ? expected.get() : 0)
.detail("TestLong", localConfiguration->getTestKnobs().TEST_LONG);
lastMismatchTime = now();
}
wait(delayJittered(0.1));
}
}
ACTOR static Future<Void> setup(ReadFromLocalConfigEnvironment* self) {
wait(self->localConfiguration.initialize());
if (self->cbfi) {
self->consumer = self->localConfiguration.consume(self->cbfi);
}
return Void();
}
public:
ReadFromLocalConfigEnvironment(std::string const& dataDir,
std::string const& configPath,
std::map<std::string, std::string> const& manualKnobOverrides)
: dataDir(dataDir), localConfiguration(dataDir, configPath, manualKnobOverrides, IsTest::YES), consumer(Never()) {
}
Future<Void> setup() { return setup(this); }
Future<Void> restartLocalConfig(std::string const& newConfigPath) {
localConfiguration = LocalConfiguration(dataDir, newConfigPath, {}, IsTest::YES);
return setup();
}
void connectToBroadcaster(Reference<IDependentAsyncVar<ConfigBroadcastFollowerInterface> const> const& cbfi) {
ASSERT(!this->cbfi);
this->cbfi = cbfi;
consumer = localConfiguration.consume(cbfi);
}
void checkImmediate(Optional<int64_t> expected) const {
if (expected.present()) {
ASSERT_EQ(localConfiguration.getTestKnobs().TEST_LONG, expected.get());
} else {
ASSERT_EQ(localConfiguration.getTestKnobs().TEST_LONG, 0);
}
}
Future<Void> checkEventually(Optional<int64_t> expected) const {
return checkEventually(&localConfiguration, expected);
}
LocalConfiguration& getMutableLocalConfiguration() { return localConfiguration; }
Future<Void> getError() const { return consumer; }
};
class LocalConfigEnvironment {
ReadFromLocalConfigEnvironment readFrom;
Version lastWrittenVersion{ 0 };
Future<Void> addMutation(Optional<KeyRef> configClass, Optional<KnobValueRef> value) {
Standalone<VectorRef<VersionedConfigMutationRef>> versionedMutations;
appendVersionedMutation(versionedMutations, ++lastWrittenVersion, configClass, "test_long"_sr, value);
return readFrom.getMutableLocalConfiguration().addChanges(versionedMutations, lastWrittenVersion);
}
public:
LocalConfigEnvironment(std::string const& dataDir,
std::string const& configPath,
std::map<std::string, std::string> const& manualKnobOverrides = {})
: readFrom(dataDir, configPath, manualKnobOverrides) {}
Future<Void> setup() { return readFrom.setup(); }
Future<Void> restartLocalConfig(std::string const& newConfigPath) {
return readFrom.restartLocalConfig(newConfigPath);
}
Future<Void> getError() const { return Never(); }
Future<Void> clear(Optional<KeyRef> configClass) { return addMutation(configClass, {}); }
Future<Void> set(Optional<KeyRef> configClass, int64_t value) {
auto knobValue = KnobValueRef::create(value);
return addMutation(configClass, knobValue.contents());
}
void check(Optional<int64_t> value) const { return readFrom.checkImmediate(value); }
};
class BroadcasterToLocalConfigEnvironment {
ReadFromLocalConfigEnvironment readFrom;
Reference<AsyncVar<ConfigBroadcastFollowerInterface>> cbfi;
ConfigBroadcaster broadcaster;
Version lastWrittenVersion{ 0 };
Future<Void> broadcastServer;
ACTOR static Future<Void> setup(BroadcasterToLocalConfigEnvironment* self) {
wait(self->readFrom.setup());
self->readFrom.connectToBroadcaster(IDependentAsyncVar<ConfigBroadcastFollowerInterface>::create(self->cbfi));
self->broadcastServer = self->broadcaster.serve(self->cbfi->get());
return Void();
}
void addMutation(Optional<KeyRef> configClass, KnobValueRef value) {
Standalone<VectorRef<VersionedConfigMutationRef>> versionedMutations;
appendVersionedMutation(versionedMutations, ++lastWrittenVersion, configClass, "test_long"_sr, value);
broadcaster.applyChanges(versionedMutations, lastWrittenVersion, {});
}
public:
BroadcasterToLocalConfigEnvironment(std::string const& dataDir, std::string const& configPath)
: broadcaster(ConfigFollowerInterface{}), cbfi(makeReference<AsyncVar<ConfigBroadcastFollowerInterface>>()),
readFrom(dataDir, configPath, {}) {}
Future<Void> setup() { return setup(this); }
void set(Optional<KeyRef> configClass, int64_t value) {
auto knobValue = KnobValueRef::create(value);
addMutation(configClass, knobValue.contents());
}
void clear(Optional<KeyRef> configClass) { addMutation(configClass, {}); }
Future<Void> check(Optional<int64_t> value) const { return readFrom.checkEventually(value); }
void changeBroadcaster() {
broadcastServer.cancel();
cbfi->set(ConfigBroadcastFollowerInterface{});
broadcastServer = broadcaster.serve(cbfi->get());
}
Future<Void> restartLocalConfig(std::string const& newConfigPath) {
return readFrom.restartLocalConfig(newConfigPath);
}
void compact() { broadcaster.compact(lastWrittenVersion); }
Future<Void> getError() const { return readFrom.getError() || broadcastServer; }
};
class TransactionEnvironment {
WriteToTransactionEnvironment writeTo;
ACTOR static Future<Void> check(TransactionEnvironment* self,
Optional<KeyRef> configClass,
Optional<int64_t> expected) {
state Reference<IConfigTransaction> tr =
IConfigTransaction::createTestSimple(self->writeTo.getTransactionInterface());
state Key configKey = encodeConfigKey(configClass, "test_long"_sr);
state Optional<Value> value = wait(tr->get(configKey));
if (expected.present()) {
ASSERT_EQ(BinaryReader::fromStringRef<int64_t>(value.get(), Unversioned()), expected.get());
} else {
ASSERT(!value.present());
}
return Void();
}
ACTOR static Future<Standalone<VectorRef<KeyRef>>> getConfigClasses(TransactionEnvironment* self) {
state Reference<IConfigTransaction> tr =
IConfigTransaction::createTestSimple(self->writeTo.getTransactionInterface());
state KeySelector begin = firstGreaterOrEqual(configClassKeys.begin);
state KeySelector end = firstGreaterOrEqual(configClassKeys.end);
Standalone<RangeResultRef> range = wait(tr->getRange(begin, end, 1000));
Standalone<VectorRef<KeyRef>> result;
for (const auto& kv : range) {
result.push_back_deep(result.arena(), kv.key);
ASSERT(kv.value == ""_sr);
}
return result;
}
ACTOR static Future<Standalone<VectorRef<KeyRef>>> getKnobNames(TransactionEnvironment* self,
Optional<KeyRef> configClass) {
state Reference<IConfigTransaction> tr =
IConfigTransaction::createTestSimple(self->writeTo.getTransactionInterface());
state KeyRange keys = globalConfigKnobKeys;
if (configClass.present()) {
keys = singleKeyRange(configClass.get().withPrefix(configKnobKeys.begin));
}
KeySelector begin = firstGreaterOrEqual(keys.begin);
KeySelector end = firstGreaterOrEqual(keys.end);
Standalone<RangeResultRef> range = wait(tr->getRange(begin, end, 1000));
Standalone<VectorRef<KeyRef>> result;
for (const auto& kv : range) {
result.push_back_deep(result.arena(), kv.key);
ASSERT(kv.value == ""_sr);
}
return result;
}
ACTOR static Future<Void> badRangeRead(TransactionEnvironment* self) {
state Reference<IConfigTransaction> tr =
IConfigTransaction::createTestSimple(self->writeTo.getTransactionInterface());
KeySelector begin = firstGreaterOrEqual(normalKeys.begin);
KeySelector end = firstGreaterOrEqual(normalKeys.end);
wait(success(tr->getRange(begin, end, 1000)));
return Void();
}
public:
TransactionEnvironment(std::string const& dataDir) : writeTo(dataDir) {}
Future<Void> setup() { return Void(); }
void restartNode() { writeTo.restartNode(); }
template <class T>
Future<Void> set(Optional<KeyRef> configClass, T value, KeyRef knobName = "test_long"_sr) {
return writeTo.set(configClass, value, knobName);
}
Future<Void> clear(Optional<KeyRef> configClass) { return writeTo.clear(configClass); }
Future<Void> check(Optional<KeyRef> configClass, Optional<int64_t> expected) {
return check(this, configClass, expected);
}
Future<Void> badRangeRead() { return badRangeRead(this); }
Future<Standalone<VectorRef<KeyRef>>> getConfigClasses() { return getConfigClasses(this); }
Future<Standalone<VectorRef<KeyRef>>> getKnobNames(Optional<KeyRef> configClass) {
return getKnobNames(this, configClass);
}
Future<Void> compact() { return writeTo.compact(); }
Future<Void> getError() const { return writeTo.getError(); }
};
class TransactionToLocalConfigEnvironment {
WriteToTransactionEnvironment writeTo;
ReadFromLocalConfigEnvironment readFrom;
Reference<AsyncVar<ConfigBroadcastFollowerInterface>> cbfi;
ConfigBroadcaster broadcaster;
Future<Void> broadcastServer;
ACTOR static Future<Void> setup(TransactionToLocalConfigEnvironment* self) {
wait(self->readFrom.setup());
self->readFrom.connectToBroadcaster(IDependentAsyncVar<ConfigBroadcastFollowerInterface>::create(self->cbfi));
self->broadcastServer = self->broadcaster.serve(self->cbfi->get());
return Void();
}
public:
TransactionToLocalConfigEnvironment(std::string const& dataDir, std::string const& configPath)
: writeTo(dataDir), readFrom(dataDir, configPath, {}), broadcaster(writeTo.getFollowerInterface()),
cbfi(makeReference<AsyncVar<ConfigBroadcastFollowerInterface>>()) {}
Future<Void> setup() { return setup(this); }
void restartNode() { writeTo.restartNode(); }
void changeBroadcaster() {
broadcastServer.cancel();
cbfi->set(ConfigBroadcastFollowerInterface{});
broadcastServer = broadcaster.serve(cbfi->get());
}
Future<Void> restartLocalConfig(std::string const& newConfigPath) {
return readFrom.restartLocalConfig(newConfigPath);
}
Future<Void> compact() { return writeTo.compact(); }
template <class T>
Future<Void> set(Optional<KeyRef> configClass, T const& value) {
return writeTo.set(configClass, value);
}
Future<Void> clear(Optional<KeyRef> configClass) { return writeTo.clear(configClass); }
Future<Void> check(Optional<int64_t> value) const { return readFrom.checkEventually(value); }
Future<Void> getError() const { return writeTo.getError() || readFrom.getError() || broadcastServer; }
};
// These functions give a common interface to all environments, to improve code reuse
template <class Env, class... Args>
Future<Void> set(Env& env, Args&&... args) {
return waitOrError(env.set(std::forward<Args>(args)...), env.getError());
}
template <class... Args>
Future<Void> set(BroadcasterToLocalConfigEnvironment& env, Args&&... args) {
env.set(std::forward<Args>(args)...);
return Void();
}
template <class Env, class... Args>
Future<Void> clear(Env& env, Args&&... args) {
return waitOrError(env.clear(std::forward<Args>(args)...), env.getError());
}
template <class... Args>
Future<Void> clear(BroadcasterToLocalConfigEnvironment& env, Args&&... args) {
env.clear(std::forward<Args>(args)...);
return Void();
}
template <class Env, class... Args>
Future<Void> check(Env& env, Args&&... args) {
return waitOrError(env.check(std::forward<Args>(args)...), env.getError());
}
template <class... Args>
Future<Void> check(LocalConfigEnvironment& env, Args&&... args) {
env.check(std::forward<Args>(args)...);
return Void();
}
template <class Env>
Future<Void> compact(Env& env) {
return waitOrError(env.compact(), env.getError());
}
Future<Void> compact(BroadcasterToLocalConfigEnvironment& env) {
env.compact();
return Void();
}
ACTOR template <class Env>
Future<Void> testRestartLocalConfig(UnitTestParameters params) {
state Env env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(check(env, int64_t{ 1 }));
wait(env.restartLocalConfig("class-A"));
wait(check(env, int64_t{ 1 }));
wait(set(env, "class-A"_sr, 2));
wait(check(env, int64_t{ 2 }));
return Void();
}
ACTOR template <class Env>
Future<Void> testRestartLocalConfigAndChangeClass(UnitTestParameters params) {
state Env env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(check(env, int64_t{ 1 }));
wait(env.restartLocalConfig("class-B"));
wait(check(env, int64_t{ 0 }));
wait(set(env, "class-B"_sr, int64_t{ 2 }));
wait(check(env, int64_t{ 2 }));
return Void();
}
ACTOR template <class Env>
Future<Void> testSet(UnitTestParameters params) {
state Env env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(check(env, int64_t{ 1 }));
return Void();
}
ACTOR template <class Env>
Future<Void> testClear(UnitTestParameters params) {
state Env env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(clear(env, "class-A"_sr));
wait(check(env, Optional<int64_t>{}));
return Void();
}
ACTOR template <class Env>
Future<Void> testGlobalSet(UnitTestParameters params) {
state Env env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, Optional<KeyRef>{}, int64_t{ 1 }));
wait(check(env, int64_t{ 1 }));
wait(set(env, "class-A"_sr, int64_t{ 10 }));
wait(check(env, int64_t{ 10 }));
return Void();
}
ACTOR template <class Env>
Future<Void> testIgnore(UnitTestParameters params) {
state Env env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, "class-B"_sr, int64_t{ 1 }));
choose {
when(wait(delay(5))) {}
when(wait(check(env, int64_t{ 1 }))) { ASSERT(false); }
}
return Void();
}
ACTOR template <class Env>
Future<Void> testCompact(UnitTestParameters params) {
state Env env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(compact(env));
wait(check(env, 1));
wait(set(env, "class-A"_sr, int64_t{ 2 }));
wait(check(env, 2));
return Void();
}
ACTOR template <class Env>
Future<Void> testChangeBroadcaster(UnitTestParameters params) {
state Env env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(check(env, int64_t{ 1 }));
env.changeBroadcaster();
wait(set(env, "class-A"_sr, int64_t{ 2 }));
wait(check(env, int64_t{ 2 }));
return Void();
}
bool matches(Standalone<VectorRef<KeyRef>> const& vec, std::set<Key> const& compareTo) {
std::set<Key> s;
for (const auto& value : vec) {
s.insert(value);
}
return (s == compareTo);
}
ACTOR Future<Void> testGetConfigClasses(UnitTestParameters params, bool doCompact) {
state TransactionEnvironment env(params.getDataDir());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(set(env, "class-B"_sr, int64_t{ 1 }));
if (doCompact) {
wait(compact(env));
}
Standalone<VectorRef<KeyRef>> configClasses = wait(env.getConfigClasses());
ASSERT(matches(configClasses, { "class-A"_sr, "class-B"_sr }));
return Void();
}
ACTOR Future<Void> testGetKnobs(UnitTestParameters params, bool global, bool doCompact) {
state TransactionEnvironment env(params.getDataDir());
state Optional<Key> configClass;
if (!global) {
configClass = "class-A"_sr;
}
wait(set(env, configClass.castTo<KeyRef>(), int64_t{ 1 }, "test_long"_sr));
wait(set(env, configClass.castTo<KeyRef>(), int{ 2 }, "test_int"_sr));
wait(set(env, "class-B"_sr, double{ 3.0 }, "test_double"_sr)); // ignored
if (doCompact) {
wait(compact(env));
}
Standalone<VectorRef<KeyRef>> knobNames =
wait(waitOrError(env.getKnobNames(configClass.castTo<KeyRef>()), env.getError()));
ASSERT(matches(knobNames, { "test_long"_sr, "test_int"_sr }));
return Void();
}
} // namespace
TEST_CASE("/fdbserver/ConfigDB/LocalConfiguration/Set") {
wait(testSet<LocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/LocalConfiguration/Restart") {
wait(testRestartLocalConfig<LocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/LocalConfiguration/RestartFresh") {
wait(testRestartLocalConfigAndChangeClass<LocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/LocalConfiguration/Clear") {
wait(testClear<LocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/LocalConfiguration/GlobalSet") {
wait(testGlobalSet<LocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/LocalConfiguration/ConflictingOverrides") {
state LocalConfigEnvironment env(params.getDataDir(), "class-A/class-B", {});
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(set(env, "class-B"_sr, int64_t{ 10 }));
env.check(10);
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/LocalConfiguration/Manual") {
state LocalConfigEnvironment env(params.getDataDir(), "class-A", { { "test_long", "1000" } });
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
env.check(1000);
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/BroadcasterToLocalConfig/Set") {
wait(testSet<BroadcasterToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/BroadcasterToLocalConfig/Clear") {
wait(testClear<BroadcasterToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/BroadcasterToLocalConfig/Ignore") {
wait(testIgnore<BroadcasterToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/BroadcasterToLocalConfig/GlobalSet") {
wait(testGlobalSet<BroadcasterToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/BroadcasterToLocalConfig/ChangeBroadcaster") {
wait(testChangeBroadcaster<BroadcasterToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/BroadcasterToLocalConfig/RestartLocalConfig") {
wait(testRestartLocalConfig<BroadcasterToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/BroadcasterToLocalConfig/RestartLocalConfigAndChangeClass") {
wait(testRestartLocalConfigAndChangeClass<BroadcasterToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/BroadcasterToLocalConfig/Compact") {
wait(testCompact<BroadcasterToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/TransactionToLocalConfig/Set") {
wait(testSet<TransactionToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/TransactionToLocalConfig/Clear") {
wait(testClear<TransactionToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/TransactionToLocalConfig/GlobalSet") {
wait(testGlobalSet<TransactionToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/TransactionToLocalConfig/RestartNode") {
state TransactionToLocalConfigEnvironment env(params.getDataDir(), "class-A");
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
env.restartNode();
wait(check(env, int64_t{ 1 }));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/TransactionToLocalConfig/ChangeBroadcaster") {
wait(testChangeBroadcaster<TransactionToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/TransactionToLocalConfig/RestartLocalConfigAndChangeClass") {
wait(testRestartLocalConfigAndChangeClass<TransactionToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/TransactionToLocalConfig/CompactNode") {
wait(testCompact<TransactionToLocalConfigEnvironment>(params));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/Set") {
state TransactionEnvironment env(params.getDataDir());
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(check(env, "class-A"_sr, int64_t{ 1 }));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/Clear") {
state TransactionEnvironment env(params.getDataDir());
wait(env.setup());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(clear(env, "class-A"_sr));
wait(check(env, "class-A"_sr, Optional<int64_t>{}));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/Restart") {
state TransactionEnvironment env(params.getDataDir());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
env.restartNode();
wait(check(env, "class-A"_sr, int64_t{ 1 }));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/CompactNode") {
state TransactionEnvironment env(params.getDataDir());
wait(set(env, "class-A"_sr, int64_t{ 1 }));
wait(compact(env));
wait(check(env, "class-A"_sr, int64_t{ 1 }));
wait(set(env, "class-A"_sr, int64_t{ 2 }));
wait(check(env, "class-A"_sr, int64_t{ 2 }));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/GetConfigClasses") {
wait(testGetConfigClasses(params, false));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/CompactThenGetConfigClasses") {
wait(testGetConfigClasses(params, true));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/GetKnobs") {
wait(testGetKnobs(params, false, false));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/CompactThenGetKnobs") {
wait(testGetKnobs(params, false, true));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/GetGlobalKnobs") {
wait(testGetKnobs(params, true, false));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/CompactThenGetGlobalKnobs") {
wait(testGetKnobs(params, true, true));
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/Transaction/BadRangeRead") {
state TransactionEnvironment env(params.getDataDir());
try {
wait(env.badRangeRead() || env.getError());
ASSERT(false);
} catch (Error& e) {
ASSERT_EQ(e.code(), error_code_invalid_config_db_range_read);
}
return Void();
}

View File

@ -0,0 +1,46 @@
/*
* ConfigFollowerInterface.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "flow/IRandom.h"
#include "fdbserver/ConfigFollowerInterface.h"
#include "fdbserver/CoordinationInterface.h"
void ConfigFollowerInterface::setupWellKnownEndpoints() {
getSnapshotAndChanges.makeWellKnownEndpoint(WLTOKEN_CONFIGFOLLOWER_GETSNAPSHOTANDCHANGES,
TaskPriority::Coordination);
getChanges.makeWellKnownEndpoint(WLTOKEN_CONFIGFOLLOWER_GETCHANGES, TaskPriority::Coordination);
compact.makeWellKnownEndpoint(WLTOKEN_CONFIGFOLLOWER_COMPACT, TaskPriority::Coordination);
}
ConfigFollowerInterface::ConfigFollowerInterface() : _id(deterministicRandom()->randomUniqueID()) {}
ConfigFollowerInterface::ConfigFollowerInterface(NetworkAddress const& remote)
: _id(deterministicRandom()->randomUniqueID()),
getSnapshotAndChanges(Endpoint({ remote }, WLTOKEN_CONFIGFOLLOWER_GETSNAPSHOTANDCHANGES)),
getChanges(Endpoint({ remote }, WLTOKEN_CONFIGFOLLOWER_GETCHANGES)),
compact(Endpoint({ remote }, WLTOKEN_CONFIGFOLLOWER_COMPACT)) {}
bool ConfigFollowerInterface::operator==(ConfigFollowerInterface const& rhs) const {
return _id == rhs._id;
}
bool ConfigFollowerInterface::operator!=(ConfigFollowerInterface const& rhs) const {
return !(*this == rhs);
}

View File

@ -0,0 +1,175 @@
/*
* ConfigFollowerInterface.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/CommitTransaction.h"
#include "fdbclient/ConfigKnobs.h"
#include "fdbclient/FDBTypes.h"
#include "fdbrpc/fdbrpc.h"
struct VersionedConfigMutationRef {
Version version;
ConfigMutationRef mutation;
VersionedConfigMutationRef() = default;
explicit VersionedConfigMutationRef(Arena& arena, Version version, ConfigMutationRef mutation)
: version(version), mutation(arena, mutation) {}
explicit VersionedConfigMutationRef(Arena& arena, VersionedConfigMutationRef const& rhs)
: version(rhs.version), mutation(arena, rhs.mutation) {}
size_t expectedSize() const { return mutation.expectedSize(); }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version, mutation);
}
};
using VersionedConfigMutation = Standalone<VersionedConfigMutationRef>;
struct VersionedConfigCommitAnnotationRef {
Version version;
ConfigCommitAnnotationRef annotation;
VersionedConfigCommitAnnotationRef() = default;
explicit VersionedConfigCommitAnnotationRef(Arena& arena, Version version, ConfigCommitAnnotationRef annotation)
: version(version), annotation(arena, annotation) {}
explicit VersionedConfigCommitAnnotationRef(Arena& arena, VersionedConfigCommitAnnotationRef rhs)
: version(rhs.version), annotation(arena, rhs.annotation) {}
size_t expectedSize() const { return annotation.expectedSize(); }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version, annotation);
}
};
using VersionedConfigCommitAnnotation = Standalone<VersionedConfigCommitAnnotationRef>;
struct ConfigFollowerGetSnapshotAndChangesReply {
static constexpr FileIdentifier file_identifier = 1734095;
Version snapshotVersion;
Version changesVersion;
std::map<ConfigKey, KnobValue> snapshot;
// TODO: Share arena
Standalone<VectorRef<VersionedConfigMutationRef>> changes;
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> annotations;
ConfigFollowerGetSnapshotAndChangesReply() = default;
template <class Snapshot>
explicit ConfigFollowerGetSnapshotAndChangesReply(
Version snapshotVersion,
Version changesVersion,
Snapshot&& snapshot,
Standalone<VectorRef<VersionedConfigMutationRef>> changes,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> annotations)
: snapshotVersion(snapshotVersion), changesVersion(changesVersion), snapshot(std::forward<Snapshot>(snapshot)),
changes(changes), annotations(annotations) {
ASSERT_GE(changesVersion, snapshotVersion);
}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, snapshotVersion, changesVersion, snapshot, changes);
}
};
struct ConfigFollowerGetSnapshotAndChangesRequest {
static constexpr FileIdentifier file_identifier = 294811;
ReplyPromise<ConfigFollowerGetSnapshotAndChangesReply> reply;
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, reply);
}
};
struct ConfigFollowerGetChangesReply {
static constexpr FileIdentifier file_identifier = 234859;
Version mostRecentVersion;
// TODO: Share arena
Standalone<VectorRef<VersionedConfigMutationRef>> changes;
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> annotations;
ConfigFollowerGetChangesReply() : mostRecentVersion(0) {}
explicit ConfigFollowerGetChangesReply(Version mostRecentVersion,
Standalone<VectorRef<VersionedConfigMutationRef>> const& changes,
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> const& annotations)
: mostRecentVersion(mostRecentVersion), changes(changes), annotations(annotations) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, mostRecentVersion, changes, annotations);
}
};
struct ConfigFollowerGetChangesRequest {
static constexpr FileIdentifier file_identifier = 178935;
Version lastSeenVersion{ 0 };
ReplyPromise<ConfigFollowerGetChangesReply> reply;
ConfigFollowerGetChangesRequest() = default;
explicit ConfigFollowerGetChangesRequest(Version lastSeenVersion) : lastSeenVersion(lastSeenVersion) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, lastSeenVersion, reply);
}
};
struct ConfigFollowerCompactRequest {
static constexpr FileIdentifier file_identifier = 568910;
Version version{ 0 };
ReplyPromise<Void> reply;
ConfigFollowerCompactRequest() = default;
explicit ConfigFollowerCompactRequest(Version version) : version(version) {}
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, version, reply);
}
};
/*
* Configuration database nodes serve a ConfigFollowerInterface which contains well known endpoints,
* used by workers to receive configuration database updates
*/
class ConfigFollowerInterface {
UID _id;
public:
static constexpr FileIdentifier file_identifier = 7721102;
RequestStream<ConfigFollowerGetSnapshotAndChangesRequest> getSnapshotAndChanges;
RequestStream<ConfigFollowerGetChangesRequest> getChanges;
RequestStream<ConfigFollowerCompactRequest> compact;
ConfigFollowerInterface();
void setupWellKnownEndpoints();
ConfigFollowerInterface(NetworkAddress const& remote);
bool operator==(ConfigFollowerInterface const& rhs) const;
bool operator!=(ConfigFollowerInterface const& rhs) const;
UID id() const { return _id; }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, _id, getSnapshotAndChanges, getChanges, compact);
}
};

View File

@ -18,9 +18,12 @@
* limitations under the License.
*/
#include "fdbclient/ConfigTransactionInterface.h"
#include "fdbserver/CoordinationInterface.h"
#include "fdbserver/IConfigDatabaseNode.h"
#include "fdbserver/IKeyValueStore.h"
#include "fdbserver/Knobs.h"
#include "fdbserver/OnDemandStore.h"
#include "fdbserver/WorkerInterface.actor.h"
#include "fdbserver/Status.h"
#include "flow/ActorCollection.h"
@ -70,56 +73,12 @@ LeaderElectionRegInterface::LeaderElectionRegInterface(INetwork* local) : Client
ServerCoordinators::ServerCoordinators(Reference<ClusterConnectionFile> cf) : ClientCoordinators(cf) {
ClusterConnectionString cs = ccf->getConnectionString();
for (auto s = cs.coordinators().begin(); s != cs.coordinators().end(); ++s) {
leaderElectionServers.push_back(LeaderElectionRegInterface(*s));
stateServers.push_back(GenerationRegInterface(*s));
leaderElectionServers.emplace_back(*s);
stateServers.emplace_back(*s);
configServers.emplace_back(*s);
}
}
// The coordination server wants to create its key value store only if it is actually used
struct OnDemandStore {
public:
OnDemandStore(std::string folder, UID myID) : folder(folder), store(nullptr), myID(myID) {}
~OnDemandStore() {
if (store)
store->close();
}
IKeyValueStore* get() {
if (!store)
open();
return store;
}
bool exists() {
if (store)
return true;
return fileExists(joinPath(folder, "coordination-0.fdq")) ||
fileExists(joinPath(folder, "coordination-1.fdq")) || fileExists(joinPath(folder, "coordination.fdb"));
}
IKeyValueStore* operator->() { return get(); }
Future<Void> getError() { return onErr(err.getFuture()); }
private:
std::string folder;
UID myID;
IKeyValueStore* store;
Promise<Future<Void>> err;
ACTOR static Future<Void> onErr(Future<Future<Void>> e) {
Future<Void> f = wait(e);
wait(f);
return Void();
}
void open() {
platform::createDirectory(folder);
store = keyValueStoreMemory(joinPath(folder, "coordination-"), myID, 500e6);
err.send(store->getError());
}
};
ACTOR Future<Void> localGenerationReg(GenerationRegInterface interf, OnDemandStore* pstore) {
state GenerationRegVal v;
state OnDemandStore& store = *pstore;
@ -176,8 +135,7 @@ ACTOR Future<Void> localGenerationReg(GenerationRegInterface interf, OnDemandSto
TEST_CASE("/fdbserver/Coordination/localGenerationReg/simple") {
state GenerationRegInterface reg;
state OnDemandStore store("simfdb/unittests/", //< FIXME
deterministicRandom()->randomUniqueID());
state OnDemandStore store(params.getDataDir(), deterministicRandom()->randomUniqueID(), "coordination-");
state Future<Void> actor = localGenerationReg(reg, &store);
state Key the_key(deterministicRandom()->randomAlphaNumeric(deterministicRandom()->randomInt(0, 10)));
@ -688,19 +646,36 @@ ACTOR Future<Void> leaderServer(LeaderElectionRegInterface interf,
}
}
ACTOR Future<Void> coordinationServer(std::string dataFolder, Reference<ClusterConnectionFile> ccf) {
ACTOR Future<Void> coordinationServer(std::string dataFolder,
Reference<ClusterConnectionFile> ccf,
UseConfigDB useConfigDB) {
state UID myID = deterministicRandom()->randomUniqueID();
state LeaderElectionRegInterface myLeaderInterface(g_network);
state GenerationRegInterface myInterface(g_network);
state OnDemandStore store(dataFolder, myID);
state OnDemandStore store(dataFolder, myID, "coordination-");
state ConfigTransactionInterface configTransactionInterface;
state ConfigFollowerInterface configFollowerInterface;
state Reference<IConfigDatabaseNode> configDatabaseNode;
state Future<Void> configDatabaseServer = Never();
TraceEvent("CoordinationServer", myID)
.detail("MyInterfaceAddr", myInterface.read.getEndpoint().getPrimaryAddress())
.detail("Folder", dataFolder);
if (useConfigDB != UseConfigDB::DISABLED) {
configTransactionInterface.setupWellKnownEndpoints();
configFollowerInterface.setupWellKnownEndpoints();
if (useConfigDB == UseConfigDB::SIMPLE) {
configDatabaseNode = IConfigDatabaseNode::createSimple(dataFolder);
} else {
configDatabaseNode = IConfigDatabaseNode::createPaxos(dataFolder);
}
configDatabaseServer =
configDatabaseNode->serve(configTransactionInterface) || configDatabaseNode->serve(configFollowerInterface);
}
try {
wait(localGenerationReg(myInterface, &store) || leaderServer(myLeaderInterface, &store, myID, ccf) ||
store.getError());
store.getError() || configDatabaseServer);
throw internal_error();
} catch (Error& e) {
TraceEvent("CoordinationServerError", myID).error(e, true);

View File

@ -23,6 +23,7 @@
#pragma once
#include "fdbclient/CoordinationInterface.h"
#include "fdbserver/ConfigFollowerInterface.h"
constexpr UID WLTOKEN_LEADERELECTIONREG_CANDIDACY(-1, 4);
constexpr UID WLTOKEN_LEADERELECTIONREG_ELECTIONRESULT(-1, 5);
@ -31,6 +32,10 @@ constexpr UID WLTOKEN_LEADERELECTIONREG_FORWARD(-1, 7);
constexpr UID WLTOKEN_GENERATIONREG_READ(-1, 8);
constexpr UID WLTOKEN_GENERATIONREG_WRITE(-1, 9);
constexpr UID WLTOKEN_CONFIGFOLLOWER_GETSNAPSHOTANDCHANGES(-1, 17);
constexpr UID WLTOKEN_CONFIGFOLLOWER_GETCHANGES(-1, 18);
constexpr UID WLTOKEN_CONFIGFOLLOWER_COMPACT(-1, 19);
struct GenerationRegInterface {
constexpr static FileIdentifier file_identifier = 16726744;
RequestStream<struct GenerationRegReadRequest> read;
@ -221,10 +226,13 @@ class ServerCoordinators : public ClientCoordinators {
public:
explicit ServerCoordinators(Reference<ClusterConnectionFile>);
vector<LeaderElectionRegInterface> leaderElectionServers;
vector<GenerationRegInterface> stateServers;
std::vector<LeaderElectionRegInterface> leaderElectionServers;
std::vector<GenerationRegInterface> stateServers;
std::vector<ConfigFollowerInterface> configServers;
};
Future<Void> coordinationServer(std::string const& dataFolder, Reference<ClusterConnectionFile> const& ccf);
Future<Void> coordinationServer(std::string const& dataFolder,
Reference<ClusterConnectionFile> const& ccf,
UseConfigDB const& useConfigDB);
#endif

View File

@ -0,0 +1,41 @@
/*
* IConfigConsumer.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbserver/IConfigConsumer.h"
#include "fdbserver/PaxosConfigConsumer.h"
#include "fdbserver/SimpleConfigConsumer.h"
std::unique_ptr<IConfigConsumer> IConfigConsumer::createTestSimple(ConfigFollowerInterface const& cfi,
double pollingInterval,
Optional<double> compactionInterval) {
return std::make_unique<SimpleConfigConsumer>(cfi, pollingInterval, compactionInterval);
}
std::unique_ptr<IConfigConsumer> IConfigConsumer::createSimple(ServerCoordinators const& coordinators,
double pollingInterval,
Optional<double> compactionInterval) {
return std::make_unique<SimpleConfigConsumer>(coordinators, pollingInterval, compactionInterval);
}
std::unique_ptr<IConfigConsumer> IConfigConsumer::createPaxos(ServerCoordinators const& coordinators,
double pollingInterval,
Optional<double> compactionInterval) {
return std::make_unique<PaxosConfigConsumer>(coordinators, pollingInterval, compactionInterval);
}

View File

@ -0,0 +1,50 @@
/*
* IConfigConsumer.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/CoordinationInterface.h"
#include "fdbserver/ConfigBroadcaster.h"
#include "fdbserver/CoordinationInterface.h"
#include "fdbserver/ConfigFollowerInterface.h"
#include "flow/flow.h"
#include <memory>
/*
* An IConfigConsumer instantiation is used by the ConfigBroadcaster to consume
* updates from the configuration database nodes. A consumer sends mutations to a broadcaster
* once they are confirmed to be committed.
*/
class IConfigConsumer {
public:
virtual ~IConfigConsumer() = default;
virtual Future<Void> consume(ConfigBroadcaster& broadcaster) = 0;
virtual UID getID() const = 0;
static std::unique_ptr<IConfigConsumer> createTestSimple(ConfigFollowerInterface const& cfi,
double pollingInterval,
Optional<double> compactionInterval);
static std::unique_ptr<IConfigConsumer> createSimple(ServerCoordinators const& coordinators,
double pollingInterval,
Optional<double> compactionInterval);
static std::unique_ptr<IConfigConsumer> createPaxos(ServerCoordinators const& coordinators,
double pollingInterval,
Optional<double> compactionInterval);
};

View File

@ -0,0 +1,31 @@
/*
* IConfigDatabaseNode.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbserver/IConfigDatabaseNode.h"
#include "fdbserver/PaxosConfigDatabaseNode.h"
#include "fdbserver/SimpleConfigDatabaseNode.h"
Reference<IConfigDatabaseNode> IConfigDatabaseNode::createSimple(std::string const& folder) {
return makeReference<SimpleConfigDatabaseNode>(folder);
}
Reference<IConfigDatabaseNode> IConfigDatabaseNode::createPaxos(std::string const& folder) {
return makeReference<PaxosConfigDatabaseNode>(folder);
}

View File

@ -0,0 +1,40 @@
/*
* IConfigDatabaseNode.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbclient/ConfigTransactionInterface.h"
#include "fdbserver/ConfigFollowerInterface.h"
#include "flow/FastRef.h"
#include "flow/flow.h"
#include <memory>
/*
* Interface for a single node in the configuration database, run on coordinators
*/
class IConfigDatabaseNode : public ReferenceCounted<IConfigDatabaseNode> {
public:
virtual Future<Void> serve(ConfigTransactionInterface const&) = 0;
virtual Future<Void> serve(ConfigFollowerInterface const&) = 0;
static Reference<IConfigDatabaseNode> createSimple(std::string const& folder);
static Reference<IConfigDatabaseNode> createPaxos(std::string const& folder);
};

View File

@ -18,6 +18,7 @@
* limitations under the License.
*/
#include "fdbclient/Knobs.h"
#include "fdbclient/Notified.h"
#include "fdbclient/SystemData.h"
#include "fdbserver/DeltaTree.h"

View File

@ -18,659 +18,8 @@
* limitations under the License.
*/
#ifndef FDBSERVER_KNOBS_H
#define FDBSERVER_KNOBS_H
#pragma once
#include "flow/Knobs.h"
#include "fdbrpc/fdbrpc.h"
#include "fdbclient/Knobs.h"
#include "fdbclient/IKnobCollection.h"
// Disk queue
static const int _PAGE_SIZE = 4096;
class ServerKnobs : public Knobs {
public:
// Versions
int64_t VERSIONS_PER_SECOND;
int64_t MAX_VERSIONS_IN_FLIGHT;
int64_t MAX_VERSIONS_IN_FLIGHT_FORCED;
int64_t MAX_READ_TRANSACTION_LIFE_VERSIONS;
int64_t MAX_WRITE_TRANSACTION_LIFE_VERSIONS;
double MAX_COMMIT_BATCH_INTERVAL; // Each commit proxy generates a CommitTransactionBatchRequest at least this
// often, so that versions always advance smoothly
// TLogs
double TLOG_TIMEOUT; // tlog OR commit proxy failure - master's reaction time
double TLOG_SLOW_REJOIN_WARN_TIMEOUT_SECS; // Warns if a tlog takes too long to rejoin
double RECOVERY_TLOG_SMART_QUORUM_DELAY; // smaller might be better for bug amplification
double TLOG_STORAGE_MIN_UPDATE_INTERVAL;
double BUGGIFY_TLOG_STORAGE_MIN_UPDATE_INTERVAL;
int DESIRED_TOTAL_BYTES;
int DESIRED_UPDATE_BYTES;
double UPDATE_DELAY;
int MAXIMUM_PEEK_BYTES;
int APPLY_MUTATION_BYTES;
int RECOVERY_DATA_BYTE_LIMIT;
int BUGGIFY_RECOVERY_DATA_LIMIT;
double LONG_TLOG_COMMIT_TIME;
int64_t LARGE_TLOG_COMMIT_BYTES;
double BUGGIFY_RECOVER_MEMORY_LIMIT;
double BUGGIFY_WORKER_REMOVED_MAX_LAG;
int64_t UPDATE_STORAGE_BYTE_LIMIT;
int64_t REFERENCE_SPILL_UPDATE_STORAGE_BYTE_LIMIT;
double TLOG_PEEK_DELAY;
int LEGACY_TLOG_UPGRADE_ENTRIES_PER_VERSION;
int VERSION_MESSAGES_OVERHEAD_FACTOR_1024THS; // Multiplicative factor to bound total space used to store a version
// message (measured in 1/1024ths, e.g. a value of 2048 yields a
// factor of 2).
int64_t VERSION_MESSAGES_ENTRY_BYTES_WITH_OVERHEAD;
double TLOG_MESSAGE_BLOCK_OVERHEAD_FACTOR;
int64_t TLOG_MESSAGE_BLOCK_BYTES;
int64_t MAX_MESSAGE_SIZE;
int LOG_SYSTEM_PUSHED_DATA_BLOCK_SIZE;
double PEEK_TRACKER_EXPIRATION_TIME;
int PARALLEL_GET_MORE_REQUESTS;
int MULTI_CURSOR_PRE_FETCH_LIMIT;
int64_t MAX_QUEUE_COMMIT_BYTES;
int DESIRED_OUTSTANDING_MESSAGES;
double DESIRED_GET_MORE_DELAY;
int CONCURRENT_LOG_ROUTER_READS;
int LOG_ROUTER_PEEK_FROM_SATELLITES_PREFERRED; // 0==peek from primary, non-zero==peek from satellites
double DISK_QUEUE_ADAPTER_MIN_SWITCH_TIME;
double DISK_QUEUE_ADAPTER_MAX_SWITCH_TIME;
int64_t TLOG_SPILL_REFERENCE_MAX_PEEK_MEMORY_BYTES;
int64_t TLOG_SPILL_REFERENCE_MAX_BATCHES_PER_PEEK;
int64_t TLOG_SPILL_REFERENCE_MAX_BYTES_PER_BATCH;
int64_t DISK_QUEUE_FILE_EXTENSION_BYTES; // When we grow the disk queue, by how many bytes should it grow?
int64_t DISK_QUEUE_FILE_SHRINK_BYTES; // When we shrink the disk queue, by how many bytes should it shrink?
int64_t DISK_QUEUE_MAX_TRUNCATE_BYTES; // A truncate larger than this will cause the file to be replaced instead.
double TLOG_DEGRADED_DURATION;
int64_t MAX_CACHE_VERSIONS;
double TXS_POPPED_MAX_DELAY;
double TLOG_MAX_CREATE_DURATION;
int PEEK_LOGGING_AMOUNT;
double PEEK_LOGGING_DELAY;
double PEEK_RESET_INTERVAL;
double PEEK_MAX_LATENCY;
bool PEEK_COUNT_SMALL_MESSAGES;
double PEEK_STATS_INTERVAL;
double PEEK_STATS_SLOW_AMOUNT;
double PEEK_STATS_SLOW_RATIO;
double PUSH_RESET_INTERVAL;
double PUSH_MAX_LATENCY;
double PUSH_STATS_INTERVAL;
double PUSH_STATS_SLOW_AMOUNT;
double PUSH_STATS_SLOW_RATIO;
int TLOG_POP_BATCH_SIZE;
// Data distribution queue
double HEALTH_POLL_TIME;
double BEST_TEAM_STUCK_DELAY;
double BG_REBALANCE_POLLING_INTERVAL;
double BG_REBALANCE_SWITCH_CHECK_INTERVAL;
double DD_QUEUE_LOGGING_INTERVAL;
double RELOCATION_PARALLELISM_PER_SOURCE_SERVER;
int DD_QUEUE_MAX_KEY_SERVERS;
int DD_REBALANCE_PARALLELISM;
int DD_REBALANCE_RESET_AMOUNT;
double BG_DD_MAX_WAIT;
double BG_DD_MIN_WAIT;
double BG_DD_INCREASE_RATE;
double BG_DD_DECREASE_RATE;
double BG_DD_SATURATION_DELAY;
double INFLIGHT_PENALTY_HEALTHY;
double INFLIGHT_PENALTY_REDUNDANT;
double INFLIGHT_PENALTY_UNHEALTHY;
double INFLIGHT_PENALTY_ONE_LEFT;
bool USE_OLD_NEEDED_SERVERS;
// Higher priorities are executed first
// Priority/100 is the "priority group"/"superpriority". Priority inversion
// is possible within but not between priority groups; fewer priority groups
// mean better worst case time bounds
// Maximum allowable priority is 999.
int PRIORITY_RECOVER_MOVE;
int PRIORITY_REBALANCE_UNDERUTILIZED_TEAM;
int PRIORITY_REBALANCE_OVERUTILIZED_TEAM;
int PRIORITY_PERPETUAL_STORAGE_WIGGLE;
int PRIORITY_TEAM_HEALTHY;
int PRIORITY_TEAM_CONTAINS_UNDESIRED_SERVER;
int PRIORITY_TEAM_REDUNDANT;
int PRIORITY_MERGE_SHARD;
int PRIORITY_POPULATE_REGION;
int PRIORITY_TEAM_UNHEALTHY;
int PRIORITY_TEAM_2_LEFT;
int PRIORITY_TEAM_1_LEFT;
int PRIORITY_TEAM_FAILED; // Priority when a server in the team is excluded as failed
int PRIORITY_TEAM_0_LEFT;
int PRIORITY_SPLIT_SHARD;
// Data distribution
double RETRY_RELOCATESHARD_DELAY;
double DATA_DISTRIBUTION_FAILURE_REACTION_TIME;
int MIN_SHARD_BYTES, SHARD_BYTES_RATIO, SHARD_BYTES_PER_SQRT_BYTES, MAX_SHARD_BYTES, KEY_SERVER_SHARD_BYTES;
int64_t SHARD_MAX_BYTES_PER_KSEC, // Shards with more than this bandwidth will be split immediately
SHARD_MIN_BYTES_PER_KSEC, // Shards with more than this bandwidth will not be merged
SHARD_SPLIT_BYTES_PER_KSEC; // When splitting a shard, it is split into pieces with less than this bandwidth
double SHARD_MAX_READ_DENSITY_RATIO;
int64_t SHARD_READ_HOT_BANDWITH_MIN_PER_KSECONDS;
double SHARD_MAX_BYTES_READ_PER_KSEC_JITTER;
double STORAGE_METRIC_TIMEOUT;
double METRIC_DELAY;
double ALL_DATA_REMOVED_DELAY;
double INITIAL_FAILURE_REACTION_DELAY;
double CHECK_TEAM_DELAY;
double LOG_ON_COMPLETION_DELAY;
int BEST_TEAM_MAX_TEAM_TRIES;
int BEST_TEAM_OPTION_COUNT;
int BEST_OF_AMT;
double SERVER_LIST_DELAY;
double RECRUITMENT_IDLE_DELAY;
double STORAGE_RECRUITMENT_DELAY;
bool TSS_HACK_IDENTITY_MAPPING;
double TSS_RECRUITMENT_TIMEOUT;
double TSS_DD_CHECK_INTERVAL;
double DATA_DISTRIBUTION_LOGGING_INTERVAL;
double DD_ENABLED_CHECK_DELAY;
double DD_STALL_CHECK_DELAY;
double DD_LOW_BANDWIDTH_DELAY;
double DD_MERGE_COALESCE_DELAY;
double STORAGE_METRICS_POLLING_DELAY;
double STORAGE_METRICS_RANDOM_DELAY;
double AVAILABLE_SPACE_RATIO_CUTOFF;
int DESIRED_TEAMS_PER_SERVER;
int MAX_TEAMS_PER_SERVER;
int64_t DD_SHARD_SIZE_GRANULARITY;
int64_t DD_SHARD_SIZE_GRANULARITY_SIM;
int DD_MOVE_KEYS_PARALLELISM;
int DD_FETCH_SOURCE_PARALLELISM;
int DD_MERGE_LIMIT;
double DD_SHARD_METRICS_TIMEOUT;
int64_t DD_LOCATION_CACHE_SIZE;
double MOVEKEYS_LOCK_POLLING_DELAY;
double DEBOUNCE_RECRUITING_DELAY;
int REBALANCE_MAX_RETRIES;
int DD_OVERLAP_PENALTY;
int DD_EXCLUDE_MIN_REPLICAS;
bool DD_VALIDATE_LOCALITY;
int DD_CHECK_INVALID_LOCALITY_DELAY;
bool DD_ENABLE_VERBOSE_TRACING;
int64_t
DD_SS_FAILURE_VERSIONLAG; // Allowed SS version lag from the current read version before marking it as failed.
int64_t DD_SS_ALLOWED_VERSIONLAG; // SS will be marked as healthy if it's version lag goes below this value.
double DD_SS_STUCK_TIME_LIMIT; // If a storage server is not getting new versions for this amount of time, then it
// becomes undesired.
int DD_TEAMS_INFO_PRINT_INTERVAL;
int DD_TEAMS_INFO_PRINT_YIELD_COUNT;
int DD_TEAM_ZERO_SERVER_LEFT_LOG_DELAY;
int DD_STORAGE_WIGGLE_PAUSE_THRESHOLD; // How many unhealthy relocations are ongoing will pause storage wiggle
// TeamRemover to remove redundant teams
bool TR_FLAG_DISABLE_MACHINE_TEAM_REMOVER; // disable the machineTeamRemover actor
double TR_REMOVE_MACHINE_TEAM_DELAY; // wait for the specified time before try to remove next machine team
bool TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS; // guard to select which machineTeamRemover logic to use
bool TR_FLAG_DISABLE_SERVER_TEAM_REMOVER; // disable the serverTeamRemover actor
double TR_REMOVE_SERVER_TEAM_DELAY; // wait for the specified time before try to remove next server team
double TR_REMOVE_SERVER_TEAM_EXTRA_DELAY; // serverTeamRemover waits for the delay and check DD healthyness again to
// ensure it runs after machineTeamRemover
// Remove wrong storage engines
double DD_REMOVE_STORE_ENGINE_DELAY; // wait for the specified time before remove the next batch
double DD_FAILURE_TIME;
double DD_ZERO_HEALTHY_TEAM_DELAY;
// KeyValueStore SQLITE
int CLEAR_BUFFER_SIZE;
double READ_VALUE_TIME_ESTIMATE;
double READ_RANGE_TIME_ESTIMATE;
double SET_TIME_ESTIMATE;
double CLEAR_TIME_ESTIMATE;
double COMMIT_TIME_ESTIMATE;
int CHECK_FREE_PAGE_AMOUNT;
double DISK_METRIC_LOGGING_INTERVAL;
int64_t SOFT_HEAP_LIMIT;
int SQLITE_PAGE_SCAN_ERROR_LIMIT;
int SQLITE_BTREE_PAGE_USABLE;
int SQLITE_BTREE_CELL_MAX_LOCAL;
int SQLITE_BTREE_CELL_MIN_LOCAL;
int SQLITE_FRAGMENT_PRIMARY_PAGE_USABLE;
int SQLITE_FRAGMENT_OVERFLOW_PAGE_USABLE;
double SQLITE_FRAGMENT_MIN_SAVINGS;
int SQLITE_CHUNK_SIZE_PAGES;
int SQLITE_CHUNK_SIZE_PAGES_SIM;
int SQLITE_READER_THREADS;
int SQLITE_WRITE_WINDOW_LIMIT;
double SQLITE_WRITE_WINDOW_SECONDS;
// KeyValueStoreSqlite spring cleaning
double SPRING_CLEANING_NO_ACTION_INTERVAL;
double SPRING_CLEANING_LAZY_DELETE_INTERVAL;
double SPRING_CLEANING_VACUUM_INTERVAL;
double SPRING_CLEANING_LAZY_DELETE_TIME_ESTIMATE;
double SPRING_CLEANING_VACUUM_TIME_ESTIMATE;
double SPRING_CLEANING_VACUUMS_PER_LAZY_DELETE_PAGE;
int SPRING_CLEANING_MIN_LAZY_DELETE_PAGES;
int SPRING_CLEANING_MAX_LAZY_DELETE_PAGES;
int SPRING_CLEANING_LAZY_DELETE_BATCH_SIZE;
int SPRING_CLEANING_MIN_VACUUM_PAGES;
int SPRING_CLEANING_MAX_VACUUM_PAGES;
// KeyValueStoreMemory
int64_t REPLACE_CONTENTS_BYTES;
// KeyValueStoreRocksDB
int ROCKSDB_BACKGROUND_PARALLELISM;
int ROCKSDB_READ_PARALLELISM;
int64_t ROCKSDB_MEMTABLE_BYTES;
bool ROCKSDB_UNSAFE_AUTO_FSYNC;
int64_t ROCKSDB_PERIODIC_COMPACTION_SECONDS;
int ROCKSDB_PREFIX_LEN;
int64_t ROCKSDB_BLOCK_CACHE_SIZE;
// Leader election
int MAX_NOTIFICATIONS;
int MIN_NOTIFICATIONS;
double NOTIFICATION_FULL_CLEAR_TIME;
double CANDIDATE_MIN_DELAY;
double CANDIDATE_MAX_DELAY;
double CANDIDATE_GROWTH_RATE;
double POLLING_FREQUENCY;
double HEARTBEAT_FREQUENCY;
// Commit CommitProxy
double START_TRANSACTION_BATCH_INTERVAL_MIN;
double START_TRANSACTION_BATCH_INTERVAL_MAX;
double START_TRANSACTION_BATCH_INTERVAL_LATENCY_FRACTION;
double START_TRANSACTION_BATCH_INTERVAL_SMOOTHER_ALPHA;
double START_TRANSACTION_BATCH_QUEUE_CHECK_INTERVAL;
double START_TRANSACTION_MAX_TRANSACTIONS_TO_START;
int START_TRANSACTION_MAX_REQUESTS_TO_START;
double START_TRANSACTION_RATE_WINDOW;
double START_TRANSACTION_MAX_EMPTY_QUEUE_BUDGET;
int START_TRANSACTION_MAX_QUEUE_SIZE;
int KEY_LOCATION_MAX_QUEUE_SIZE;
double COMMIT_TRANSACTION_BATCH_INTERVAL_FROM_IDLE;
double COMMIT_TRANSACTION_BATCH_INTERVAL_MIN;
double COMMIT_TRANSACTION_BATCH_INTERVAL_MAX;
double COMMIT_TRANSACTION_BATCH_INTERVAL_LATENCY_FRACTION;
double COMMIT_TRANSACTION_BATCH_INTERVAL_SMOOTHER_ALPHA;
int COMMIT_TRANSACTION_BATCH_COUNT_MAX;
int COMMIT_TRANSACTION_BATCH_BYTES_MIN;
int COMMIT_TRANSACTION_BATCH_BYTES_MAX;
double COMMIT_TRANSACTION_BATCH_BYTES_SCALE_BASE;
double COMMIT_TRANSACTION_BATCH_BYTES_SCALE_POWER;
int64_t COMMIT_BATCHES_MEM_BYTES_HARD_LIMIT;
double COMMIT_BATCHES_MEM_FRACTION_OF_TOTAL;
double COMMIT_BATCHES_MEM_TO_TOTAL_MEM_SCALE_FACTOR;
double RESOLVER_COALESCE_TIME;
int BUGGIFIED_ROW_LIMIT;
double PROXY_SPIN_DELAY;
double UPDATE_REMOTE_LOG_VERSION_INTERVAL;
int MAX_TXS_POP_VERSION_HISTORY;
double MIN_CONFIRM_INTERVAL;
double ENFORCED_MIN_RECOVERY_DURATION;
double REQUIRED_MIN_RECOVERY_DURATION;
bool ALWAYS_CAUSAL_READ_RISKY;
int MAX_COMMIT_UPDATES;
double MAX_PROXY_COMPUTE;
double MAX_COMPUTE_PER_OPERATION;
int PROXY_COMPUTE_BUCKETS;
double PROXY_COMPUTE_GROWTH_RATE;
int TXN_STATE_SEND_AMOUNT;
double REPORT_TRANSACTION_COST_ESTIMATION_DELAY;
bool PROXY_REJECT_BATCH_QUEUED_TOO_LONG;
int RESET_MASTER_BATCHES;
int RESET_RESOLVER_BATCHES;
double RESET_MASTER_DELAY;
double RESET_RESOLVER_DELAY;
// Master Server
double COMMIT_SLEEP_TIME;
double MIN_BALANCE_TIME;
int64_t MIN_BALANCE_DIFFERENCE;
double SECONDS_BEFORE_NO_FAILURE_DELAY;
int64_t MAX_TXS_SEND_MEMORY;
int64_t MAX_RECOVERY_VERSIONS;
double MAX_RECOVERY_TIME;
double PROVISIONAL_START_DELAY;
double PROVISIONAL_DELAY_GROWTH;
double PROVISIONAL_MAX_DELAY;
double SECONDS_BEFORE_RECRUIT_BACKUP_WORKER;
double CC_INTERFACE_TIMEOUT;
// Resolver
int64_t KEY_BYTES_PER_SAMPLE;
int64_t SAMPLE_OFFSET_PER_KEY;
double SAMPLE_EXPIRATION_TIME;
double SAMPLE_POLL_TIME;
int64_t RESOLVER_STATE_MEMORY_LIMIT;
// Backup Worker
double BACKUP_TIMEOUT; // master's reaction time for backup failure
double BACKUP_NOOP_POP_DELAY;
int BACKUP_FILE_BLOCK_BYTES;
int64_t BACKUP_LOCK_BYTES;
double BACKUP_UPLOAD_DELAY;
// Cluster Controller
double CLUSTER_CONTROLLER_LOGGING_DELAY;
double MASTER_FAILURE_REACTION_TIME;
double MASTER_FAILURE_SLOPE_DURING_RECOVERY;
int WORKER_COORDINATION_PING_DELAY;
double SIM_SHUTDOWN_TIMEOUT;
double SHUTDOWN_TIMEOUT;
double MASTER_SPIN_DELAY;
double CC_CHANGE_DELAY;
double CC_CLASS_DELAY;
double WAIT_FOR_GOOD_RECRUITMENT_DELAY;
double WAIT_FOR_GOOD_REMOTE_RECRUITMENT_DELAY;
double ATTEMPT_RECRUITMENT_DELAY;
double WAIT_FOR_DISTRIBUTOR_JOIN_DELAY;
double WAIT_FOR_RATEKEEPER_JOIN_DELAY;
double WORKER_FAILURE_TIME;
double CHECK_OUTSTANDING_INTERVAL;
double INCOMPATIBLE_PEERS_LOGGING_INTERVAL;
double VERSION_LAG_METRIC_INTERVAL;
int64_t MAX_VERSION_DIFFERENCE;
double FORCE_RECOVERY_CHECK_DELAY;
double RATEKEEPER_FAILURE_TIME;
double REPLACE_INTERFACE_DELAY;
double REPLACE_INTERFACE_CHECK_DELAY;
double COORDINATOR_REGISTER_INTERVAL;
double CLIENT_REGISTER_INTERVAL;
// Knobs used to select the best policy (via monte carlo)
int POLICY_RATING_TESTS; // number of tests per policy (in order to compare)
int POLICY_GENERATIONS; // number of policies to generate
int EXPECTED_MASTER_FITNESS;
int EXPECTED_TLOG_FITNESS;
int EXPECTED_LOG_ROUTER_FITNESS;
int EXPECTED_COMMIT_PROXY_FITNESS;
int EXPECTED_GRV_PROXY_FITNESS;
int EXPECTED_RESOLVER_FITNESS;
double RECRUITMENT_TIMEOUT;
int DBINFO_SEND_AMOUNT;
double DBINFO_BATCH_DELAY;
// Move Keys
double SHARD_READY_DELAY;
double SERVER_READY_QUORUM_INTERVAL;
double SERVER_READY_QUORUM_TIMEOUT;
double REMOVE_RETRY_DELAY;
int MOVE_KEYS_KRM_LIMIT;
int MOVE_KEYS_KRM_LIMIT_BYTES; // This must be sufficiently larger than CLIENT_KNOBS->KEY_SIZE_LIMIT
// (fdbclient/Knobs.h) to ensure that at least two entries will be returned from an
// attempt to read a key range map
int MAX_SKIP_TAGS;
double MAX_ADDED_SOURCES_MULTIPLIER;
// FdbServer
double MIN_REBOOT_TIME;
double MAX_REBOOT_TIME;
std::string LOG_DIRECTORY;
int64_t SERVER_MEM_LIMIT;
double SYSTEM_MONITOR_FREQUENCY;
// Ratekeeper
double SMOOTHING_AMOUNT;
double SLOW_SMOOTHING_AMOUNT;
double METRIC_UPDATE_RATE;
double DETAILED_METRIC_UPDATE_RATE;
double LAST_LIMITED_RATIO;
double RATEKEEPER_DEFAULT_LIMIT;
int64_t TARGET_BYTES_PER_STORAGE_SERVER;
int64_t SPRING_BYTES_STORAGE_SERVER;
int64_t AUTO_TAG_THROTTLE_STORAGE_QUEUE_BYTES;
int64_t TARGET_BYTES_PER_STORAGE_SERVER_BATCH;
int64_t SPRING_BYTES_STORAGE_SERVER_BATCH;
int64_t STORAGE_HARD_LIMIT_BYTES;
int64_t STORAGE_DURABILITY_LAG_HARD_MAX;
int64_t STORAGE_DURABILITY_LAG_SOFT_MAX;
int64_t LOW_PRIORITY_STORAGE_QUEUE_BYTES;
int64_t LOW_PRIORITY_DURABILITY_LAG;
int64_t TARGET_BYTES_PER_TLOG;
int64_t SPRING_BYTES_TLOG;
int64_t TARGET_BYTES_PER_TLOG_BATCH;
int64_t SPRING_BYTES_TLOG_BATCH;
int64_t TLOG_SPILL_THRESHOLD;
int64_t TLOG_HARD_LIMIT_BYTES;
int64_t TLOG_RECOVER_MEMORY_LIMIT;
double TLOG_IGNORE_POP_AUTO_ENABLE_DELAY;
int64_t MAX_MANUAL_THROTTLED_TRANSACTION_TAGS;
int64_t MAX_AUTO_THROTTLED_TRANSACTION_TAGS;
double MIN_TAG_COST;
double AUTO_THROTTLE_TARGET_TAG_BUSYNESS;
double AUTO_THROTTLE_RAMP_TAG_BUSYNESS;
double AUTO_TAG_THROTTLE_RAMP_UP_TIME;
double AUTO_TAG_THROTTLE_DURATION;
double TAG_THROTTLE_PUSH_INTERVAL;
double AUTO_TAG_THROTTLE_START_AGGREGATION_TIME;
double AUTO_TAG_THROTTLE_UPDATE_FREQUENCY;
double TAG_THROTTLE_EXPIRED_CLEANUP_INTERVAL;
bool AUTO_TAG_THROTTLING_ENABLED;
double MAX_TRANSACTIONS_PER_BYTE;
int64_t MIN_AVAILABLE_SPACE;
double MIN_AVAILABLE_SPACE_RATIO;
double TARGET_AVAILABLE_SPACE_RATIO;
double AVAILABLE_SPACE_UPDATE_DELAY;
double MAX_TL_SS_VERSION_DIFFERENCE; // spring starts at half this value
double MAX_TL_SS_VERSION_DIFFERENCE_BATCH;
int MAX_MACHINES_FALLING_BEHIND;
int MAX_TPS_HISTORY_SAMPLES;
int NEEDED_TPS_HISTORY_SAMPLES;
int64_t TARGET_DURABILITY_LAG_VERSIONS;
int64_t AUTO_TAG_THROTTLE_DURABILITY_LAG_VERSIONS;
int64_t TARGET_DURABILITY_LAG_VERSIONS_BATCH;
int64_t DURABILITY_LAG_UNLIMITED_THRESHOLD;
double INITIAL_DURABILITY_LAG_MULTIPLIER;
double DURABILITY_LAG_REDUCTION_RATE;
double DURABILITY_LAG_INCREASE_RATE;
double STORAGE_SERVER_LIST_FETCH_TIMEOUT;
// disk snapshot
int64_t MAX_FORKED_PROCESS_OUTPUT;
double SNAP_CREATE_MAX_TIMEOUT;
// Storage Metrics
double STORAGE_METRICS_AVERAGE_INTERVAL;
double STORAGE_METRICS_AVERAGE_INTERVAL_PER_KSECONDS;
double SPLIT_JITTER_AMOUNT;
int64_t IOPS_UNITS_PER_SAMPLE;
int64_t BANDWIDTH_UNITS_PER_SAMPLE;
int64_t BYTES_READ_UNITS_PER_SAMPLE;
int64_t READ_HOT_SUB_RANGE_CHUNK_SIZE;
int64_t EMPTY_READ_PENALTY;
bool READ_SAMPLING_ENABLED;
// Storage Server
double STORAGE_LOGGING_DELAY;
double STORAGE_SERVER_POLL_METRICS_DELAY;
double FUTURE_VERSION_DELAY;
int STORAGE_LIMIT_BYTES;
int BUGGIFY_LIMIT_BYTES;
int FETCH_BLOCK_BYTES;
int FETCH_KEYS_PARALLELISM_BYTES;
int FETCH_KEYS_LOWER_PRIORITY;
int BUGGIFY_BLOCK_BYTES;
double STORAGE_DURABILITY_LAG_REJECT_THRESHOLD;
double STORAGE_DURABILITY_LAG_MIN_RATE;
int STORAGE_COMMIT_BYTES;
double STORAGE_COMMIT_INTERVAL;
double UPDATE_SHARD_VERSION_INTERVAL;
int BYTE_SAMPLING_FACTOR;
int BYTE_SAMPLING_OVERHEAD;
int MAX_STORAGE_SERVER_WATCH_BYTES;
int MAX_BYTE_SAMPLE_CLEAR_MAP_SIZE;
double LONG_BYTE_SAMPLE_RECOVERY_DELAY;
int BYTE_SAMPLE_LOAD_PARALLELISM;
double BYTE_SAMPLE_LOAD_DELAY;
double BYTE_SAMPLE_START_DELAY;
double UPDATE_STORAGE_PROCESS_STATS_INTERVAL;
double BEHIND_CHECK_DELAY;
int BEHIND_CHECK_COUNT;
int64_t BEHIND_CHECK_VERSIONS;
double WAIT_METRICS_WRONG_SHARD_CHANCE;
int64_t MIN_TAG_READ_PAGES_RATE;
int64_t MIN_TAG_WRITE_PAGES_RATE;
double TAG_MEASUREMENT_INTERVAL;
int64_t READ_COST_BYTE_FACTOR;
bool PREFIX_COMPRESS_KVS_MEM_SNAPSHOTS;
bool REPORT_DD_METRICS;
double DD_METRICS_REPORT_INTERVAL;
double FETCH_KEYS_TOO_LONG_TIME_CRITERIA;
double MAX_STORAGE_COMMIT_TIME;
// Wait Failure
int MAX_OUTSTANDING_WAIT_FAILURE_REQUESTS;
double WAIT_FAILURE_DELAY_LIMIT;
// Worker
double WORKER_LOGGING_INTERVAL;
double HEAP_PROFILER_INTERVAL;
double UNKNOWN_CC_TIMEOUT;
double DEGRADED_RESET_INTERVAL;
double DEGRADED_WARNING_LIMIT;
double DEGRADED_WARNING_RESET_DELAY;
int64_t TRACE_LOG_FLUSH_FAILURE_CHECK_INTERVAL_SECONDS;
double TRACE_LOG_PING_TIMEOUT_SECONDS;
double MIN_DELAY_CC_WORST_FIT_CANDIDACY_SECONDS; // Listen for a leader for N seconds, and if not heard, then try to
// become the leader.
double MAX_DELAY_CC_WORST_FIT_CANDIDACY_SECONDS;
double DBINFO_FAILED_DELAY;
bool ENABLE_WORKER_HEALTH_MONITOR;
double WORKER_HEALTH_MONITOR_INTERVAL; // Interval between two health monitor health check.
int PEER_LATENCY_CHECK_MIN_POPULATION; // The minimum number of latency samples required to check a peer.
double PEER_LATENCY_DEGRADATION_PERCENTILE; // The percentile latency used to check peer health.
double PEER_LATENCY_DEGRADATION_THRESHOLD; // The latency threshold to consider a peer degraded.
double PEER_TIMEOUT_PERCENTAGE_DEGRADATION_THRESHOLD; // The percentage of timeout to consider a peer degraded.
// Test harness
double WORKER_POLL_DELAY;
// Coordination
double COORDINATED_STATE_ONCONFLICT_POLL_INTERVAL;
bool ENABLE_CROSS_CLUSTER_SUPPORT; // Allow a coordinator to serve requests whose connection string does not match
// the local descriptor
// Buggification
double BUGGIFIED_EVENTUAL_CONSISTENCY;
bool BUGGIFY_ALL_COORDINATION;
// Status
double STATUS_MIN_TIME_BETWEEN_REQUESTS;
double MAX_STATUS_REQUESTS_PER_SECOND;
int CONFIGURATION_ROWS_TO_FETCH;
bool DISABLE_DUPLICATE_LOG_WARNING;
double HISTOGRAM_REPORT_INTERVAL;
// IPager
int PAGER_RESERVED_PAGES;
// IndirectShadowPager
int FREE_PAGE_VACUUM_THRESHOLD;
int VACUUM_QUEUE_SIZE;
int VACUUM_BYTES_PER_SECOND;
// Timekeeper
int64_t TIME_KEEPER_DELAY;
int64_t TIME_KEEPER_MAX_ENTRIES;
// Fast Restore
// TODO: After 6.3, review FR knobs, remove unneeded ones and change default value
int64_t FASTRESTORE_FAILURE_TIMEOUT;
int64_t FASTRESTORE_HEARTBEAT_INTERVAL;
double FASTRESTORE_SAMPLING_PERCENT;
int64_t FASTRESTORE_NUM_LOADERS;
int64_t FASTRESTORE_NUM_APPLIERS;
// FASTRESTORE_TXN_BATCH_MAX_BYTES is target txn size used by appliers to apply mutations
double FASTRESTORE_TXN_BATCH_MAX_BYTES;
// FASTRESTORE_VERSIONBATCH_MAX_BYTES is the maximum data size in each version batch
double FASTRESTORE_VERSIONBATCH_MAX_BYTES;
// FASTRESTORE_VB_PARALLELISM is the number of concurrently running version batches
int64_t FASTRESTORE_VB_PARALLELISM;
int64_t FASTRESTORE_VB_MONITOR_DELAY; // How quickly monitor finished version batch
double FASTRESTORE_VB_LAUNCH_DELAY;
int64_t FASTRESTORE_ROLE_LOGGING_DELAY;
int64_t FASTRESTORE_UPDATE_PROCESS_STATS_INTERVAL; // How quickly to update process metrics for restore
int64_t FASTRESTORE_ATOMICOP_WEIGHT; // workload amplication factor for atomic op
int64_t FASTRESTORE_APPLYING_PARALLELISM; // number of outstanding txns writing to dest. DB
int64_t FASTRESTORE_MONITOR_LEADER_DELAY;
int64_t FASTRESTORE_STRAGGLER_THRESHOLD_SECONDS;
bool FASTRESTORE_TRACK_REQUEST_LATENCY; // true to track reply latency of each request in a request batch
bool FASTRESTORE_TRACK_LOADER_SEND_REQUESTS; // track requests of load send mutations to appliers?
int64_t FASTRESTORE_MEMORY_THRESHOLD_MB_SOFT; // threshold when pipelined actors should be delayed
int64_t FASTRESTORE_WAIT_FOR_MEMORY_LATENCY;
int64_t FASTRESTORE_HEARTBEAT_DELAY; // interval for master to ping loaders and appliers
int64_t
FASTRESTORE_HEARTBEAT_MAX_DELAY; // master claim a node is down if no heart beat from the node for this delay
int64_t FASTRESTORE_APPLIER_FETCH_KEYS_SIZE; // number of keys to fetch in a txn on applier
int64_t FASTRESTORE_LOADER_SEND_MUTATION_MSG_BYTES; // desired size of mutation message sent from loader to appliers
bool FASTRESTORE_GET_RANGE_VERSIONS_EXPENSIVE; // parse each range file to get (range, version) it has?
int64_t FASTRESTORE_REQBATCH_PARALLEL; // number of requests to wait on for getBatchReplies()
bool FASTRESTORE_REQBATCH_LOG; // verbose log information for getReplyBatches
int FASTRESTORE_TXN_CLEAR_MAX; // threshold to start tracking each clear op in a txn
int FASTRESTORE_TXN_RETRY_MAX; // threshold to start output error on too many retries
double FASTRESTORE_TXN_EXTRA_DELAY; // extra delay to avoid overwhelming fdb
bool FASTRESTORE_NOT_WRITE_DB; // do not write result to DB. Only for dev testing
bool FASTRESTORE_USE_RANGE_FILE; // use range file in backup
bool FASTRESTORE_USE_LOG_FILE; // use log file in backup
int64_t FASTRESTORE_SAMPLE_MSG_BYTES; // sample message desired size
double FASTRESTORE_SCHED_UPDATE_DELAY; // delay in seconds in updating process metrics
int FASTRESTORE_SCHED_TARGET_CPU_PERCENT; // release as many requests as possible when cpu usage is below the knob
int FASTRESTORE_SCHED_MAX_CPU_PERCENT; // max cpu percent when scheduler shall not release non-urgent requests
int FASTRESTORE_SCHED_INFLIGHT_LOAD_REQS; // number of inflight requests to load backup files
int FASTRESTORE_SCHED_INFLIGHT_SEND_REQS; // number of inflight requests for loaders to send mutations to appliers
int FASTRESTORE_SCHED_LOAD_REQ_BATCHSIZE; // number of load request to release at once
int FASTRESTORE_SCHED_INFLIGHT_SENDPARAM_THRESHOLD; // we can send future VB requests if it is less than this knob
int FASTRESTORE_SCHED_SEND_FUTURE_VB_REQS_BATCH; // number of future VB sendLoadingParam requests to process at once
int FASTRESTORE_NUM_TRACE_EVENTS;
bool FASTRESTORE_EXPENSIVE_VALIDATION; // when set true, performance will be heavily affected
double FASTRESTORE_WRITE_BW_MB; // target aggregated write bandwidth from all appliers
double FASTRESTORE_RATE_UPDATE_SECONDS; // how long to update appliers target write rate
int REDWOOD_DEFAULT_PAGE_SIZE; // Page size for new Redwood files
int REDWOOD_DEFAULT_EXTENT_SIZE; // Extent size for new Redwood files
int REDWOOD_DEFAULT_EXTENT_READ_SIZE; // Extent read size for Redwood files
int REDWOOD_EXTENT_CONCURRENT_READS; // Max number of simultaneous extent disk reads in progress.
int REDWOOD_KVSTORE_CONCURRENT_READS; // Max number of simultaneous point or range reads in progress.
double REDWOOD_PAGE_REBUILD_MAX_SLACK; // When rebuilding pages, max slack to allow in page
int REDWOOD_LAZY_CLEAR_BATCH_SIZE_PAGES; // Number of pages to try to pop from the lazy delete queue and process at
// once
int REDWOOD_LAZY_CLEAR_MIN_PAGES; // Minimum number of pages to free before ending a lazy clear cycle, unless the
// queue is empty
int REDWOOD_LAZY_CLEAR_MAX_PAGES; // Maximum number of pages to free before ending a lazy clear cycle, unless the
// queue is empty
int64_t REDWOOD_REMAP_CLEANUP_WINDOW; // Remap remover lag interval in which to coalesce page writes
double REDWOOD_REMAP_CLEANUP_LAG; // Maximum allowed remap remover lag behind the cleanup window as a multiple of
// the window size
double REDWOOD_LOGGING_INTERVAL;
// Server request latency measurement
int LATENCY_SAMPLE_SIZE;
double LATENCY_METRICS_LOGGING_INTERVAL;
ServerKnobs();
void initialize(bool randomize = false, ClientKnobs* clientKnobs = nullptr, bool isSimulated = false);
};
extern std::unique_ptr<ServerKnobs> globalServerKnobs;
extern ServerKnobs const* SERVER_KNOBS;
#endif
#define SERVER_KNOBS (&IKnobCollection::getGlobalKnobCollection().getServerKnobs())

View File

@ -0,0 +1,487 @@
/*
* LocalConfiguration.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbclient/IKnobCollection.h"
#include "fdbrpc/Stats.h"
#include "fdbserver/ConfigBroadcastFollowerInterface.h"
#include "fdbserver/IKeyValueStore.h"
#include "fdbserver/LocalConfiguration.h"
#include "fdbserver/OnDemandStore.h"
#include "flow/UnitTest.h"
#include "flow/actorcompiler.h" // This must be the last #include.
namespace {
const KeyRef configPathKey = "configPath"_sr;
const KeyRef lastSeenVersionKey = "lastSeenVersion"_sr;
const KeyRangeRef knobOverrideKeys = KeyRangeRef("knobOverride/"_sr, "knobOverride0"_sr);
KeyRef stringToKeyRef(std::string const& s) {
return StringRef(reinterpret_cast<uint8_t const*>(s.c_str()), s.size());
}
} // namespace
class LocalConfigurationImpl {
UID id;
OnDemandStore kvStore;
Future<Void> initFuture;
Version lastSeenVersion{ 0 };
std::unique_ptr<IKnobCollection> testKnobCollection;
class ConfigKnobOverrides {
Standalone<VectorRef<KeyRef>> configPath;
std::map<Optional<Key>, std::map<Key, KnobValue>> configClassToKnobToValue;
public:
ConfigKnobOverrides() = default;
explicit ConfigKnobOverrides(std::string const& paramString) {
configClassToKnobToValue[{}] = {};
if (std::all_of(paramString.begin(), paramString.end(), [](char c) {
return isalpha(c) || isdigit(c) || c == '/' || c == '-';
})) {
StringRef s = stringToKeyRef(paramString);
while (s.size()) {
configPath.push_back_deep(configPath.arena(), s.eat("/"_sr));
configClassToKnobToValue[configPath.back()] = {};
}
} else {
TEST(true); // Invalid configuration path
if (!g_network->isSimulated()) {
fprintf(stderr, "WARNING: Invalid configuration path: `%s'\n", paramString.c_str());
}
throw invalid_config_path();
}
}
ConfigClassSet getConfigClassSet() const { return ConfigClassSet(configPath); }
void set(Optional<KeyRef> configClass, KeyRef knobName, KnobValueRef value) {
configClassToKnobToValue[configClass.castTo<Key>()][knobName] = value;
}
void remove(Optional<KeyRef> configClass, KeyRef knobName) {
configClassToKnobToValue[configClass.castTo<Key>()].erase(knobName);
}
void update(IKnobCollection& knobCollection) const {
// Apply global overrides
const auto& knobToValue = configClassToKnobToValue.at({});
for (const auto& [knobName, knobValue] : knobToValue) {
try {
knobCollection.setKnob(knobName.toString(), knobValue);
} catch (Error& e) {
if (e.code() == error_code_invalid_option_value) {
TEST(true); // invalid knob in configuration database
TraceEvent(SevWarnAlways, "InvalidKnobOptionValue")
.detail("KnobName", knobName)
.detail("KnobValue", knobValue.toString());
} else {
throw e;
}
}
}
// Apply specific overrides
for (const auto& configClass : configPath) {
const auto& knobToValue = configClassToKnobToValue.at(configClass);
for (const auto& [knobName, knobValue] : knobToValue) {
knobCollection.setKnob(knobName.toString(), knobValue);
}
}
}
bool hasSameConfigPath(ConfigKnobOverrides const& other) const { return configPath == other.configPath; }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, configPath);
}
} configKnobOverrides;
class ManualKnobOverrides {
std::map<Key, KnobValue> overrides;
public:
explicit ManualKnobOverrides(std::map<std::string, std::string> const& overrides) {
for (const auto& [knobName, knobValueString] : overrides) {
try {
auto knobValue =
IKnobCollection::parseKnobValue(knobName, knobValueString, IKnobCollection::Type::TEST);
this->overrides[stringToKeyRef(knobName)] = knobValue;
} catch (Error& e) {
if (e.code() == error_code_invalid_option) {
TEST(true); // Attempted to manually set invalid knob option
if (!g_network->isSimulated()) {
fprintf(stderr, "WARNING: Unrecognized knob option '%s'\n", knobName.c_str());
}
TraceEvent(SevWarnAlways, "UnrecognizedKnobOption").detail("Knob", printable(knobName));
} else if (e.code() == error_code_invalid_option_value) {
TEST(true); // Invalid manually set knob value
if (!g_network->isSimulated()) {
fprintf(stderr,
"WARNING: Invalid value '%s' for knob option '%s'\n",
knobValueString.c_str(),
knobName.c_str());
}
TraceEvent(SevWarnAlways, "InvalidKnobValue")
.detail("Knob", printable(knobName))
.detail("Value", printable(knobValueString));
} else {
throw e;
}
}
}
}
void update(IKnobCollection& knobCollection) {
for (const auto& [knobName, knobValue] : overrides) {
knobCollection.setKnob(knobName.toString(), knobValue);
}
}
} manualKnobOverrides;
IKnobCollection& getKnobs() {
return testKnobCollection ? *testKnobCollection : IKnobCollection::getMutableGlobalKnobCollection();
}
IKnobCollection const& getKnobs() const {
return testKnobCollection ? *testKnobCollection : IKnobCollection::getGlobalKnobCollection();
}
CounterCollection cc;
Counter broadcasterChanges;
Counter snapshots;
Counter changeRequestsFetched;
Counter mutations;
Future<Void> logger;
ACTOR static Future<Void> saveConfigPath(LocalConfigurationImpl* self) {
self->kvStore->set(
KeyValueRef(configPathKey, BinaryWriter::toValue(self->configKnobOverrides, IncludeVersion())));
wait(self->kvStore->commit());
return Void();
}
ACTOR static Future<Void> clearKVStore(LocalConfigurationImpl* self) {
self->kvStore->clear(singleKeyRange(configPathKey));
self->kvStore->clear(knobOverrideKeys);
wait(self->kvStore->commit());
return Void();
}
ACTOR static Future<Version> getLastSeenVersion(LocalConfigurationImpl* self) {
state Version result = 0;
state Optional<Value> lastSeenVersionValue = wait(self->kvStore->readValue(lastSeenVersionKey));
if (!lastSeenVersionValue.present()) {
self->kvStore->set(KeyValueRef(lastSeenVersionKey, BinaryWriter::toValue(result, IncludeVersion())));
wait(self->kvStore->commit());
} else {
result = BinaryReader::fromStringRef<Version>(lastSeenVersionValue.get(), IncludeVersion());
}
return result;
}
ACTOR static Future<Void> initialize(LocalConfigurationImpl* self) {
state Version lastSeenVersion = wait(getLastSeenVersion(self));
state Optional<Value> storedConfigPathValue = wait(self->kvStore->readValue(configPathKey));
if (!storedConfigPathValue.present()) {
wait(saveConfigPath(self));
self->updateInMemoryState(lastSeenVersion);
return Void();
}
state ConfigKnobOverrides storedConfigPath =
BinaryReader::fromStringRef<ConfigKnobOverrides>(storedConfigPathValue.get(), IncludeVersion());
if (!storedConfigPath.hasSameConfigPath(self->configKnobOverrides)) {
TEST(true); // All local information is outdated
wait(clearKVStore(self));
wait(saveConfigPath(self));
self->updateInMemoryState(lastSeenVersion);
return Void();
}
Standalone<RangeResultRef> range = wait(self->kvStore->readRange(knobOverrideKeys));
for (const auto& kv : range) {
auto configKey =
BinaryReader::fromStringRef<ConfigKey>(kv.key.removePrefix(knobOverrideKeys.begin), IncludeVersion());
self->configKnobOverrides.set(configKey.configClass,
configKey.knobName,
ObjectReader::fromStringRef<KnobValue>(kv.value, IncludeVersion()));
}
self->updateInMemoryState(lastSeenVersion);
return Void();
}
void updateInMemoryState(Version lastSeenVersion) {
this->lastSeenVersion = lastSeenVersion;
// TODO: Support randomization?
getKnobs().reset(Randomize::NO, g_network->isSimulated() ? IsSimulated::YES : IsSimulated::NO);
configKnobOverrides.update(getKnobs());
manualKnobOverrides.update(getKnobs());
// Must reinitialize in order to update dependent knobs
getKnobs().initialize(Randomize::NO, g_network->isSimulated() ? IsSimulated::YES : IsSimulated::NO);
}
ACTOR static Future<Void> setSnapshot(LocalConfigurationImpl* self,
std::map<ConfigKey, KnobValue> snapshot,
Version snapshotVersion) {
// TODO: Concurrency control?
ASSERT(self->initFuture.isValid() && self->initFuture.isReady());
++self->snapshots;
self->kvStore->clear(knobOverrideKeys);
for (const auto& [configKey, knobValue] : snapshot) {
self->configKnobOverrides.set(configKey.configClass, configKey.knobName, knobValue);
self->kvStore->set(
KeyValueRef(BinaryWriter::toValue(configKey, IncludeVersion()).withPrefix(knobOverrideKeys.begin),
ObjectWriter::toValue(knobValue, IncludeVersion())));
}
ASSERT_GE(snapshotVersion, self->lastSeenVersion);
self->kvStore->set(KeyValueRef(lastSeenVersionKey, BinaryWriter::toValue(snapshotVersion, IncludeVersion())));
wait(self->kvStore->commit());
self->updateInMemoryState(snapshotVersion);
return Void();
}
ACTOR static Future<Void> addChanges(LocalConfigurationImpl* self,
Standalone<VectorRef<VersionedConfigMutationRef>> changes,
Version mostRecentVersion) {
// TODO: Concurrency control?
ASSERT(self->initFuture.isValid() && self->initFuture.isReady());
++self->changeRequestsFetched;
for (const auto& versionedMutation : changes) {
ASSERT_GT(versionedMutation.version, self->lastSeenVersion);
++self->mutations;
const auto& mutation = versionedMutation.mutation;
auto serializedKey = BinaryWriter::toValue(mutation.getKey(), IncludeVersion());
if (mutation.isSet()) {
self->kvStore->set(KeyValueRef(serializedKey.withPrefix(knobOverrideKeys.begin),
ObjectWriter::toValue(mutation.getValue(), IncludeVersion())));
self->configKnobOverrides.set(mutation.getConfigClass(), mutation.getKnobName(), mutation.getValue());
} else {
self->kvStore->clear(singleKeyRange(serializedKey.withPrefix(knobOverrideKeys.begin)));
self->configKnobOverrides.remove(mutation.getConfigClass(), mutation.getKnobName());
}
}
self->kvStore->set(KeyValueRef(lastSeenVersionKey, BinaryWriter::toValue(mostRecentVersion, IncludeVersion())));
wait(self->kvStore->commit());
self->updateInMemoryState(mostRecentVersion);
return Void();
}
ACTOR static Future<Void> consumeInternal(LocalConfigurationImpl* self,
ConfigBroadcastFollowerInterface broadcaster) {
loop {
try {
state ConfigBroadcastFollowerGetChangesReply changesReply =
wait(broadcaster.getChanges.getReply(ConfigBroadcastFollowerGetChangesRequest{
self->lastSeenVersion, self->configKnobOverrides.getConfigClassSet() }));
TraceEvent(SevDebug, "LocalConfigGotChanges", self->id)
.detail("Size", changesReply.changes.size())
.detail("Version", changesReply.mostRecentVersion);
wait(self->addChanges(changesReply.changes, changesReply.mostRecentVersion));
} catch (Error& e) {
if (e.code() == error_code_version_already_compacted) {
state ConfigBroadcastFollowerGetSnapshotReply snapshotReply = wait(broadcaster.getSnapshot.getReply(
ConfigBroadcastFollowerGetSnapshotRequest{ self->configKnobOverrides.getConfigClassSet() }));
ASSERT_GT(snapshotReply.version, self->lastSeenVersion);
++self->snapshots;
wait(setSnapshot(self, std::move(snapshotReply.snapshot), snapshotReply.version));
} else {
throw e;
}
}
wait(yield()); // Necessary to not immediately trigger retry?
}
}
ACTOR static Future<Void> consume(
LocalConfigurationImpl* self,
Reference<IDependentAsyncVar<ConfigBroadcastFollowerInterface> const> broadcaster) {
ASSERT(self->initFuture.isValid() && self->initFuture.isReady());
loop {
choose {
when(wait(brokenPromiseToNever(consumeInternal(self, broadcaster->get())))) { ASSERT(false); }
when(wait(broadcaster->onChange())) { ++self->broadcasterChanges; }
when(wait(self->kvStore->getError())) { ASSERT(false); }
}
}
}
public:
LocalConfigurationImpl(std::string const& dataFolder,
std::string const& configPath,
std::map<std::string, std::string> const& manualKnobOverrides,
IsTest isTest)
: id(deterministicRandom()->randomUniqueID()), kvStore(dataFolder, id, "localconf-"), cc("LocalConfiguration"),
broadcasterChanges("BroadcasterChanges", cc), snapshots("Snapshots", cc),
changeRequestsFetched("ChangeRequestsFetched", cc), mutations("Mutations", cc), configKnobOverrides(configPath),
manualKnobOverrides(manualKnobOverrides) {
if (isTest == IsTest::YES) {
testKnobCollection = IKnobCollection::create(IKnobCollection::Type::TEST,
Randomize::NO,
g_network->isSimulated() ? IsSimulated::YES : IsSimulated::NO);
}
logger = traceCounters(
"LocalConfigurationMetrics", id, SERVER_KNOBS->WORKER_LOGGING_INTERVAL, &cc, "LocalConfigurationMetrics");
}
Future<Void> initialize() {
ASSERT(!initFuture.isValid());
initFuture = initialize(this);
return initFuture;
}
Future<Void> addChanges(Standalone<VectorRef<VersionedConfigMutationRef>> changes, Version mostRecentVersion) {
return addChanges(this, changes, mostRecentVersion);
}
FlowKnobs const& getFlowKnobs() const {
ASSERT(initFuture.isValid() && initFuture.isReady());
return getKnobs().getFlowKnobs();
}
ClientKnobs const& getClientKnobs() const {
ASSERT(initFuture.isValid() && initFuture.isReady());
return getKnobs().getClientKnobs();
}
ServerKnobs const& getServerKnobs() const {
ASSERT(initFuture.isValid() && initFuture.isReady());
return getKnobs().getServerKnobs();
}
TestKnobs const& getTestKnobs() const {
ASSERT(initFuture.isValid() && initFuture.isReady());
return getKnobs().getTestKnobs();
}
Future<Void> consume(Reference<IDependentAsyncVar<ConfigBroadcastFollowerInterface> const> const& broadcaster) {
return consume(this, broadcaster);
}
UID getID() const { return id; }
static void testManualKnobOverridesInvalidName() {
std::map<std::string, std::string> invalidOverrides;
invalidOverrides["knob_name_that_does_not_exist"] = "";
// Should only trace and not throw an error:
ManualKnobOverrides manualKnobOverrides(invalidOverrides);
}
static void testManualKnobOverridesInvalidValue() {
std::map<std::string, std::string> invalidOverrides;
invalidOverrides["test_int"] = "not_an_int";
// Should only trace and not throw an error:
ManualKnobOverrides manualKnobOverrides(invalidOverrides);
}
static void testConfigKnobOverridesInvalidConfigPath() {
try {
ConfigKnobOverrides configKnobOverrides("#invalid_config_path");
ASSERT(false);
} catch (Error& e) {
ASSERT_EQ(e.code(), error_code_invalid_config_path);
}
}
static void testConfigKnobOverridesInvalidName() {
ConfigKnobOverrides configKnobOverrides;
configKnobOverrides.set(
{}, "knob_name_that_does_not_exist"_sr, KnobValueRef::create(ParsedKnobValue(int{ 1 })));
auto testKnobCollection = IKnobCollection::create(IKnobCollection::Type::TEST, Randomize::NO, IsSimulated::NO);
// Should only trace and not throw an error:
configKnobOverrides.update(*testKnobCollection);
}
static void testConfigKnobOverridesInvalidValue() {
ConfigKnobOverrides configKnobOverrides;
configKnobOverrides.set({}, "test_int"_sr, KnobValueRef::create(ParsedKnobValue("not_an_int")));
auto testKnobCollection = IKnobCollection::create(IKnobCollection::Type::TEST, Randomize::NO, IsSimulated::NO);
// Should only trace and not throw an error:
configKnobOverrides.update(*testKnobCollection);
}
};
LocalConfiguration::LocalConfiguration(std::string const& dataFolder,
std::string const& configPath,
std::map<std::string, std::string> const& manualKnobOverrides,
IsTest isTest)
: _impl(std::make_unique<LocalConfigurationImpl>(dataFolder, configPath, manualKnobOverrides, isTest)) {}
LocalConfiguration::LocalConfiguration(LocalConfiguration&&) = default;
LocalConfiguration& LocalConfiguration::operator=(LocalConfiguration&&) = default;
LocalConfiguration::~LocalConfiguration() = default;
Future<Void> LocalConfiguration::initialize() {
return impl().initialize();
}
FlowKnobs const& LocalConfiguration::getFlowKnobs() const {
return impl().getFlowKnobs();
}
ClientKnobs const& LocalConfiguration::getClientKnobs() const {
return impl().getClientKnobs();
}
ServerKnobs const& LocalConfiguration::getServerKnobs() const {
return impl().getServerKnobs();
}
TestKnobs const& LocalConfiguration::getTestKnobs() const {
return impl().getTestKnobs();
}
Future<Void> LocalConfiguration::consume(
Reference<IDependentAsyncVar<ConfigBroadcastFollowerInterface> const> const& broadcaster) {
return impl().consume(broadcaster);
}
Future<Void> LocalConfiguration::addChanges(Standalone<VectorRef<VersionedConfigMutationRef>> changes,
Version mostRecentVersion) {
return impl().addChanges(changes, mostRecentVersion);
}
UID LocalConfiguration::getID() const {
return impl().getID();
}
TEST_CASE("/fdbserver/ConfigDB/ManualKnobOverrides/InvalidName") {
LocalConfigurationImpl::testManualKnobOverridesInvalidName();
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/ManualKnobOverrides/InvalidValue") {
LocalConfigurationImpl::testManualKnobOverridesInvalidValue();
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/ConfigKnobOverrides/InvalidConfigPath") {
LocalConfigurationImpl::testConfigKnobOverridesInvalidConfigPath();
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/ConfigKnobOverrides/InvalidName") {
LocalConfigurationImpl::testConfigKnobOverridesInvalidName();
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/ConfigKnobOverrides/InvalidValue") {
LocalConfigurationImpl::testConfigKnobOverridesInvalidValue();
return Void();
}

View File

@ -0,0 +1,70 @@
/*
* LocalConfiguration.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include <string>
#include "fdbclient/ConfigKnobs.h"
#include "fdbclient/IKnobCollection.h"
#include "fdbserver/ConfigBroadcastFollowerInterface.h"
#include "fdbserver/Knobs.h"
#include "flow/Arena.h"
#include "flow/Knobs.h"
// To be used effectively as a boolean parameter with added type safety
enum class IsTest { NO, YES };
/*
* Each worker maintains a LocalConfiguration object used to update its knob collection.
* When a worker starts, the following steps are executed:
* - Apply manual knob updates
* - Read the local configuration file (with "localconf" prefix)
* - If the stored configuration path does not match the current configuration path, delete the local configuration
* file
* - Otherwise, apply knob updates from the local configuration file (without overriding manual knob overrides)
* - Register with the broadcaster to receive new updates for the relevant configuration classes
* - Persist these updates when received, and restart if necessary
*/
class LocalConfiguration {
std::unique_ptr<class LocalConfigurationImpl> _impl;
LocalConfigurationImpl& impl() { return *_impl; }
LocalConfigurationImpl const& impl() const { return *_impl; }
public:
LocalConfiguration(std::string const& dataFolder,
std::string const& configPath,
std::map<std::string, std::string> const& manualKnobOverrides,
IsTest isTest = IsTest::NO);
LocalConfiguration(LocalConfiguration&&);
LocalConfiguration& operator=(LocalConfiguration&&);
~LocalConfiguration();
Future<Void> initialize();
FlowKnobs const& getFlowKnobs() const;
ClientKnobs const& getClientKnobs() const;
ServerKnobs const& getServerKnobs() const;
TestKnobs const& getTestKnobs() const;
Future<Void> consume(Reference<IDependentAsyncVar<ConfigBroadcastFollowerInterface> const> const& broadcaster);
UID getID() const;
public: // Testing
Future<Void> addChanges(Standalone<VectorRef<VersionedConfigMutationRef>> versionedMutations,
Version mostRecentVersion);
};

View File

@ -28,6 +28,7 @@
struct NetworkTestInterface {
RequestStream<struct NetworkTestRequest> test;
RequestStream<struct NetworkTestStreamingRequest> testStream;
NetworkTestInterface() {}
NetworkTestInterface(NetworkAddress remote);
NetworkTestInterface(INetwork* local);
@ -57,6 +58,29 @@ struct NetworkTestRequest {
}
};
struct NetworkTestStreamingReply : ReplyPromiseStreamReply {
constexpr static FileIdentifier file_identifier = 3726830;
int index = 0;
NetworkTestStreamingReply() = default;
explicit NetworkTestStreamingReply(int index) : index(index) {}
size_t expectedSize() const { return 4e6; /*sizeof(*this);*/ }
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, ReplyPromiseStreamReply::acknowledgeToken, index);
}
};
struct NetworkTestStreamingRequest {
constexpr static FileIdentifier file_identifier = 2794452;
ReplyPromiseStream<struct NetworkTestStreamingReply> reply;
template <class Ar>
void serialize(Ar& ar) {
serializer(ar, reply);
}
};
Future<Void> networkTestServer();
Future<Void> networkTestClient(std::string const& testServers);

View File

@ -0,0 +1,63 @@
/*
* OnDemandStore.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbserver/OnDemandStore.h"
#include "flow/actorcompiler.h" // must be last include
ACTOR static Future<Void> onErr(Future<Future<Void>> e) {
Future<Void> f = wait(e);
wait(f);
return Void();
}
void OnDemandStore::open() {
platform::createDirectory(folder);
store = keyValueStoreMemory(joinPath(folder, prefix), myID, 500e6);
err.send(store->getError());
}
OnDemandStore::OnDemandStore(std::string const& folder, UID myID, std::string const& prefix)
: folder(folder), prefix(prefix), store(nullptr), myID(myID) {}
OnDemandStore::~OnDemandStore() {
if (store) {
store->close();
}
}
IKeyValueStore* OnDemandStore::get() {
if (!store) {
open();
}
return store;
}
bool OnDemandStore::exists() const {
return store || fileExists(joinPath(folder, prefix + "0.fdq")) || fileExists(joinPath(folder, prefix + "1.fdq")) ||
fileExists(joinPath(folder, prefix + ".fdb"));
}
IKeyValueStore* OnDemandStore::operator->() {
return get();
}
Future<Void> OnDemandStore::getError() const {
return onErr(err.getFuture());
}

44
fdbserver/OnDemandStore.h Normal file
View File

@ -0,0 +1,44 @@
/*
* OnDemandStore.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "flow/Arena.h"
#include "flow/IRandom.h"
#include "flow/Platform.h"
#include "fdbserver/IKeyValueStore.h"
// Create a key value store if and only if it is actually used
class OnDemandStore : NonCopyable {
std::string folder;
UID myID;
IKeyValueStore* store;
Promise<Future<Void>> err;
std::string prefix;
void open();
public:
OnDemandStore(std::string const& folder, UID myID, std::string const& prefix);
~OnDemandStore();
IKeyValueStore* get();
bool exists() const;
IKeyValueStore* operator->();
Future<Void> getError() const;
};

View File

@ -0,0 +1,44 @@
/*
* PaxosConfigConsumer.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbserver/PaxosConfigConsumer.h"
class PaxosConfigConsumerImpl {};
PaxosConfigConsumer::PaxosConfigConsumer(ServerCoordinators const& cfi,
Optional<double> pollingInterval,
Optional<double> compactionInterval) {
// TODO: Implement
ASSERT(false);
}
PaxosConfigConsumer::~PaxosConfigConsumer() = default;
Future<Void> PaxosConfigConsumer::consume(ConfigBroadcaster& broadcaster) {
// TODO: Implement
ASSERT(false);
return Void();
}
UID PaxosConfigConsumer::getID() const {
// TODO: Implement
ASSERT(false);
return {};
}

View File

@ -0,0 +1,40 @@
/*
* PaxosConfigConsumer.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbserver/IConfigConsumer.h"
/*
* A fault-tolerant configuration database consumer implementation
*/
class PaxosConfigConsumer : public IConfigConsumer {
std::unique_ptr<class PaxosConfigConsumerImpl> _impl;
PaxosConfigConsumerImpl const& impl() const { return *_impl; }
PaxosConfigConsumerImpl& impl() { return *_impl; }
public:
PaxosConfigConsumer(ServerCoordinators const& cfi,
Optional<double> pollingInterval,
Optional<double> compactionInterval);
~PaxosConfigConsumer();
Future<Void> consume(ConfigBroadcaster& broadcaster) override;
UID getID() const override;
};

View File

@ -0,0 +1,42 @@
/*
* PaxosConfigDatabaseNode.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbserver/PaxosConfigDatabaseNode.h"
class PaxosConfigDatabaseNodeImpl {};
PaxosConfigDatabaseNode::PaxosConfigDatabaseNode(std::string const& folder) {
// TODO: Implement
ASSERT(false);
}
PaxosConfigDatabaseNode::~PaxosConfigDatabaseNode() = default;
Future<Void> PaxosConfigDatabaseNode::serve(ConfigTransactionInterface const& cti) {
// TODO: Implement
ASSERT(false);
return Void();
}
Future<Void> PaxosConfigDatabaseNode::serve(ConfigFollowerInterface const& cfi) {
// TODO: Implement
ASSERT(false);
return Void();
}

View File

@ -0,0 +1,36 @@
/*
* PaxosConfigDatabaseNode.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbserver/IConfigDatabaseNode.h"
/*
* Fault-tolerant configuration database node implementation
*/
class PaxosConfigDatabaseNode : public IConfigDatabaseNode {
std::unique_ptr<class PaxosConfigDatabaseNodeImpl> impl;
public:
PaxosConfigDatabaseNode(std::string const& folder);
~PaxosConfigDatabaseNode();
Future<Void> serve(ConfigTransactionInterface const&) override;
Future<Void> serve(ConfigFollowerInterface const&) override;
};

View File

@ -26,6 +26,7 @@
#define FDBSERVER_SERVERDBINFO_H
#pragma once
#include "fdbserver/ConfigBroadcastFollowerInterface.h"
#include "fdbserver/DataDistributorInterface.h"
#include "fdbserver/MasterInterface.h"
#include "fdbserver/LogSystemConfig.h"
@ -62,6 +63,7 @@ struct ServerDBInfo {
// which need to stay alive in case this recovery fails
Optional<LatencyBandConfig> latencyBandConfig;
int64_t infoGeneration;
ConfigBroadcastFollowerInterface configBroadcaster;
ServerDBInfo()
: recoveryCount(0), recoveryState(RecoveryState::UNINITIALIZED), logSystemConfig(0), infoGeneration(0) {}
@ -85,7 +87,8 @@ struct ServerDBInfo {
logSystemConfig,
priorCommittedLogServers,
latencyBandConfig,
infoGeneration);
infoGeneration,
configBroadcaster);
}
};

View File

@ -0,0 +1,150 @@
/*
* SimpleConfigConsumer.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include "fdbserver/ConfigBroadcastFollowerInterface.h"
#include "fdbserver/SimpleConfigConsumer.h"
class SimpleConfigConsumerImpl {
ConfigFollowerInterface cfi;
Version lastSeenVersion{ 0 };
double pollingInterval;
Optional<double> compactionInterval;
UID id;
CounterCollection cc;
Counter compactRequest;
Counter successfulChangeRequest;
Counter failedChangeRequest;
Counter snapshotRequest;
Future<Void> logger;
ACTOR static Future<Void> compactor(SimpleConfigConsumerImpl* self, ConfigBroadcaster* broadcaster) {
if (!self->compactionInterval.present()) {
wait(Never());
return Void();
}
loop {
state Version compactionVersion = self->lastSeenVersion;
wait(delayJittered(self->compactionInterval.get()));
wait(self->cfi.compact.getReply(ConfigFollowerCompactRequest{ compactionVersion }));
++self->compactRequest;
broadcaster->compact(compactionVersion);
}
}
ACTOR static Future<Void> fetchChanges(SimpleConfigConsumerImpl* self, ConfigBroadcaster* broadcaster) {
wait(getSnapshotAndChanges(self, broadcaster));
loop {
try {
ConfigFollowerGetChangesReply reply =
wait(self->cfi.getChanges.getReply(ConfigFollowerGetChangesRequest{ self->lastSeenVersion }));
++self->successfulChangeRequest;
for (const auto& versionedMutation : reply.changes) {
TraceEvent te(SevDebug, "ConsumerFetchedMutation", self->id);
te.detail("Version", versionedMutation.version)
.detail("ConfigClass", versionedMutation.mutation.getConfigClass())
.detail("KnobName", versionedMutation.mutation.getKnobName());
if (versionedMutation.mutation.isSet()) {
te.detail("Op", "Set").detail("KnobValue", versionedMutation.mutation.getValue().toString());
} else {
te.detail("Op", "Clear");
}
}
ASSERT_GE(reply.mostRecentVersion, self->lastSeenVersion);
if (reply.mostRecentVersion > self->lastSeenVersion) {
self->lastSeenVersion = reply.mostRecentVersion;
broadcaster->applyChanges(reply.changes, reply.mostRecentVersion, reply.annotations);
}
wait(delayJittered(self->pollingInterval));
} catch (Error& e) {
++self->failedChangeRequest;
if (e.code() == error_code_version_already_compacted) {
TEST(true); // SimpleConfigConsumer get version_already_compacted error
wait(getSnapshotAndChanges(self, broadcaster));
} else {
throw e;
}
}
}
}
ACTOR static Future<Void> getSnapshotAndChanges(SimpleConfigConsumerImpl* self, ConfigBroadcaster* broadcaster) {
ConfigFollowerGetSnapshotAndChangesReply reply =
wait(self->cfi.getSnapshotAndChanges.getReply(ConfigFollowerGetSnapshotAndChangesRequest{}));
++self->snapshotRequest;
TraceEvent(SevDebug, "ConfigConsumerGotSnapshotAndChanges", self->id)
.detail("SnapshotVersion", reply.snapshotVersion)
.detail("SnapshotSize", reply.snapshot.size())
.detail("ChangesVersion", reply.changesVersion)
.detail("ChangesSize", reply.changes.size())
.detail("AnnotationsSize", reply.annotations.size());
broadcaster->applySnapshotAndChanges(
std::move(reply.snapshot), reply.snapshotVersion, reply.changes, reply.changesVersion, reply.annotations);
ASSERT_GE(reply.changesVersion, self->lastSeenVersion);
self->lastSeenVersion = reply.changesVersion;
return Void();
}
static ConfigFollowerInterface getConfigFollowerInterface(ConfigFollowerInterface const& cfi) { return cfi; }
static ConfigFollowerInterface getConfigFollowerInterface(ServerCoordinators const& coordinators) {
return ConfigFollowerInterface(coordinators.configServers[0]);
}
public:
template <class ConfigSource>
SimpleConfigConsumerImpl(ConfigSource const& configSource,
double const& pollingInterval,
Optional<double> const& compactionInterval)
: pollingInterval(pollingInterval), compactionInterval(compactionInterval),
id(deterministicRandom()->randomUniqueID()), cc("ConfigConsumer"), compactRequest("CompactRequest", cc),
successfulChangeRequest("SuccessfulChangeRequest", cc), failedChangeRequest("FailedChangeRequest", cc),
snapshotRequest("SnapshotRequest", cc) {
cfi = getConfigFollowerInterface(configSource);
logger = traceCounters(
"ConfigConsumerMetrics", id, SERVER_KNOBS->WORKER_LOGGING_INTERVAL, &cc, "ConfigConsumerMetrics");
}
Future<Void> consume(ConfigBroadcaster& broadcaster) {
return fetchChanges(this, &broadcaster) || compactor(this, &broadcaster);
}
UID getID() const { return id; }
};
SimpleConfigConsumer::SimpleConfigConsumer(ConfigFollowerInterface const& cfi,
double pollingInterval,
Optional<double> compactionInterval)
: _impl(std::make_unique<SimpleConfigConsumerImpl>(cfi, pollingInterval, compactionInterval)) {}
SimpleConfigConsumer::SimpleConfigConsumer(ServerCoordinators const& coordinators,
double pollingInterval,
Optional<double> compactionInterval)
: _impl(std::make_unique<SimpleConfigConsumerImpl>(coordinators, pollingInterval, compactionInterval)) {}
Future<Void> SimpleConfigConsumer::consume(ConfigBroadcaster& broadcaster) {
return impl().consume(broadcaster);
}
SimpleConfigConsumer::~SimpleConfigConsumer() = default;
UID SimpleConfigConsumer::getID() const {
return impl().getID();
}

View File

@ -0,0 +1,49 @@
/*
* SimpleConfigConsumer.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbserver/IConfigConsumer.h"
#include "fdbserver/LocalConfiguration.h"
#include <memory>
/*
* A test-only configuration database consumer implementation that interacts with a single-node
* configuration database. It assumed that a single coordinator (the lowest coordinator by IP address)
* stores all data, so there is no fault tolerance.
*/
class SimpleConfigConsumer : public IConfigConsumer {
std::unique_ptr<class SimpleConfigConsumerImpl> _impl;
SimpleConfigConsumerImpl const& impl() const { return *_impl; }
SimpleConfigConsumerImpl& impl() { return *_impl; }
public:
SimpleConfigConsumer(ServerCoordinators const& coordinators,
double pollingInterval,
Optional<double> compactionInterval);
~SimpleConfigConsumer();
Future<Void> consume(ConfigBroadcaster& broadcaster) override;
UID getID() const override;
public: // Testing
SimpleConfigConsumer(ConfigFollowerInterface const& cfi,
double pollingInterval,
Optional<double> compactionInterval);
};

View File

@ -0,0 +1,495 @@
/*
* SimpleConfigDatabaseNode.actor.cpp
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <map>
#include "fdbclient/SystemData.h"
#include "fdbserver/SimpleConfigDatabaseNode.h"
#include "fdbserver/IKeyValueStore.h"
#include "fdbserver/OnDemandStore.h"
#include "flow/Arena.h"
#include "flow/genericactors.actor.h"
#include "flow/UnitTest.h"
#include "flow/actorcompiler.h" // This must be the last #include.
namespace {
const KeyRef lastCompactedVersionKey = "lastCompactedVersion"_sr;
const KeyRef liveTransactionVersionKey = "liveTransactionVersion"_sr;
const KeyRef committedVersionKey = "committedVersion"_sr;
const KeyRangeRef kvKeys = KeyRangeRef("kv/"_sr, "kv0"_sr);
const KeyRangeRef mutationKeys = KeyRangeRef("mutation/"_sr, "mutation0"_sr);
const KeyRangeRef annotationKeys = KeyRangeRef("annotation/"_sr, "annotation0"_sr);
Key versionedAnnotationKey(Version version) {
ASSERT_GE(version, 0);
return BinaryWriter::toValue(bigEndian64(version), IncludeVersion()).withPrefix(annotationKeys.begin);
}
Version getVersionFromVersionedAnnotationKey(KeyRef versionedAnnotationKey) {
return fromBigEndian64(BinaryReader::fromStringRef<uint64_t>(
versionedAnnotationKey.removePrefix(annotationKeys.begin), IncludeVersion()));
}
Key versionedMutationKey(Version version, uint32_t index) {
ASSERT_GE(version, 0);
BinaryWriter bw(IncludeVersion());
bw << bigEndian64(version);
bw << bigEndian32(index);
return bw.toValue().withPrefix(mutationKeys.begin);
}
Version getVersionFromVersionedMutationKey(KeyRef versionedMutationKey) {
uint64_t bigEndianResult;
ASSERT(versionedMutationKey.startsWith(mutationKeys.begin));
BinaryReader br(versionedMutationKey.removePrefix(mutationKeys.begin), IncludeVersion());
br >> bigEndianResult;
return fromBigEndian64(bigEndianResult);
}
} //namespace
TEST_CASE("/fdbserver/ConfigDB/SimpleConfigDatabaseNode/Internal/versionedMutationKeys") {
std::vector<Key> keys;
for (Version version = 0; version < 1000; ++version) {
for (int index = 0; index < 5; ++index) {
keys.push_back(versionedMutationKey(version, index));
}
}
for (int i = 0; i < 5000; ++i) {
ASSERT(getVersionFromVersionedMutationKey(keys[i]) == i / 5);
}
return Void();
}
TEST_CASE("/fdbserver/ConfigDB/SimpleConfigDatabaseNode/Internal/versionedMutationKeyOrdering") {
Standalone<VectorRef<KeyRef>> keys;
for (Version version = 0; version < 1000; ++version) {
for (auto index = 0; index < 5; ++index) {
keys.push_back_deep(keys.arena(), versionedMutationKey(version, index));
}
}
for (auto index = 0; index < 1000; ++index) {
keys.push_back_deep(keys.arena(), versionedMutationKey(1000, index));
}
ASSERT(std::is_sorted(keys.begin(), keys.end()));
return Void();
}
class SimpleConfigDatabaseNodeImpl {
UID id;
OnDemandStore kvStore;
CounterCollection cc;
// Follower counters
Counter compactRequests;
Counter successfulChangeRequests;
Counter failedChangeRequests;
Counter snapshotRequests;
// Transaction counters
Counter successfulCommits;
Counter failedCommits;
Counter setMutations;
Counter clearMutations;
Counter getValueRequests;
Counter newVersionRequests;
Future<Void> logger;
ACTOR static Future<Version> getLiveTransactionVersion(SimpleConfigDatabaseNodeImpl *self) {
Optional<Value> value = wait(self->kvStore->readValue(liveTransactionVersionKey));
state Version liveTransactionVersion = 0;
if (value.present()) {
liveTransactionVersion = BinaryReader::fromStringRef<Version>(value.get(), IncludeVersion());
} else {
self->kvStore->set(KeyValueRef(liveTransactionVersionKey, BinaryWriter::toValue(liveTransactionVersion, IncludeVersion())));
wait(self->kvStore->commit());
}
return liveTransactionVersion;
}
ACTOR static Future<Version> getCommittedVersion(SimpleConfigDatabaseNodeImpl *self) {
Optional<Value> value = wait(self->kvStore->readValue(committedVersionKey));
state Version committedVersion = 0;
if (value.present()) {
committedVersion = BinaryReader::fromStringRef<Version>(value.get(), IncludeVersion());
} else {
self->kvStore->set(KeyValueRef(committedVersionKey, BinaryWriter::toValue(committedVersion, IncludeVersion())));
wait(self->kvStore->commit());
}
return committedVersion;
}
ACTOR static Future<Version> getLastCompactedVersion(SimpleConfigDatabaseNodeImpl* self) {
Optional<Value> value = wait(self->kvStore->readValue(lastCompactedVersionKey));
state Version lastCompactedVersion = 0;
if (value.present()) {
lastCompactedVersion = BinaryReader::fromStringRef<Version>(value.get(), IncludeVersion());
} else {
self->kvStore->set(
KeyValueRef(lastCompactedVersionKey, BinaryWriter::toValue(lastCompactedVersion, IncludeVersion())));
wait(self->kvStore->commit());
}
return lastCompactedVersion;
}
// Returns all commit annotations between for commits with version in [startVersion, endVersion]
ACTOR static Future<Standalone<VectorRef<VersionedConfigCommitAnnotationRef>>>
getAnnotations(SimpleConfigDatabaseNodeImpl* self, Version startVersion, Version endVersion) {
Key startKey = versionedAnnotationKey(startVersion);
Key endKey = versionedAnnotationKey(endVersion + 1);
state KeyRangeRef keys(startKey, endKey);
Standalone<RangeResultRef> range = wait(self->kvStore->readRange(keys));
Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> result;
for (const auto& kv : range) {
auto version = getVersionFromVersionedAnnotationKey(kv.key);
ASSERT_LE(version, endVersion);
auto annotation = BinaryReader::fromStringRef<ConfigCommitAnnotation>(kv.value, IncludeVersion());
result.emplace_back_deep(result.arena(), version, annotation);
}
return result;
}
// Returns all mutations with version in [startVersion, endVersion]
ACTOR static Future<Standalone<VectorRef<VersionedConfigMutationRef>>>
getMutations(SimpleConfigDatabaseNodeImpl* self, Version startVersion, Version endVersion) {
Key startKey = versionedMutationKey(startVersion, 0);
Key endKey = versionedMutationKey(endVersion + 1, 0);
state KeyRangeRef keys(startKey, endKey);
Standalone<RangeResultRef> range = wait(self->kvStore->readRange(keys));
Standalone<VectorRef<VersionedConfigMutationRef>> result;
for (const auto &kv : range) {
auto version = getVersionFromVersionedMutationKey(kv.key);
ASSERT_LE(version, endVersion);
auto mutation = ObjectReader::fromStringRef<ConfigMutation>(kv.value, IncludeVersion());
result.emplace_back_deep(result.arena(), version, mutation);
}
return result;
}
ACTOR static Future<Void> getChanges(SimpleConfigDatabaseNodeImpl *self, ConfigFollowerGetChangesRequest req) {
Version lastCompactedVersion = wait(getLastCompactedVersion(self));
if (req.lastSeenVersion < lastCompactedVersion) {
++self->failedChangeRequests;
req.reply.sendError(version_already_compacted());
return Void();
}
state Version committedVersion = wait(getCommittedVersion(self));
state Standalone<VectorRef<VersionedConfigMutationRef>> versionedMutations =
wait(getMutations(self, req.lastSeenVersion + 1, committedVersion));
state Standalone<VectorRef<VersionedConfigCommitAnnotationRef>> versionedAnnotations =
wait(getAnnotations(self, req.lastSeenVersion + 1, committedVersion));
TraceEvent(SevDebug, "ConfigDatabaseNodeSendingChanges")
.detail("ReqLastSeenVersion", req.lastSeenVersion)
.detail("CommittedVersion", committedVersion)
.detail("NumMutations", versionedMutations.size())
.detail("NumCommits", versionedAnnotations.size());
++self->successfulChangeRequests;
req.reply.send(ConfigFollowerGetChangesReply{ committedVersion, versionedMutations, versionedAnnotations });
return Void();
}
// New transactions increment the database's current live version. This effectively serves as a lock, providing
// serializability
ACTOR static Future<Void> getNewVersion(SimpleConfigDatabaseNodeImpl* self, ConfigTransactionGetVersionRequest req) {
state Version currentVersion = wait(getLiveTransactionVersion(self));
self->kvStore->set(KeyValueRef(liveTransactionVersionKey, BinaryWriter::toValue(++currentVersion, IncludeVersion())));
wait(self->kvStore->commit());
req.reply.send(ConfigTransactionGetVersionReply(currentVersion));
return Void();
}
ACTOR static Future<Void> get(SimpleConfigDatabaseNodeImpl* self, ConfigTransactionGetRequest req) {
Version currentVersion = wait(getLiveTransactionVersion(self));
if (req.version != currentVersion) {
req.reply.sendError(transaction_too_old());
return Void();
}
state Optional<Value> serializedValue =
wait(self->kvStore->readValue(BinaryWriter::toValue(req.key, IncludeVersion()).withPrefix(kvKeys.begin)));
state Optional<KnobValue> value;
if (serializedValue.present()) {
value = ObjectReader::fromStringRef<KnobValue>(serializedValue.get(), IncludeVersion());
}
Standalone<VectorRef<VersionedConfigMutationRef>> versionedMutations = wait(getMutations(self, 0, req.version));
for (const auto &versionedMutation : versionedMutations) {
const auto &mutation = versionedMutation.mutation;
if (mutation.getKey() == req.key) {
if (mutation.isSet()) {
value = mutation.getValue();
} else {
value = {};
}
}
}
req.reply.send(ConfigTransactionGetReply{ value });
return Void();
}
// Retrieve all configuration classes that contain explicitly defined knobs
// TODO: Currently it is possible that extra configuration classes may be returned, we
// may want to fix this to clean up the contract
ACTOR static Future<Void> getConfigClasses(SimpleConfigDatabaseNodeImpl* self,
ConfigTransactionGetConfigClassesRequest req) {
Version currentVersion = wait(getLiveTransactionVersion(self));
if (req.version != currentVersion) {
req.reply.sendError(transaction_too_old());
return Void();
}
state Standalone<RangeResultRef> snapshot = wait(self->kvStore->readRange(kvKeys));
state std::set<Key> configClassesSet;
for (const auto& kv : snapshot) {
auto configKey =
BinaryReader::fromStringRef<ConfigKey>(kv.key.removePrefix(kvKeys.begin), IncludeVersion());
if (configKey.configClass.present()) {
configClassesSet.insert(configKey.configClass.get());
}
}
state Version lastCompactedVersion = wait(getLastCompactedVersion(self));
state Standalone<VectorRef<VersionedConfigMutationRef>> mutations =
wait(getMutations(self, lastCompactedVersion + 1, req.version));
for (const auto& versionedMutation : mutations) {
auto configClass = versionedMutation.mutation.getConfigClass();
if (configClass.present()) {
configClassesSet.insert(configClass.get());
}
}
Standalone<VectorRef<KeyRef>> configClasses;
for (const auto& configClass : configClassesSet) {
configClasses.push_back_deep(configClasses.arena(), configClass);
}
req.reply.send(ConfigTransactionGetConfigClassesReply{ configClasses });
return Void();
}
// Retrieve all knobs explicitly defined for the specified configuration class
ACTOR static Future<Void> getKnobs(SimpleConfigDatabaseNodeImpl* self, ConfigTransactionGetKnobsRequest req) {
Version currentVersion = wait(getLiveTransactionVersion(self));
if (req.version != currentVersion) {
req.reply.sendError(transaction_too_old());
return Void();
}
// FIXME: Filtering after reading from disk is very inefficient
state Standalone<RangeResultRef> snapshot = wait(self->kvStore->readRange(kvKeys));
state std::set<Key> knobSet;
for (const auto& kv : snapshot) {
auto configKey =
BinaryReader::fromStringRef<ConfigKey>(kv.key.removePrefix(kvKeys.begin), IncludeVersion());
if (configKey.configClass.template castTo<Key>() == req.configClass) {
knobSet.insert(configKey.knobName);
}
}
state Version lastCompactedVersion = wait(getLastCompactedVersion(self));
state Standalone<VectorRef<VersionedConfigMutationRef>> mutations =
wait(getMutations(self, lastCompactedVersion + 1, req.version));
for (const auto& versionedMutation : mutations) {
if (versionedMutation.mutation.getConfigClass().template castTo<Key>() == req.configClass) {
if (versionedMutation.mutation.isSet()) {
knobSet.insert(versionedMutation.mutation.getKnobName());
} else {
knobSet.erase(versionedMutation.mutation.getKnobName());
}
}
}
Standalone<VectorRef<KeyRef>> knobNames;
for (const auto& knobName : knobSet) {
knobNames.push_back_deep(knobNames.arena(), knobName);
}
req.reply.send(ConfigTransactionGetKnobsReply{ knobNames });
return Void();
}
ACTOR static Future<Void> commit(SimpleConfigDatabaseNodeImpl* self, ConfigTransactionCommitRequest req) {
Version currentVersion = wait(getLiveTransactionVersion(self));
if (req.version != currentVersion) {
++self->failedCommits;
req.reply.sendError(transaction_too_old());
return Void();
}
int index = 0;
for (const auto &mutation : req.mutations) {
Key key = versionedMutationKey(req.version, index++);
Value value = ObjectWriter::toValue(mutation, IncludeVersion());
if (mutation.isSet()) {
TraceEvent("SimpleConfigDatabaseNodeSetting")
.detail("ConfigClass", mutation.getConfigClass())
.detail("KnobName", mutation.getKnobName())
.detail("Value", mutation.getValue().toString())
.detail("Version", req.version);
++self->setMutations;
} else {
++self->clearMutations;
}
self->kvStore->set(KeyValueRef(key, value));
}
self->kvStore->set(
KeyValueRef(versionedAnnotationKey(req.version), BinaryWriter::toValue(req.annotation, IncludeVersion())));
self->kvStore->set(KeyValueRef(committedVersionKey, BinaryWriter::toValue(req.version, IncludeVersion())));
wait(self->kvStore->commit());
++self->successfulCommits;
req.reply.send(Void());
return Void();
}
ACTOR static Future<Void> serve(SimpleConfigDatabaseNodeImpl* self, ConfigTransactionInterface const* cti) {
loop {
choose {
when(ConfigTransactionGetVersionRequest req = waitNext(cti->getVersion.getFuture())) {
++self->newVersionRequests;
wait(getNewVersion(self, req));
}
when(ConfigTransactionGetRequest req = waitNext(cti->get.getFuture())) {
++self->getValueRequests;
wait(get(self, req));
}
when(ConfigTransactionCommitRequest req = waitNext(cti->commit.getFuture())) {
wait(commit(self, req));
}
when(ConfigTransactionGetConfigClassesRequest req = waitNext(cti->getClasses.getFuture())) {
wait(getConfigClasses(self, req));
}
when(ConfigTransactionGetKnobsRequest req = waitNext(cti->getKnobs.getFuture())) {
wait(getKnobs(self, req));
}
when(wait(self->kvStore->getError())) { ASSERT(false); }
}
}
}
ACTOR static Future<Void> getSnapshotAndChanges(SimpleConfigDatabaseNodeImpl* self,
ConfigFollowerGetSnapshotAndChangesRequest req) {
state ConfigFollowerGetSnapshotAndChangesReply reply;
Standalone<RangeResultRef> data = wait(self->kvStore->readRange(kvKeys));
for (const auto& kv : data) {
reply
.snapshot[BinaryReader::fromStringRef<ConfigKey>(kv.key.removePrefix(kvKeys.begin), IncludeVersion())] =
ObjectReader::fromStringRef<KnobValue>(kv.value, IncludeVersion());
}
wait(store(reply.snapshotVersion, getLastCompactedVersion(self)));
wait(store(reply.changesVersion, getCommittedVersion(self)));
wait(store(reply.changes, getMutations(self, reply.snapshotVersion + 1, reply.changesVersion)));
wait(store(reply.annotations, getAnnotations(self, reply.snapshotVersion + 1, reply.changesVersion)));
TraceEvent(SevDebug, "ConfigDatabaseNodeGettingSnapshot", self->id)
.detail("SnapshotVersion", reply.snapshotVersion)
.detail("ChangesVersion", reply.changesVersion)
.detail("SnapshotSize", reply.snapshot.size())
.detail("ChangesSize", reply.changes.size())
.detail("AnnotationsSize", reply.annotations.size());
req.reply.send(reply);
return Void();
}
// Apply mutations from the WAL in mutationKeys into the kvKeys key space.
// Periodic compaction prevents the database from growing too large, and improve read performance.
// However, commit annotations for compacted mutations are lost
ACTOR static Future<Void> compact(SimpleConfigDatabaseNodeImpl* self, ConfigFollowerCompactRequest req) {
state Version lastCompactedVersion = wait(getLastCompactedVersion(self));
TraceEvent(SevDebug, "ConfigDatabaseNodeCompacting", self->id)
.detail("Version", req.version)
.detail("LastCompacted", lastCompactedVersion);
if (req.version <= lastCompactedVersion) {
req.reply.send(Void());
return Void();
}
Standalone<VectorRef<VersionedConfigMutationRef>> versionedMutations =
wait(getMutations(self, lastCompactedVersion + 1, req.version));
self->kvStore->clear(
KeyRangeRef(versionedMutationKey(lastCompactedVersion + 1, 0), versionedMutationKey(req.version + 1, 0)));
self->kvStore->clear(
KeyRangeRef(versionedAnnotationKey(lastCompactedVersion + 1), versionedAnnotationKey(req.version + 1)));
for (const auto& versionedMutation : versionedMutations) {
const auto& version = versionedMutation.version;
const auto& mutation = versionedMutation.mutation;
if (version > req.version) {
break;
} else {
TraceEvent(SevDebug, "ConfigDatabaseNodeCompactionApplyingMutation", self->id)
.detail("IsSet", mutation.isSet())
.detail("MutationVersion", version)
.detail("LastCompactedVersion", lastCompactedVersion)
.detail("ReqVersion", req.version);
auto serializedKey = BinaryWriter::toValue(mutation.getKey(), IncludeVersion());
if (mutation.isSet()) {
self->kvStore->set(KeyValueRef(serializedKey.withPrefix(kvKeys.begin),
ObjectWriter::toValue(mutation.getValue(), IncludeVersion())));
} else {
self->kvStore->clear(singleKeyRange(serializedKey.withPrefix(kvKeys.begin)));
}
lastCompactedVersion = version;
}
}
self->kvStore->set(
KeyValueRef(lastCompactedVersionKey, BinaryWriter::toValue(lastCompactedVersion, IncludeVersion())));
wait(self->kvStore->commit());
req.reply.send(Void());
return Void();
}
ACTOR static Future<Void> serve(SimpleConfigDatabaseNodeImpl* self, ConfigFollowerInterface const* cfi) {
loop {
choose {
when(ConfigFollowerGetSnapshotAndChangesRequest req =
waitNext(cfi->getSnapshotAndChanges.getFuture())) {
++self->snapshotRequests;
wait(getSnapshotAndChanges(self, req));
}
when(ConfigFollowerGetChangesRequest req = waitNext(cfi->getChanges.getFuture())) {
wait(getChanges(self, req));
}
when(ConfigFollowerCompactRequest req = waitNext(cfi->compact.getFuture())) {
++self->compactRequests;
wait(compact(self, req));
}
when(wait(self->kvStore->getError())) { ASSERT(false); }
}
}
}
public:
SimpleConfigDatabaseNodeImpl(std::string const& folder)
: id(deterministicRandom()->randomUniqueID()), kvStore(folder, id, "globalconf-"), cc("ConfigDatabaseNode"),
compactRequests("CompactRequests", cc), successfulChangeRequests("SuccessfulChangeRequests", cc),
failedChangeRequests("FailedChangeRequests", cc), snapshotRequests("SnapshotRequests", cc),
successfulCommits("SuccessfulCommits", cc), failedCommits("FailedCommits", cc),
setMutations("SetMutations", cc), clearMutations("ClearMutations", cc),
getValueRequests("GetValueRequests", cc), newVersionRequests("NewVersionRequests", cc) {
logger = traceCounters(
"ConfigDatabaseNodeMetrics", id, SERVER_KNOBS->WORKER_LOGGING_INTERVAL, &cc, "ConfigDatabaseNode");
TraceEvent(SevDebug, "StartingSimpleConfigDatabaseNode", id).detail("KVStoreAlreadyExists", kvStore.exists());
}
Future<Void> serve(ConfigTransactionInterface const& cti) { return serve(this, &cti); }
Future<Void> serve(ConfigFollowerInterface const& cfi) { return serve(this, &cfi); }
};
SimpleConfigDatabaseNode::SimpleConfigDatabaseNode(std::string const& folder)
: _impl(std::make_unique<SimpleConfigDatabaseNodeImpl>(folder)) {}
SimpleConfigDatabaseNode::~SimpleConfigDatabaseNode() = default;
Future<Void> SimpleConfigDatabaseNode::serve(ConfigTransactionInterface const& cti) {
return impl().serve(cti);
}
Future<Void> SimpleConfigDatabaseNode::serve(ConfigFollowerInterface const& cfi) {
return impl().serve(cfi);
}

View File

@ -0,0 +1,40 @@
/*
* SimpleConfigDatabaseNode.h
*
* This source file is part of the FoundationDB open source project
*
* Copyright 2013-2018 Apple Inc. and the FoundationDB project authors
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "fdbserver/IConfigDatabaseNode.h"
/*
* A test-only configuration database node implementation that assumes all data is stored on a single coordinator.
* As such, there is no need to handle rolling forward or rolling back mutations, because this one node is considered
* the source of truth.
*/
class SimpleConfigDatabaseNode : public IConfigDatabaseNode {
std::unique_ptr<class SimpleConfigDatabaseNodeImpl> _impl;
SimpleConfigDatabaseNodeImpl const& impl() const { return *_impl; }
SimpleConfigDatabaseNodeImpl& impl() { return *_impl; }
public:
SimpleConfigDatabaseNode(std::string const& folder);
~SimpleConfigDatabaseNode();
Future<Void> serve(ConfigTransactionInterface const&) override;
Future<Void> serve(ConfigFollowerInterface const&) override;
};

View File

@ -503,7 +503,10 @@ ACTOR Future<ISimulator::KillType> simulatedFDBDRebooter(Reference<ClusterConnec
"",
"",
-1,
whitelistBinPaths));
whitelistBinPaths,
"",
{},
UseConfigDB::DISABLED));
}
if (runBackupAgents != AgentNone) {
futures.push_back(runBackup(connFile));
@ -1125,6 +1128,7 @@ ACTOR Future<Void> restartSimulatedSystem(vector<Future<Void>>* systemActors,
struct SimulationConfig {
explicit SimulationConfig(const TestConfig& testConfig);
int extraDB;
bool generateFearless;
DatabaseConfiguration db;
@ -1132,11 +1136,23 @@ struct SimulationConfig {
// Simulation layout
int datacenters;
int replication_type;
int machine_count; // Total, not per DC.
int processes_per_machine;
int coordinators;
private:
void setRandomConfig();
void setSimpleConfig();
void setSpecificConfig(const TestConfig& testConfig);
void setDatacenters(const TestConfig& testConfig);
void setStorageEngine(const TestConfig& testConfig);
void setRegions(const TestConfig& testConfig);
void setReplicationType(const TestConfig& testConfig);
void setMachineCount(const TestConfig& testConfig);
void setCoordinators(const TestConfig& testConfig);
void setProcessesPerMachine(const TestConfig& testConfig);
void setTss(const TestConfig& testConfig);
void generateNormalConfig(const TestConfig& testConfig);
};
@ -1156,16 +1172,68 @@ void SimulationConfig::set_config(std::string config) {
StringRef StringRefOf(const char* s) {
return StringRef((uint8_t*)s, strlen(s));
}
// Generates and sets an appropriate configuration for the database according to
// the provided testConfig. Some attributes are randomly generated for more coverage
// of different combinations
void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
set_config("new");
// generateMachineTeamTestConfig set up the number of servers per machine and the number of machines such that
// if we do not remove the surplus server and machine teams, the simulation test will report error.
// This is needed to make sure the number of server (and machine) teams is no larger than the desired number.
bool generateMachineTeamTestConfig = BUGGIFY_WITH_PROB(0.1) ? true : false;
bool generateFearless =
// Set the randomly generated options of the config. Compiled here to easily observe and trace random options
void SimulationConfig::setRandomConfig() {
if (deterministicRandom()->random01() < 0.25) {
db.desiredTLogCount = deterministicRandom()->randomInt(1, 7);
}
if (deterministicRandom()->random01() < 0.25) {
db.commitProxyCount = deterministicRandom()->randomInt(1, 7);
}
if (deterministicRandom()->random01() < 0.25) {
db.grvProxyCount = deterministicRandom()->randomInt(1, 4);
}
if (deterministicRandom()->random01() < 0.25) {
db.resolverCount = deterministicRandom()->randomInt(1, 7);
}
// TraceEvent("SimulatedConfigRandom")
// .detail("DesiredTLogCount", db.desiredTLogCount)
// .detail("CommitProxyCount", db.commitProxyCount)
// .detail("GRVProxyCount", db.grvProxyCount)
// .detail("ResolverCount", db.resolverCount);
if (deterministicRandom()->random01() < 0.5) {
// TraceEvent("SimulatedConfigRandom").detail("PerpetualWiggle", 0);
set_config("perpetual_storage_wiggle=0");
} else {
// TraceEvent("SimulatedConfigRandom").detail("PerpetualWiggle", 1);
set_config("perpetual_storage_wiggle=1");
}
if (deterministicRandom()->random01() < 0.5) {
set_config("backup_worker_enabled:=1");
}
}
// Overwrite DB with simple options, used when simpleConfig is true in the TestConfig
void SimulationConfig::setSimpleConfig() {
db.desiredTLogCount = 1;
db.commitProxyCount = 1;
db.grvProxyCount = 1;
db.resolverCount = 1;
}
// Overwrite previous options with ones specified by TestConfig
void SimulationConfig::setSpecificConfig(const TestConfig& testConfig) {
if (testConfig.desiredTLogCount.present()) {
db.desiredTLogCount = testConfig.desiredTLogCount.get();
}
if (testConfig.commitProxyCount.present()) {
db.commitProxyCount = testConfig.commitProxyCount.get();
}
if (testConfig.grvProxyCount.present()) {
db.grvProxyCount = testConfig.grvProxyCount.get();
}
if (testConfig.resolverCount.present()) {
db.resolverCount = testConfig.resolverCount.get();
}
}
// Sets generateFearless and number of dataCenters based on testConfig details
// The number of datacenters may be overwritten in setRegions
void SimulationConfig::setDatacenters(const TestConfig& testConfig) {
generateFearless =
testConfig.simpleConfig ? false : (testConfig.minimumRegions > 1 || deterministicRandom()->random01() < 0.5);
if (testConfig.generateFearless.present()) {
// overwrite whatever decision we made before
@ -1176,32 +1244,15 @@ void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
? 1
: (generateFearless ? (testConfig.minimumReplication > 0 || deterministicRandom()->random01() < 0.5 ? 4 : 6)
: deterministicRandom()->randomInt(1, 4));
// Overwrite with specific option if present
if (testConfig.datacenters.present()) {
datacenters = testConfig.datacenters.get();
}
if (testConfig.desiredTLogCount.present()) {
db.desiredTLogCount = testConfig.desiredTLogCount.get();
} else if (deterministicRandom()->random01() < 0.25) {
db.desiredTLogCount = deterministicRandom()->randomInt(1, 7);
}
}
if (testConfig.commitProxyCount.present()) {
db.commitProxyCount = testConfig.commitProxyCount.get();
} else if (deterministicRandom()->random01() < 0.25) {
db.commitProxyCount = deterministicRandom()->randomInt(1, 7);
}
if (testConfig.grvProxyCount.present()) {
db.grvProxyCount = testConfig.grvProxyCount.get();
} else if (deterministicRandom()->random01() < 0.25) {
db.grvProxyCount = deterministicRandom()->randomInt(1, 4);
}
if (testConfig.resolverCount.present()) {
db.resolverCount = testConfig.resolverCount.get();
} else if (deterministicRandom()->random01() < 0.25) {
db.resolverCount = deterministicRandom()->randomInt(1, 7);
}
// Sets storage engine based on testConfig details
void SimulationConfig::setStorageEngine(const TestConfig& testConfig) {
int storage_engine_type = deterministicRandom()->randomInt(0, 4);
if (testConfig.storageEngineType.present()) {
storage_engine_type = testConfig.storageEngineType.get();
@ -1238,38 +1289,15 @@ void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
default:
ASSERT(false); // Programmer forgot to adjust cases.
}
}
int tssCount = 0;
if (!testConfig.simpleConfig && !testConfig.disableTss && deterministicRandom()->random01() < 0.25) {
// 1 or 2 tss
tssCount = deterministicRandom()->randomInt(1, 3);
}
// if (deterministicRandom()->random01() < 0.5) {
// set_config("ssd");
// } else {
// set_config("memory");
// }
// set_config("memory");
// set_config("memory-radixtree-beta");
if (deterministicRandom()->random01() < 0.5) {
set_config("perpetual_storage_wiggle=0");
} else {
set_config("perpetual_storage_wiggle=1");
}
// set_config("perpetual_storage_wiggle=1");
if (testConfig.simpleConfig) {
db.desiredTLogCount = 1;
db.commitProxyCount = 1;
db.grvProxyCount = 1;
db.resolverCount = 1;
}
int replication_type = testConfig.simpleConfig
? 1
: (std::max(testConfig.minimumReplication,
datacenters > 4 ? deterministicRandom()->randomInt(1, 3)
: std::min(deterministicRandom()->randomInt(0, 6), 3)));
// Sets replication type and TLogSpillType and Version
void SimulationConfig::setReplicationType(const TestConfig& testConfig) {
replication_type = testConfig.simpleConfig
? 1
: (std::max(testConfig.minimumReplication,
datacenters > 4 ? deterministicRandom()->randomInt(1, 3)
: std::min(deterministicRandom()->randomInt(0, 6), 3)));
if (testConfig.config.present()) {
set_config(testConfig.config.get());
} else {
@ -1330,211 +1358,213 @@ void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
if (deterministicRandom()->random01() < 0.5)
set_config(format("log_spill:=%d", TLogSpillType::DEFAULT));
}
if (deterministicRandom()->random01() < 0.5) {
set_config("backup_worker_enabled:=1");
}
}
}
if (generateFearless || (datacenters == 2 && deterministicRandom()->random01() < 0.5)) {
// The kill region workload relies on the fact that all "0", "2", and "4" are all of the possible primary dcids.
StatusObject primaryObj;
StatusObject primaryDcObj;
primaryDcObj["id"] = "0";
primaryDcObj["priority"] = 2;
StatusArray primaryDcArr;
primaryDcArr.push_back(primaryDcObj);
// Set the regions of the config, including the primary and remote options
// This will also determine the replication types used for satellite and remote.
void SimulationConfig::setRegions(const TestConfig& testConfig) {
// The kill region workload relies on the fact that all "0", "2", and "4" are all of the possible primary dcids.
StatusObject primaryObj;
StatusObject primaryDcObj;
primaryDcObj["id"] = "0";
primaryDcObj["priority"] = 2;
StatusArray primaryDcArr;
primaryDcArr.push_back(primaryDcObj);
StatusObject remoteObj;
StatusObject remoteDcObj;
remoteDcObj["id"] = "1";
remoteDcObj["priority"] = 1;
StatusArray remoteDcArr;
remoteDcArr.push_back(remoteDcObj);
StatusObject remoteObj;
StatusObject remoteDcObj;
remoteDcObj["id"] = "1";
remoteDcObj["priority"] = 1;
StatusArray remoteDcArr;
remoteDcArr.push_back(remoteDcObj);
bool needsRemote = generateFearless;
if (generateFearless) {
if (datacenters > 4) {
// FIXME: we cannot use one satellite replication with more than one satellite per region because
// canKillProcesses does not respect usable_dcs
int satellite_replication_type = deterministicRandom()->randomInt(0, 3);
switch (satellite_replication_type) {
case 0: {
TEST(true); // Simulated cluster using no satellite redundancy mode (>4 datacenters)
break;
}
case 1: {
TEST(true); // Simulated cluster using two satellite fast redundancy mode
primaryObj["satellite_redundancy_mode"] = "two_satellite_fast";
remoteObj["satellite_redundancy_mode"] = "two_satellite_fast";
break;
}
case 2: {
TEST(true); // Simulated cluster using two satellite safe redundancy mode
primaryObj["satellite_redundancy_mode"] = "two_satellite_safe";
remoteObj["satellite_redundancy_mode"] = "two_satellite_safe";
break;
}
default:
ASSERT(false); // Programmer forgot to adjust cases.
}
} else {
int satellite_replication_type = deterministicRandom()->randomInt(0, 5);
switch (satellite_replication_type) {
case 0: {
// FIXME: implement
TEST(true); // Simulated cluster using custom satellite redundancy mode
break;
}
case 1: {
TEST(true); // Simulated cluster using no satellite redundancy mode (<4 datacenters)
break;
}
case 2: {
TEST(true); // Simulated cluster using single satellite redundancy mode
primaryObj["satellite_redundancy_mode"] = "one_satellite_single";
remoteObj["satellite_redundancy_mode"] = "one_satellite_single";
break;
}
case 3: {
TEST(true); // Simulated cluster using double satellite redundancy mode
primaryObj["satellite_redundancy_mode"] = "one_satellite_double";
remoteObj["satellite_redundancy_mode"] = "one_satellite_double";
break;
}
case 4: {
TEST(true); // Simulated cluster using triple satellite redundancy mode
primaryObj["satellite_redundancy_mode"] = "one_satellite_triple";
remoteObj["satellite_redundancy_mode"] = "one_satellite_triple";
break;
}
default:
ASSERT(false); // Programmer forgot to adjust cases.
}
}
if (deterministicRandom()->random01() < 0.25)
primaryObj["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
if (deterministicRandom()->random01() < 0.25)
remoteObj["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
// We cannot run with a remote DC when MAX_READ_TRANSACTION_LIFE_VERSIONS is too small, because the log
// routers will not be able to keep up.
if (testConfig.minimumRegions <= 1 &&
(deterministicRandom()->random01() < 0.25 ||
SERVER_KNOBS->MAX_READ_TRANSACTION_LIFE_VERSIONS < SERVER_KNOBS->VERSIONS_PER_SECOND)) {
TEST(true); // Simulated cluster using one region
needsRemote = false;
} else {
TEST(true); // Simulated cluster using two regions
db.usableRegions = 2;
}
int remote_replication_type = deterministicRandom()->randomInt(0, datacenters > 4 ? 4 : 5);
switch (remote_replication_type) {
bool needsRemote = generateFearless;
if (generateFearless) {
if (datacenters > 4) {
// FIXME: we cannot use one satellite replication with more than one satellite per region because
// canKillProcesses does not respect usable_dcs
int satellite_replication_type = deterministicRandom()->randomInt(0, 3);
switch (satellite_replication_type) {
case 0: {
// FIXME: implement
TEST(true); // Simulated cluster using custom remote redundancy mode
TEST(true); // Simulated cluster using no satellite redundancy mode (>4 datacenters)
break;
}
case 1: {
TEST(true); // Simulated cluster using default remote redundancy mode
TEST(true); // Simulated cluster using two satellite fast redundancy mode
primaryObj["satellite_redundancy_mode"] = "two_satellite_fast";
remoteObj["satellite_redundancy_mode"] = "two_satellite_fast";
break;
}
case 2: {
TEST(true); // Simulated cluster using single remote redundancy mode
set_config("remote_single");
break;
}
case 3: {
TEST(true); // Simulated cluster using double remote redundancy mode
set_config("remote_double");
break;
}
case 4: {
TEST(true); // Simulated cluster using triple remote redundancy mode
set_config("remote_triple");
TEST(true); // Simulated cluster using two satellite safe redundancy mode
primaryObj["satellite_redundancy_mode"] = "two_satellite_safe";
remoteObj["satellite_redundancy_mode"] = "two_satellite_safe";
break;
}
default:
ASSERT(false); // Programmer forgot to adjust cases.
}
if (deterministicRandom()->random01() < 0.25)
db.desiredLogRouterCount = deterministicRandom()->randomInt(1, 7);
if (deterministicRandom()->random01() < 0.25)
db.remoteDesiredTLogCount = deterministicRandom()->randomInt(1, 7);
bool useNormalDCsAsSatellites =
datacenters > 4 && testConfig.minimumRegions < 2 && deterministicRandom()->random01() < 0.3;
StatusObject primarySatelliteObj;
primarySatelliteObj["id"] = useNormalDCsAsSatellites ? "1" : "2";
primarySatelliteObj["priority"] = 1;
primarySatelliteObj["satellite"] = 1;
if (deterministicRandom()->random01() < 0.25)
primarySatelliteObj["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
primaryDcArr.push_back(primarySatelliteObj);
StatusObject remoteSatelliteObj;
remoteSatelliteObj["id"] = useNormalDCsAsSatellites ? "0" : "3";
remoteSatelliteObj["priority"] = 1;
remoteSatelliteObj["satellite"] = 1;
if (deterministicRandom()->random01() < 0.25)
remoteSatelliteObj["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
remoteDcArr.push_back(remoteSatelliteObj);
if (datacenters > 4) {
StatusObject primarySatelliteObjB;
primarySatelliteObjB["id"] = useNormalDCsAsSatellites ? "2" : "4";
primarySatelliteObjB["priority"] = 1;
primarySatelliteObjB["satellite"] = 1;
if (deterministicRandom()->random01() < 0.25)
primarySatelliteObjB["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
primaryDcArr.push_back(primarySatelliteObjB);
StatusObject remoteSatelliteObjB;
remoteSatelliteObjB["id"] = useNormalDCsAsSatellites ? "2" : "5";
remoteSatelliteObjB["priority"] = 1;
remoteSatelliteObjB["satellite"] = 1;
if (deterministicRandom()->random01() < 0.25)
remoteSatelliteObjB["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
remoteDcArr.push_back(remoteSatelliteObjB);
}
if (useNormalDCsAsSatellites) {
datacenters = 3;
}
}
primaryObj["datacenters"] = primaryDcArr;
remoteObj["datacenters"] = remoteDcArr;
StatusArray regionArr;
regionArr.push_back(primaryObj);
if (needsRemote || deterministicRandom()->random01() < 0.5) {
regionArr.push_back(remoteObj);
}
if (needsRemote) {
g_simulator.originalRegions = "regions=" + json_spirit::write_string(json_spirit::mValue(regionArr),
json_spirit::Output_options::none);
StatusArray disablePrimary = regionArr;
disablePrimary[0].get_obj()["datacenters"].get_array()[0].get_obj()["priority"] = -1;
g_simulator.disablePrimary = "regions=" + json_spirit::write_string(json_spirit::mValue(disablePrimary),
json_spirit::Output_options::none);
StatusArray disableRemote = regionArr;
disableRemote[1].get_obj()["datacenters"].get_array()[0].get_obj()["priority"] = -1;
g_simulator.disableRemote = "regions=" + json_spirit::write_string(json_spirit::mValue(disableRemote),
json_spirit::Output_options::none);
} else {
// In order to generate a starting configuration with the remote disabled, do not apply the region
// configuration to the DatabaseConfiguration until after creating the starting conf string.
set_config("regions=" +
json_spirit::write_string(json_spirit::mValue(regionArr), json_spirit::Output_options::none));
int satellite_replication_type = deterministicRandom()->randomInt(0, 5);
switch (satellite_replication_type) {
case 0: {
// FIXME: implement
TEST(true); // Simulated cluster using custom satellite redundancy mode
break;
}
case 1: {
TEST(true); // Simulated cluster using no satellite redundancy mode (<4 datacenters)
break;
}
case 2: {
TEST(true); // Simulated cluster using single satellite redundancy mode
primaryObj["satellite_redundancy_mode"] = "one_satellite_single";
remoteObj["satellite_redundancy_mode"] = "one_satellite_single";
break;
}
case 3: {
TEST(true); // Simulated cluster using double satellite redundancy mode
primaryObj["satellite_redundancy_mode"] = "one_satellite_double";
remoteObj["satellite_redundancy_mode"] = "one_satellite_double";
break;
}
case 4: {
TEST(true); // Simulated cluster using triple satellite redundancy mode
primaryObj["satellite_redundancy_mode"] = "one_satellite_triple";
remoteObj["satellite_redundancy_mode"] = "one_satellite_triple";
break;
}
default:
ASSERT(false); // Programmer forgot to adjust cases.
}
}
if (deterministicRandom()->random01() < 0.25)
primaryObj["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
if (deterministicRandom()->random01() < 0.25)
remoteObj["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
// We cannot run with a remote DC when MAX_READ_TRANSACTION_LIFE_VERSIONS is too small, because the log
// routers will not be able to keep up.
if (testConfig.minimumRegions <= 1 &&
(deterministicRandom()->random01() < 0.25 ||
SERVER_KNOBS->MAX_READ_TRANSACTION_LIFE_VERSIONS < SERVER_KNOBS->VERSIONS_PER_SECOND)) {
TEST(true); // Simulated cluster using one region
needsRemote = false;
} else {
TEST(true); // Simulated cluster using two regions
db.usableRegions = 2;
}
int remote_replication_type = deterministicRandom()->randomInt(0, datacenters > 4 ? 4 : 5);
switch (remote_replication_type) {
case 0: {
// FIXME: implement
TEST(true); // Simulated cluster using custom remote redundancy mode
break;
}
case 1: {
TEST(true); // Simulated cluster using default remote redundancy mode
break;
}
case 2: {
TEST(true); // Simulated cluster using single remote redundancy mode
set_config("remote_single");
break;
}
case 3: {
TEST(true); // Simulated cluster using double remote redundancy mode
set_config("remote_double");
break;
}
case 4: {
TEST(true); // Simulated cluster using triple remote redundancy mode
set_config("remote_triple");
break;
}
default:
ASSERT(false); // Programmer forgot to adjust cases.
}
if (deterministicRandom()->random01() < 0.25)
db.desiredLogRouterCount = deterministicRandom()->randomInt(1, 7);
if (deterministicRandom()->random01() < 0.25)
db.remoteDesiredTLogCount = deterministicRandom()->randomInt(1, 7);
bool useNormalDCsAsSatellites =
datacenters > 4 && testConfig.minimumRegions < 2 && deterministicRandom()->random01() < 0.3;
StatusObject primarySatelliteObj;
primarySatelliteObj["id"] = useNormalDCsAsSatellites ? "1" : "2";
primarySatelliteObj["priority"] = 1;
primarySatelliteObj["satellite"] = 1;
if (deterministicRandom()->random01() < 0.25)
primarySatelliteObj["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
primaryDcArr.push_back(primarySatelliteObj);
StatusObject remoteSatelliteObj;
remoteSatelliteObj["id"] = useNormalDCsAsSatellites ? "0" : "3";
remoteSatelliteObj["priority"] = 1;
remoteSatelliteObj["satellite"] = 1;
if (deterministicRandom()->random01() < 0.25)
remoteSatelliteObj["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
remoteDcArr.push_back(remoteSatelliteObj);
if (datacenters > 4) {
StatusObject primarySatelliteObjB;
primarySatelliteObjB["id"] = useNormalDCsAsSatellites ? "2" : "4";
primarySatelliteObjB["priority"] = 1;
primarySatelliteObjB["satellite"] = 1;
if (deterministicRandom()->random01() < 0.25)
primarySatelliteObjB["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
primaryDcArr.push_back(primarySatelliteObjB);
StatusObject remoteSatelliteObjB;
remoteSatelliteObjB["id"] = useNormalDCsAsSatellites ? "2" : "5";
remoteSatelliteObjB["priority"] = 1;
remoteSatelliteObjB["satellite"] = 1;
if (deterministicRandom()->random01() < 0.25)
remoteSatelliteObjB["satellite_logs"] = deterministicRandom()->randomInt(1, 7);
remoteDcArr.push_back(remoteSatelliteObjB);
}
if (useNormalDCsAsSatellites) {
datacenters = 3;
}
}
primaryObj["datacenters"] = primaryDcArr;
remoteObj["datacenters"] = remoteDcArr;
StatusArray regionArr;
regionArr.push_back(primaryObj);
if (needsRemote || deterministicRandom()->random01() < 0.5) {
regionArr.push_back(remoteObj);
}
if (needsRemote) {
g_simulator.originalRegions =
"regions=" + json_spirit::write_string(json_spirit::mValue(regionArr), json_spirit::Output_options::none);
StatusArray disablePrimary = regionArr;
disablePrimary[0].get_obj()["datacenters"].get_array()[0].get_obj()["priority"] = -1;
g_simulator.disablePrimary = "regions=" + json_spirit::write_string(json_spirit::mValue(disablePrimary),
json_spirit::Output_options::none);
StatusArray disableRemote = regionArr;
disableRemote[1].get_obj()["datacenters"].get_array()[0].get_obj()["priority"] = -1;
g_simulator.disableRemote = "regions=" + json_spirit::write_string(json_spirit::mValue(disableRemote),
json_spirit::Output_options::none);
} else {
// In order to generate a starting configuration with the remote disabled, do not apply the region
// configuration to the DatabaseConfiguration until after creating the starting conf string.
set_config("regions=" +
json_spirit::write_string(json_spirit::mValue(regionArr), json_spirit::Output_options::none));
}
}
// Sets the machine count based on the testConfig. May be overwritten later
// if the end result is not a viable config.
void SimulationConfig::setMachineCount(const TestConfig& testConfig) {
if (testConfig.machineCount.present()) {
machine_count = testConfig.machineCount.get();
} else if (generateFearless && testConfig.minimumReplication > 1) {
@ -1551,6 +1581,10 @@ void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
((db.minDatacentersRequired() > 0) ? datacenters : 1) *
std::max(3, db.minZonesRequiredPerDatacenter()));
machine_count = deterministicRandom()->randomInt(machine_count, std::max(machine_count + 1, extraDB ? 6 : 10));
// generateMachineTeamTestConfig set up the number of servers per machine and the number of machines such that
// if we do not remove the surplus server and machine teams, the simulation test will report error.
// This is needed to make sure the number of server (and machine) teams is no larger than the desired number.
bool generateMachineTeamTestConfig = BUGGIFY_WITH_PROB(0.1) ? true : false;
if (generateMachineTeamTestConfig) {
// When DESIRED_TEAMS_PER_SERVER is set to 1, the desired machine team number is 5
// while the max possible machine team number is 10.
@ -1559,7 +1593,11 @@ void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
machine_count = std::max(machine_count, deterministicRandom()->randomInt(5, extraDB ? 6 : 10));
}
}
}
// Sets the coordinator count based on the testConfig. May be overwritten later
// if the end result is not a viable config.
void SimulationConfig::setCoordinators(const TestConfig& testConfig) {
if (testConfig.coordinators.present()) {
coordinators = testConfig.coordinators.get();
} else {
@ -1569,14 +1607,10 @@ void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
? deterministicRandom()->randomInt(1, std::max(machine_count, 2))
: 1;
}
}
if (testConfig.minimumReplication > 1 && datacenters == 3) {
// low latency tests in 3 data hall mode need 2 other data centers with 2 machines each to avoid waiting for
// logs to recover.
machine_count = std::max(machine_count, 6);
coordinators = 3;
}
// Sets the processes per machine based on the testConfig.
void SimulationConfig::setProcessesPerMachine(const TestConfig& testConfig) {
if (testConfig.processesPerMachine.present()) {
processes_per_machine = testConfig.processesPerMachine.get();
} else if (generateFearless) {
@ -1584,6 +1618,16 @@ void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
} else {
processes_per_machine = deterministicRandom()->randomInt(1, (extraDB ? 14 : 28) / machine_count + 2);
}
}
// Sets the TSS configuration based on the testConfig.
// Also configures the cluster behaviour through setting some flags on the simulator.
void SimulationConfig::setTss(const TestConfig& testConfig) {
int tssCount = 0;
if (!testConfig.simpleConfig && !testConfig.disableTss && deterministicRandom()->random01() < 0.25) {
// 1 or 2 tss
tssCount = deterministicRandom()->randomInt(1, 3);
}
// reduce tss to half of extra non-seed servers that can be recruited in usable regions.
tssCount =
@ -1608,6 +1652,43 @@ void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
}
}
// Generates and sets an appropriate configuration for the database according to
// the provided testConfig. Some attributes are randomly generated for more coverage
// of different combinations
void SimulationConfig::generateNormalConfig(const TestConfig& testConfig) {
set_config("new");
// Some of these options will overwrite one another so the ordering is important.
// This is a bit inefficient but separates the different types of option setting paths for better readability.
setDatacenters(testConfig);
// These 3 sets will only change the settings with trivial logic and low coupling with
// other portions of the configuration. The parameters that are more involved and use
// complex logic will be found in their respective "set----" methods following after.
setRandomConfig();
if (testConfig.simpleConfig) {
setSimpleConfig();
}
setSpecificConfig(testConfig);
setStorageEngine(testConfig);
setReplicationType(testConfig);
if (generateFearless || (datacenters == 2 && deterministicRandom()->random01() < 0.5)) {
setRegions(testConfig);
}
setMachineCount(testConfig);
setCoordinators(testConfig);
if (testConfig.minimumReplication > 1 && datacenters == 3) {
// low latency tests in 3 data hall mode need 2 other data centers with 2 machines each to avoid waiting for
// logs to recover.
machine_count = std::max(machine_count, 6);
coordinators = 3;
}
setProcessesPerMachine(testConfig);
setTss(testConfig);
}
// Configures the system according to the given specifications in order to run
// simulation under the correct conditions
void setupSimulatedSystem(vector<Future<Void>>* systemActors,

View File

@ -2675,7 +2675,8 @@ ACTOR Future<StatusReply> clusterGetStatus(
std::map<NetworkAddress, std::pair<double, OpenDatabaseRequest>>* clientStatus,
ServerCoordinators coordinators,
std::vector<NetworkAddress> incompatibleConnections,
Version datacenterVersionDifference) {
Version datacenterVersionDifference,
ConfigBroadcaster const* configBroadcaster) {
state double tStart = timer();
state JsonBuilderArray messages;
@ -2907,6 +2908,10 @@ ACTOR Future<StatusReply> clusterGetStatus(
statusObj["workload"] = workerStatuses[1];
statusObj["layers"] = workerStatuses[2];
if (configBroadcaster) {
// TODO: Read from coordinators for more up-to-date config database status?
statusObj["configuration_database"] = configBroadcaster->getStatus();
}
// Add qos section if it was populated
if (!qos.empty())

View File

@ -23,6 +23,7 @@
#pragma once
#include "fdbrpc/fdbrpc.h"
#include "fdbserver/ConfigBroadcaster.h"
#include "fdbserver/WorkerInterface.actor.h"
#include "fdbserver/MasterInterface.h"
#include "fdbclient/ClusterInterface.h"
@ -42,6 +43,7 @@ Future<StatusReply> clusterGetStatus(
std::map<NetworkAddress, std::pair<double, OpenDatabaseRequest>>* const& clientStatus,
ServerCoordinators const& coordinators,
std::vector<NetworkAddress> const& incompatibleConnections,
Version const& datacenterVersionDifference);
Version const& datacenterVersionDifference,
ConfigBroadcaster const* const& conifgBroadcaster);
#endif

View File

@ -156,7 +156,7 @@ struct ClusterControllerFullInterface {
bool operator==(ClusterControllerFullInterface const& r) const { return id() == r.id(); }
bool operator!=(ClusterControllerFullInterface const& r) const { return id() != r.id(); }
bool hasMessage() {
bool hasMessage() const {
return clientInterface.hasMessage() || recruitFromConfiguration.getFuture().isReady() ||
recruitRemoteFromConfiguration.getFuture().isReady() || recruitStorage.getFuture().isReady() ||
registerWorker.getFuture().isReady() || getWorkers.getFuture().isReady() ||
@ -828,13 +828,17 @@ ACTOR Future<Void> fdbd(Reference<ClusterConnectionFile> ccf,
std::string metricsConnFile,
std::string metricsPrefix,
int64_t memoryProfilingThreshold,
std::string whitelistBinPaths);
std::string whitelistBinPaths,
std::string configPath,
std::map<std::string, std::string> manualKnobOverrides,
UseConfigDB useConfigDB);
ACTOR Future<Void> clusterController(Reference<ClusterConnectionFile> ccf,
Reference<AsyncVar<Optional<ClusterControllerFullInterface>>> currentCC,
Reference<AsyncVar<ClusterControllerPriorityInfo>> asyncPriorityInfo,
Future<Void> recoveredDiskFiles,
LocalityData locality);
LocalityData locality,
UseConfigDB useConfigDB);
// These servers are started by workerServer
class IKeyValueStore;

View File

@ -35,6 +35,7 @@
#include <boost/algorithm/string.hpp>
#include <boost/interprocess/managed_shared_memory.hpp>
#include "fdbclient/IKnobCollection.h"
#include "fdbclient/NativeAPI.actor.h"
#include "fdbclient/SystemData.h"
#include "fdbclient/versions.h"
@ -91,7 +92,7 @@ enum {
OPT_DCID, OPT_MACHINE_CLASS, OPT_BUGGIFY, OPT_VERSION, OPT_BUILD_FLAGS, OPT_CRASHONERROR, OPT_HELP, OPT_NETWORKIMPL, OPT_NOBUFSTDOUT, OPT_BUFSTDOUTERR,
OPT_TRACECLOCK, OPT_NUMTESTERS, OPT_DEVHELP, OPT_ROLLSIZE, OPT_MAXLOGS, OPT_MAXLOGSSIZE, OPT_KNOB, OPT_UNITTESTPARAM, OPT_TESTSERVERS, OPT_TEST_ON_SERVERS, OPT_METRICSCONNFILE,
OPT_METRICSPREFIX, OPT_LOGGROUP, OPT_LOCALITY, OPT_IO_TRUST_SECONDS, OPT_IO_TRUST_WARN_ONLY, OPT_FILESYSTEM, OPT_PROFILER_RSS_SIZE, OPT_KVFILE,
OPT_TRACE_FORMAT, OPT_WHITELIST_BINPATH, OPT_BLOB_CREDENTIAL_FILE
OPT_TRACE_FORMAT, OPT_WHITELIST_BINPATH, OPT_BLOB_CREDENTIAL_FILE, OPT_CONFIG_PATH, OPT_USE_TEST_CONFIG_DB,
};
CSimpleOpt::SOption g_rgOptions[] = {
@ -174,6 +175,8 @@ CSimpleOpt::SOption g_rgOptions[] = {
{ OPT_TRACE_FORMAT , "--trace_format", SO_REQ_SEP },
{ OPT_WHITELIST_BINPATH, "--whitelist_binpath", SO_REQ_SEP },
{ OPT_BLOB_CREDENTIAL_FILE, "--blob_credential_file", SO_REQ_SEP },
{ OPT_CONFIG_PATH, "--config_path", SO_REQ_SEP },
{ OPT_USE_TEST_CONFIG_DB, "--use_test_config_db", SO_NONE },
#ifndef TLS_DISABLED
TLS_OPTION_FLAGS
@ -954,7 +957,7 @@ struct CLIOptions {
NetworkAddressList publicAddresses, listenAddresses;
const char* targetKey = nullptr;
uint64_t memLimit =
int64_t memLimit =
8LL << 30; // Nice to maintain the same default value for memLimit and SERVER_KNOBS->SERVER_MEM_LIMIT and
// SERVER_KNOBS->COMMIT_BATCHES_MEM_BYTES_HARD_LIMIT
uint64_t storageMemLimit = 1LL << 30;
@ -965,6 +968,7 @@ struct CLIOptions {
bool useNet2 = true;
bool useThreadPool = false;
std::vector<std::pair<std::string, std::string>> knobs;
std::map<std::string, std::string> manualKnobOverrides;
LocalityData localities;
int minTesterCount = 1;
bool testOnServers = false;
@ -976,6 +980,9 @@ struct CLIOptions {
std::vector<std::string> blobCredentials; // used for fast restore workers & backup workers
const char* blobCredsFromENV = nullptr;
std::string configPath;
UseConfigDB useConfigDB{ UseConfigDB::DISABLED };
Reference<ClusterConnectionFile> connectionFile;
Standalone<StringRef> machineId;
UnitTestParameters testParams;
@ -1051,6 +1058,7 @@ private:
}
syn = syn.substr(7);
knobs.emplace_back(syn, args.OptionArg());
manualKnobOverrides[syn] = args.OptionArg();
break;
}
case OPT_UNITTESTPARAM: {
@ -1429,6 +1437,12 @@ private:
} while (t.size() != 0);
}
break;
case OPT_CONFIG_PATH:
configPath = args.OptionArg();
break;
case OPT_USE_TEST_CONFIG_DB:
useConfigDB = UseConfigDB::SIMPLE;
break;
#ifndef TLS_DISABLED
case TLSConfig::OPT_TLS_PLUGIN:
@ -1626,46 +1640,44 @@ int main(int argc, char* argv[]) {
enableBuggify(opts.buggifyEnabled, BuggifyType::General);
if (!globalServerKnobs->setKnob("log_directory", opts.logFolder))
ASSERT(false);
IKnobCollection::setGlobalKnobCollection(IKnobCollection::Type::SERVER,
Randomize::YES,
role == ServerRole::Simulation ? IsSimulated::YES : IsSimulated::NO);
IKnobCollection::getMutableGlobalKnobCollection().setKnob("log_directory", KnobValue::create(opts.logFolder));
if (role != ServerRole::Simulation) {
if (!globalServerKnobs->setKnob("commit_batches_mem_bytes_hard_limit", std::to_string(opts.memLimit)))
ASSERT(false);
IKnobCollection::getMutableGlobalKnobCollection().setKnob("commit_batches_mem_bytes_hard_limit",
KnobValue::create(int64_t{ opts.memLimit }));
}
for (auto k = opts.knobs.begin(); k != opts.knobs.end(); ++k) {
for (const auto& [knobName, knobValueString] : opts.knobs) {
try {
if (!globalFlowKnobs->setKnob(k->first, k->second) &&
!globalClientKnobs->setKnob(k->first, k->second) &&
!globalServerKnobs->setKnob(k->first, k->second)) {
fprintf(stderr, "WARNING: Unrecognized knob option '%s'\n", k->first.c_str());
TraceEvent(SevWarnAlways, "UnrecognizedKnobOption").detail("Knob", printable(k->first));
}
auto& g_knobs = IKnobCollection::getMutableGlobalKnobCollection();
auto knobValue = g_knobs.parseKnobValue(knobName, knobValueString);
g_knobs.setKnob(knobName, knobValue);
} catch (Error& e) {
if (e.code() == error_code_invalid_option_value) {
fprintf(stderr,
"WARNING: Invalid value '%s' for knob option '%s'\n",
k->second.c_str(),
k->first.c_str());
knobName.c_str(),
knobValueString.c_str());
TraceEvent(SevWarnAlways, "InvalidKnobValue")
.detail("Knob", printable(k->first))
.detail("Value", printable(k->second));
.detail("Knob", printable(knobName))
.detail("Value", printable(knobValueString));
} else {
fprintf(stderr, "ERROR: Failed to set knob option '%s': %s\n", k->first.c_str(), e.what());
fprintf(stderr, "ERROR: Failed to set knob option '%s': %s\n", knobName.c_str(), e.what());
TraceEvent(SevError, "FailedToSetKnob")
.detail("Knob", printable(k->first))
.detail("Value", printable(k->second))
.detail("Knob", printable(knobName))
.detail("Value", printable(knobValueString))
.error(e);
throw;
}
}
}
if (!globalServerKnobs->setKnob("server_mem_limit", std::to_string(opts.memLimit)))
ASSERT(false);
IKnobCollection::getMutableGlobalKnobCollection().setKnob("server_mem_limit",
KnobValue::create(int64_t{ opts.memLimit }));
// Reinitialize knobs in order to update knobs that are dependent on explicitly set knobs
globalFlowKnobs->initialize(true, role == ServerRole::Simulation);
globalClientKnobs->initialize(true);
globalServerKnobs->initialize(true, globalClientKnobs.get(), role == ServerRole::Simulation);
IKnobCollection::getMutableGlobalKnobCollection().initialize(
Randomize::YES, role == ServerRole::Simulation ? IsSimulated::YES : IsSimulated::NO);
// evictionPolicyStringToEnum will throw an exception if the string is not recognized as a valid
EvictablePageCache::evictionPolicyStringToEnum(FLOW_KNOBS->CACHE_EVICTION_POLICY);
@ -1786,21 +1798,6 @@ int main(int argc, char* argv[]) {
.detail("MemoryLimit", opts.memLimit)
.trackLatest("ProgramStart");
// Test for TraceEvent length limits
/*std::string foo(4096, 'x');
TraceEvent("TooLongDetail").detail("Contents", foo);
TraceEvent("TooLongEvent")
.detail("Contents1", foo)
.detail("Contents2", foo)
.detail("Contents3", foo)
.detail("Contents4", foo)
.detail("Contents5", foo)
.detail("Contents6", foo)
.detail("Contents7", foo)
.detail("Contents8", foo)
.detail("ExtraTest", 1776);*/
Error::init();
std::set_new_handler(&platform::outOfMemory);
setMemoryQuota(opts.memLimit);
@ -1980,7 +1977,10 @@ int main(int argc, char* argv[]) {
opts.metricsConnFile,
opts.metricsPrefix,
opts.rsssize,
opts.whitelistBinPaths));
opts.whitelistBinPaths,
opts.configPath,
opts.manualKnobOverrides,
opts.useConfigDB));
actors.push_back(histogramReport());
// actors.push_back( recurring( []{}, .001 ) ); // for ASIO latency measurement

View File

@ -267,7 +267,7 @@ struct MasterData : NonCopyable, ReferenceCounted<MasterData> {
safeLocality(tagLocalityInvalid), primaryLocality(tagLocalityInvalid), neverCreated(false),
lastEpochEnd(invalidVersion), liveCommittedVersion(invalidVersion), databaseLocked(false),
minKnownCommittedVersion(invalidVersion), recoveryTransactionVersion(invalidVersion), lastCommitTime(0),
registrationCount(0), version(invalidVersion), lastVersionTime(0), txnStateStore(0), memoryLimit(2e9),
registrationCount(0), version(invalidVersion), lastVersionTime(0), txnStateStore(nullptr), memoryLimit(2e9),
addActor(addActor), hasConfiguration(false), recruitmentStalled(makeReference<AsyncVar<bool>>(false)),
cc("Master", dbgid.toString()), changeCoordinatorsRequests("ChangeCoordinatorsRequests", cc),
getCommitVersionRequests("GetCommitVersionRequests", cc),
@ -294,7 +294,9 @@ ACTOR Future<Void> newCommitProxies(Reference<MasterData> self, RecruitFromConfi
req.recoveryCount = self->cstate.myDBState.recoveryCount + 1;
req.recoveryTransactionVersion = self->recoveryTransactionVersion;
req.firstProxy = i == 0;
TraceEvent("CommitProxyReplies", self->dbgid).detail("WorkerID", recr.commitProxies[i].id());
TraceEvent("CommitProxyReplies", self->dbgid)
.detail("WorkerID", recr.commitProxies[i].id())
.detail("FirstProxy", req.firstProxy ? "True" : "False");
initializationReplies.push_back(
transformErrors(throwErrorOr(recr.commitProxies[i].commitProxy.getReplyUnlessFailedFor(
req, SERVER_KNOBS->TLOG_TIMEOUT, SERVER_KNOBS->MASTER_FAILURE_SLOPE_DURING_RECOVERY)),
@ -953,6 +955,10 @@ ACTOR Future<Void> sendInitialCommitToResolvers(Reference<MasterData> self) {
wait(yield());
}
wait(waitForAll(txnReplies));
TraceEvent("RecoveryInternal", self->dbgid)
.detail("StatusCode", RecoveryStatus::recovery_transaction)
.detail("Status", RecoveryStatus::names[RecoveryStatus::recovery_transaction])
.detail("Step", "SentTxnStateStoreToCommitProxies");
vector<Future<ResolveTransactionBatchReply>> replies;
for (auto& r : self->resolvers) {
@ -965,6 +971,10 @@ ACTOR Future<Void> sendInitialCommitToResolvers(Reference<MasterData> self) {
}
wait(waitForAll(replies));
TraceEvent("RecoveryInternal", self->dbgid)
.detail("StatusCode", RecoveryStatus::recovery_transaction)
.detail("Status", RecoveryStatus::names[RecoveryStatus::recovery_transaction])
.detail("Step", "InitializedAllResolvers");
return Void();
}

View File

@ -92,6 +92,48 @@ ACTOR Future<Void> networkTestServer() {
}
}
ACTOR Future<Void> networkTestStreamingServer() {
state NetworkTestInterface interf( g_network );
state Future<Void> logging = delay( 1.0 );
state double lastTime = now();
state int sent = 0;
state LatencyStats latency;
loop {
try {
choose {
when(state NetworkTestStreamingRequest req = waitNext(interf.testStream.getFuture())) {
state LatencyStats::sample sample = latency.tick();
state int i = 0;
for (; i < 100; ++i) {
wait(req.reply.onReady());
req.reply.send(NetworkTestStreamingReply{ i });
}
req.reply.sendError(end_of_stream());
latency.tock(sample);
sent++;
}
when( wait( logging ) ) {
auto spd = sent / (now() - lastTime);
if (FLOW_KNOBS->NETWORK_TEST_SCRIPT_MODE) {
fprintf(stderr, "%f\t%.3f\t%.3f\n", spd, latency.mean() * 1e6, latency.stddev() * 1e6);
} else {
fprintf(stderr, "responses per second: %f (%f us)\n", spd, latency.mean() * 1e6);
}
latency.reset();
lastTime = now();
sent = 0;
logging = delay( 1.0 );
}
}
} catch (Error &e) {
if(e.code() != error_code_operation_obsolete) {
throw e;
}
}
}
}
static bool moreRequestsPending(int count) {
if (count == -1) {
return false;
@ -128,6 +170,32 @@ ACTOR Future<Void> testClient(std::vector<NetworkTestInterface> interfs,
return Void();
}
ACTOR Future<Void> testClientStream(std::vector<NetworkTestInterface> interfs, int* sent, int* completed,
LatencyStats* latency) {
state std::string request_payload(FLOW_KNOBS->NETWORK_TEST_REQUEST_SIZE, '.');
state LatencyStats::sample sample;
while (moreRequestsPending(*sent)) {
(*sent)++;
sample = latency->tick();
state ReplyPromiseStream<NetworkTestStreamingReply> stream =
interfs[deterministicRandom()->randomInt(0, interfs.size())].testStream.getReplyStream(
NetworkTestStreamingRequest{});
state int j = 0;
try {
loop {
NetworkTestStreamingReply rep = waitNext(stream.getFuture());
ASSERT(rep.index == j++);
}
} catch (Error& e) {
ASSERT(e.code() == error_code_end_of_stream || e.code() == error_code_connection_failed);
}
latency->tock(sample);
(*completed)++;
}
return Void();
}
ACTOR Future<Void> logger(int* sent, int* completed, LatencyStats* latency) {
state double lastTime = now();
state int logged = 0;

View File

@ -639,6 +639,8 @@ public:
FlowLock durableVersionLock;
FlowLock fetchKeysParallelismLock;
int64_t fetchKeysBytesBudget;
AsyncVar<bool> fetchKeysBudgetUsed;
vector<Promise<FetchInjectionInfo*>> readyFetchKeys;
int64_t instanceID;
@ -730,8 +732,8 @@ public:
struct Counters {
CounterCollection cc;
Counter allQueries, getKeyQueries, getValueQueries, getRangeQueries, finishedQueries, lowPriorityQueries,
rowsQueried, bytesQueried, watchQueries, emptyQueries;
Counter allQueries, getKeyQueries, getValueQueries, getRangeQueries, getRangeStreamQueries, finishedQueries,
lowPriorityQueries, rowsQueried, bytesQueried, watchQueries, emptyQueries;
// Bytes of the mutations that have been added to the memory of the storage server. When the data is durable
// and cleared from the memory, we do not subtract it but add it to bytesDurable.
@ -764,21 +766,22 @@ public:
Counters(StorageServer* self)
: cc("StorageServer", self->thisServerID.toString()), getKeyQueries("GetKeyQueries", cc),
getValueQueries("GetValueQueries", cc), getRangeQueries("GetRangeQueries", cc),
allQueries("QueryQueue", cc), finishedQueries("FinishedQueries", cc),
lowPriorityQueries("LowPriorityQueries", cc), rowsQueried("RowsQueried", cc),
bytesQueried("BytesQueried", cc), watchQueries("WatchQueries", cc), emptyQueries("EmptyQueries", cc),
bytesInput("BytesInput", cc), bytesDurable("BytesDurable", cc), bytesFetched("BytesFetched", cc),
mutationBytes("MutationBytes", cc), sampledBytesCleared("SampledBytesCleared", cc),
kvFetched("KVFetched", cc), mutations("Mutations", cc), setMutations("SetMutations", cc),
clearRangeMutations("ClearRangeMutations", cc), atomicMutations("AtomicMutations", cc),
updateBatches("UpdateBatches", cc), updateVersions("UpdateVersions", cc), loops("Loops", cc),
fetchWaitingMS("FetchWaitingMS", cc), fetchWaitingCount("FetchWaitingCount", cc),
fetchExecutingMS("FetchExecutingMS", cc), fetchExecutingCount("FetchExecutingCount", cc),
readsRejected("ReadsRejected", cc), fetchedVersions("FetchedVersions", cc),
fetchesFromLogs("FetchesFromLogs", cc), readLatencySample("ReadLatencyMetrics",
self->thisServerID,
SERVER_KNOBS->LATENCY_METRICS_LOGGING_INTERVAL,
SERVER_KNOBS->LATENCY_SAMPLE_SIZE),
getRangeStreamQueries("GetRangeStreamQueries", cc), allQueries("QueryQueue", cc),
finishedQueries("FinishedQueries", cc), lowPriorityQueries("LowPriorityQueries", cc),
rowsQueried("RowsQueried", cc), bytesQueried("BytesQueried", cc), watchQueries("WatchQueries", cc),
emptyQueries("EmptyQueries", cc), bytesInput("BytesInput", cc), bytesDurable("BytesDurable", cc),
bytesFetched("BytesFetched", cc), mutationBytes("MutationBytes", cc),
sampledBytesCleared("SampledBytesCleared", cc), kvFetched("KVFetched", cc), mutations("Mutations", cc),
setMutations("SetMutations", cc), clearRangeMutations("ClearRangeMutations", cc),
atomicMutations("AtomicMutations", cc), updateBatches("UpdateBatches", cc),
updateVersions("UpdateVersions", cc), loops("Loops", cc), fetchWaitingMS("FetchWaitingMS", cc),
fetchWaitingCount("FetchWaitingCount", cc), fetchExecutingMS("FetchExecutingMS", cc),
fetchExecutingCount("FetchExecutingCount", cc), readsRejected("ReadsRejected", cc),
fetchedVersions("FetchedVersions", cc), fetchesFromLogs("FetchesFromLogs", cc),
readLatencySample("ReadLatencyMetrics",
self->thisServerID,
SERVER_KNOBS->LATENCY_METRICS_LOGGING_INTERVAL,
SERVER_KNOBS->LATENCY_SAMPLE_SIZE),
readLatencyBands("ReadLatencyBands", self->thisServerID, SERVER_KNOBS->STORAGE_LOGGING_DELAY) {
specialCounter(cc, "LastTLogVersion", [self]() { return self->lastTLogVersion; });
specialCounter(cc, "Version", [self]() { return self->version.get(); });
@ -789,17 +792,13 @@ public:
specialCounter(cc, "LocalRate", [self] { return int64_t(self->currentRate() * 100); });
specialCounter(cc, "BytesReadSampleCount", [self]() { return self->metrics.bytesReadSample.queue.size(); });
specialCounter(
cc, "FetchKeysFetchActive", [self]() { return self->fetchKeysParallelismLock.activePermits(); });
specialCounter(cc, "FetchKeysWaiting", [self]() { return self->fetchKeysParallelismLock.waiters(); });
specialCounter(cc, "QueryQueueMax", [self]() { return self->getAndResetMaxQueryQueueSize(); });
specialCounter(cc, "BytesStored", [self]() { return self->metrics.byteSample.getEstimate(allKeys); });
specialCounter(cc, "ActiveWatches", [self]() { return self->numWatches; });
specialCounter(cc, "WatchBytes", [self]() { return self->watchBytes; });
specialCounter(cc, "KvstoreSizeTotal", [self]() { return std::get<0>(self->storage.getSize()); });
specialCounter(cc, "KvstoreNodeTotal", [self]() { return std::get<1>(self->storage.getSize()); });
specialCounter(cc, "KvstoreInlineKey", [self]() { return std::get<2>(self->storage.getSize()); });
@ -813,7 +812,8 @@ public:
db(db), actors(false), lastTLogVersion(0), lastVersionWithData(0), restoredVersion(0),
rebootAfterDurableVersion(std::numeric_limits<Version>::max()), durableInProgress(Void()), versionLag(0),
primaryLocality(tagLocalityInvalid), updateEagerReads(0), shardChangeCounter(0),
fetchKeysParallelismLock(SERVER_KNOBS->FETCH_KEYS_PARALLELISM_BYTES), shuttingDown(false),
fetchKeysParallelismLock(SERVER_KNOBS->FETCH_KEYS_PARALLELISM),
fetchKeysBytesBudget(SERVER_KNOBS->STORAGE_FETCH_BYTES), fetchKeysBudgetUsed(false), shuttingDown(false),
debug_inApplyUpdate(false), debug_lastValidateTime(0), watchBytes(0), numWatches(0), logProtocol(0),
counters(this), tag(invalidTag), maxQueryQueue(0), thisServerID(ssi.id()), tssInQuarantine(false),
readQueueSizeMetric(LiteralStringRef("StorageServer.ReadQueueSize")), behind(false), versionBehind(false),
@ -1268,12 +1268,13 @@ ACTOR Future<Void> getValueQ(StorageServer* data, GetValueRequest req) {
DEBUG_MUTATION("ShardGetValue",
version,
MutationRef(MutationRef::DebugKey, req.key, v.present() ? v.get() : LiteralStringRef("<null>")));
DEBUG_MUTATION(
"ShardGetPath",
version,
MutationRef(MutationRef::DebugKey,
req.key,
path == 0 ? LiteralStringRef("0") : path == 1 ? LiteralStringRef("1") : LiteralStringRef("2")));
DEBUG_MUTATION("ShardGetPath",
version,
MutationRef(MutationRef::DebugKey,
req.key,
path == 0 ? LiteralStringRef("0")
: path == 1 ? LiteralStringRef("1")
: LiteralStringRef("2")));
/*
StorageMetrics m;
@ -2072,6 +2073,186 @@ ACTOR Future<Void> getKeyValuesQ(StorageServer* data, GetKeyValuesRequest req)
return Void();
}
ACTOR Future<Void> getKeyValuesStreamQ(StorageServer* data, GetKeyValuesStreamRequest req)
// Throws a wrong_shard_server if the keys in the request or result depend on data outside this server OR if a large
// selector offset prevents all data from being read in one range read
{
state Span span("SS:getKeyValuesStream"_loc, { req.spanContext });
state int64_t resultSize = 0;
req.reply.setByteLimit(SERVER_KNOBS->RANGESTREAM_LIMIT_BYTES);
++data->counters.getRangeStreamQueries;
++data->counters.allQueries;
++data->readQueueSizeMetric;
data->maxQueryQueue = std::max<int>(
data->maxQueryQueue, data->counters.allQueries.getValue() - data->counters.finishedQueries.getValue());
// Active load balancing runs at a very high priority (to obtain accurate queue lengths)
// so we need to downgrade here
if (SERVER_KNOBS->FETCH_KEYS_LOWER_PRIORITY && req.isFetchKeys) {
wait(delay(0, TaskPriority::FetchKeys));
} else {
wait(delay(0, TaskPriority::DefaultEndpoint));
}
try {
if (req.debugID.present())
g_traceBatch.addEvent(
"TransactionDebug", req.debugID.get().first(), "storageserver.getKeyValuesStream.Before");
state Version version = wait(waitForVersion(data, req.version, span.context));
state uint64_t changeCounter = data->shardChangeCounter;
// try {
state KeyRange shard = getShardKeyRange(data, req.begin);
if (req.debugID.present())
g_traceBatch.addEvent(
"TransactionDebug", req.debugID.get().first(), "storageserver.getKeyValuesStream.AfterVersion");
//.detail("ShardBegin", shard.begin).detail("ShardEnd", shard.end);
//} catch (Error& e) { TraceEvent("WrongShardServer", data->thisServerID).detail("Begin",
//req.begin.toString()).detail("End", req.end.toString()).detail("Version", version).detail("Shard",
//"None").detail("In", "getKeyValues>getShardKeyRange"); throw e; }
if (!selectorInRange(req.end, shard) && !(req.end.isFirstGreaterOrEqual() && req.end.getKey() == shard.end)) {
// TraceEvent("WrongShardServer1", data->thisServerID).detail("Begin",
//req.begin.toString()).detail("End", req.end.toString()).detail("Version", version).detail("ShardBegin",
//shard.begin).detail("ShardEnd", shard.end).detail("In", "getKeyValues>checkShardExtents");
throw wrong_shard_server();
}
state int offset1;
state int offset2;
state Future<Key> fBegin = req.begin.isFirstGreaterOrEqual()
? Future<Key>(req.begin.getKey())
: findKey(data, req.begin, version, shard, &offset1, span.context);
state Future<Key> fEnd = req.end.isFirstGreaterOrEqual()
? Future<Key>(req.end.getKey())
: findKey(data, req.end, version, shard, &offset2, span.context);
state Key begin = wait(fBegin);
state Key end = wait(fEnd);
if (req.debugID.present())
g_traceBatch.addEvent(
"TransactionDebug", req.debugID.get().first(), "storageserver.getKeyValuesStream.AfterKeys");
//.detail("Off1",offset1).detail("Off2",offset2).detail("ReqBegin",req.begin.getKey()).detail("ReqEnd",req.end.getKey());
// Offsets of zero indicate begin/end keys in this shard, which obviously means we can answer the query
// An end offset of 1 is also OK because the end key is exclusive, so if the first key of the next shard is the
// end the last actual key returned must be from this shard. A begin offset of 1 is also OK because then either
// begin is past end or equal to end (so the result is definitely empty)
if ((offset1 && offset1 != 1) || (offset2 && offset2 != 1)) {
TEST(true); // wrong_shard_server due to offset in rangeStream
// We could detect when offset1 takes us off the beginning of the database or offset2 takes us off the end,
// and return a clipped range rather than an error (since that is what the NativeAPI.getRange will do anyway
// via its "slow path"), but we would have to add some flags to the response to encode whether we went off
// the beginning and the end, since it needs that information.
//TraceEvent("WrongShardServer2", data->thisServerID).detail("Begin", req.begin.toString()).detail("End", req.end.toString()).detail("Version", version).detail("ShardBegin", shard.begin).detail("ShardEnd", shard.end).detail("In", "getKeyValues>checkOffsets").detail("BeginKey", begin).detail("EndKey", end).detail("BeginOffset", offset1).detail("EndOffset", offset2);
throw wrong_shard_server();
}
if (begin >= end) {
if (req.debugID.present())
g_traceBatch.addEvent(
"TransactionDebug", req.debugID.get().first(), "storageserver.getKeyValuesStream.Send");
//.detail("Begin",begin).detail("End",end);
GetKeyValuesStreamReply none;
none.version = version;
none.more = false;
data->checkChangeCounter(changeCounter,
KeyRangeRef(std::min<KeyRef>(req.begin.getKey(), req.end.getKey()),
std::max<KeyRef>(req.begin.getKey(), req.end.getKey())));
req.reply.send(none);
req.reply.sendError(end_of_stream());
} else {
loop {
wait(req.reply.onReady());
if (version < data->oldestVersion.get()) {
throw transaction_too_old();
}
state int byteLimit = CLIENT_KNOBS->REPLY_BYTE_LIMIT;
GetKeyValuesReply _r =
wait(readRange(data, version, KeyRangeRef(begin, end), req.limit, &byteLimit, span.context));
GetKeyValuesStreamReply r(_r);
if (req.debugID.present())
g_traceBatch.addEvent("TransactionDebug",
req.debugID.get().first(),
"storageserver.getKeyValuesStream.AfterReadRange");
//.detail("Begin",begin).detail("End",end).detail("SizeOf",r.data.size());
data->checkChangeCounter(
changeCounter,
KeyRangeRef(std::min<KeyRef>(begin, std::min<KeyRef>(req.begin.getKey(), req.end.getKey())),
std::max<KeyRef>(end, std::max<KeyRef>(req.begin.getKey(), req.end.getKey()))));
if (EXPENSIVE_VALIDATION) {
for (int i = 0; i < r.data.size(); i++)
ASSERT(r.data[i].key >= begin && r.data[i].key < end);
ASSERT(r.data.size() <= std::abs(req.limit));
}
/*for( int i = 0; i < r.data.size(); i++ ) {
StorageMetrics m;
m.bytesPerKSecond = r.data[i].expectedSize();
m.iosPerKSecond = 1; //FIXME: this should be 1/r.data.size(), but we cannot do that because it is an int
data->metrics.notify(r.data[i].key, m);
}*/
// For performance concerns, the cost of a range read is billed to the start key and end key of the
// range.
int64_t totalByteSize = 0;
for (int i = 0; i < r.data.size(); i++) {
totalByteSize += r.data[i].expectedSize();
}
if (totalByteSize > 0 && SERVER_KNOBS->READ_SAMPLING_ENABLED) {
int64_t bytesReadPerKSecond = std::max(totalByteSize, SERVER_KNOBS->EMPTY_READ_PENALTY) / 2;
data->metrics.notifyBytesReadPerKSecond(r.data[0].key, bytesReadPerKSecond);
data->metrics.notifyBytesReadPerKSecond(r.data[r.data.size() - 1].key, bytesReadPerKSecond);
}
req.reply.send(r);
data->counters.rowsQueried += r.data.size();
if (r.data.size() == 0) {
++data->counters.emptyQueries;
}
if (!r.more) {
req.reply.sendError(end_of_stream());
break;
}
ASSERT(r.data.size());
if (req.limit >= 0) {
begin = keyAfter(r.data.back().key);
} else {
end = r.data.back().key;
}
if (SERVER_KNOBS->FETCH_KEYS_LOWER_PRIORITY && req.isFetchKeys) {
wait(delay(0, TaskPriority::FetchKeys));
} else {
wait(delay(0, TaskPriority::DefaultEndpoint));
}
data->transactionTagCounter.addRequest(req.tags, resultSize);
}
}
} catch (Error& e) {
if (e.code() != error_code_operation_obsolete) {
if (!canReplyWith(e))
throw;
req.reply.sendError(e);
}
}
data->transactionTagCounter.addRequest(req.tags, resultSize);
++data->counters.finishedQueries;
--data->readQueueSizeMetric;
return Void();
}
ACTOR Future<Void> getKeyQ(StorageServer* data, GetKeyRequest req) {
state Span span("SS:getKey"_loc, { req.spanContext });
state int64_t resultSize = 0;
@ -2504,72 +2685,6 @@ void coalesceShards(StorageServer* data, KeyRangeRef keys) {
}
}
ACTOR Future<RangeResult> tryGetRange(Database cx,
Version version,
KeyRangeRef keys,
GetRangeLimits limits,
bool* isTooOld) {
state Transaction tr(cx);
state RangeResult output;
state KeySelectorRef begin = firstGreaterOrEqual(keys.begin);
state KeySelectorRef end = firstGreaterOrEqual(keys.end);
if (*isTooOld)
throw transaction_too_old();
ASSERT(!cx->switchable);
tr.setVersion(version);
tr.info.taskID = TaskPriority::FetchKeys;
limits.minRows = 0;
try {
loop {
RangeResult rep = wait(tr.getRange(begin, end, limits, true));
limits.decrement(rep);
if (limits.isReached() || !rep.more) {
if (output.size()) {
output.arena().dependsOn(rep.arena());
output.append(output.arena(), rep.begin(), rep.size());
if (limits.isReached() && rep.readThrough.present())
output.readThrough = rep.readThrough.get();
} else {
output = rep;
}
output.more = limits.isReached();
return output;
} else if (rep.readThrough.present()) {
output.arena().dependsOn(rep.arena());
if (rep.size()) {
output.append(output.arena(), rep.begin(), rep.size());
ASSERT(rep.readThrough.get() > rep.end()[-1].key);
} else {
ASSERT(rep.readThrough.get() > keys.begin);
}
begin = firstGreaterOrEqual(rep.readThrough.get());
} else {
output.arena().dependsOn(rep.arena());
output.append(output.arena(), rep.begin(), rep.size());
begin = firstGreaterThan(output.end()[-1].key);
}
}
} catch (Error& e) {
if (begin.getKey() != keys.begin &&
(e.code() == error_code_transaction_too_old || e.code() == error_code_future_version ||
e.code() == error_code_process_behind)) {
if (e.code() == error_code_transaction_too_old)
*isTooOld = true;
output.more = true;
if (begin.isFirstGreaterOrEqual())
output.readThrough = begin.getKey();
return output;
}
throw;
}
}
template <class T>
void addMutation(T& target, Version version, MutationRef const& mutation) {
target.addMutation(version, mutation);
@ -2667,13 +2782,46 @@ public:
}
};
ACTOR Future<Void> tryGetRange(PromiseStream<RangeResult> results, Transaction* tr, KeyRange keys) {
state KeySelectorRef begin = firstGreaterOrEqual(keys.begin);
state KeySelectorRef end = firstGreaterOrEqual(keys.end);
try {
loop {
GetRangeLimits limits(GetRangeLimits::ROW_LIMIT_UNLIMITED, SERVER_KNOBS->FETCH_BLOCK_BYTES);
limits.minRows = 0;
state RangeResult rep = wait(tr->getRange(begin, end, limits, true));
if (!rep.more) {
rep.readThrough = keys.end;
}
results.send(rep);
if (!rep.more) {
results.sendError(end_of_stream());
return Void();
}
if (rep.readThrough.present()) {
begin = firstGreaterOrEqual(rep.readThrough.get());
} else {
begin = firstGreaterThan(rep.end()[-1].key);
}
}
} catch (Error& e) {
if (e.code() == error_code_actor_cancelled) {
throw;
}
results.sendError(e);
throw;
}
}
ACTOR Future<Void> fetchKeys(StorageServer* data, AddingShard* shard) {
state const UID fetchKeysID = deterministicRandom()->randomUniqueID();
state TraceInterval interval("FetchKeys");
state KeyRange keys = shard->keys;
state Future<Void> warningLogger = logFetchKeysWarning(shard);
state const double startTime = now();
state int fetchBlockBytes = BUGGIFY ? SERVER_KNOBS->BUGGIFY_BLOCK_BYTES : SERVER_KNOBS->FETCH_BLOCK_BYTES;
state FetchKeysMetricReporter metricReporter(fetchKeysID,
startTime,
keys,
@ -2714,8 +2862,8 @@ ACTOR Future<Void> fetchKeys(StorageServer* data, AddingShard* shard) {
TraceEvent(SevDebug, "FetchKeysVersionSatisfied", data->thisServerID).detail("FKID", interval.pairID);
wait(data->fetchKeysParallelismLock.take(TaskPriority::DefaultYield, fetchBlockBytes));
state FlowLock::Releaser holdingFKPL(data->fetchKeysParallelismLock, fetchBlockBytes);
wait(data->fetchKeysParallelismLock.take(TaskPriority::DefaultYield));
state FlowLock::Releaser holdingFKPL(data->fetchKeysParallelismLock);
state double executeStart = now();
++data->counters.fetchWaitingCount;
@ -2727,7 +2875,6 @@ ACTOR Future<Void> fetchKeys(StorageServer* data, AddingShard* shard) {
wait(data->durableVersionLock.take());
shard->phase = AddingShard::Fetching;
state Version fetchVersion = data->version.get();
data->durableVersionLock.release();
@ -2747,107 +2894,78 @@ ACTOR Future<Void> fetchKeys(StorageServer* data, AddingShard* shard) {
data->cx->invalidateCache(keys);
loop {
state Transaction tr(data->cx);
state Version fetchVersion = data->version.get();
while (!shard->updates.empty() && shard->updates[0].version <= fetchVersion)
shard->updates.pop_front();
tr.setVersion(fetchVersion);
tr.info.taskID = TaskPriority::FetchKeys;
state PromiseStream<RangeResult> results;
state Future<Void> hold = SERVER_KNOBS->FETCH_USING_STREAMING
? tr.getRangeStream(results, keys, GetRangeLimits(), true)
: tryGetRange(results, &tr, keys);
state Key nfk = keys.begin;
try {
TEST(true); // Fetching keys for transferred shard
loop {
TEST(true); // Fetching keys for transferred shard
while (data->fetchKeysBudgetUsed.get()) {
wait(data->fetchKeysBudgetUsed.onChange());
}
state RangeResult this_block = waitNext(results.getFuture());
state RangeResult this_block =
wait(tryGetRange(data->cx,
fetchVersion,
keys,
GetRangeLimits(GetRangeLimits::ROW_LIMIT_UNLIMITED, fetchBlockBytes),
&isTooOld));
state int expectedBlockSize =
(int)this_block.expectedSize() + (8 - (int)sizeof(KeyValueRef)) * this_block.size();
int expectedSize = (int)this_block.expectedSize() + (8 - (int)sizeof(KeyValueRef)) * this_block.size();
TraceEvent(SevDebug, "FetchKeysBlock", data->thisServerID)
.detail("FKID", interval.pairID)
.detail("BlockRows", this_block.size())
.detail("BlockBytes", expectedBlockSize)
.detail("KeyBegin", keys.begin)
.detail("KeyEnd", keys.end)
.detail("Last", this_block.size() ? this_block.end()[-1].key : std::string())
.detail("Version", fetchVersion)
.detail("More", this_block.more);
DEBUG_KEY_RANGE("fetchRange", fetchVersion, keys);
for (auto k = this_block.begin(); k != this_block.end(); ++k)
DEBUG_MUTATION("fetch", fetchVersion, MutationRef(MutationRef::SetValue, k->key, k->value));
TraceEvent(SevDebug, "FetchKeysBlock", data->thisServerID)
.detail("FKID", interval.pairID)
.detail("BlockRows", this_block.size())
.detail("BlockBytes", expectedSize)
.detail("KeyBegin", keys.begin)
.detail("KeyEnd", keys.end)
.detail("Last", this_block.size() ? this_block.end()[-1].key : std::string())
.detail("Version", fetchVersion)
.detail("More", this_block.more);
DEBUG_KEY_RANGE("fetchRange", fetchVersion, keys);
for (auto k = this_block.begin(); k != this_block.end(); ++k)
DEBUG_MUTATION("fetch", fetchVersion, MutationRef(MutationRef::SetValue, k->key, k->value));
metricReporter.addFetchedBytes(expectedBlockSize, this_block.size());
metricReporter.addFetchedBytes(expectedSize, this_block.size());
// Write this_block to storage
state KeyValueRef* kvItr = this_block.begin();
for (; kvItr != this_block.end(); ++kvItr) {
data->storage.writeKeyValue(*kvItr);
wait(yield());
}
if (fetchBlockBytes > expectedSize) {
holdingFKPL.release(fetchBlockBytes - expectedSize);
}
kvItr = this_block.begin();
for (; kvItr != this_block.end(); ++kvItr) {
data->byteSampleApplySet(*kvItr, invalidVersion);
wait(yield());
}
// Wait for permission to proceed
// wait( data->fetchKeysStorageWriteLock.take() );
// state FlowLock::Releaser holdingFKSWL( data->fetchKeysStorageWriteLock );
ASSERT(this_block.readThrough.present() || this_block.size());
nfk = this_block.readThrough.present() ? this_block.readThrough.get()
: keyAfter(this_block.end()[-1].key);
this_block = RangeResult();
// Write this_block directly to storage, bypassing update() which write to MVCC in memory.
state KeyValueRef* kvItr = this_block.begin();
for (; kvItr != this_block.end(); ++kvItr) {
data->storage.writeKeyValue(*kvItr);
wait(yield());
}
kvItr = this_block.begin();
for (; kvItr != this_block.end(); ++kvItr) {
data->byteSampleApplySet(*kvItr, invalidVersion);
wait(yield());
}
if (this_block.more) {
Key nfk = this_block.readThrough.present() ? this_block.readThrough.get()
: keyAfter(this_block.end()[-1].key);
if (nfk != keys.end) {
std::deque<Standalone<VerUpdateRef>> updatesToSplit = std::move(shard->updates);
// This actor finishes committing the keys [keys.begin,nfk) that we already fetched.
// The remaining unfetched keys [nfk,keys.end) will become a separate AddingShard with its own
// fetchKeys.
shard->server->addShard(ShardInfo::addingSplitLeft(KeyRangeRef(keys.begin, nfk), shard));
shard->server->addShard(ShardInfo::newAdding(data, KeyRangeRef(nfk, keys.end)));
shard = data->shards.rangeContaining(keys.begin).value()->adding.get();
warningLogger = logFetchKeysWarning(shard);
AddingShard* otherShard = data->shards.rangeContaining(nfk).value()->adding.get();
keys = shard->keys;
// Split our prior updates. The ones that apply to our new, restricted key range will go back
// into shard->updates, and the ones delivered to the new shard will be discarded because it is
// in WaitPrevious phase (hasn't chosen a fetchVersion yet). What we are doing here is expensive
// and could get more expensive if we started having many more blocks per shard. May need
// optimization in the future.
std::deque<Standalone<VerUpdateRef>>::iterator u = updatesToSplit.begin();
for (; u != updatesToSplit.end(); ++u) {
splitMutations(data, data->shards, *u);
}
TEST(true); // fetchkeys has more
TEST(shard->updates.size()); // Shard has updates
ASSERT(otherShard->updates.empty());
data->fetchKeysBytesBudget -= expectedBlockSize;
if (data->fetchKeysBytesBudget <= 0) {
data->fetchKeysBudgetUsed.set(true);
}
}
this_block = RangeResult();
if (BUGGIFY)
wait(delay(1));
break;
} catch (Error& e) {
TraceEvent("FKBlockFail", data->thisServerID)
.error(e, true)
.suppressFor(1.0)
.detail("FKID", interval.pairID);
if (e.code() == error_code_transaction_too_old) {
TEST(true); // A storage server has forgotten the history data we are fetching
Version lastFV = fetchVersion;
fetchVersion = data->version.get();
isTooOld = false;
// Throw away deferred updates from before fetchVersion, since we don't need them to use blocks
// fetched at that version
while (!shard->updates.empty() && shard->updates[0].version <= fetchVersion)
shard->updates.pop_front();
if (e.code() != error_code_end_of_stream && e.code() != error_code_connection_failed &&
e.code() != error_code_transaction_too_old && e.code() != error_code_future_version &&
e.code() != error_code_process_behind) {
throw;
}
if (nfk == keys.begin) {
TraceEvent("FKBlockFail", data->thisServerID)
.error(e, true)
.suppressFor(1.0)
.detail("FKID", interval.pairID);
// FIXME: remove when we no longer support upgrades from 5.X
if (debug_getRangeRetries >= 100) {
@ -2861,17 +2979,40 @@ ACTOR Future<Void> fetchKeys(StorageServer* data, AddingShard* shard) {
TraceEvent(SevWarn, "FetchPast", data->thisServerID)
.detail("TotalAttempts", debug_getRangeRetries)
.detail("FKID", interval.pairID)
.detail("V", lastFV)
.detail("N", fetchVersion)
.detail("E", data->version.get());
}
} else if (e.code() == error_code_future_version || e.code() == error_code_process_behind) {
TEST(true); // fetchKeys got future_version or process_behind, so there must be a huge storage lag
// somewhere. Keep trying.
} else {
throw;
wait(delayJittered(FLOW_KNOBS->PREVENT_FAST_SPIN_DELAY));
continue;
}
wait(delayJittered(FLOW_KNOBS->PREVENT_FAST_SPIN_DELAY));
if (nfk < keys.end) {
std::deque<Standalone<VerUpdateRef>> updatesToSplit = std::move(shard->updates);
// This actor finishes committing the keys [keys.begin,nfk) that we already fetched.
// The remaining unfetched keys [nfk,keys.end) will become a separate AddingShard with its own
// fetchKeys.
shard->server->addShard(ShardInfo::addingSplitLeft(KeyRangeRef(keys.begin, nfk), shard));
shard->server->addShard(ShardInfo::newAdding(data, KeyRangeRef(nfk, keys.end)));
shard = data->shards.rangeContaining(keys.begin).value()->adding.get();
warningLogger = logFetchKeysWarning(shard);
AddingShard* otherShard = data->shards.rangeContaining(nfk).value()->adding.get();
keys = shard->keys;
// Split our prior updates. The ones that apply to our new, restricted key range will go back into
// shard->updates, and the ones delivered to the new shard will be discarded because it is in
// WaitPrevious phase (hasn't chosen a fetchVersion yet). What we are doing here is expensive and
// could get more expensive if we started having many more blocks per shard. May need optimization
// in the future.
std::deque<Standalone<VerUpdateRef>>::iterator u = updatesToSplit.begin();
for (; u != updatesToSplit.end(); ++u) {
splitMutations(data, data->shards, *u);
}
TEST(true); // fetchkeys has more
TEST(shard->updates.size()); // Shard has updates
ASSERT(otherShard->updates.empty());
}
break;
}
}
@ -2891,8 +3032,8 @@ ACTOR Future<Void> fetchKeys(StorageServer* data, AddingShard* shard) {
// being recovered. Instead we wait for the updateStorage loop to commit something (and consequently also what
// we have written)
wait(data->durableVersion.whenAtLeast(data->storageVersion() + 1));
holdingFKPL.release();
wait(data->durableVersion.whenAtLeast(data->storageVersion() + 1));
TraceEvent(SevDebug, "FKAfterFinalCommit", data->thisServerID)
.detail("FKID", interval.pairID)
@ -3922,8 +4063,11 @@ ACTOR Future<Void> updateStorage(StorageServer* data) {
data->ssDurableVersionUpdateLatencyHistogram->sampleSeconds(now() - beforeSSDurableVersionUpdate);
//TraceEvent("StorageServerDurable", data->thisServerID).detail("Version", newOldestVersion);
wait(durableDelay);
data->fetchKeysBytesBudget = SERVER_KNOBS->STORAGE_FETCH_BYTES;
data->fetchKeysBudgetUsed.set(false);
if (!data->fetchKeysBudgetUsed.get()) {
wait(durableDelay || data->fetchKeysBudgetUsed.onChange());
}
}
}
@ -4651,6 +4795,17 @@ ACTOR Future<Void> serveGetKeyValuesRequests(StorageServer* self, FutureStream<G
}
}
ACTOR Future<Void> serveGetKeyValuesStreamRequests(StorageServer* self,
FutureStream<GetKeyValuesStreamRequest> getKeyValuesStream) {
loop {
GetKeyValuesStreamRequest req = waitNext(getKeyValuesStream);
// Warning: This code is executed at extremely high priority (TaskPriority::LoadBalancedEndpoint), so downgrade
// before doing real work
// FIXME: add readGuard again
self->actors.add(getKeyValuesStreamQ(self, req));
}
}
ACTOR Future<Void> serveGetKeyRequests(StorageServer* self, FutureStream<GetKeyRequest> getKey) {
loop {
GetKeyRequest req = waitNext(getKey);
@ -4810,6 +4965,7 @@ ACTOR Future<Void> storageServerCore(StorageServer* self, StorageServerInterface
self->actors.add(checkBehind(self));
self->actors.add(serveGetValueRequests(self, ssi.getValue.getFuture()));
self->actors.add(serveGetKeyValuesRequests(self, ssi.getKeyValues.getFuture()));
self->actors.add(serveGetKeyValuesStreamRequests(self, ssi.getKeyValuesStream.getFuture()));
self->actors.add(serveGetKeyRequests(self, ssi.getKey.getFuture()));
self->actors.add(serveWatchValueRequests(self, ssi.watchValue.getFuture()));
self->actors.add(traceRole(Role::STORAGE_SERVER, ssi.id()));

View File

@ -43,6 +43,7 @@
#include "fdbserver/ServerDBInfo.h"
#include "fdbserver/FDBExecHelper.actor.h"
#include "fdbserver/CoordinationInterface.h"
#include "fdbserver/LocalConfiguration.h"
#include "fdbclient/MonitorLeader.h"
#include "fdbclient/ClientWorkerInterface.h"
#include "flow/Profiler.h"
@ -473,8 +474,9 @@ std::vector<DiskStore> getDiskStores(std::string folder,
store.tLogOptions.version = TLogVersion::V2;
store.tLogOptions.spillType = TLogSpillType::VALUE;
prefix = fileLogDataPrefix;
} else
} else {
continue;
}
store.storeID = UID::fromString(files[idx].substr(prefix.size(), 32));
store.filename = filenameFromSample(type, folder, files[idx]);
@ -941,6 +943,7 @@ ACTOR Future<Void> storageServerRollbackRebooter(std::set<std::pair<UID, KeyValu
DUMPTOKEN(recruited.getQueuingMetrics);
DUMPTOKEN(recruited.getKeyValueStoreType);
DUMPTOKEN(recruited.watchValue);
DUMPTOKEN(recruited.getKeyValuesStream);
prevStorageServer =
storageServer(store, recruited, db, folder, Promise<Void>(), Reference<ClusterConnectionFile>(nullptr));
@ -1303,6 +1306,7 @@ ACTOR Future<Void> workerServer(Reference<ClusterConnectionFile> connFile,
DUMPTOKEN(recruited.getQueuingMetrics);
DUMPTOKEN(recruited.getKeyValueStoreType);
DUMPTOKEN(recruited.watchValue);
DUMPTOKEN(recruited.getKeyValuesStream);
Promise<Void> recovery;
Future<Void> f = storageServer(kv, recruited, dbInfo, folder, recovery, connFile);
@ -1710,6 +1714,7 @@ ACTOR Future<Void> workerServer(Reference<ClusterConnectionFile> connFile,
DUMPTOKEN(recruited.getQueuingMetrics);
DUMPTOKEN(recruited.getKeyValueStoreType);
DUMPTOKEN(recruited.watchValue);
DUMPTOKEN(recruited.getKeyValuesStream);
// printf("Recruited as storageServer\n");
std::string filename =
@ -2191,7 +2196,8 @@ ACTOR Future<Void> monitorLeaderRemotelyWithDelayedCandidacy(
Reference<AsyncVar<ClusterControllerPriorityInfo>> asyncPriorityInfo,
Future<Void> recoveredDiskFiles,
LocalityData locality,
Reference<AsyncVar<ServerDBInfo>> dbInfo) {
Reference<AsyncVar<ServerDBInfo>> dbInfo,
UseConfigDB useConfigDB) {
state Future<Void> monitor = monitorLeaderRemotely(connFile, currentCC);
state Future<Void> timeout;
@ -2217,7 +2223,8 @@ ACTOR Future<Void> monitorLeaderRemotelyWithDelayedCandidacy(
: Never())) {}
when(wait(timeout.isValid() ? timeout : Never())) {
monitor.cancel();
wait(clusterController(connFile, currentCC, asyncPriorityInfo, recoveredDiskFiles, locality));
wait(clusterController(
connFile, currentCC, asyncPriorityInfo, recoveredDiskFiles, locality, useConfigDB));
return Void();
}
}
@ -2243,9 +2250,17 @@ ACTOR Future<Void> fdbd(Reference<ClusterConnectionFile> connFile,
std::string metricsConnFile,
std::string metricsPrefix,
int64_t memoryProfileThreshold,
std::string whitelistBinPaths) {
std::string whitelistBinPaths,
std::string configPath,
std::map<std::string, std::string> manualKnobOverrides,
UseConfigDB useConfigDB) {
state vector<Future<Void>> actors;
state Promise<Void> recoveredDiskFiles;
state LocalConfiguration localConfig(dataFolder, configPath, manualKnobOverrides);
if (useConfigDB != UseConfigDB::DISABLED) {
wait(localConfig.initialize());
}
actors.push_back(serveProtocolInfo());
@ -2267,7 +2282,7 @@ ACTOR Future<Void> fdbd(Reference<ClusterConnectionFile> connFile,
if (coordFolder.size()) {
// SOMEDAY: remove the fileNotFound wrapper and make DiskQueue construction safe from errors setting up
// their files
actors.push_back(fileNotFoundToNever(coordinationServer(coordFolder, coordinators.ccf)));
actors.push_back(fileNotFoundToNever(coordinationServer(coordFolder, coordinators.ccf, useConfigDB)));
}
state UID processIDUid = wait(createAndLockProcessIdFile(dataFolder));
@ -2281,19 +2296,26 @@ ACTOR Future<Void> fdbd(Reference<ClusterConnectionFile> connFile,
makeReference<AsyncVar<ClusterControllerPriorityInfo>>(getCCPriorityInfo(fitnessFilePath, processClass));
auto dbInfo = makeReference<AsyncVar<ServerDBInfo>>();
if (useConfigDB != UseConfigDB::DISABLED) {
actors.push_back(
reportErrors(localConfig.consume(IDependentAsyncVar<ConfigBroadcastFollowerInterface>::create(
dbInfo, [](auto const& info) { return info.configBroadcaster; })),
"LocalConfiguration"));
}
actors.push_back(reportErrors(monitorAndWriteCCPriorityInfo(fitnessFilePath, asyncPriorityInfo),
"MonitorAndWriteCCPriorityInfo"));
if (processClass.machineClassFitness(ProcessClass::ClusterController) == ProcessClass::NeverAssign) {
actors.push_back(reportErrors(monitorLeader(connFile, cc), "ClusterController"));
} else if (processClass.machineClassFitness(ProcessClass::ClusterController) == ProcessClass::WorstFit &&
SERVER_KNOBS->MAX_DELAY_CC_WORST_FIT_CANDIDACY_SECONDS > 0) {
actors.push_back(
reportErrors(monitorLeaderRemotelyWithDelayedCandidacy(
connFile, cc, asyncPriorityInfo, recoveredDiskFiles.getFuture(), localities, dbInfo),
"ClusterController"));
actors.push_back(reportErrors(
monitorLeaderRemotelyWithDelayedCandidacy(
connFile, cc, asyncPriorityInfo, recoveredDiskFiles.getFuture(), localities, dbInfo, useConfigDB),
"ClusterController"));
} else {
actors.push_back(reportErrors(
clusterController(connFile, cc, asyncPriorityInfo, recoveredDiskFiles.getFuture(), localities),
clusterController(
connFile, cc, asyncPriorityInfo, recoveredDiskFiles.getFuture(), localities, useConfigDB),
"ClusterController"));
}
actors.push_back(reportErrors(extractClusterInterface(cc, ci), "ExtractClusterInterface"));

Some files were not shown because too many files have changed in this diff Show More