Specifying the `--no-config-db` option when changing coordinators
through fdbcli will prevent the command from hanging when the
configuration database is not active. Failing to specify this option
when the configuration database is not active will not affect the
correctness of the command, but it will hang instead of returning.
* Add the verify option for \xff\xff/worker_interfaces
* Remove unused code
* update documentations
* update documentations
* solve comments from review
* update some of the comments to be more clear
* Add getRange test coverage for SpecialKeyRangeAsyncImpl
* Fix the bug in SpecialKeyRangeAsyncImpl found by the test
* Refactor ConflictingKeysImpl::getRange to use containedRanges to simplify the code
* Fix file format
* Initialize SpecialKeyRangeAsyncImpl cache with correct end key
* Add release notes
* Revert "Refactor ConflictingKeysImpl::getRange to use containedRanges to simplify the code"
This reverts commit fdd298f469.
* Add extra validation to special key space reads in simulation
* Fix bugs turned up by validating subrange reads
* Change to validateSpecialSubrangeRead
It is in general not safe to expect that a read from the special key
space returns the same results if performed again, since the
transaction may be being modified concurrently.
* Add comment
* Add comment
* proof of concept
* use code-probe instead of test
* code probe working on gcc
* code probe implemented
* renamed TestProbe to CodeProbe
* fixed refactoring typo
* support filtered output
* print probes at end of simulation
* fix missed probes print
* fix deduplication
* Fix refactoring issues
* revert bad refactor
* make sure file paths are relative
* fix more wrong refactor changes
* Add an internal C API to support memory connection records
* Track shared state in the client using a unique and immutable cluster ID from the cluster
* Add missing code to store the clusterId in the database state object
* Update some arguments to pass by const&
* extract tenant management into its own file and namespace
* rename the tenant management workload source file
* extract tenant special keys functions to a separate file
* extract some helper functions to GenericTransactionHelper.h
* convert StringRef -> TenantNameRef
* move some TenantMapEntry implementation into the cpp file
* add some helper functions to decode/encode a tenant mode
If the version is attempting to track wall clock time (which is the case
if the version epoch is set), it doesn't make sense to allow arbitrarily
setting the version through the `advanceversion` fdbcli command.
* coordinatorsKey should not storing IP addresses.
Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.
* Remove the legacy coordinators() function.
* Update async_resolve().
ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.
* Clean code format.
* Fix typo.
* Remove SpecifiedQuorumChange and NoQuorumChange.
The special keys `\xff\xff/management/profiling/client_txn_sample_rate`
and `\xff\xff/management/profiling/client_txn_size_limit` are deprecated
in FDB 7.2. However, GlobalConfig was introduced in 7.0, and reading and
writing these keys through the special key space was broken in 7.0+.
This change modifies the profiling special keys to use GlobalConfig
behind the scenes, fixing the broken special keys.
The following Python script was used to make sure both GlobalConfig and
the profiling special key can be used to read/write/clear profiling
data:
```
import fdb
import time
fdb.api_version(710)
@fdb.transactional
def set_sample_rate(tr):
tr.options.set_special_key_space_enable_writes()
# Alternative way to write the key
#tr[b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate'] = fdb.tuple.pack((5.0,))
tr[b'\xff\xff/management/profiling/client_txn_sample_rate'] = '5.0'
@fdb.transactional
def clear_sample_rate(tr):
tr.options.set_special_key_space_enable_writes()
# Alternative way to clear the key
#tr.clear(b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate')
tr[b'\xff\xff/management/profiling/client_txn_sample_rate'] = 'default'
@fdb.transactional
def get_sample_rate(tr):
print(tr.get(b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate'))
# Alternative way to read the key
#print(tr.get(b'\xff\xff/management/profiling/client_txn_sample_rate'))
fdb.options.set_trace_enable()
fdb.options.set_trace_format('json')
db = fdb.open()
get_sample_rate(db) # None (or 'default')
set_sample_rate(db)
time.sleep(1) # Allow time for global config changes to propagate
get_sample_rate(db) # 5.0
clear_sample_rate(db)
time.sleep(1)
get_sample_rate(db) # None (or 'default')
```
It can be run with `PYTHONPATH=./bindings/python/ python profiling.py`,
and reads the `fdb.cluster` file in the current directory.
```
$ PYTHONPATH=./bindings/python/ python sps.py
None
5.000000
None
```
Currently, GlobalConfig is a singleton that means for each process there is only
one GlobalConfig object. This is bug from clients perspective as a client can
keep connections to several databases. This patch tracks GlobalConfig for each
database using an unordered_map in flowGlobals.
We discovered this bug while testing multi-version client, where the client got
stuck. This was lucky, as normally it'd just write down config to the wrong
database.
* Initialize cluster version at wall-clock time
Previously, new clusters would begin at version 0. After this change,
clusters will initialize at a version matching wall-clock time. Instead
of using the Unix epoch (or Windows epoch), FDB clusters will use a new
epoch, defaulting to January 1, 2010, 01:00:00+00:00. In the future,
this base epoch will be modifiable through fdbcli, allowing
administrators to advance the cluster version.
Basing the version off of time allows different FDB clusters to share
data without running into version issues.
* Send version epoch to master
* Cleanup
* Update fdbserver/storageserver.actor.cpp
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
* Jump directly to expected version if possible
* Fix initial version issue on storage servers
* Add random recovery offset to start version in simulation
* Type fixes
* Disable reference time by default
Enable on a cluster using the fdbcli command `versionepoch add 0`.
* Use correct recoveryTransactionVersion when recovering
* Allow version epoch to be adjusted forwards (to decrease the version)
* Set version epoch in simulation
* Add quiet database check to ensure small version offset
* Fix initial version issue on storage servers
* Disable reference time by default
Enable on a cluster using the fdbcli command `versionepoch add 0`.
* Add fdbcli command to read/write version epoch
* Cause recovery when version epoch is set
* Handle optional version epoch key
* Add ability to clear the version epoch
This causes version advancement to revert to the old methodology whereas
versions attempt to advance by about a million versions per second,
instead of trying to match the clock.
* Update transaction access
* Modify version epoch to use microseconds instead of seconds
* Modify fdbcli version target API
Move commands from `versionepoch` to `targetversion` top level command.
* Add fdbcli tests for
* Temporarily disable targetversion cli tests
* Fix version epoch fetch issue
* Fix Arena issue
* Reduce max version jump in simulation to 1,000,000
* Rework fdbcli API
It now requires two commands to fully switch a cluster to using the
version epoch. First, enable the version epoch with `versionepoch
enable` or `versionepoch set <versionepoch>`. At this point, versions
will be given out at a faster or slower rate in an attempt to reach the
expected version. Then, run `versionepoch commit` to perform a one time
jump to the expected version. This is essentially irreversible.
* Temporarily disable old targetversion tests
* Cleanup
* Move version epoch buggify to sequencer
This will cause some issues with the QuietDatabase check for the version
offset - namely, it won't do anything, since the version epoch is not
being written to the txnStateStore in simulation. This will get fixed in
the future.
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>