Commit Graph

312 Commits

Author SHA1 Message Date
FoundationDB CI 86d6106dc1
format source code after switch to clang 15 2022-12-08 17:26:45 +00:00
Chaoguang Lin 4dbfa01fbf
Add a new robustness workload for testing special keys (#8957)
* Add a new robustness workload for testing special keys

* Fix a few robustness related issues and remove duplicate tests

* Add comments
2022-12-05 14:05:26 -08:00
Markus Pilman 23edfd0d59 Fix formatting 2022-10-04 18:33:30 -06:00
Markus Pilman 550488b020 Merge remote-tracking branch 'origin/main' into bugfixes/open-for-ide
# Conflicts:
#	bindings/c/CMakeLists.txt
#	fdbclient/include/fdbclient/GetEncryptCipherKeys.actor.h
#	fdbserver/BackupWorker.actor.cpp
#	fdbserver/BlobWorker.actor.cpp
#	fdbserver/CommitProxyServer.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/StorageCache.actor.cpp
#	fdbserver/include/fdbserver/GetEncryptCipherKeys.actor.h
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/PhysicalShardMove.actor.cpp
#	flow/CMakeLists.txt
2022-10-04 18:27:48 -06:00
Jingyu Zhou 772a9ab9fc
Merge pull request #8185 from vishesh/issue-8127-exclude
Make exclusion less pessimistic when warning about low space usage
2022-09-27 14:14:25 -07:00
Chaoguang Lin 9628561235
Add DataDistributionMetrics workload into correctness packages, (#8237)
which makes the code probe hit in nightly tests.
2022-09-20 15:33:15 -07:00
Vishesh Yadav ddd3217471 Make exclusion less pessimistic when warning about low space usage
Tries to account for shared disks by looking at disk_id if available.
2022-09-20 11:02:19 -07:00
Chaoguang Lin 125137b987
Change the special key space correctness workload to hit code probe (#8214) 2022-09-19 15:01:21 -07:00
A.J. Beamon 4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
Lukas Joswiak 1a33515934 Add `--no-config-db` option to fdbcli coordinators command
Specifying the `--no-config-db` option when changing coordinators
through fdbcli will prevent the command from hanging when the
configuration database is not active. Failing to specify this option
when the configuration database is not active will not affect the
correctness of the command, but it will hang instead of returning.
2022-09-13 16:53:54 -07:00
Jon Fu dbb6357371 add conflict range tests and change tenant prefix code to work with RYW 2022-09-06 16:55:57 -07:00
Jon Fu 479d774e79 set raw access for certain management API functions and update special key test 2022-08-29 11:45:56 -07:00
Jon Fu ee707b8f87 initial commit for explicit tenant support in special key space 2022-08-25 12:44:10 -07:00
Chaoguang Lin 3fed0456ca
Add an verify option for \xff\xff/worker_interfaces special keys (#7873)
* Add the verify option for \xff\xff/worker_interfaces

* Remove unused code

* update documentations

* update documentations

* solve comments from review

* update some of the comments to be more clear
2022-08-15 14:05:07 -04:00
Lukas Joswiak a4c4eadffc Write tracing and ALP special key errors as JSON 2022-08-08 10:31:10 -07:00
Chaoguang Lin 48e46cbc81
Add test coverage for SpecialKeyRangeAsyncImpl::getRange (#7671)
* Add getRange test coverage for SpecialKeyRangeAsyncImpl

* Fix the bug in SpecialKeyRangeAsyncImpl found by the test

* Refactor ConflictingKeysImpl::getRange to use containedRanges to simplify the code

* Fix file format

* Initialize SpecialKeyRangeAsyncImpl cache with correct end key

* Add release notes

* Revert "Refactor ConflictingKeysImpl::getRange to use containedRanges to simplify the code"

This reverts commit fdd298f469.
2022-08-02 12:04:40 -07:00
Andrew Noyes 89141d4b3a
Validate subrange reads in simulation (#7597)
* Add extra validation to special key space reads in simulation

* Fix bugs turned up by validating subrange reads

* Change to validateSpecialSubrangeRead

It is in general not safe to expect that a read from the special key
space returns the same results if performed again, since the
transaction may be being modified concurrently.

* Add comment

* Add comment
2022-07-21 14:42:08 -07:00
A.J. Beamon 410f27412b
Merge pull request #7620 from sfc-gh-ajbeamon/make-tuple
Add a Tuple::makeTuple function to easily construct a tuple
2022-07-20 17:09:10 -07:00
A.J. Beamon 190ad8c7e9 Convert existing tuple usages to use Tuple::makeTuple() 2022-07-19 13:45:59 -07:00
Markus Pilman 1de37afd52
Make TEST macros C++ only (#7558)
* proof of concept

* use code-probe instead of test

* code probe working on gcc

* code probe implemented

* renamed TestProbe to CodeProbe

* fixed refactoring typo

* support filtered output

* print probes at end of simulation

* fix missed probes print

* fix deduplication

* Fix refactoring issues

* revert bad refactor

* make sure file paths are relative

* fix more wrong refactor changes
2022-07-19 13:15:51 -07:00
A.J. Beamon 1b81e72604 Add a Tuple::makeTuple function to easily construct a tuple. Update Tuple to allow all types to be passed via .append() so they can be used with makeTuple. 2022-07-19 11:50:58 -07:00
A.J. Beamon c4b0f6eaae
Add an internal C API to support connection to a cluster using a connection string (#7438)
* Add an internal C API to support memory connection records

* Track shared state in the client using a unique and immutable cluster ID from the cluster

* Add missing code to store the clusterId in the database state object

* Update some arguments to pass by const&
2022-07-07 10:12:49 +02:00
A.J. Beamon 2f67328a0c Update the tenant special keys submodule to support multiple sub-ranges. This will enable future work that allows configuring tenants at the same time as creating them. 2022-06-30 15:03:37 -07:00
A.J. Beamon 4bafe77889 Some refactoring of tenant code:
* extract tenant management into its own file and namespace
* rename the tenant management workload source file
* extract tenant special keys functions to a separate file
* extract some helper functions to GenericTransactionHelper.h
* convert StringRef -> TenantNameRef
* move some TenantMapEntry implementation into the cpp file
* add some helper functions to decode/encode a tenant mode
2022-06-27 12:32:49 -07:00
A.J. Beamon 90625ba20d Update the create tenant transaction to take the ID as a parameter. Generate unique IDs for multiple creations in the same transaction. Don't set lock aware options inside the tenant transaction code. 2022-06-06 09:45:14 -07:00
Lukas Joswiak 43ece953e1 Remove actor for client profiling get range
Since the client profiling special key range now uses global config,
there is no longer any `wait` required.
2022-05-27 10:25:29 -07:00
Jingyu Zhou ae5818afa8
Merge pull request #7240 from jzhou77/fix-7109
CC sends recovery txn version during TLog recruitment
2022-05-27 09:27:19 -07:00
Jingyu Zhou 0aea52d640 Disable verbose NormalizeKeySelector event
This type of events can exceed 0.5M in one of tests and cause
SpecialKeySpaceCorrectness.toml fail.
2022-05-25 13:23:43 -07:00
Xiaoxi Wang fd35fde481 Merge branch 'main' of https://github.com/apple/foundationdb into readaware 2022-05-23 15:09:03 -07:00
Lukas Joswiak d4471e7fe7 Disable `advanceversion` if the version epoch is set
If the version is attempting to track wall clock time (which is the case
if the version epoch is set), it doesn't make sense to allow arbitrarily
setting the version through the `advanceversion` fdbcli command.
2022-05-23 11:45:18 -07:00
Renxuan Wang df4e0deb4d
coordinatorsKey should not always store IP addresses. (#7204)
* coordinatorsKey should not storing IP addresses.

Currently, when we do a commit of coordinator change, we are always converting hostnames to IP addresses and store the converted results in coordinatorsKey (\xff/coordinators). This result in ForwardRequest also sending IP addresses, and receivers will update their cluster files with IPs, then we lose the dynamic IP feature.

* Remove the legacy coordinators() function.

* Update async_resolve().

ip::basic_resolver::async_resolve(const query & q, ResolveHandler && handler) is deprecated.

* Clean code format.

* Fix typo.

* Remove SpecifiedQuorumChange and NoQuorumChange.
2022-05-23 11:42:56 -07:00
Xiaoxi Wang 382f0fc4a2 merge upstream/main 2022-05-17 10:20:51 -07:00
Lukas Joswiak 7972ef48d6 Refactor profiling special keys to use GlobalConfig
The special keys `\xff\xff/management/profiling/client_txn_sample_rate`
and `\xff\xff/management/profiling/client_txn_size_limit` are deprecated
in FDB 7.2. However, GlobalConfig was introduced in 7.0, and reading and
writing these keys through the special key space was broken in 7.0+.
This change modifies the profiling special keys to use GlobalConfig
behind the scenes, fixing the broken special keys.

The following Python script was used to make sure both GlobalConfig and
the profiling special key can be used to read/write/clear profiling
data:

```
import fdb
import time

fdb.api_version(710)

@fdb.transactional
def set_sample_rate(tr):
    tr.options.set_special_key_space_enable_writes()
    # Alternative way to write the key
    #tr[b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate'] = fdb.tuple.pack((5.0,))
    tr[b'\xff\xff/management/profiling/client_txn_sample_rate'] = '5.0'

@fdb.transactional
def clear_sample_rate(tr):
    tr.options.set_special_key_space_enable_writes()
    # Alternative way to clear the key
    #tr.clear(b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate')
    tr[b'\xff\xff/management/profiling/client_txn_sample_rate'] = 'default'

@fdb.transactional
def get_sample_rate(tr):
    print(tr.get(b'\xff\xff/global_config/config/fdb_client_info/client_txn_sample_rate'))
    # Alternative way to read the key
    #print(tr.get(b'\xff\xff/management/profiling/client_txn_sample_rate'))

fdb.options.set_trace_enable()
fdb.options.set_trace_format('json')
db = fdb.open()

get_sample_rate(db) # None (or 'default')
set_sample_rate(db)
time.sleep(1)       # Allow time for global config changes to propagate
get_sample_rate(db) # 5.0
clear_sample_rate(db)
time.sleep(1)
get_sample_rate(db) # None (or 'default')
```

It can be run with `PYTHONPATH=./bindings/python/ python profiling.py`,
and reads the `fdb.cluster` file in the current directory.

```
$ PYTHONPATH=./bindings/python/ python sps.py
None
5.000000
None
```
2022-05-10 10:51:08 -07:00
Vishesh Yadav f88cbf9309 Address review comments 2022-05-09 14:54:51 -07:00
Vishesh Yadav 9173e2e19b Move GlobalConfig to DatabaseContext 2022-05-09 14:54:51 -07:00
Vishesh Yadav 7578d5ebc7 Create GlobalConfig object for each database instance
Currently, GlobalConfig is a singleton that means for each process there is only
one GlobalConfig object. This is bug from clients perspective as a client can
keep connections to several databases. This patch tracks GlobalConfig for each
database using an unordered_map in flowGlobals.

We discovered this bug while testing multi-version client, where the client got
stuck. This was lucky, as normally it'd just write down config to the wrong
database.
2022-05-09 14:54:51 -07:00
Xiaoxi Wang 269d85daa8 Merge branch 'main' of https://github.com/apple/foundationdb into readaware 2022-05-03 13:37:56 -07:00
Ray Jenkins dc9e782ccc
OpenTelemetry Tracing Perf Fixes (#6990) 2022-05-02 14:56:51 -05:00
Xiaoxi Wang 69985ba251 Merge branch 'main' of https://github.com/apple/foundationdb into readaware 2022-05-02 10:53:22 -07:00
Renxuan Wang c69a07a858
Check in the new Hostname logic. (#6926)
* Revert #6655.

20220407-031010-renxuan-c101052c21da8346           compressed=True data_size=31004844 duration=4310801 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=1:04:15 sanity=False started=100047 stopped=20220407-041425 submitted=20220407-031010 timeout=5400 username=renxuan

* Revert #6271.

20220407-051532-renxuan-470f0fe6aac1c217           compressed=True data_size=30982370 duration=3491067 ended=100002 fail_fast=10 max_runs=100000 pass=100002 priority=100 remaining=0 runtime=0:59:57 sanity=False started=100141 stopped=20220407-061529 submitted=20220407-051532 timeout=5400 username=renxuan

* Revert #6266.

Remove resolving-related functionalities in connection string. Connection string will be used for storing purpose only, and non-mutable.

20220407-175119-renxuan-55d30ee1a4b42c2f           compressed=True data_size=30970443 duration=5437659 ended=100000 fail_fast=10 max_runs=100000 pass=100000 priority=100 remaining=0 runtime=0:59:31 sanity=False started=100154 stopped=20220407-185050 submitted=20220407-175119 timeout=5400 username=renxuan

* Add hostname to coordinator interfaces.

* Turn on the new hostname logic.

* Add the corresponding change in config txns.

The most notable change is before calling basicLoadBalance(), we need to call tryInitializeRequestStream() to initialize request streams first.

Passed correctness tests.

* Return error when hostnames cannot be resolved in coordinators command.

* Minor fixes.
2022-04-27 21:54:13 -07:00
Xiaoxi Wang e8477e15ce Merge remote-tracking branch 'upstream/main' into readaware 2022-04-25 09:55:52 -07:00
Ray Jenkins 1c5bf135d5
Revert "Migrate to OpenTelemetry tracing. (#6855)" (#6941)
This reverts commit 5df3bac110.
2022-04-25 09:29:56 -05:00
Xiaoxi Wang 0639810b66 merge upstream/main 2022-04-22 11:09:15 -07:00
A.J. Beamon 1352083d4c
Merge pull request #6884 from sfc-gh-clin/deprecate-speical-keys
Remove the client profiling special keys and update related documentations
2022-04-21 21:51:16 -07:00
Xiaoxi Wang a0aac83085 Merge remote-tracking branch 'upstream/main' into readaware 2022-04-21 09:24:08 -07:00
Ray Jenkins 5df3bac110
Migrate to OpenTelemetry tracing. (#6855) 2022-04-20 09:26:37 -05:00
Chaoguang Lin c0264a8522 Remove the client profiling special keys and update related documentations 2022-04-18 17:54:50 -07:00
Xiaoxi Wang ed97a35dc0 Merge branch 'main' into readaware 2022-04-12 16:47:15 -07:00
Lukas Joswiak 73a7c32982
Add fdbcli command to read/write version epoch (#6480)
* Initialize cluster version at wall-clock time

Previously, new clusters would begin at version 0. After this change,
clusters will initialize at a version matching wall-clock time. Instead
of using the Unix epoch (or Windows epoch), FDB clusters will use a new
epoch, defaulting to January 1, 2010, 01:00:00+00:00. In the future,
this base epoch will be modifiable through fdbcli, allowing
administrators to advance the cluster version.

Basing the version off of time allows different FDB clusters to share
data without running into version issues.

* Send version epoch to master

* Cleanup

* Update fdbserver/storageserver.actor.cpp

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>

* Jump directly to expected version if possible

* Fix initial version issue on storage servers

* Add random recovery offset to start version in simulation

* Type fixes

* Disable reference time by default

Enable on a cluster using the fdbcli command `versionepoch add 0`.

* Use correct recoveryTransactionVersion when recovering

* Allow version epoch to be adjusted forwards (to decrease the version)

* Set version epoch in simulation

* Add quiet database check to ensure small version offset

* Fix initial version issue on storage servers

* Disable reference time by default

Enable on a cluster using the fdbcli command `versionepoch add 0`.

* Add fdbcli command to read/write version epoch

* Cause recovery when version epoch is set

* Handle optional version epoch key

* Add ability to clear the version epoch

This causes version advancement to revert to the old methodology whereas
versions attempt to advance by about a million versions per second,
instead of trying to match the clock.

* Update transaction access

* Modify version epoch to use microseconds instead of seconds

* Modify fdbcli version target API

Move commands from `versionepoch` to `targetversion` top level command.

* Add fdbcli tests for

* Temporarily disable targetversion cli tests

* Fix version epoch fetch issue

* Fix Arena issue

* Reduce max version jump in simulation to 1,000,000

* Rework fdbcli API

It now requires two commands to fully switch a cluster to using the
version epoch. First, enable the version epoch with `versionepoch
enable` or `versionepoch set <versionepoch>`. At this point, versions
will be given out at a faster or slower rate in an attempt to reach the
expected version. Then, run `versionepoch commit` to perform a one time
jump to the expected version. This is essentially irreversible.

* Temporarily disable old targetversion tests

* Cleanup

* Move version epoch buggify to sequencer

This will cause some issues with the QuietDatabase check for the version
offset - namely, it won't do anything, since the version epoch is not
being written to the txnStateStore in simulation. This will get fixed in
the future.

Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2022-04-08 12:33:19 -07:00
Xiaoxi Wang 2bc67a4f1d Merge branch 'main' of https://github.com/apple/foundationdb into readaware 2022-03-28 14:20:46 -07:00