Meng Xu
ad9b3fb4a8
DD:Add trace for detailed relocate shard info
2020-02-29 13:45:10 -08:00
Evan Tschannen
f04e311a1e
Merge commit 'b46d6e25e24993ab5a5f04091fd3235050b7cd09' into feature-boost-ssl
...
# Conflicts:
# fdbserver/SimulatedCluster.actor.cpp
# flow/Net2.actor.cpp
2020-02-20 17:36:38 -08:00
Evan Tschannen
08c318d28a
re-added the connect lock in the fdbcli so that the timeout is not spent before a connection has been initiated (because of the handshake lock)
2020-02-20 10:43:34 -08:00
Evan Tschannen
53d0867a17
limit the number of connections a process can attempt to establish in parallel
2020-02-04 18:15:10 -08:00
Evan Tschannen
73ad702d14
Clients which fetch status should not disconnect from the coordinators and cluster controller between each retrieval
2020-01-22 15:41:22 -08:00
Evan Tschannen
c93ca04ea6
Do not merge more than 100 shards together to avoid creating untrackable shards
2020-01-15 09:33:27 -08:00
Evan Tschannen
8475da359c
Merge pull request #2527 from etschannen/feature-region-fixes
...
A database could perform poorly while a remote region catches up to the primary
2020-01-10 17:26:43 -08:00
Evan Tschannen
855f03a41f
ratekeeper needed to check remoteDC in another location
...
the storage server scoped a transaction incorrectly
2020-01-10 15:58:36 -08:00
Evan Tschannen
9a3dfec7c5
open a connection with processes before attempting to kill them to improve the reliability of the kill process
...
secondaryAddresses are included in the list of processes which can be killed
2020-01-03 16:10:44 -08:00
A.J. Beamon
5bd4bf357f
Limit length of delays in timeout, repeating them as necessary.
2019-11-13 12:49:07 -08:00
Evan Tschannen
5fbd9f2ed5
added logging to TaskBucket
2019-11-12 19:15:56 -08:00
Evan Tschannen
ef01ad2ed8
optimized log range clearing to clear everything for each possible hash (256 clears) if that would be more efficient than one clear per second that has elapsed
...
aborting a DR without the —cleanup flag will still attempt to cleanup for 30 seconds before giving up
added a cleanup command to fdbbackup which can remove mutations from orphaned DRs which were stopped without the —cleanup flag
2019-09-27 18:32:27 -07:00
Evan Tschannen
8425f53fc5
clients only connect to three proxies
2019-07-28 23:52:29 -07:00
Evan Tschannen
be5d144b8b
added status information on connected clients
2019-07-25 17:15:31 -07:00
Evan Tschannen
8b73a1c998
removed verbose trace messages
2019-07-24 15:07:41 -07:00
Evan Tschannen
4a866290b7
Clients keep a persistent connection open with coordinators to get updates to the list of proxies
...
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00
Jingyu Zhou
9c2257a0e5
Add transaction size option
2019-06-19 07:45:23 -07:00
Balachandar Namasivayam
04e9aa6afd
For small clusters that are growing quickly, it could happen that the rateLimit is set to a low value and it would take very long to read the entire database. Fix this by setting the rateLimit to the maximum allowed value if reading the entire database is taking a long time.
2019-04-10 17:13:37 -07:00
Evan Tschannen
e3400c13ae
fixed a performance regression related to broadcasting a read version to too many transactions simultaneously
2019-03-22 18:37:39 -07:00
Meng Xu
e30e2af1f3
ClientKnobs: Add CHECK_CONNECTED_COORDINATOR_NUM_DELAY
2019-03-13 16:54:56 -07:00
Evan Tschannen
f1897f3eb6
Merge branch 'master' into feature-metadata-version
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
2019-03-04 21:06:16 -08:00
Trevor Clinkenbeard
56ae46f89e
Client lazily fetches health metrics from proxies
2019-03-04 14:16:39 -08:00
Evan Tschannen
075fdef31a
Merge branch 'master' into feature-metadata-version
...
# Conflicts:
# fdbclient/DatabaseContext.h
2019-03-03 22:58:45 -08:00
Evan Tschannen
3da85f3acd
implemented the \xff/metadataVersion key, which can be used by layers to help them cheaply cache metadata and know when their cache is invalid
2019-02-28 17:45:00 -08:00
Trevor Clinkenbeard
d2bde4e55b
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-27 16:30:25 -08:00
Balachandar Namasivayam
f44f26c232
Dynamically rate limit consistency check.
2019-02-07 16:08:39 -08:00
Trevor Clinkenbeard
7b0aa104d4
Added UPDATE_HEALTH_METRICS_INTERVAL and UPDATE_DETAILED_HEALTH_METRICS_INTERVAL client knobs
2019-02-01 11:10:43 -08:00
Stephen Atherton
d005594bdd
Added optional support for sending a unique id per request in a header for logging/tracking purposes.
2019-01-08 14:48:47 -08:00
Evan Tschannen
18509ac4dd
account for the overhead of tags on a message when batching transactions into a version
...
increased the default location cache size
2018-11-02 12:52:34 -07:00
Evan Tschannen
e60c668853
The cluster controller will increase its failure monitoring delay after there have been many unfinishedRecoveries
2018-08-31 10:51:55 -07:00
Evan Tschannen
1dce97f28c
Merge branch 'release-5.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/SimulatedCluster.actor.cpp
# packaging/msi/FDBInstaller.wxs
# versions.target
2018-06-21 17:05:11 -07:00
Stephen Atherton
e9e1e194f0
Added operation-specific rate controls to blob store interface.
2018-06-20 20:34:34 -07:00
Balachandar Namasivayam
529d0497f1
Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload.
...
Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.
2018-06-01 15:21:40 -07:00
yichic
ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
...
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang
d6559b144f
Share log mutations between backups and DRs which have the same backup range
2018-03-19 11:32:50 -07:00
Yichi Chiang
26b93ff920
Share log mutations between backups and DRs which have the same backup range
2018-03-16 18:09:23 -07:00
Evan Tschannen
91bb8faa45
Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
Stephen Atherton
cb68885328
If backup expiration determines that force is required but the force parameter is not set, it will no longer throw an error unless the backup contains data from prior to the expire_before_version.
2018-03-08 11:27:15 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
79d94214a4
Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs
2018-01-25 10:12:06 -08:00
Stephen Atherton
aebbe1dcfd
Changed core_versionspersecond knob to int64_t type to avoid integer overflow. Cleaned up backup TraceEvent suppression. Added backtrace to LargeTransaction TraceEvent to make it easier to find the source of large commits in applications using NativeAPI directly.
2018-01-24 11:59:37 -08:00
Evan Tschannen
698ef4117e
Merge branch 'master' into feature-remote-logs
2018-01-20 10:34:30 -08:00
Stephen Atherton
897ff6f676
Added new knob for how many tasks to add per transaction in backup dispatch, instead of using the value for restore which has much lower overhead per task.
2018-01-16 10:45:21 -08:00
Evan Tschannen
3ec45d38a0
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Stephen Atherton
96cb06cbc7
Bug fixes. Fdbbackup delete was broken. Blobstore backup container deletion would do too much listing before deletions began due to list operations queueing up ahead of and starving the delete operations. Created new knob and blob endpoint limit for concurrent list operations to fix this. Increased blob request timeout default because some requests were taking longer. Crash fixes in blobstore doRequest() which wasn't checking that response object is valid before using it in error conditions. Filesystem-like backup container class (covering blobstore and local dirs) now ignores unrecognized filenames for describe() and expire() operations.
2018-01-05 23:06:39 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Stephen Atherton
7caa012fbf
Added snapshot interval option to "fdbbackup start" which defaults to a new knob's value. Added snapshot info to backup status text. Improvements to fdbbackup help.
2017-12-20 00:49:08 -08:00
Stephen Atherton
e0d9cea008
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Stephen Atherton
33f9f1a95c
Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration.
2017-12-14 01:44:38 -08:00
Evan Tschannen
482ac38ca6
added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs
2017-12-01 13:04:32 -08:00
Stephen Atherton
a77162b53d
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/BackupAgent.h
# fdbclient/FileBackupAgent.actor.cpp
# fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton
3dfaf13b67
IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
...
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively().
Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
A.J. Beamon
cd085764f1
Do not automatically change a cluster file that does not match what you expect.
2017-11-10 14:12:45 -08:00
Stephen Atherton
3afc85881e
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbrpc/BlobStore.actor.cpp
2017-10-20 21:38:28 -07:00
Evan Tschannen
e2c1e87df6
made a large number of fixes to make fearless DR correctness clean.
2017-10-19 15:36:32 -07:00
Stephen Atherton
ef84e52127
Improved error handling and memory usage in AsyncFileBlobStoreWrite. Writes will now fail if any upload has already failed, rather than buffering unboundedly until sync() is called to complete the file. There is also a configurable limit on how many uploads can be pending before writes will stall waiting for one to finish.
2017-10-18 05:51:30 -07:00
Stephen Atherton
e934604f67
Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random.
...
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.
2017-10-15 21:51:11 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00