Commit Graph

64 Commits

Author SHA1 Message Date
Young Liu 3728ed03dd Resolve comments 2020-09-05 18:55:09 -07:00
Young Liu 1ad5e17458 add support for comparing original and current impls 2020-09-05 11:14:59 -07:00
Meng Xu 1ba9b6b07f DD:Change SendRelocateToDDQx100 to SendRelocateToDDQueue 2020-07-17 14:10:17 -07:00
Meng Xu 42a7b91647 Turn clang-format off for error_definition.h 2020-07-17 11:11:30 -07:00
Meng Xu 098cdfb558 Replace actor_cancelled error with dd_cancelled 2020-07-16 20:26:07 -07:00
Meng Xu 59132e2cc8
Merge pull request #3237 from ajbeamon/tag-throttling-documentation
Add documentation for the tag throttling feature
2020-05-27 09:33:30 -07:00
A.J. Beamon 530da59d62 Add documentation for the tag throttling feature 2020-05-26 15:27:06 -07:00
A.J. Beamon 86f712657e Eliminate the undefined behavior of calling run_network twice, instead returning an error. 2020-05-22 13:31:06 -07:00
Andrew Noyes 8bd5dcaff8 Merge branch 'master' into atn34/special-key-versioning 2020-05-09 15:34:20 -07:00
Andrew Noyes 6b35b1b686 Disallow no-module read by default 2020-05-08 05:37:37 +00:00
Andrew Noyes 1d6209e304 Check for cross-module reads 2020-05-08 05:37:37 +00:00
A.J. Beamon aed97a9f20 Merge branch 'master' into transaction-tagging 2020-05-07 14:52:22 -07:00
tclinken 4ec652f59f Fixed backup_invalid_info error message 2020-05-01 10:33:13 -07:00
A.J. Beamon dfec896438 Enforce a throttle limit. Don't count transaction tags on RK if the proxy has updated us in a while. 2020-04-17 11:48:02 -07:00
A.J. Beamon ebeca10bce Change the serialization of tags sent in some messages. Add communication of the sampling rate from cluster to clients. 2020-04-09 16:55:56 -07:00
A.J. Beamon 26b7e02d4c Some initial work to support tagging transactions and passing them around. 2020-03-20 11:23:11 -07:00
Xin Dong e21426d12a Send error back to the GRV requests with batch priority when the cluster is saturated, instead of blindly enqueue the requests and let the client timeout. 2020-01-30 14:13:56 -08:00
Evan Tschannen 6c0b934dda
Merge pull request #2242 from alexmiller-apple/fix-10min-stall-again
Fix the 10min multi-region recovery stall again
2020-01-23 17:53:02 -08:00
Jingyu Zhou a4d6ebe79e Recruit backup worker in newEpoch 2020-01-22 19:37:48 -08:00
Alex Miller 1cb311fcb8 Add an ASSERT_WE_THINK that peek cursors don't get timed_out()
This should prevent us from regressing and having multi-region
recoveries hang for 10min again.
2020-01-21 17:07:37 -08:00
sramamoorthy c59168fd07 error msg: Snapshot error -> Disk Snapshot error 2019-08-30 08:46:36 -07:00
sramamoorthy 5d87443323 improved error msgs for snapshot cmd 2019-08-27 16:43:52 -07:00
Evan Tschannen 7a932479dd throw away state if we ever read popped data from the disk queue adapter 2019-07-30 10:14:39 -07:00
Vishesh Yadav 867986cdea fdbrpc: Reduced connection monitoring from clients
This patch does two changes to connection monitoring:

1. Connection monitoring at client side will check if the connection
has been stayed idle for some time. If connection is unused for a
while, we close the connection. There is some weirdness involved here
as ping messages are by themselves are connection traffic. We get over
this by making it two-phase process, first being checking idle
reliable traffic, followed by disabling pings and then checking for
idle unreliable traffic.

2. Connection monitoring of clients from server will no longer send
pings to clients. Instead, it keep monitor the received bytes and
close after certain period of inactivity.
2019-07-09 14:24:16 -07:00
Vishesh Yadav 6fa7081a21 net: Don't make FailureMonitoring requests from client
This patch removes the need for clients to continuously contact
cluster coordinator for failure monitoring information. Instead, it
uses the FlowTransport to monitor the statuses of peers and update
FailureMonitor accordingly.
2019-06-09 00:43:38 -07:00
sramamoorthy b17ad85497 exec op not supported when log_anti_quorum > 0 2019-05-28 22:07:46 -07:00
sramamoorthy bb474dc323 if recovery < fully_recovered then fail the exec
Will do more cleanup, pushing it for a test run in CI
2019-05-28 22:07:46 -07:00
sramamoorthy 925499954b New status cluster_not_fully_recovered 2019-05-28 22:07:46 -07:00
sramamoorthy 898bed66c1 Allow only whitelisted binary path for exec op 2019-05-28 22:07:46 -07:00
Nikolas Ioannou d6170210e7 AsyncFileCached: throw (new) exception, through helper static fn, if cache eviction polity not recognized. 2019-05-06 10:11:46 +02:00
Evan Tschannen 8e05713a5d do not log a SevError trace event if we cannot deserialize the connect packet 2019-04-10 17:41:02 -07:00
Trevor Clinkenbeard dc2b740415 Added server_overloaded error and client message 2019-01-25 13:15:19 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00
Stephen Atherton 0ec216a1fa Added X-Request-ID header to HTTP requests and verification of matching ID in response, if present. 2019-01-07 17:56:38 -08:00
Stephen Atherton f64d5321e9 Backup describe, expire, and delete will now clearly indicate when there is no backup at the given URL. 2018-12-20 18:05:23 -08:00
Stephen Atherton 5d9cd9acdc Correctness test now has additional random reader which doesn't do verification but isn't stopped when the btree is closed. Fixed bug exposed by this where pager snapshots will still try to read pages after the pager has been shut down or even destroyed. Added new error type, shutdown_in_progress. 2018-10-04 23:46:37 -07:00
Stephen Atherton ce3f01a0cf Added concept of type to JsonString. Appending single items or key/value pairs is now type-safe and only allowed in certain cases. JsonString will refuse to produce invalid JSON. All duplicative functions have been replaced with templates. Encoding of values uses json_spirit's value writer which should be no worse performance than format() and it will escape everything properly. Final string form is now built directly using knowledge of type, such as when an instance becomes an Array or Object the appropriate opening character is written. This avoids a full copy just to prepend the opening character later. Index interface for key/value pairs no longer makes a temporary copy of the key string. JsonString is now only needed by Status.actor.cpp. Still more work to be done here. 2018-09-08 07:15:28 -07:00
Alec Grieser be9c34c6f8
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2 2018-07-10 10:04:48 -07:00
Stephen Atherton 3ce7c78d36 If an HTTP request fails due to a connection failure or a timeout, do not convert the error to the more generic http_request_failed. 2018-07-09 18:58:33 -07:00
A.J. Beamon 0ca51989bb Merge branch 'master' into trace-log-refactor
# Conflicts:
#	fdbserver/QuietDatabase.actor.cpp
#	fdbserver/Status.actor.cpp
#	flow/Trace.cpp
2018-06-08 13:24:30 -07:00
Balachandar Namasivayam 529d0497f1 Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload.
Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.
2018-06-01 15:21:40 -07:00
A.J. Beamon ce0c991e78 Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs. 2018-05-02 10:44:38 -07:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Stephen Atherton 69425a303b Improved error handling for cases where blob account credentials are either not found in the provided credentials sources and/or some of the credentials sources provided are not readable or parseable. 2018-02-07 21:50:43 -08:00
Stephen Atherton 66de9d392b New error code, http_auth_failed, which is used when blob authentication fails instead of the previous generic http_request_failed. 2018-01-22 14:58:56 -08:00
Stephen Atherton 93b34a945f Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup. 2018-01-17 04:09:43 -08:00
A.J. Beamon 653a46f12f Update error string fro cluster_version_changed error 2018-01-04 15:06:09 -08:00
Stephen Atherton aeebe711ce TaskBucket’s saveAndExtend() is now accomplished through extendTimeout() with an option to save parameters. SaveAndExtendIncrementally() has been removed as it is no longer needed because TaskBucket’s normal execution loop calls extendTimeout() periodically as long as the TaskFunc’s execute() actor has not finished or thrown. If a TaskFunc wants to save changes to task parameters to checkpoint progress for task restarts to benefit from it can call extendTimeout() explicitly with the updateParams flag set to true. 2017-11-30 17:18:57 -08:00
Stephen Atherton 3dfaf13b67 IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup.  The addition of IBackupFile and its finish() method simplified the log and range writer tasks.  Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes.  Added KeyBackedSet<T> type.  Moved JSONDoc to its own header.  Added platform::findFilesRecursively().

Still to do:  update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
Stephen Atherton 3afc85881e Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbrpc/BlobStore.actor.cpp
2017-10-20 21:38:28 -07:00