Commit Graph

713 Commits

Author SHA1 Message Date
Alec Grieser 2868908c14 make use of Tuple.pack(prefix) in java tests 2017-10-09 15:28:52 -07:00
Alec Grieser 152e10eba1 added hasIncompleteVersionstamp utility method to tuples 2017-10-09 13:52:00 -07:00
Alec Grieser a9cc7af79e added versionstamps to java tuples 2017-10-09 11:07:34 -07:00
Evan Tschannen 5e6eba365b fix: always set confChange, because popVersion is not deterministic across proxies, and confChange needs to be set deterministically 2017-10-06 18:37:08 -07:00
Evan Tschannen 93b3d0e4e7 fix: toMap didn’t report logs proxies and resolvers 2017-10-06 15:55:50 -07:00
Alex Miller a21c8a820b Move cpuProfilerRequest from WorkerInterface to ClientWorkerInterface.
A way to access this stream is required if we wish to be able to toggle
profiling from fdbcli.  There's two ways to do this:

1. Use `monitorLeader()` to get a `ClusterControllerFullInterface`, and use
`getWorkers` from there to get a list of `WorkerInterface`s, from which we can
access cpuProfilerRequest.
2. Move cpuProfilerRequest to ClientWorkerInterface and use the existing code
in the client that can fetch a list of all `ClientWorkerInterface`s.

The split between WorkerInterface and ClientWorkerInterface appears to be
what a client might have a need to call versus what is fdbserver-internal (and
thus no client should even want to call). Thus, it seems to make more sense to
acknowledge that profiling is useful to be able to toggle from a client, and go
with option (2).
2017-10-05 14:08:28 -07:00
Yichi Chiang 3edc2824a9 Add initialClass to RegisterWorkerRequest 2 2017-10-05 11:03:25 -07:00
John Brownlee 6ad9e389dc Merge pull request #168 from cie/fdbmonitor-fork-retry-support
Add support for retrying a process if fork fails. The HUP signal now …
2017-10-05 10:19:43 -07:00
Alvin Moore 0c899c167a Merge branch 'master' of github.com:apple/foundationdb 2017-10-05 08:54:37 -07:00
A.J. Beamon c1bc355306 Add support for retrying a process if fork fails. The HUP signal now causes configuration to be reloaded and timeouts to be reset. A little refactoring to make this easier. 2017-10-05 08:23:52 -07:00
Alvin Moore de8f875038 Fixed call to IsClear
Changed killMachine and killDataCenter interface to return final killtype
Updated TESTs for DataCenter to ensure that DataCenter was killed
Added assertion to ensure that failed DC kills were not downgrades
2017-10-05 03:07:20 -07:00
Yichi Chiang 05f7626e39 Add initialClass to RegisterWorkerRequest 2017-10-04 17:11:12 -07:00
Yichi Chiang 3c70df57b5 Fix cluster controller review comments 2017-10-04 15:48:55 -07:00
A.J. Beamon 63570ccb05 Merge pull request #163 from cie/alexmiller/txnprofcli
Allow client profiling to be configured from fdbcli.
2017-10-04 14:35:14 -07:00
Alex Miller 2e662b6bb6 Fixing review comments.
* parse_with_units found a proper home in flow.h while this was pending
* atof->strtod for error checking
2017-10-04 14:00:38 -07:00
Alex Miller e55cc447d2 Address code review comments.
* Fixed memory corruption with SystemData key constants
* Removed duplication in ClusterController
* Reworked fdbcli actions to better represent explicit vs default assignments
2017-10-04 13:36:18 -07:00
Alex Miller 80fa597422 Allow client profiling to be configured from fdbcli.
This adds the following commands:
* profile client status
* profile client on 0.001 100MB
* profile client off
2017-10-04 13:36:18 -07:00
A.J. Beamon 5063793f36 Revert line ending change 2017-10-04 11:19:19 -07:00
A.J. Beamon d886b95628 Merge pull request #131 from cie/33300740-with-shutdown-hooks
<rdar://problem/33300740> Java: support callbacks from external multi-version client threads
2017-10-04 09:17:25 -07:00
Alex Miller 706427ee62 Fix potential division by zero issues via RPC.
A carefully crafted SplitMetricRequest could have caused division by zero.
It's not really great to offer Division By Zero As A Service, so let's just
return an error instead.
2017-10-03 22:11:08 -07:00
Alex Miller 0ac868ad5d "Simplify" IndexedSet's insert and addMetric API.
The existing code tried to work around the complexities of optionally using
rvalue references' move capabilities if they exist.  As seen in the previous
MapPair, there's a combinatorial explosion of prototypes to declare as the
parameter length increases.  Because of this, addMetric ended up with a strange
API, and there was a wrapper to make a copy for insert.

Instead, we can apply the idiom of using universal/forwarding references and
std::forward to allow the compiler to instantiate the combinations that are
needed.  There's a TagData struct with no copy constructor that validates that
move constructors can be properly called still.

I measured a 12-byte difference between before and after this change, so no
template bloat was introduced.
2017-10-03 20:15:12 -07:00
Evan Tschannen 3a2ddcc84a Add destinations that are read-write to the source list, so that cancelled data movement can contribute to copying the data for the next movement. 2017-10-03 17:39:08 -07:00
Stephen Atherton fd5fe3a000 Add slightly better handling of HTTP 503 in blob client. Previously it would end the blob request loop and the task doing the blob action would see a failure, but now the blob request attempt loop will continue to back off and retry. This is better because previously the task that saw the failure would be re-run quickly. 2017-10-03 15:25:49 -07:00
Stephen Atherton 03c4cea511 Added rate-controlled TraceEvents for blob http connection attempts and failures. 2017-10-03 15:21:40 -07:00
Yichi Chiang 25870189ae Merge pull request #165 from cie/fix-connection-count
Fix connection count
2017-10-03 14:03:54 -07:00
Yichi Chiang 284e35204a Fix connection count 2017-10-03 10:54:20 -07:00
Alvin Moore 5257b99d3f Fixed problem with machines RebootedAndCleared not being considered dead in availability consideration 2017-10-03 10:48:16 -07:00
Evan Tschannen 7818a7972b fix: read_lock_aware had the same code as used_during_commit_protection_disable 2017-10-03 09:37:45 -07:00
Balachandar Namasivayam 0e153cdd35 Throttle Spammy logs. Three knobs are added.
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.
2017-10-02 18:43:11 -07:00
Alvin Moore d099656557 Merge branch 'release-5.0' 2017-10-02 12:05:24 -07:00
Alvin Moore 25513d8e2c Added tests for DataCenter kills 2017-10-02 12:04:28 -07:00
Evan Tschannen 6ea9903c82 Merge branch 'release-5.0'
# Conflicts:
#	fdbbackup/backup.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	versions.target
2017-10-01 18:46:44 -07:00
Evan Tschannen 8250528756 updated version.target for 5.0.6 2017-10-01 18:42:16 -07:00
Evan Tschannen 1a73e31959 updated wix guid 2017-10-01 18:37:22 -07:00
Evan Tschannen 5f82bfe533 updated release notes for 5.0.5 2017-10-01 17:45:56 -07:00
Stephen Atherton fe7530ed53 Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0 2017-10-01 16:45:56 -07:00
Stephen Atherton ad9de674ac Knob change, blob requests should be allowed more time. 2017-10-01 16:45:47 -07:00
Evan Tschannen 0949c4be65 Revert "Fixed problem with master being recruited on excluded servers"
This reverts commit 1f7b624734a8ad6e896dd3f01f9cdf334ca62486.
2017-10-01 16:30:19 -07:00
Evan Tschannen 696d432462 Revert "fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)"
This reverts commit 83b2ce68c8e1a29fc1559598cc38d3ef7eb46101.
2017-10-01 16:29:32 -07:00
Evan Tschannen 0dde15f1d2 fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)
fix: better master exists did not use exclusions because the configuration was reset
2017-10-01 16:26:58 -07:00
Stephen Atherton 058300be16 Each blobstore request will again select a random remote address. This used to happen before recent load balancing improvements related to focusing too much load on consistently up endpoints after others have recovered from being down. 2017-10-01 16:17:38 -07:00
Stephen Atherton 13a79482d8 Added comments for clarity. 2017-10-01 16:03:12 -07:00
Stephen Atherton a95107417f Improved behavior of slow writes during backup. KeyRange and Log backup tasks now use TaskBucket::saveAndExtend() to keep the task alive until flushing the file finishes or fails with an error (blob uploads fail after a limited number of retries). This prevents blob uploads from being retried too often if the destination is slow since a task abort and retry would start the backoff counters back at zero. Also removed a debugging behavior that was accidentally checked in. 2017-10-01 16:01:24 -07:00
Stephen Atherton a098919b20 Bug fix, releaser declared in wrong place, and lots of whitespace cleanup from try blocks that were no longer needed. 2017-10-01 11:25:50 -07:00
Stephen Atherton af87ac301d Removed wait never used for debugging which was accidentally included in bug fix. 2017-10-01 11:19:38 -07:00
Stephen Atherton 6000cafde1 Bug fix, locks were being taken inside try/catch so release would be done even if the take threw an error. Changed to using a Releaser. 2017-10-01 10:46:55 -07:00
Evan Tschannen f84e7252e8 fix: there was a reference counting cycle in asyncFileBlobStore and asyncFileReadAhead 2017-09-29 19:13:08 -07:00
A.J. Beamon 38616424f6 Report a couple error cases in blobstore URL parsing when dealing with numbers. 2017-09-29 17:58:49 -07:00
Alex Miller 440437f190 Merge pull request #156 from cie/alexmiller/drtime
Make versionstamped operations always have a version less than the current database version
2017-09-29 17:30:53 -07:00
Yichi Chiang 636ce4a131 Replace leader when find a better one 2017-09-29 16:34:55 -07:00