Commit Graph

57 Commits

Author SHA1 Message Date
Evan Tschannen 3f9d9d8b84 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	cmake/FlowCommands.cmake
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/StorageServerInterface.h
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/fdbserver.actor.cpp
#	flow/Knobs.h
#	flow/Platform.cpp
#	versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen fd5705a451 fixed capitalization 2020-01-15 09:35:57 -08:00
Evan Tschannen c93ca04ea6 Do not merge more than 100 shards together to avoid creating untrackable shards 2020-01-15 09:33:27 -08:00
Evan Tschannen b331c5dafe wantsToMerge was created before the shardEvaluator has a chance to update it based on shardSize changes 2020-01-10 17:23:56 -08:00
Evan Tschannen fde53cbeef HasBeenTrueFor was ready immediately after a previous shard merge 2020-01-10 16:28:56 -08:00
Evan Tschannen 9b80498180 Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth 2020-01-10 14:58:38 -08:00
Evan Tschannen e4fa4ad0c9 Data distribution will not merge a shard unless it has been low bandwidth for 5 minutes 2020-01-09 17:02:49 -08:00
Evan Tschannen 4de60fc437 Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/TLogServer.actor.cpp
2019-11-01 15:48:04 -07:00
Evan Tschannen 8f0348d5e0 fix: merges which cross over systemKeys.begin did not properly decrement the systemSizeEstimate 2019-10-31 16:38:33 -07:00
Meng Xu e676348710
Merge pull request #1955 from fzhjon/mark-ss-failed
Add fdbcli and API command to mark storage servers as permanently failed
2019-10-22 23:36:30 -07:00
A.J. Beamon 29a0014b41 Fix "bandwith" typo 2019-10-22 09:51:59 -07:00
Xin Dong fca9aab17a
Merge pull request #2046 from dongxinEric/feature/hot-read-key-detection
Added metrics for read hot key detection
2019-10-21 14:31:48 -07:00
Jon Fu d2b6626d5c Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-21 13:47:06 -07:00
Xin Dong 9a81948843
Accept review suggestions.
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-21 10:08:43 -07:00
Xin Dong 6a40ef25e5 Credit to Evan for pointing out the missing line which costs me weeks debugging some weird behaviors. 2019-10-18 16:46:19 -07:00
Jon Fu b1fd6b4443 addressed review comments 2019-10-18 09:43:25 -07:00
Evan Tschannen 86bcb84b45 Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards 2019-10-11 17:50:43 -07:00
Xin Dong 41aae9cbd9 Fix compiler errors 2019-10-10 13:08:59 -07:00
Xin Dong 795ce59fbb Resolved conflict with master 2019-10-09 16:45:11 -07:00
Xin Dong 62ffdd54a3 Updated some comments to reflect the correct knob value and also used a more appropiate value for read bandwidth. Set the default value for read bandwidth in some cases. 2019-10-09 16:42:42 -07:00
Xin Dong cd4757b06c Address review comments 2019-10-09 16:42:42 -07:00
Xin Dong 6b0f771cc0 Fixex a typo in knobs. Addressed some review comments. Added code for actual metric collecting. 2019-10-09 16:42:42 -07:00
Xin Dong 12293d5497 Added metrics for read hot key detection 2019-10-09 16:42:42 -07:00
A.J. Beamon 909855bcec Fix: the keys argument to changeSizes was passed as a reference, but when used after the first wait(), it may no longer be valid. 2019-10-09 14:07:48 -07:00
Jon Fu d96a7b2c69 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-03 09:47:45 -07:00
Evan Tschannen 045175bd0e added tracking for the size of the system keyspace 2019-09-27 22:39:19 -07:00
Evan Tschannen 3bb62e008c lowered the priority of some delays in data distribution so that the process will prefer other work 2019-09-27 18:33:13 -07:00
Jon Fu 00c2025d4b fixed removeKeys impl, adjusted test workload, and introduced extra safety checks to NativeAPI and proxy 2019-08-27 14:39:44 -07:00
Jon Fu 66bba51988 Implemented direct removal of failed storage server from system keyspace 2019-08-27 14:39:43 -07:00
Meng Xu b7478f5dd3 DD:Add comments to help understand code
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
mpilman d01cbf3455 Addressed code review comments 2019-04-05 13:12:20 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
anoyes 981426bac9 More ide fixes 2019-03-05 18:03:57 -08:00
Jingyu Zhou c38b2a8c38 Change masterId to distributorId in tracker.
This reflects the change of moving data distribution out of master server.
2019-02-14 16:37:16 -08:00
Evan Tschannen 4e54690005 Merge branch 'release-6.0'
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MoveKeys.actor.cpp
2018-11-12 20:26:58 -08:00
Evan Tschannen cd188a351e fix: if a destination team became unhealthy and then healthy again, it would lower the priority of a move even though the source servers we are moving from are still unhealthy
fix: badTeams were not accounted for when checking priorities
2018-11-11 12:33:31 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen e68c07ae35 fix: trackShardBytes was called with the incorrect range, resulting in incorrect shard sizes
reduced the size of shard tracker actors by removing unnecessary state variable. Because we have a large number of these actors these extra state variables add up to a lot of memory
2018-11-02 13:03:01 -07:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
A.J. Beamon 2a97139d5d This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated. 2018-08-16 10:24:12 -07:00
Alex Miller fb31a6999f Rewrite all files to have #include actorcompiler.h as the last include. 2018-08-14 15:50:26 -07:00
Alex Miller 535b5701e5 Rewrite all `Void _ = wait(...)` -> `wait(...)`.
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen 6f02ea843a prevented a slow task when too many shards were sent to the data distribution queue after switching to a fearless deployment 2018-08-09 12:37:46 -07:00
Evan Tschannen 1c29275672 call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details. 2018-08-01 14:30:57 -07:00
Evan Tschannen 392c73affb fixed a few slow tasks 2018-07-12 14:06:59 -07:00
A.J. Beamon e5488419cc Attempt to normalize trace events:
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.

Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.

This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen fa7eaea7cf fix: shards affected by team failure did not properly handle separate teams for the remote and primary data centers 2018-03-08 10:50:05 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00