Evan Tschannen
3f9d9d8b84
Merge branch 'release-6.2'
...
# Conflicts:
# CMakeLists.txt
# cmake/FlowCommands.cmake
# documentation/sphinx/source/release-notes.rst
# fdbclient/StorageServerInterface.h
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# flow/Knobs.h
# flow/Platform.cpp
# versions.target
2020-01-16 18:37:47 -08:00
Evan Tschannen
fd5705a451
fixed capitalization
2020-01-15 09:35:57 -08:00
Evan Tschannen
c93ca04ea6
Do not merge more than 100 shards together to avoid creating untrackable shards
2020-01-15 09:33:27 -08:00
Evan Tschannen
b331c5dafe
wantsToMerge was created before the shardEvaluator has a chance to update it based on shardSize changes
2020-01-10 17:23:56 -08:00
Evan Tschannen
fde53cbeef
HasBeenTrueFor was ready immediately after a previous shard merge
2020-01-10 16:28:56 -08:00
Evan Tschannen
9b80498180
Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth
2020-01-10 14:58:38 -08:00
Evan Tschannen
e4fa4ad0c9
Data distribution will not merge a shard unless it has been low bandwidth for 5 minutes
2020-01-09 17:02:49 -08:00
Evan Tschannen
4de60fc437
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/TLogServer.actor.cpp
2019-11-01 15:48:04 -07:00
Evan Tschannen
8f0348d5e0
fix: merges which cross over systemKeys.begin did not properly decrement the systemSizeEstimate
2019-10-31 16:38:33 -07:00
Meng Xu
e676348710
Merge pull request #1955 from fzhjon/mark-ss-failed
...
Add fdbcli and API command to mark storage servers as permanently failed
2019-10-22 23:36:30 -07:00
A.J. Beamon
29a0014b41
Fix "bandwith" typo
2019-10-22 09:51:59 -07:00
Xin Dong
fca9aab17a
Merge pull request #2046 from dongxinEric/feature/hot-read-key-detection
...
Added metrics for read hot key detection
2019-10-21 14:31:48 -07:00
Jon Fu
d2b6626d5c
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-21 13:47:06 -07:00
Xin Dong
9a81948843
Accept review suggestions.
...
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-10-21 10:08:43 -07:00
Xin Dong
6a40ef25e5
Credit to Evan for pointing out the missing line which costs me weeks debugging some weird behaviors.
2019-10-18 16:46:19 -07:00
Jon Fu
b1fd6b4443
addressed review comments
2019-10-18 09:43:25 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
Xin Dong
41aae9cbd9
Fix compiler errors
2019-10-10 13:08:59 -07:00
Xin Dong
795ce59fbb
Resolved conflict with master
2019-10-09 16:45:11 -07:00
Xin Dong
62ffdd54a3
Updated some comments to reflect the correct knob value and also used a more appropiate value for read bandwidth. Set the default value for read bandwidth in some cases.
2019-10-09 16:42:42 -07:00
Xin Dong
cd4757b06c
Address review comments
2019-10-09 16:42:42 -07:00
Xin Dong
6b0f771cc0
Fixex a typo in knobs. Addressed some review comments. Added code for actual metric collecting.
2019-10-09 16:42:42 -07:00
Xin Dong
12293d5497
Added metrics for read hot key detection
2019-10-09 16:42:42 -07:00
A.J. Beamon
909855bcec
Fix: the keys argument to changeSizes was passed as a reference, but when used after the first wait(), it may no longer be valid.
2019-10-09 14:07:48 -07:00
Jon Fu
d96a7b2c69
Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed
2019-10-03 09:47:45 -07:00
Evan Tschannen
045175bd0e
added tracking for the size of the system keyspace
2019-09-27 22:39:19 -07:00
Evan Tschannen
3bb62e008c
lowered the priority of some delays in data distribution so that the process will prefer other work
2019-09-27 18:33:13 -07:00
Jon Fu
00c2025d4b
fixed removeKeys impl, adjusted test workload, and introduced extra safety checks to NativeAPI and proxy
2019-08-27 14:39:44 -07:00
Jon Fu
66bba51988
Implemented direct removal of failed storage server from system keyspace
2019-08-27 14:39:43 -07:00
Meng Xu
b7478f5dd3
DD:Add comments to help understand code
...
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
mpilman
d01cbf3455
Addressed code review comments
2019-04-05 13:12:20 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00
Jingyu Zhou
c38b2a8c38
Change masterId to distributorId in tracker.
...
This reflects the change of moving data distribution out of master server.
2019-02-14 16:37:16 -08:00
Evan Tschannen
4e54690005
Merge branch 'release-6.0'
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MoveKeys.actor.cpp
2018-11-12 20:26:58 -08:00
Evan Tschannen
cd188a351e
fix: if a destination team became unhealthy and then healthy again, it would lower the priority of a move even though the source servers we are moving from are still unhealthy
...
fix: badTeams were not accounted for when checking priorities
2018-11-11 12:33:31 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
e68c07ae35
fix: trackShardBytes was called with the incorrect range, resulting in incorrect shard sizes
...
reduced the size of shard tracker actors by removing unnecessary state variable. Because we have a large number of these actors these extra state variables add up to a lot of memory
2018-11-02 13:03:01 -07:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
A.J. Beamon
2a97139d5d
This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.
2018-08-16 10:24:12 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
6f02ea843a
prevented a slow task when too many shards were sent to the data distribution queue after switching to a fearless deployment
2018-08-09 12:37:46 -07:00
Evan Tschannen
1c29275672
call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.
2018-08-01 14:30:57 -07:00
Evan Tschannen
392c73affb
fixed a few slow tasks
2018-07-12 14:06:59 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen
fa7eaea7cf
fix: shards affected by team failure did not properly handle separate teams for the remote and primary data centers
2018-03-08 10:50:05 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00