sfc-gh-tclinkenbeard
6235d087a6
Prevent shardTracker or trackShardBytes from accidentally unsafely accessing DataDistributionTracker
2020-11-16 12:46:21 -08:00
sfc-gh-tclinkenbeard
ca8ea3b6ff
Fix memory issues caused by cancelling data distribution tracker
2020-11-15 23:52:36 -08:00
Meng Xu
222da17558
Merge branch 'release-6.2' into mengxu/ha-code-read
2020-11-12 13:39:27 -08:00
Meng Xu
c2dd7d1d38
Remove unresolved questions
2020-11-11 22:39:11 -08:00
Andrew Noyes
f467524e06
Don't dereference self on broken_promise
2020-11-11 00:24:23 +00:00
Meng Xu
063700e4d6
Add comments and questions to HA and tLog code reading
...
The comments' correctness need to be confirmed by reviewers.
2020-10-30 12:14:57 -07:00
sfc-gh-tclinkenbeard
91a8367acb
Avoid slow task in ~DataDistributionTracker
2020-10-01 11:44:55 -07:00
Evan Tschannen
7adc916e18
Merge pull request #2806 from ajbeamon/improve-team-request-performance
...
Improve performance of get team requests.
2020-03-16 11:56:45 -07:00
Evan Tschannen
12f2b32770
added additional logging in data distribution
2020-03-13 15:19:33 -07:00
A.J. Beamon
555db50cd1
Avoid calling into SABTF so frequently. Use a cheaper call that only checks that shards exist.
2020-03-12 11:22:03 -07:00
Evan Tschannen
fd5705a451
fixed capitalization
2020-01-15 09:35:57 -08:00
Evan Tschannen
c93ca04ea6
Do not merge more than 100 shards together to avoid creating untrackable shards
2020-01-15 09:33:27 -08:00
Evan Tschannen
b331c5dafe
wantsToMerge was created before the shardEvaluator has a chance to update it based on shardSize changes
2020-01-10 17:23:56 -08:00
Evan Tschannen
fde53cbeef
HasBeenTrueFor was ready immediately after a previous shard merge
2020-01-10 16:28:56 -08:00
Evan Tschannen
9b80498180
Added a trace event to warn if a shard is merged before enough time has elapses from becoming low bandwidth
2020-01-10 14:58:38 -08:00
Evan Tschannen
e4fa4ad0c9
Data distribution will not merge a shard unless it has been low bandwidth for 5 minutes
2020-01-09 17:02:49 -08:00
Evan Tschannen
8f0348d5e0
fix: merges which cross over systemKeys.begin did not properly decrement the systemSizeEstimate
2019-10-31 16:38:33 -07:00
Evan Tschannen
86bcb84b45
Raised the data distribution priority of splitting shards above restoring fault tolerance to avoid hot write shards
2019-10-11 17:50:43 -07:00
A.J. Beamon
909855bcec
Fix: the keys argument to changeSizes was passed as a reference, but when used after the first wait(), it may no longer be valid.
2019-10-09 14:07:48 -07:00
Evan Tschannen
045175bd0e
added tracking for the size of the system keyspace
2019-09-27 22:39:19 -07:00
Evan Tschannen
3bb62e008c
lowered the priority of some delays in data distribution so that the process will prefer other work
2019-09-27 18:33:13 -07:00
Meng Xu
b7478f5dd3
DD:Add comments to help understand code
...
Add comments to explain the functionalities of some code.
2019-07-22 11:23:16 -07:00
Alex Miller
7a500cd37f
A giant translation of TaskFooPriority -> TaskPriority::Foo
...
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
mpilman
d01cbf3455
Addressed code review comments
2019-04-05 13:12:20 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00
Jingyu Zhou
c38b2a8c38
Change masterId to distributorId in tracker.
...
This reflects the change of moving data distribution out of master server.
2019-02-14 16:37:16 -08:00
Evan Tschannen
4e54690005
Merge branch 'release-6.0'
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MoveKeys.actor.cpp
2018-11-12 20:26:58 -08:00
Evan Tschannen
cd188a351e
fix: if a destination team became unhealthy and then healthy again, it would lower the priority of a move even though the source servers we are moving from are still unhealthy
...
fix: badTeams were not accounted for when checking priorities
2018-11-11 12:33:31 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
e68c07ae35
fix: trackShardBytes was called with the incorrect range, resulting in incorrect shard sizes
...
reduced the size of shard tracker actors by removing unnecessary state variable. Because we have a large number of these actors these extra state variables add up to a lot of memory
2018-11-02 13:03:01 -07:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
A.J. Beamon
2a97139d5d
This is the first step in eliminating the usage of database names in our code. The C API remains the same, but underneath that all usage of database names is eliminated.
2018-08-16 10:24:12 -07:00
Alex Miller
fb31a6999f
Rewrite all files to have #include actorcompiler.h as the last include.
2018-08-14 15:50:26 -07:00
Alex Miller
535b5701e5
Rewrite all `Void _ = wait(...)` -> `wait(...)`.
...
This takes advantage of the new actorcompiler functionality to avoid
having duplicate definitions of `Void _` when trying to feed the
un-actorompiled source through clang.
2018-08-14 15:50:26 -07:00
Evan Tschannen
6f02ea843a
prevented a slow task when too many shards were sent to the data distribution queue after switching to a fearless deployment
2018-08-09 12:37:46 -07:00
Evan Tschannen
1c29275672
call all methods which could disable a trace event before it is initialized. In practice this means calling .error first, then .suppressFor, then all your details.
2018-08-01 14:30:57 -07:00
Evan Tschannen
392c73affb
fixed a few slow tasks
2018-07-12 14:06:59 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Evan Tschannen
fa7eaea7cf
fix: shards affected by team failure did not properly handle separate teams for the remote and primary data centers
2018-03-08 10:50:05 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Evan Tschannen
ebd94bb654
removed a separately configurable storage team size for the remote data center, because it did not make sense
...
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
Evan Tschannen
c3918d892a
do not use bandwidth splitting on the keyServer shard, lots of sets and clears to this shard generally means you do not want to create additional data distribution work
2017-11-30 18:28:16 -08:00
Evan Tschannen
aa0c2ae317
only increase the max shard size if the shard begins in the keyServer keyspace, do not increase the minimum shard size
2017-10-27 14:22:26 -07:00
Evan Tschannen
3a4078bdda
the keyservers shards are always a fixed large size
2017-10-27 11:52:11 -07:00
Yichi Chiang
53e1ae9f60
shard system keyspace
2017-07-26 13:47:31 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00