Xin Dong
0b0414fb94
Addressded review comments. Change the issue reporting from 'ITraceLogWriter' to be a more generic way.
2020-02-25 15:37:53 -08:00
Xin Dong
034dfe5e42
Now the inability to flush trace logs will be reported to both 'stderr' and also the status json object.
...
- Since the first flush failure, if the accumulated consecutive failure count exceeds the value defined in knobs, it will trigger the current worker process to report this issue via the 'GetServerDBInfo' interface of the cluster controler
- A successful flush will reset the accumulated counter.
Notice that the current solution does not take the time into consideration. The assumption is that flush failures tend to only happen in a clustered manner. The intermittent, but short, periods of flush failures are not considered as a problem since the memory pressure built by them should be negligible.
2020-02-25 15:37:32 -08:00
Evan Tschannen
96258b9809
Merge branch 'release-6.2'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbcli/fdbcli.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/FlowTransport.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistribution.actor.h
# fdbserver/DataDistributionQueue.actor.cpp
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/SkipList.cpp
# fdbserver/StorageMetrics.actor.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/fdbserver.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KVStoreTest.actor.cpp
# flow/CMakeLists.txt
# flow/Knobs.cpp
# flow/Knobs.h
# flow/genericactors.actor.cpp
# flow/serialize.h
2020-02-21 19:09:16 -08:00
Andrew Noyes
17660fb18d
Fix simulation test
2020-02-11 13:49:19 -08:00
Meng Xu
3b57bf1781
Merge branch 'master' into mengxu/fast-restore-agent-PR
2020-02-03 17:23:54 -08:00
Meng Xu
559b95c61a
FastRestore:RestoreRole:Mimic how fdbd starts
2020-02-01 10:23:48 -08:00
Vishesh Yadav
6e6cfaff16
Cleanup old Failure Monitoring code
2020-01-07 15:53:32 -08:00
tclinken
8f84fbc4b9
Only print 'waiting for DD to end...' if test actually waits
2019-11-03 16:13:32 -08:00
Evan Tschannen
58b984b846
Merge pull request #2047 from tclinken/lock-aware-db-ping
...
Use lock aware transaction for pingDatabase
2019-10-31 10:24:20 -07:00
Meng Xu
0b785e5c1c
DD:getTeam may fail to get a team when it can
...
Due to randomness, when unhealthy teams are majority while there still
exists healthy teams, getTeam function may be unlucky to find
any feasible (ok) team, which leads to BestTeamStuck situation.
This commit increases the tries from 10 to 20.
A long-term solution may first find all feasible teams and choose a random
one from them. Since This can affect the statistics of which team is picked.
So it is not included in this commit.
Non-functional change: This commit removes unneeded printf introduced by
fast restore PR 1404.
2019-09-07 20:08:58 -07:00
Trevor Clinkenbeard
2f92cf8c96
Use lock aware transaction for pingDatabase
2019-09-07 12:25:44 -07:00
Meng Xu
c2355f721e
Merge branch 'master' into mengxu/performant-restore-PR
2019-09-04 17:11:42 -07:00
Andrew Noyes
6aa0ada7b1
Replace scalar root types with proper messages
2019-08-28 14:40:50 -07:00
Meng Xu
3b54363780
FastRestore:Apply Clang-format
2019-08-01 18:09:12 -07:00
Meng Xu
7ccaeddf05
Merge branch 'master' into mengxu/performant-restore-PR
2019-08-01 13:23:17 -07:00
Evan Tschannen
90e3b50213
Merge branch 'master' into feature-coordinator-connection
...
# Conflicts:
# fdbclient/DatabaseContext.h
# fdbclient/NativeAPI.actor.cpp
# fdbclient/NativeAPI.actor.h
# fdbserver/workloads/KillRegion.actor.cpp
2019-07-26 15:05:02 -07:00
Meng Xu
45083edf74
Merge branch 'master' into mengxu/performant-restore-PR
...
Fix conflicts as well.
2019-07-25 10:46:11 -07:00
sramamoorthy
c18558cf55
enable DD mode in restore based on test spec
2019-07-24 15:36:28 -07:00
sramamoorthy
62c14dae72
disable dd during snap and enable in restore
2019-07-24 15:36:28 -07:00
Evan Tschannen
4a866290b7
Clients keep a persistent connection open with coordinators to get updates to the list of proxies
...
Status still needs to be updated with client information with information from the coordinators
2019-07-23 19:22:44 -07:00
A.J. Beamon
a5a6f8431c
Add a random UID to TransactionMetrics in case a client opens multiple connections and also a field to indicate whether the connection is internal. Convert some of the metrics to our Counter object instead of running totals.
2019-07-08 14:01:04 -07:00
A.J. Beamon
603721e125
Merge branch 'master' into thread-safe-random-number-generation
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/AsyncFileCached.actor.h
# fdbrpc/genericactors.actor.cpp
# fdbrpc/sim2.actor.cpp
# fdbserver/DiskQueue.actor.cpp
# fdbserver/workloads/BulkSetup.actor.h
# flow/ActorCollection.actor.cpp
# flow/Net2.actor.cpp
# flow/Trace.cpp
# flow/flow.cpp
2019-05-23 08:35:47 -07:00
mpilman
f6fbad5061
Fix memory bug
2019-05-13 14:15:23 -07:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Austin Seipp
bf378952cb
fdbserver: fix some print/scan format warnings
...
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Meng Xu
529ce66b6c
Merge branch 'apple/master' into mengxu/performant-restore-PR
2019-04-18 18:02:45 -07:00
Andrew Noyes
ef04471a66
Fix more unused-variable warnings
2019-04-17 16:04:10 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
Meng Xu
70d7c289f4
Merge branch 'master' into mengxu/restore/parallel-v7
2019-03-30 22:13:10 -07:00
Balachandar Namasivayam
0bbdc15f71
Multi-test processes waits until a timeout if any of the tester processes restarts. Use getReplyUnlessFailedFor instead of getReply to detect the restarts and fail quickly instead of waiting for a timeout which is usually large.
2019-03-28 17:05:30 -07:00
Evan Tschannen
710a64dc4e
replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails
2019-03-08 11:25:07 -05:00
A.J. Beamon
e2bcecb08f
Merge branch 'master' into ratekeeper-batch-priority-limits
2019-02-28 12:52:44 -08:00
Evan Tschannen
fafb66b0a8
Merge pull request #1126 from bnamasivayam/ratelimit-consistencycheck
...
Dynamically rate limit consistency check.
2019-02-27 14:43:09 -08:00
A.J. Beamon
a051055caf
Initial implementation of adding separate limits for batch priority in ratekeeper
2019-02-27 10:31:56 -08:00
mpilman
999ea09bfd
Use correct fwd decls in TesterInterface
...
Also TesterInterface.h -> TesterInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
699216f713
Use fwd decls in workloads
...
Also workloads.h -> workloads.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3cb2391b58
use proper fwd declarations in ManagementAPI
...
Also ManagementAPI.h -> ManagementAPI.actor.h
2019-02-19 15:16:59 -08:00
Vishesh Yadav
907446d0ce
Merge remote-tracking branch 'apple/master' into task/tls-upgrade
2019-02-14 11:37:38 -08:00
Balachandar Namasivayam
f44f26c232
Dynamically rate limit consistency check.
2019-02-07 16:08:39 -08:00
Meng Xu
550f2e2682
Merge with master to use the latest backup container
...
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
Evan Tschannen
1d7fec3074
Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2
...
# Conflicts:
# .gitignore
2019-01-24 17:43:06 -08:00
mpilman
79637f07ac
Fixed several minor code issues
...
These will become a problem as soon as we
switch to C++17
2019-01-24 14:43:12 -08:00
Meng Xu
c91d143504
BugFix: master should wait until at least 2 workers have registered their interfaces
...
otherwise, when master proceeds to distribute workload, it will find 0 loader or applier, which violates the invariant
2019-01-10 19:56:19 -08:00
Balachandar Namasivayam
baeaa490e4
Add some sanity checks to tester.actor.cpp
2019-01-10 11:05:50 -08:00
Vishesh Yadav
3eb9b23024
Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint
...
- This patch will make FDB listen to multiple addresses given via
command line. Although, we'll still use first address in most places,
this patch starts using vector<NetworkAddress> in Endpoint at some basic
places.
- When sending packets to an endpoint, pick a random network address in
endpoints
- Renames Endpoint::address to Endpoint::addresses since it
now holds a vector of addresses.
2018-12-13 13:36:52 -08:00
Vishesh Yadav
43e5a46f9b
Change Endpoint::address(NetworkAddress) to vector<NetworkAddress>
...
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.
This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.
NOTE:
Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
2018-12-13 13:36:52 -08:00
Meng Xu
f27a7f20ac
print timeout value
2018-12-07 21:24:28 -08:00
Meng Xu
80b2f75187
debug why restore did not restore the complete data
2018-12-03 19:29:17 -08:00