Commit Graph

233 Commits

Author SHA1 Message Date
Evan Tschannen 3325980c03 Merge branch 'release-6.2'
# Conflicts:
#	CMakeLists.txt
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer_6_0.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.actor.h
#	fdbserver/worker.actor.cpp
#	versions.target
2019-10-24 17:38:15 -07:00
A.J. Beamon a1bed51d34 Ignore batch priority GRVs for latency band tracking 2019-10-23 10:29:58 -07:00
Evan Tschannen 12c517ab16 limit the number of committed version updates in progress simultaneously to prevent running out of memory 2019-10-21 16:01:45 -07:00
Jon Fu d2b6626d5c Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-10-21 13:47:06 -07:00
Evan Tschannen 688940b685 merge 6.2 into master 2019-10-21 11:43:46 -07:00
Evan Tschannen ac28e96bbf added a yield on the proxy to remove a slow task when processing large transactions 2019-10-16 14:31:59 -07:00
Jon Fu 450a09e117 Code Review Changes 2019-09-24 15:48:50 -07:00
Jon Fu 40ad6f0931 created new reply types to work with flatbuffer serialization requirements 2019-09-18 13:40:18 -07:00
Jon Fu 471e283128 Merge branch 'master' of https://github.com/apple/foundationdb into mark-ss-failed 2019-09-18 11:49:07 -07:00
Jingyu Zhou 02230e4ef6 Refactor: moving addition of backup mutations out of commitBatch
commitBatch() is too complicated and should be simplified.
2019-09-12 18:37:08 -07:00
Evan Tschannen dcbb19816b
Merge pull request #1999 from ajbeamon/fix-proxy-grv-budgeting
The master proxy was too slow to erase a GRV budget deficit if no GRV requests were coming in.
2019-08-30 13:32:54 -07:00
sramamoorthy 5d87443323 improved error msgs for snapshot cmd 2019-08-27 16:43:52 -07:00
Jon Fu 3666c0c776 added more trace lines and added timeout to safety check in test workload 2019-08-27 14:39:44 -07:00
Jon Fu 04d514c483 added a wait to check for master proxies changed and put in a few more trace events 2019-08-27 14:39:44 -07:00
Jon Fu e515691d7d do not continuously loop if maybe_request_delivered 2019-08-27 14:39:44 -07:00
Jon Fu 00c2025d4b fixed removeKeys impl, adjusted test workload, and introduced extra safety checks to NativeAPI and proxy 2019-08-27 14:39:44 -07:00
Jon Fu 5a877d6b14 added safety check on client to prevent removing all servers from a team 2019-08-27 14:39:43 -07:00
A.J. Beamon 0b1fc91a9c Revert "Don't grow the budget deficit once it's exceeded some number of seconds of transactions. Decay the deficit if the rate changes and it exceeds the new limit."
This reverts commit 90cb73d472.
2019-08-22 10:05:29 -07:00
A.J. Beamon 90cb73d472 Don't grow the budget deficit once it's exceeded some number of seconds of transactions. Decay the deficit if the rate changes and it exceeds the new limit. 2019-08-19 14:56:59 -07:00
A.J. Beamon b2af17fb08 Simplify logic by removing an unneeded condition. 2019-08-15 08:23:13 -07:00
A.J. Beamon 717ede25b3 Fix: the master proxy was too slow to erase a GRV budget deficit if no GRV requests were coming in. 2019-08-14 15:01:09 -07:00
Evan Tschannen ba54508c47 code cleanup 2019-08-06 16:30:30 -07:00
Evan Tschannen 4c9a392f05 the master checks the popped version of the txsTag before recovering the txnStateStore, to avoid restoring data that is later found to be popped 2019-08-05 17:01:48 -07:00
Andrew Noyes 1bad0fd44e Make requestTime private 2019-07-31 17:59:35 -07:00
Evan Tschannen 6dbaddd0a7 Added a knob to always use CAUSAL_READ_RISKY for GRV 2019-07-30 18:21:46 -07:00
sramamoorthy 63941e0d96 disable DD with a in-memory flag and use in snapv2 2019-07-30 17:04:51 -07:00
Evan Tschannen 5c98dcce6d revert the proxy forwarding path, because it is no longer necessary as clients keep a persistent connection open with coordinators 2019-07-27 16:46:22 -07:00
Evan Tschannen b509a441e7 Merge branch 'master' into feature-skip-confirm
# Conflicts:
#	bindings/flow/tester/Tester.actor.cpp
#	bindings/go/src/_stacktester/stacktester.go
#	bindings/java/src/test/com/apple/foundationdb/test/AsyncStackTester.java
#	bindings/java/src/test/com/apple/foundationdb/test/StackTester.java
#	bindings/python/tests/tester.py
#	bindings/ruby/tests/tester.rb
#	documentation/sphinx/source/api-c.rst
#	documentation/sphinx/source/api-python.rst
#	documentation/sphinx/source/api-ruby.rst
#	documentation/sphinx/source/data-modeling.rst
#	documentation/sphinx/source/developer-guide.rst
#	fdbclient/vexillographer/fdb.options
#	fdbserver/MasterProxyServer.actor.cpp
2019-07-27 15:08:13 -07:00
sramamoorthy 9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
sramamoorthy a65c9f92ed get rid of all timeouts and other changes 2019-07-24 15:36:28 -07:00
sramamoorthy a2f2ad96ff code review comments and merge to master changes 2019-07-24 15:36:28 -07:00
sramamoorthy 869f77aef1 Few cosmetic edits and fixes 2019-07-24 15:36:28 -07:00
sramamoorthy 62c14dae72 disable dd during snap and enable in restore 2019-07-24 15:36:28 -07:00
sramamoorthy 95d6807740 tryGetReply instead of getReply for ddSnapReq 2019-07-24 15:36:28 -07:00
sramamoorthy d0793f5ca2 snap v2: master proxy related changes 2019-07-24 15:36:28 -07:00
Jingyu Zhou 63e37aebaf Reorder include files. 2019-07-19 11:24:26 -07:00
Jingyu Zhou d8fb1ea2d3 Log large transactions at proxy
This can help debugging where large transactions are coming from.
2019-07-19 11:10:48 -07:00
Evan Tschannen d4c332e2cf update lastCommitTime when doing GRV’s without causal read risky 2019-07-14 14:46:00 -07:00
Evan Tschannen 02de53160d only skip confirm epoch live if CAUSAL_READ_RISKY is enabled
time checked on the proxy should be less than the time waited by the master to account for clock speed differences
setting REQUIRED_MIN_RECOVERY_DURATION and ENFORCED_MIN_RECOVERY_DURATION to 0 will go back to the old behavior
2019-07-12 17:58:16 -07:00
Evan Tschannen a63969afb3 enforce a minimum recovery duration, which allows proxies to avoid checking if the epoch is alive as long as its last commit has been less than MINIMUM_RECOVERY_DURATION ago 2019-07-12 13:10:21 -07:00
Evan Tschannen d8948c8be1 Merge branch 'master' into feature-fast-txs-recovery
# Conflicts:
#	fdbserver/TagPartitionedLogSystem.actor.cpp
2019-07-10 13:59:52 -07:00
Evan Tschannen 001abec29d fixed a compiler error, buggified a new knob 2019-07-09 16:50:59 -07:00
Evan Tschannen 64aee73c4f we only need to hold the ReplyPromise for messages that we are going to forward to new proxies 2019-07-09 16:47:56 -07:00
Evan Tschannen c348b3da51 After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies 2019-07-08 12:53:40 -07:00
Jingyu Zhou 50e7593c5b
Merge pull request #1796 from ajbeamon/remove-trace-event-underscores
Remove trace event underscores
2019-07-05 21:45:55 -07:00
Evan Tschannen 15e894c724 Merge in master 2019-07-05 15:49:24 -07:00
A.J. Beamon 9f4b6fd770 Remove additional underscores 2019-07-05 08:12:25 -07:00
Alex Miller 7a500cd37f A giant translation of TaskFooPriority -> TaskPriority::Foo
This is so that APIs that take priorities don't take ints, which are
common and easy to accidentally pass the wrong thing.
2019-06-25 02:47:35 -07:00
Evan Tschannen e0be631414 shard the txs tag so that more transaction logs are involved in its recovery 2019-06-19 18:15:09 -07:00
sramamoorthy 2a68b28590 rebase related changes 2019-05-28 22:07:46 -07:00
sramamoorthy b17ad85497 exec op not supported when log_anti_quorum > 0 2019-05-28 22:07:46 -07:00
sramamoorthy d3a179b6f9 Multiple bug fixes
- wait for snapTLogFailKeys in a loop, otherwise in some race
  condition it can cause a false assert
- in single region, there does not seem to be a guarantee of
  tagLocalityListKey for a given DC ID, avoiding that assert for now
- to find the workers that are coordinators, looking up by primary
  address is not sufficient in some cases, hence looking by both
  primary and secondary address
- test make files to reflect the location of the new test cases
2019-05-28 22:07:46 -07:00
sramamoorthy bb474dc323 if recovery < fully_recovered then fail the exec
Will do more cleanup, pushing it for a test run in CI
2019-05-28 22:07:46 -07:00
sramamoorthy 925499954b New status cluster_not_fully_recovered 2019-05-28 22:07:46 -07:00
sramamoorthy 6f42337c09 TransactionNotPermitted instead of conflict error
When the cluster has not recovered completely, return op not
permitted instead of conflict error
2019-05-28 22:07:46 -07:00
sramamoorthy 4083af0b01 Avoid using trackLatest for TLog pop test cases 2019-05-28 22:07:46 -07:00
sramamoorthy 936ffc2dde rebase related changes 2019-05-28 22:07:46 -07:00
sramamoorthy ec7834e2f7 code re-orgnaization and address comments 2019-05-28 22:07:46 -07:00
sramamoorthy c76cc84ded execute coordinators code reorganized 2019-05-28 22:07:46 -07:00
sramamoorthy 61e93a9304 Address review comments and minor fixes 2019-05-28 22:07:46 -07:00
sramamoorthy 00ccee8a6c workaround for log giving remote log and others
logSystemConfig.allLocalLogs() sometimes returns remote TLog interface
and a workaround is implemented here. Other minor cleanup.
2019-05-28 22:07:46 -07:00
sramamoorthy 89b7a052f5 Bug fixes for snapping coordinators 2019-05-28 22:07:46 -07:00
sramamoorthy 17ecba8313 trace cleanup and other indentation changes 2019-05-28 22:07:46 -07:00
sramamoorthy 898bed66c1 Allow only whitelisted binary path for exec op 2019-05-28 22:07:46 -07:00
sramamoorthy d282016f93 Exec op to tag only local storage nodes 2019-05-28 22:07:46 -07:00
sramamoorthy 6431513ad0 Fail exec req until the cluster is fully_recovered 2019-05-28 22:07:46 -07:00
sramamoorthy 4016f16c76 Fix few compilation and bugs in rebase 2019-05-28 22:07:46 -07:00
sramamoorthy 4bc4c615da exec op to all tlog, restore change in test &other
- exec operation to go to all the TLogs
- minor bug fix in tlog
- restore implementation for the simulator
- restore snap UID to be stored in restartInfo.ini
- test cases added
- indentation and trace file fixes
2019-05-28 22:07:46 -07:00
sramamoorthy 72dd067173 Trace message changes and fix few FIXMEs 2019-05-28 22:07:46 -07:00
sramamoorthy 69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Jingyu Zhou e193cac5ef Merge remote-tracking branch 'apple/master' into tlog
Resolve Conflicts: fdbserver/MasterProxyServer.actor.cpp
2019-05-01 17:18:00 -07:00
Evan Tschannen 2d5043c665 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-30 18:27:04 -07:00
Evan Tschannen 6c77864731 separate GetStorageServerRejoinInfoRequest from GetKeyServerLocationsRequest, to avoid yielding for the rejoin requests 2019-04-25 17:07:35 -07:00
Jingyu Zhou 439d5a3843 Use emplace_back instead of push_back in Proxy 2019-04-22 14:03:48 -07:00
Jingyu Zhou d2b215b926 Refactor tag population of ServerCacheInfo 2019-04-22 11:55:04 -07:00
Jingyu Zhou 966ec30fcc Add pseudoLocalities for special tag consumers 2019-04-21 10:41:07 -07:00
Jingyu Zhou b4e7e7a85b Refactor StorageCache updates 2019-04-21 10:41:07 -07:00
Evan Tschannen 6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
Remove unused functions
2019-04-09 11:49:45 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Jingyu Zhou 47b4b82628
Merge branch 'master' into fix-unreferenced 2019-04-01 14:07:19 -07:00
Jingyu Zhou f7f8ddd894 Fix warnings on unused variables
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
Evan Tschannen b6008558d3 renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen 87e2a1a029 The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update 2019-03-18 16:09:57 -07:00
Evan Tschannen ec6c843124 increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch 2019-03-16 16:18:58 -07:00
Evan Tschannen a7e45cff91
Merge pull request #1176 from jzhou77/ratekeeper
Make Ratekeeper a separate role
2019-03-12 15:58:59 -07:00
Jingyu Zhou dc129207a9 Minor fix after rebase. 2019-03-07 13:16:20 -08:00
Jingyu Zhou 3c86643822 Separate Ratekeeper from data distribution.
Add a new role for ratekeeper.

Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Evan Tschannen 988add9fb5 cache the metadataVersion for commits, so that doing setVersion with the commit version of a different transaction will 2019-03-04 16:48:34 -08:00
Evan Tschannen 075fdef31a Merge branch 'master' into feature-metadata-version
# Conflicts:
#	fdbclient/DatabaseContext.h
2019-03-03 22:58:45 -08:00
Trevor Clinkenbeard 39f612d132 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-03-02 17:07:00 -08:00
Evan Tschannen 841eb10af0
Replace g_network->now() with now()
Co-Authored-By: tclinken <trevorclinkenbeard@gmail.com>
2019-03-02 08:15:56 -08:00
Evan Tschannen a37a65a7ff
Replace g_network->now() with now()
Co-Authored-By: tclinken <trevorclinkenbeard@gmail.com>
2019-03-02 08:15:05 -08:00
Evan Tschannen 3da85f3acd implemented the \xff/metadataVersion key, which can be used by layers to help them cheaply cache metadata and know when their cache is invalid 2019-02-28 17:45:00 -08:00
A.J. Beamon 3e6a6a6569 Update status schema for correctness. Send the count of batch transactions started back to ratekeeper so that it can be logged with other ratekeeper metrics. 2019-02-28 12:00:58 -08:00
A.J. Beamon d48a2d7b54 Fix problem caused by merge 2019-02-27 12:21:07 -08:00
A.J. Beamon a051055caf Initial implementation of adding separate limits for batch priority in ratekeeper 2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard abfe057805 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-02-25 13:47:16 -08:00
Trevor Clinkenbeard 07f800eeee Got rid of detailed field in GetRateInfoReply message 2019-02-23 17:52:11 -08:00
Trevor Clinkenbeard f3a73963b4 Got rid of detailedLeaseDuration in GetRateInfoReply message 2019-02-23 16:42:11 -08:00