Commit Graph

152 Commits

Author SHA1 Message Date
Evan Tschannen e0be631414 shard the txs tag so that more transaction logs are involved in its recovery 2019-06-19 18:15:09 -07:00
Alex Miller 51fd42a4d2 Merge remote-tracking branch 'upstream/master' into spilled-only-peek 2019-06-18 17:33:52 -07:00
sramamoorthy 1190f2f33d rebased related changes 2019-05-28 22:07:46 -07:00
sramamoorthy b43c100e57 TLog bug fixes 2019-05-28 22:07:46 -07:00
sramamoorthy 3877f87481 comment change in tLogCommit 2019-05-28 22:07:46 -07:00
sramamoorthy 31b6c86650 ignorePopDeadline to have high limit in simulator
- ignorePopDeadline to have highier limit in simulator
to accommdate for the buggify delays and make snapshot succeed.

- introduce a new knob for auto resetting the disabling of tlog pop
2019-05-28 22:07:46 -07:00
sramamoorthy b1b96946af logData->stop check right after execOpHold wait 2019-05-28 22:07:46 -07:00
sramamoorthy 5749e220bd use FlowLock for implementing critical section
Instead of using Promises and future to implement
critcal section use FlowLock
2019-05-28 22:07:46 -07:00
sramamoorthy e6c0b87a4d remove unused variable 2019-05-28 22:07:46 -07:00
sramamoorthy f27a40f118 execProcessingHelper made synchronous
tLogCommit exects no blocking between duplicate check and
setting of the new version, that constraint was broken
when synchronous execProcessingHelper was introduced.
As a fix, execProcessingHelper was made asynchronous.
2019-05-28 22:07:46 -07:00
sramamoorthy d3a179b6f9 Multiple bug fixes
- wait for snapTLogFailKeys in a loop, otherwise in some race
  condition it can cause a false assert
- in single region, there does not seem to be a guarantee of
  tagLocalityListKey for a given DC ID, avoiding that assert for now
- to find the workers that are coordinators, looking up by primary
  address is not sufficient in some cases, hence looking by both
  primary and secondary address
- test make files to reflect the location of the new test cases
2019-05-28 22:07:46 -07:00
sramamoorthy dcd2d96751 make spawnProcess predictable in the simulator 2019-05-28 22:07:46 -07:00
sramamoorthy 4083af0b01 Avoid using trackLatest for TLog pop test cases 2019-05-28 22:07:46 -07:00
sramamoorthy ec7834e2f7 code re-orgnaization and address comments 2019-05-28 22:07:46 -07:00
sramamoorthy b6e037ffbc Replace fork with boost::process::child 2019-05-28 22:07:46 -07:00
sramamoorthy e91c76834e tlog: move snap create part to indepdendent funcs 2019-05-28 22:07:46 -07:00
sramamoorthy 61e93a9304 Address review comments and minor fixes 2019-05-28 22:07:46 -07:00
sramamoorthy 9e3104c2d4 Fix: races in async exec leading to bad backup 2019-05-28 22:07:46 -07:00
sramamoorthy cfdad0c5e6 tlog to snapshot exactly at exec version 2019-05-28 22:07:46 -07:00
sramamoorthy 539e65efad Skip parsing mutations if it is tagged for TxsTag
In Tlog, if a mutation is targetted for TxsTag then skip from
parsing them.
2019-05-28 22:07:46 -07:00
sramamoorthy 17ecba8313 trace cleanup and other indentation changes 2019-05-28 22:07:46 -07:00
sramamoorthy aa79480d69 changes to make fdbfork asynchronous 2019-05-28 22:07:46 -07:00
sramamoorthy 4016f16c76 Fix few compilation and bugs in rebase 2019-05-28 22:07:46 -07:00
sramamoorthy 3d5998e9dd tlog: when pops are disabled, store them & replay
In Tlogs, disable pop is done whlie taking snapshots. Earlier, tlogs
were ignoring the pops if it got pop requests when pops were
disabled. In this change, instead of ignoring the pop - it remembers
the list of pops in-memory and plays them once the popping is
enabled.
2019-05-28 22:07:46 -07:00
sramamoorthy 4bc4c615da exec op to all tlog, restore change in test &other
- exec operation to go to all the TLogs
- minor bug fix in tlog
- restore implementation for the simulator
- restore snap UID to be stored in restartInfo.ini
- test cases added
- indentation and trace file fixes
2019-05-28 22:07:46 -07:00
sramamoorthy 72dd067173 Trace message changes and fix few FIXMEs 2019-05-28 22:07:46 -07:00
sramamoorthy 69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
A.J. Beamon f417e60264 Merge branch 'merge-release-6.1-into-master' into thread-safe-random-number-generation
# Conflicts:
#	fdbserver/QuietDatabase.actor.cpp
2019-05-23 09:52:00 -07:00
A.J. Beamon d29c7e4c9b Merge branch 'release-6.1' into merge-release-6.1-into-master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/QuietDatabase.actor.cpp
#	versions.target
2019-05-23 09:28:45 -07:00
Evan Tschannen 4059d68348 fix: the tlog would not pop data from the disk queue after a storage server was removed, because the tag still exists in memory on the logs
fix: we could incorrectly make data durable if eraseMessagesFromMemory was in progress while running updatePersistentData
the quiet database check now ensure that tlogs have no more than 30 seconds of versions unpopped from the disk queue
2019-05-20 23:58:45 -07:00
Alex Miller 4eb4c03ce5 Save TLog resources by letting peek request only spilled data.
If a peek is entirely fulfilled from spilled data, then it's likely that
the next peek will be also.  It is thus wasteful for each of these peeks
to call peekMessagesFromMemory, which memcpy's excessively, and then
throw all that data away without using it.

Now, TLogs will give a hint back to peek cursors about if the provided
reply was served entirely from the spilled data, which peek curors then
feed back as the hint into their next request.

At some point, a cursor will send a request for only spilled data, get
an incomplete response, and then be told to send its next request as one
that peeks from memory as well, and then it will fully catch up.
2019-05-14 15:38:48 -10:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Evan Tschannen 22499666d0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/LogRouter.actor.cpp
#	flow/Trace.cpp
#	versions.target
2019-05-08 18:19:35 -07:00
Evan Tschannen 93eb2a9395
Merge pull request #1527 from alexmiller-apple/tstlog-6.1
Spill-by-reference knob + TLog6.0 Spilled Peek deprioritization
2019-05-03 17:19:45 -07:00
Alex Miller c918b21137 Deprioritize spilled peeks in spill-by-value, and improve its logic.
This deprioritizes before calling peekMessagesFromMemory, which should
improve the memory usage of the TLog, and makes sure to keep txsTag
peeks at a high priority to help recoveries stay fast.
2019-05-03 15:27:11 -07:00
Evan Tschannen 8590b710bf added additional logging on the logs and log routers 2019-05-02 17:24:39 -07:00
Jingyu Zhou 8b5449e608 Fix review comments for PR #1473 2019-04-29 16:45:42 -07:00
Jingyu Zhou 5462f560e7 Add pseudo locality for log routers and tlogs
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
  popping.

Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
Jingyu Zhou 0b1984978a Small code refactoring. 2019-04-21 10:41:07 -07:00
Jingyu Zhou ec1bc5cfca Add LogSystemType enum 2019-04-21 10:41:07 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Evan Tschannen 31ed73d9f5 Ported the bug fix https://github.com/apple/foundationdb/pull/1379 to OldTLogServer_6_0 2019-04-02 15:27:37 -07:00
Evan Tschannen b6008558d3 renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen 1c6ad6d307 fix: change the location where stopped is checked, because a yield could cause cause stopped to be set after the existing check 2019-03-20 19:33:09 -07:00
Evan Tschannen 5873705228 tlog commits very rarely take an additional 6 seconds 2019-03-11 12:11:17 -07:00
Evan Tschannen 80c3f2f8e2 added status fields detailing which processes are degraded, and also the total number of degraded processes 2019-03-10 22:58:15 -07:00
Evan Tschannen 044b6b4f8a Merge branch 'master' into feature-degraded-tlog
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
Evan Tschannen 53f16b5347 when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded 2019-03-08 11:46:34 -05:00
Alex Miller c6a65389ae Remove noexcept macro and replace with BOOST_NOEXCEPT.
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
Alex Miller 2aa527c0ef Fix a bug resulting from concurrent TLog changes.
TLogServer was forked into OldTLogServer_6_0 at the same time that
3247d594 modified TLogServer, so the modification never made it into
OldTLogServer_6_0, resulting in a rare failure.

Manual code inspection revealed that there was also
78976161 that concurrently modified TLogServer, so that change was
copied to OldTLogServer_6_0 as well.
2019-03-04 01:42:38 -08:00
Alex Miller fa1bfbc0c5 Replace TLogSpillType with TLogVersion in worker and filenames. 2019-02-19 22:30:15 -08:00
Alex Miller df61bd07db Save an (unused) copy of the previous TLog. 2019-02-19 22:18:10 -08:00