foundationdb

Commit Graph

Author	SHA1	Message	Date
Alex Miller	6742222084	Make TLogServer able to spill by value and by reference ...and test it in simulation, but not combined yet. It turns out that because of txsTag, we basically had to support spill-by-value anyway. Thus, if we treat all tags like txsTag when spilling and peeking, then we have an easy way to bring the two spilling types back into one implementation.	2019-10-03 01:45:10 -07:00
Alex Miller	d38a96ab73	Make LogData aware of the spill type it was created to perform. The spilling type is now pulled out of the request, and then stored on LogData for later access, and persisted in the tlog metadata per tlog generation. It turns out that serializing types as Unversioned is a bit wonky.	2019-10-03 01:45:10 -07:00
Evan Tschannen	b495cc697b	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # documentation/sphinx/source/release-notes.rst # versions.target	2019-09-13 09:25:08 -07:00
Alex Miller	53bcf41805	Fix the build.	2019-09-12 18:46:30 -07:00
Alex Miller	befa0646b3	Merge remote-tracking branch 'upstream/release-6.2' into faster-remote-dc	2019-09-12 18:46:03 -07:00
Evan Tschannen	6a7f109788	added logging on the TLog for the tag with smallest popped version	2019-09-12 16:22:01 -07:00
Alex Miller	99843bd4ba	Add parallel peek support to log routers	2019-09-12 14:26:37 -07:00
Evan Tschannen	94668c6f1f	Merge pull request #2063 from jzhou77/clang Refactor deserialization of on-wire buffer with TagsAndMessage	2019-09-09 16:34:56 -07:00
Jingyu Zhou	2d5ebebb7b	Use TagsAndMessage for deserialization in TLogServer	2019-09-05 16:53:10 -07:00
Jingyu Zhou	2723922f5f	Replace -1 as VERSION_HEADER constant for serialization	2019-09-05 12:45:39 -07:00
Meng Xu	c2355f721e	Merge branch 'master' into mengxu/performant-restore-PR	2019-09-04 17:11:42 -07:00
Jingyu Zhou	cd3f1e33d4	Refactor deserialization of TagsAndMessages Consolidate deserialization of TagsAndMessages in the structure itself and change both TLog and ServerPeekCursor to use it.	2019-09-04 14:55:05 -07:00
Evan Tschannen	24aad14f06	Merge branch 'release-6.2' # Conflicts: # CMakeLists.txt # documentation/sphinx/source/release-notes.rst # versions.target	2019-08-30 17:23:58 -07:00
Evan Tschannen	dc1d055b27	Merge pull request #2042 from senthil-ram/snap_cli_fix fix fdbcli --exec 'snapshot create.sh' failure	2019-08-30 13:40:38 -07:00
sramamoorthy	b3277f2982	Fix #2009 posix compliant args for snapshot binary	2019-08-30 12:54:09 -07:00
Andrew Noyes	6aa0ada7b1	Replace scalar root types with proper messages	2019-08-28 14:40:50 -07:00
Jingyu Zhou	4a63de16e9	Merge pull request #1945 from xumengpanda/mengxu/tLog-code-read-v2 Add comments to DiskQueue and tLog	2019-08-08 13:24:32 -07:00
Meng Xu	c9c50ceff8	Comments:Add comments to DiskQueue No functional change.	2019-08-01 15:20:01 -07:00
Meng Xu	7ccaeddf05	Merge branch 'master' into mengxu/performant-restore-PR	2019-08-01 13:23:17 -07:00
Evan Tschannen	3774ff55b0	There were still use cases where this checks are necessary	2019-07-31 17:45:21 -07:00
Evan Tschannen	854ee75664	we no longer need to special case for txs tag, because it will be initialized by createTagData	2019-07-31 17:13:15 -07:00
Evan Tschannen	ff171e293e	fix: always make sure to add txsTags to localTags for remote logs	2019-07-31 16:04:35 -07:00
Evan Tschannen	9f11f2ec53	Merge branch 'master' of github.com:apple/foundationdb	2019-07-30 16:55:56 -07:00
Evan Tschannen	aaeeb605b2	Changes to degraded can cause master recoveries, which are not supposed to happen when speedUpSimulation is true	2019-07-30 16:33:40 -07:00
Evan Tschannen	6977e7d2e8	do not return recovered version as popped for txsTags because it could cause recovery to start over optimized how buffered peek cursor discards popped data	2019-07-30 12:21:48 -07:00
Evan Tschannen	13203da199	fix: do not set the popped version of txsTag because it could be copied over at the recoveredAt version	2019-07-27 22:36:06 -07:00
Evan Tschannen	28df2c35bb	Merge pull request #1855 from alexmiller-apple/sharded-txs-safe-upgrade Make sharded txsTag upgradeable and downgradeable	2019-07-26 13:29:39 -07:00
Meng Xu	1706aaf199	Merge branch 'master' into mengxu/performant-restore-PR Fix conflict in TlogServer.actor.cpp by accepting master changes	2019-07-26 11:46:27 -07:00
sramamoorthy	9afd162e2f	remove snap v1 related code	2019-07-25 17:29:31 -07:00
Meng Xu	45083edf74	Merge branch 'master' into mengxu/performant-restore-PR Fix conflicts as well.	2019-07-25 10:46:11 -07:00
sramamoorthy	a65c9f92ed	get rid of all timeouts and other changes	2019-07-24 15:36:28 -07:00
sramamoorthy	a2f2ad96ff	code review comments and merge to master changes	2019-07-24 15:36:28 -07:00
sramamoorthy	31c010b393	few minor fixes	2019-07-24 15:36:28 -07:00
sramamoorthy	c73bdfad9f	do not pop txsTag	2019-07-24 15:36:28 -07:00
sramamoorthy	a335ed2011	includeCancelled for tLogSnapCreate	2019-07-24 15:36:28 -07:00
sramamoorthy	61cd690add	enable/disable pop req with UID mis-match to fail	2019-07-24 15:36:28 -07:00
sramamoorthy	f4e257e464	snap v2: TLog related changes	2019-07-24 15:36:28 -07:00
Evan Tschannen	6d694cc2ce	Merge pull request #1818 from alexmiller-apple/peek-cursor-timeout-bug Fix parallel peek stalling for 10min when a TLog generation is destroyed	2019-07-19 16:39:31 -07:00
Alex Miller	9863ace96c	Replace usages with intialization lists. But C++ needs a bit of help to inference though the templates.	2019-07-18 22:27:36 -07:00
Alex Miller	55258709a0	Remove an ASSERT from testing and now inaccurate comment.	2019-07-17 01:30:01 -07:00
Alex Miller	e9684a1f63	Fix issues configuring from sharded txs tag to not Which is an intermingling of what should be two commits: 1. Rely on TLogVersion instead of txsTags==0 2. Copy and index sharded txsTags between KCV and RV as txsTag when configuring log_version 4->3.	2019-07-17 01:25:09 -07:00
Alex Miller	812ce37bcd	Remove buggify and unneeded safeguards. The buggify was actually incorrect and broke an invariant, which I then fixed on the other side, but this work was actually unneeded in total. The real issue being fixed was returnIfBlock not sending an error, as well as the other error cases.	2019-07-16 15:58:02 -07:00
Alex Miller	4cc60dc9b8	Merge remote-tracking branch 'upstream/master' into peek-cursor-timeout-bug	2019-07-15 17:05:39 -07:00
Alex Miller	2cbc05fc72	Address more issues that cause peek cursors to time out. There were error cases that would cause a peek to terminate early or be cancelled without sending anything to the next peek in line. We would thus end up with the first peek in a sequence waiting on its future, and nothing that exists that would send to that future.	2019-07-15 16:03:37 -07:00
Alex Miller	c8e94e601a	Merge pull request #1729 from etschannen/feature-fast-txs-recovery Improve the recovery speed of the txnStateStore	2019-07-15 13:27:41 -07:00
Vishesh Yadav	2606794df6	Merge pull request #1812 from alexmiller-apple/improve-only-spilled Improve the behavior of parallelPeekMore+onlySpilled.	2019-07-10 17:15:19 -07:00
Evan Tschannen	d8948c8be1	Merge branch 'master' into feature-fast-txs-recovery # Conflicts: # fdbserver/TagPartitionedLogSystem.actor.cpp	2019-07-10 13:59:52 -07:00
Evan Tschannen	49121172ea	Merge pull request #1795 from alexmiller-apple/peek-from-satellites Log Routers will prefer to peek from satellite logs.	2019-07-09 17:38:57 -07:00
Alex Miller	fd769ad878	Fix parallel peek stalling for 10min when a TLog generation is destroyed. `peekTracker` was held on the Shared TLog (TLogData), whereas peeks are received and replied to as part of a TLog instance (LogData). When a peek was received on a TLog, it was registered into peekTracker along with the ReplyPromise. If the TLog was then removed as part of a no-longer-needed generation of TLogs, there is nothing left to reply to the request, but by holding onto the ReplyPromise in peekTracker, we leave the remote end with an expectation that we will reply. Then, 10min later, peekTrackerCleanup runs and finally times out the peek cursor, thus preventing FDB from being completely stuck. Now, each TLog generation has its own `peekTracker`, and when a TLog is destroyed, it times out all of the pending peek curors that are still expecting a response. This will then trigger the client to re-issue them to the next generation of TLogs, thus removing the 10min gap to do so.	2019-07-09 17:27:36 -07:00
Alex Miller	44f11702a8	Log Routers will prefer to peek from satellite logs. Formerly, they would prefer to peek from the primary's logs. Testing of a failed region rejoining the cluster revealed that this becomes quite a strain on the primary logs when extremely large volumes of peek requests are coming from the Log Routers. It happens that we have satellites that contain the same mutations with Log Router tags, that have no other peeking load, so we can prefer to use the satellite to peek rather than the primary to distribute load across TLogs better. Unfortunately, this revealed a latent bug in how tagged mutations in the KnownCommittedVersion->RecoveryVersion gap were copied across generations when the number of log router tags were decreased. Satellite TLogs would be assigned log router tags using the team-building based logic in getPushLocations(), whereas TLogs would internally re-index tags according to tag.id%logRouterTags. This mismatch would mean that we could have: Log0 -2:0 ----- -2:0 Log 0 Log1 -2:1 \ >--- -2:1,-2:0 (-2:2 mod 2 becomes -2:0) Log 1 Log2 -2:2 / And now we have data that's tagged as -2:0 on a TLog that's not the preferred location for -2:0, and therefore a BestLocationOnly cursor would miss the mutations. This was never noticed before, as we never used a satellite as a preferred location to peek from. Merge cursors always peek from all locations, and thus a peek for -2:0 that needed data from the satellites would have gone to both TLogs and merged the results. We now take this mod-based re-indexing into account when assigning which TLogs need to recover which tags from the previous generation, to make sure that tag.id%logRouterTags always results in the assigned TLog being the preferred location. Unfortunately, previously existing will potentially have existing satellites with log router tags indexed incorrectly, so this transition needs to be gated on a `log_version` transition. Old LogSets will have an old LogVersion, and we won't prefer the sattelite for peeking. Log Sets post-6.2 (opt-in) or post-6.3 (default) will be indexed correctly, and therefore we can safely offload peeking onto the satellites.	2019-07-08 22:25:01 -07:00

1 2 3 4 5 ...

346 Commits