Commit Graph

20382 Commits

Author SHA1 Message Date
Xiaoxi Wang 22eafcf7a2 rename trace event 2022-05-04 14:27:42 -07:00
Xiaoxi Wang ae66ed6c16 fix DataDistributionQueue time_out ; reset the rebalance poll time 2022-05-04 14:11:20 -07:00
Xiaoxi Wang a3d0b005dc reset several method use getShardMetrics 2022-05-04 00:00:03 -07:00
Xiaoxi Wang 1723bee639 add fetchTopKShardMetrics to dd tracker 2022-05-03 23:42:09 -07:00
Xiaoxi Wang d848441cdd simulate ReadSkewReadWrite spec 2022-05-03 23:39:05 -07:00
Xiaoxi Wang 7c37d172b9 solve some comments 2022-05-03 17:21:08 -07:00
Xiaoxi Wang 269d85daa8 Merge branch 'main' of https://github.com/apple/foundationdb into readaware 2022-05-03 13:37:56 -07:00
A.J. Beamon be0c7a8884
Merge pull request #7037 from sfc-gh-ajbeamon/fdbcli-generator-refactor
Move fdbcli command and hint generators into the CommandFactory
2022-05-03 12:52:29 -07:00
Hao Fu 97eb12381b
implement equals and hashCode in MappedKeyValue (#7041) 2022-05-03 12:24:26 -07:00
Johannes Scheuermann 9665786785
Merge pull request #7011 from johscheuer/add-is-present-method-sidecar
Add sidecar method to check if a file is present
2022-05-03 07:22:23 +01:00
Steve Atherton fa02f17932
Merge pull request #7050 from sfc-gh-satherton/redwood-shutdown-hang-fix
Bug fix:  Redwood shutdown would wait for pending IO success
2022-05-02 21:34:13 -07:00
Steve Atherton 9a279c24ae Bug fix: Redwood shutdown would wait for pending IO success so if any of them failed the shutdown would never complete. 2022-05-02 19:26:44 -07:00
Andrew Noyes 90dae38d04
Update RYWIterator test to match #6993 (#7046)
There's a test which checks behavior against a reference implementation,
and so the reference implementation needs to be updated as well.
2022-05-02 18:22:59 -07:00
Jingyu Zhou 05e63bc703
Fix orphaned storage server due to force recovery (#6914)
* Fix orphaned storage server due to force recovery

The force recovery can roll back the transaction that adds a storage server.
However, the storage server may now at version B > A, the recovery version.
As a result, its peek to buddy TLog won't return TLogPeekReply::popped to
trigger its exit, and instead getting a higher version C > B back. To the
storage server, this means the message is empty, thus not removing itself and
keeps peeking.

The fix is to instead of using recovery version as the popped version for the
SS, we use the recovery transaction version, which is the first transaction
after the recovery. Force recovery bumps this version to a much higher version
than the SS's version. So the TLog would set TLogPeekReply::popped to trigger
the storage server exit.

* Fix tlog peek to disallow return empty message between recoveredAt and recovery txn version

This contract today is not explicitly set and can cause storage server to fail
with assertion "rollbackVersion >= data->storageVersion()". This is because if
such an empty version is returned, SS may advance its storage version to a
value larger than the rollback version set in the recovery transaction.

The fix is to block peek reply until recovery transaction has been received.

* Move recoveryTxnReceived to be per LogData

This is because a shared TLog can have a first generation TLog which is already
setting the promise, thus later generations won't wait for the recovery version.
For the current generation, all peeks need to wait, while for older generations,
there is no need to wait (by checking if they are stopped).

* For initial commit, poppedVersion needs to be at least 2

To get rid of the previous unsuccessful recovery's recruited seed
storage servers.
2022-05-02 17:17:37 -07:00
Hao Fu fa2e85f1d3
Add comment about getMappedRange parameters (#7044) 2022-05-02 15:17:14 -07:00
Andrew Noyes 7ed82c1ac5
Mac m1 has 16k pages (#7038)
Previously the page guard implementation assumed that the page size was
4k. Also check for mmap and mprotect returning errors.
2022-05-02 14:24:43 -07:00
Andrew Noyes 0a4b364379
Fix operation_failed thrown incorrectly from transactions (#6993)
* Add a test demonstrating the issue

If you write a versionstamped value after a set, then reading throws
operation_failed.

* Treat SetVersionstampedValue as independent in coalesce and mutate
2022-05-02 13:49:42 -07:00
Rajiv Ranganath cf6e39af79 docs: add `GET_RANGE_SPLIT_POINTS`
Add `GET_RANGE_SPLIT_POINTS` instruction documentation.
2022-05-02 13:31:20 -07:00
Ray Jenkins dc9e782ccc
OpenTelemetry Tracing Perf Fixes (#6990) 2022-05-02 14:56:51 -05:00
Josh Slocum 8284ec5712 Fixing memory leak when handling FDBResult in multi version client 2022-05-02 12:56:05 -05:00
Josh Slocum 57e1b487f1 Fixing ASAN alloc-dealloc-mismatch 2022-05-02 12:56:05 -05:00
Xiaoxi Wang 69985ba251 Merge branch 'main' of https://github.com/apple/foundationdb into readaware 2022-05-02 10:53:22 -07:00
Markus Pilman d9aee5c253
Merge pull request #7012 from sfc-gh-vgasiunas/vgasiunas-upgrade-tests
Improve robustness of upgrade tests
2022-05-02 10:30:21 -06:00
Xiaodong Zhang a7a5b3e273 fix bug in tpcc workload 2022-05-02 09:28:23 -07:00
A.J. Beamon 75fc526697
Merge pull request #7020 from sfc-gh-ajbeamon/fix-dd-team-removal-health
Mark a team as unhealthy when it is removed
2022-05-02 08:45:55 -07:00
A.J. Beamon 43c2ca35a5 Move fdbcli command and hint generators into the files implementing the command. 2022-05-02 08:39:59 -07:00
Markus Pilman f5570ba49e
Merge pull request #7035 from sfc-gh-jshim/fix-token-sign-arena
Fix TokenSign copying and using uninitialized arena
2022-05-02 08:52:19 -06:00
Junhyun Shim 41d1c73b9c Fix TokenSign copying and using uninitialized arena
TokenSign was copying unused Arena held by Standalone instead of refering to it.
An Arena has to be used at least once before it holds a valid, copyable reference.
Otherwise the lifecycle of the copied Arena would be its own and not be shared with the original.
Thus, when the copied arena went out of scope,
the memory supposed to be held by returned Standalone also got released.

Fix: instead of copying, refer to Standalone's arena.
2022-05-02 09:48:43 +02:00
Jingyu Zhou 0ca9761088 Fix IDE build warnings and errors 2022-05-01 16:20:57 -07:00
Steve Atherton 74b82205df
Merge pull request #7024 from sfc-gh-etschannen/fix-cluster-id
fix: prevent a storage server from attempting to read the cluster id from itself due to a stale cache entry
2022-04-29 22:33:07 -07:00
Steve Atherton 1ab5c21967
Merge pull request #6979 from sfc-gh-jslocum/speedup_tail_latency
Don't do huge tail latencies for network requests when speed up simul…
2022-04-29 22:31:35 -07:00
Steve Atherton b7bc0a3aff
Merge pull request #6911 from sfc-gh-jslocum/min_shard_bytes_main
Decreasing MIN_SHARD_BYTES knob
2022-04-29 22:31:21 -07:00
Evan Tschannen 3bab26c01b fix: prevent a storage server from attempting to read the cluster id from itself due to a stale cache entry 2022-04-29 14:56:43 -07:00
Josh Slocum e5840d3a38 Merge branch 'main' into speedup_tail_latency 2022-04-29 16:05:12 -05:00
Steve Atherton 165d9fa6b1
Merge pull request #7013 from sfc-gh-jslocum/writeduringread_keysize_main
Fix for WriteDuringRead workload key sizes with useSystemKeys=true bu…
2022-04-29 14:01:44 -07:00
Steve Atherton 338d2304d7
Bug fix: Killing a machine process would not wait for AsyncFileNonDurable close operations to finish, causing a later reopen of the same file in a new process to hang forever. Renamed AsyncFileNonDurable::deleteFile to closeFile for clarity. Renamed Machine deletingFiles to deletingOrClosingFiles for clarity. (#7007) 2022-04-29 14:01:18 -07:00
Steve Atherton 504400c1b3
Merge pull request #7017 from sfc-gh-jslocum/tssq_cc_fix
Allow TSS failures in consistency check when fault injection is enabled
2022-04-29 14:01:06 -07:00
Josh Slocum 674e6a8fdc Merge branch 'main' into min_shard_bytes_main 2022-04-29 16:00:27 -05:00
Steve Atherton 2887429f42
Merge pull request #6991 from sfc-gh-etschannen/fix-recovery-version
fix: when more tlogs are absent than the replication factor we would access invalid memory
2022-04-29 14:00:15 -07:00
Steve Atherton 2bcbff2809
Merge pull request #6965 from sfc-gh-tclinkenbeard/increase-snapshot-lower-bound
Increase lower bound for snapshot restart tests to 7.1.0
2022-04-29 13:59:37 -07:00
Steve Atherton 2678546c00
Merge pull request #6950 from sfc-gh-jslocum/cf_delete_race
Fixing change feed deleted from multiple sources race
2022-04-29 13:58:22 -07:00
A.J. Beamon 5eedcafcfb Mark a team as unhealthy when it is removed 2022-04-29 12:40:41 -07:00
Josh Slocum 7d94b0b442 Allow TSS failures in consistency check when fault injection is enabled 2022-04-29 13:24:54 -05:00
Josh Slocum aa20eefe7b Fix for WriteDuringRead workload key sizes with useSystemKeys=true but writing to normal key space 2022-04-29 11:33:54 -05:00
Vaidas Gasiunas 1ef33db1ef Upgrade Tests: Create the destination directory before copying a client library from the local repository 2022-04-29 16:41:03 +02:00
Vaidas Gasiunas 449d5aec61 Upgrade Tests: Fix paths for accessing binaries from the local repo 2022-04-29 15:54:47 +02:00
Vaidas Gasiunas e33d7455a5 Upgrade Tests: Retry on download errors 2022-04-29 15:32:47 +02:00
Vaidas Gasiunas 27c3d7a953 Upgrade Tests: Use old binaries from the Docker image if available 2022-04-29 14:53:22 +02:00
Johannes M. Scheuermann d1c71a7903 Add sidecar method to check if a file is present 2022-04-29 13:10:05 +01:00
Vaidas Gasiunas 27d7b2e409 Upgrade Test: Avoid blocking on opening names pipes in case the tester fails to initialize 2022-04-29 14:04:27 +02:00