Commit Graph

25355 Commits

Author SHA1 Message Date
Jingyu Zhou 3d9f37d1d1
Merge pull request #9530 from sfc-gh-jslocum/more_bg_correctness_fixes
More bg correctness fixes
2023-03-02 11:29:01 -08:00
Xiaoxi Wang 2dcacea7e1
Merge pull request #9517 from sfc-gh-xwang/feature/main/hotRangeDetect
Hot range detection improvement
2023-03-02 11:19:48 -08:00
Xiaoxi Wang 010d5590e3 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/hotRangeDetect 2023-03-02 10:07:17 -08:00
Jingyu Zhou de89d2cca1
Merge pull request #9521 from sfc-gh-ajbeamon/fix-metacluster-issues
Fix some metacluster issues
2023-03-02 10:07:11 -08:00
Jingyu Zhou ad1468a7b9
Merge pull request #9515 from sfc-gh-ajbeamon/transaction-debug-logging
Add the ability to attach debug printing or tracing to transactions
2023-03-02 10:06:24 -08:00
Jingyu Zhou 0dd948d41b
Merge pull request #9523 from sfc-gh-ajbeamon/transaction-not-found-for-conflicted-transactions
Check transaction validity with respect to tenants even if it will fail with a conflict
2023-03-02 10:03:09 -08:00
A.J. Beamon 557b778053 Merge branch 'main' into transaction-not-found-for-conflicted-transactions 2023-03-02 08:46:33 -08:00
A.J. Beamon 9abca2b3de Merge branch 'main' into fix-metacluster-issues 2023-03-02 08:45:15 -08:00
A.J. Beamon d7fe58b0ca Merge branch 'main' into transaction-debug-logging 2023-03-02 08:37:26 -08:00
Jingyu Zhou 84e0d19a72
Merge pull request #9532 from yao-xiao-github/main 2023-03-02 07:03:12 -08:00
Xiaoxi Wang 314f86c668 fix invert range bug 2023-03-01 22:41:50 -08:00
Xiaoxi Wang 2d78b126f6 rename splitCount to chunkCount 2023-03-01 21:51:51 -08:00
Xiaoxi Wang c7e2eae88c add unit test for new readHotDetect implementation and add comments 2023-03-01 21:42:22 -08:00
Yao Xiao 2c10e5401e Disable empty range check in unit tests. 2023-03-01 16:11:03 -08:00
Xiaoxi Wang 179f0ba71c new version of getReadHotRanges 2023-03-01 15:55:29 -08:00
A.J. Beamon 533f83b05e Fix a few more issues in metacluster code and tests:
1. Some additional idempotency problems in metacluster tests
2. An assertion that checked that a rename had expected values could fail during concurrent restores, but it would only happen if the transaction itself would fail to commit
3. Tweak the parameters of the MetaclusterRecovery test to try to avoid rare cases of logging too many trace events
2023-03-01 15:31:36 -08:00
Hui Liu fd561bd36a
Merge pull request #9503 from sfc-gh-huliu/fix
BlobRestore - add deepScan option when describing backup
2023-03-01 14:20:53 -08:00
A.J. Beamon 3367cfabce Merge branch 'main' into transaction-not-found-for-conflicted-transactions 2023-03-01 10:12:30 -08:00
A.J. Beamon a8b9197d7e Merge branch 'main' into fix-metacluster-issues 2023-03-01 10:11:09 -08:00
A.J. Beamon 544890a6cd Merge branch 'main' into transaction-debug-logging 2023-03-01 10:09:17 -08:00
Jingyu Zhou 1eae9270ff
Merge pull request #9519 from sfc-gh-jslocum/bg_ctest_fixes 2023-03-01 10:00:24 -08:00
Hui Liu 1fd47bbe0d BlobRestore Add deepScan when describing backup 2023-03-01 09:47:59 -08:00
Josh Slocum d64e55a4a2 adding platform error to list of acceptable blob manager purge errors 2023-03-01 10:55:30 -06:00
A.J. Beamon a00cdc8396 Rename template type and variable for TraceEvent::moveTo 2023-03-01 08:43:18 -08:00
Josh Slocum 546a8879c2 Fix feed fetch and lock race 2023-03-01 10:37:54 -06:00
Josh Slocum 7dc6de7aee fixing invalid feed assertion in restarting tests 2023-03-01 10:36:41 -06:00
Junhyun Shim b6f0d2095a
Cases where newBuf == buf assertion fails have been observed in Valgrind (#9528) 2023-03-01 15:49:11 +01:00
A.J. Beamon a714c0d4cd Fix missing initialization of the err variable 2023-02-28 16:55:57 -08:00
A.J. Beamon 55b752edf1 Check transaction validity with respect to tenants even if it will fail with a conflict. This allows us to report the appropriate non-retryable error instead. 2023-02-28 15:47:00 -08:00
A.J. Beamon 2898a95c81 Fix two metacluster issues:
1. When retrying the transaction to register a restoring cluster, don't choose a new ID if the current ID matches the one recorded for the restoring cluster
2. A metacluster test was incorrectly handling the case where a transaction was retried with unknown result and had committed successfully
2023-02-28 15:40:04 -08:00
Junhyun Shim 6b26f5a6da
Fix transaction option consistency in TagThrottleInfo getter (#9513)
* Fix transaction option consistency in TagThrottleInfo getter

Subroutine of getter actor function for throttled and recommended tags
was, upon retry, resetting the transaction object which the caller also uses,
resetting the transaction option and causing a key_outside_legal_range by caller

Also, allowing a subroutine to conditionally, non-trivially modify the passed object
(i.e. transaction reset) is a risky pattern.

Fix: confine subroutine's responsibility to "attempting to" fetch and parse
"autoThrottlingEnabled" key. Let the calling function reset the object if needed.

* Apply Clang format
2023-02-28 23:47:26 +01:00
A.J. Beamon 310fc2ff4e Merge branch 'main' into transaction-debug-logging 2023-02-28 14:18:51 -08:00
Josh Slocum 61173b9b91 something else blob granule upgrade tests takes longer with more granules too, bumping timeout for now 2023-02-28 16:00:17 -06:00
Xiaoxi Wang 26237a291d update read range reply field 2023-02-28 13:18:57 -08:00
Josh Slocum 09122a9eb0 removing extra wait in blob manager failure detection if unneeded 2023-02-28 14:39:35 -06:00
Jingyu Zhou 739144da6d
Merge pull request #9432 from sfc-gh-jslocum/api_tester_more_granules
decreasing granule size knobs to enable more granules in local cluster tests
2023-02-28 12:33:01 -08:00
Jingyu Zhou 40b24c3dbb
Merge pull request #9493 from sfc-gh-jslocum/bg_delete_tenant_test
added unit test for bg tenant deletion to make sure nothing breaks
2023-02-28 12:32:37 -08:00
Jingyu Zhou 6c955080e9
Merge pull request #9207 from sfc-gh-jslocum/disable_feed_coalesce
disabling feed coalesce for now
2023-02-28 12:32:01 -08:00
Jingyu Zhou a350a929b9
Merge pull request #9494 from sfc-gh-jslocum/bg_cp_improvements
addressing review comments and fixmes in bg commit proxy code
2023-02-28 12:30:58 -08:00
Markus Pilman 8fe9d31907
Merge pull request #9516 from sfc-gh-jslocum/simkms_check
adding simulation check guard
2023-02-28 13:29:13 -07:00
Jingyu Zhou 7963e6ef69
Merge pull request #9480 from apple/sfc-gh-dadkins/tlog-queue-metrics
tlog: Measure time spent waiting for previous versions separately from actual commit time
2023-02-28 12:24:26 -08:00
Xiaoxi Wang 8cb2a1553a add read ops sampler 2023-02-28 12:03:42 -08:00
Josh Slocum f4308a0f6c Merge branch 'main' into disable_feed_coalesce 2023-02-28 13:57:21 -06:00
A.J. Beamon 87ac857aeb Make debug logging functions pure virtual on the transaction interfaces. Rename the function on TraceEvent to be more generic. 2023-02-28 11:11:06 -08:00
Markus Pilman 5bebb5b4aa
Merge pull request #9492 from sfc-gh-vgasiunas/vgasiunas-api-version-defs
Centralize definition of API Version for Java, Python and C API
2023-02-28 12:04:02 -07:00
Josh Slocum 36430d32ae adding simulation check guard 2023-02-28 12:39:20 -06:00
Josh Slocum c5e73bfd22
Blob Granule correctness fixes (#9514)
* handling new race with reassign and force purge

* handling error race causing flow lock leak
2023-02-28 12:08:07 -06:00
Josh Slocum 301f2fd201 disabling feed coalesce for now 2023-02-28 12:07:12 -06:00
Josh Slocum 39a7625152 fixing bg knobs to only be added to conf file when bg is enabled, and refactoring bg + encryption knobs 2023-02-28 11:49:12 -06:00
Lukas Joswiak 47fc53ed6e Adds more detailed mutation logging to commit proxy
The commit proxy writes a `ProxyMetrics` trace every 5 seconds. This
event contains a lot of useful information, such as the number of commit
batches that arrived and exited, the number of mutations processed, the
number of bytes those mutations made up, etc.. However, it is difficult
to tell what the workload pattern looks like within these 5 second
intervals when the metrics are being calculated.

This PR adds a new trace, `ProxyDetailedMetrics`, which logs itself
every 100ms. It currently only writes the number of mutations and the
number of mutation bytes that arrived during the 100ms time period. But
it should be easy to add more metrics in the future.

It's possible this increased logging could cause issues. Based off a
simulation run of the `WriteDuringRead` test, I got the following
results:

```
$ rg ProxyDetailedMetrics trace.json | wc -l
    6877
$ rg "Roles\": \".*CP.*\"" trace.json | wc -l
   11402
$ wc -l trace.json
   96147 trace.json
```

So on processes running as a commit proxy, this approximately doubled
the number of lines logged. But relative to the cluster overall, it only
added about 5% overhead.

If we want to reduce this number, one possibility would be to not write
a trace if all the values being written are 0. I'm not sure if this
would help much in production, but in simulation the large majority of
the traces (99%+) consist of zero values.
2023-02-28 09:48:39 -08:00