Commit Graph

25385 Commits

Author SHA1 Message Date
Jingyu Zhou 1eae9270ff
Merge pull request #9519 from sfc-gh-jslocum/bg_ctest_fixes 2023-03-01 10:00:24 -08:00
Hui Liu 1fd47bbe0d BlobRestore Add deepScan when describing backup 2023-03-01 09:47:59 -08:00
Josh Slocum d64e55a4a2 adding platform error to list of acceptable blob manager purge errors 2023-03-01 10:55:30 -06:00
A.J. Beamon a00cdc8396 Rename template type and variable for TraceEvent::moveTo 2023-03-01 08:43:18 -08:00
Josh Slocum 546a8879c2 Fix feed fetch and lock race 2023-03-01 10:37:54 -06:00
Josh Slocum 7dc6de7aee fixing invalid feed assertion in restarting tests 2023-03-01 10:36:41 -06:00
Junhyun Shim b6f0d2095a
Cases where newBuf == buf assertion fails have been observed in Valgrind (#9528) 2023-03-01 15:49:11 +01:00
A.J. Beamon a714c0d4cd Fix missing initialization of the err variable 2023-02-28 16:55:57 -08:00
A.J. Beamon 55b752edf1 Check transaction validity with respect to tenants even if it will fail with a conflict. This allows us to report the appropriate non-retryable error instead. 2023-02-28 15:47:00 -08:00
A.J. Beamon 2898a95c81 Fix two metacluster issues:
1. When retrying the transaction to register a restoring cluster, don't choose a new ID if the current ID matches the one recorded for the restoring cluster
2. A metacluster test was incorrectly handling the case where a transaction was retried with unknown result and had committed successfully
2023-02-28 15:40:04 -08:00
Junhyun Shim 6b26f5a6da
Fix transaction option consistency in TagThrottleInfo getter (#9513)
* Fix transaction option consistency in TagThrottleInfo getter

Subroutine of getter actor function for throttled and recommended tags
was, upon retry, resetting the transaction object which the caller also uses,
resetting the transaction option and causing a key_outside_legal_range by caller

Also, allowing a subroutine to conditionally, non-trivially modify the passed object
(i.e. transaction reset) is a risky pattern.

Fix: confine subroutine's responsibility to "attempting to" fetch and parse
"autoThrottlingEnabled" key. Let the calling function reset the object if needed.

* Apply Clang format
2023-02-28 23:47:26 +01:00
A.J. Beamon 310fc2ff4e Merge branch 'main' into transaction-debug-logging 2023-02-28 14:18:51 -08:00
Josh Slocum 61173b9b91 something else blob granule upgrade tests takes longer with more granules too, bumping timeout for now 2023-02-28 16:00:17 -06:00
Xiaoxi Wang 26237a291d update read range reply field 2023-02-28 13:18:57 -08:00
Josh Slocum 09122a9eb0 removing extra wait in blob manager failure detection if unneeded 2023-02-28 14:39:35 -06:00
Jingyu Zhou 739144da6d
Merge pull request #9432 from sfc-gh-jslocum/api_tester_more_granules
decreasing granule size knobs to enable more granules in local cluster tests
2023-02-28 12:33:01 -08:00
Jingyu Zhou 40b24c3dbb
Merge pull request #9493 from sfc-gh-jslocum/bg_delete_tenant_test
added unit test for bg tenant deletion to make sure nothing breaks
2023-02-28 12:32:37 -08:00
Jingyu Zhou 6c955080e9
Merge pull request #9207 from sfc-gh-jslocum/disable_feed_coalesce
disabling feed coalesce for now
2023-02-28 12:32:01 -08:00
Jingyu Zhou a350a929b9
Merge pull request #9494 from sfc-gh-jslocum/bg_cp_improvements
addressing review comments and fixmes in bg commit proxy code
2023-02-28 12:30:58 -08:00
Markus Pilman 8fe9d31907
Merge pull request #9516 from sfc-gh-jslocum/simkms_check
adding simulation check guard
2023-02-28 13:29:13 -07:00
Jingyu Zhou 7963e6ef69
Merge pull request #9480 from apple/sfc-gh-dadkins/tlog-queue-metrics
tlog: Measure time spent waiting for previous versions separately from actual commit time
2023-02-28 12:24:26 -08:00
Xiaoxi Wang 8cb2a1553a add read ops sampler 2023-02-28 12:03:42 -08:00
Josh Slocum f4308a0f6c Merge branch 'main' into disable_feed_coalesce 2023-02-28 13:57:21 -06:00
A.J. Beamon 87ac857aeb Make debug logging functions pure virtual on the transaction interfaces. Rename the function on TraceEvent to be more generic. 2023-02-28 11:11:06 -08:00
Markus Pilman 5bebb5b4aa
Merge pull request #9492 from sfc-gh-vgasiunas/vgasiunas-api-version-defs
Centralize definition of API Version for Java, Python and C API
2023-02-28 12:04:02 -07:00
Josh Slocum 36430d32ae adding simulation check guard 2023-02-28 12:39:20 -06:00
Josh Slocum c5e73bfd22
Blob Granule correctness fixes (#9514)
* handling new race with reassign and force purge

* handling error race causing flow lock leak
2023-02-28 12:08:07 -06:00
Josh Slocum 301f2fd201 disabling feed coalesce for now 2023-02-28 12:07:12 -06:00
Josh Slocum 39a7625152 fixing bg knobs to only be added to conf file when bg is enabled, and refactoring bg + encryption knobs 2023-02-28 11:49:12 -06:00
Lukas Joswiak 47fc53ed6e Adds more detailed mutation logging to commit proxy
The commit proxy writes a `ProxyMetrics` trace every 5 seconds. This
event contains a lot of useful information, such as the number of commit
batches that arrived and exited, the number of mutations processed, the
number of bytes those mutations made up, etc.. However, it is difficult
to tell what the workload pattern looks like within these 5 second
intervals when the metrics are being calculated.

This PR adds a new trace, `ProxyDetailedMetrics`, which logs itself
every 100ms. It currently only writes the number of mutations and the
number of mutation bytes that arrived during the 100ms time period. But
it should be easy to add more metrics in the future.

It's possible this increased logging could cause issues. Based off a
simulation run of the `WriteDuringRead` test, I got the following
results:

```
$ rg ProxyDetailedMetrics trace.json | wc -l
    6877
$ rg "Roles\": \".*CP.*\"" trace.json | wc -l
   11402
$ wc -l trace.json
   96147 trace.json
```

So on processes running as a commit proxy, this approximately doubled
the number of lines logged. But relative to the cluster overall, it only
added about 5% overhead.

If we want to reduce this number, one possibility would be to not write
a trace if all the values being written are 0. I'm not sure if this
would help much in production, but in simulation the large majority of
the traces (99%+) consist of zero values.
2023-02-28 09:48:39 -08:00
Dan Adkins 035314b277 Merge branch 'main' of github.com:apple/foundationdb into sfc-gh-dadkins/tlog-queue-metrics 2023-02-28 09:26:03 -08:00
A.J. Beamon cb66a76d80 Merge branch 'main' into transaction-debug-logging 2023-02-28 09:21:30 -08:00
Jingyu Zhou 4ea70b1f59
Merge pull request #9512 from sfc-gh-mpilman/bugfixes/remove-lockid-from-txnstatestore
Don't store lockid in txnStateStore
2023-02-28 09:16:45 -08:00
Evan Tschannen 1a87364286
Merge pull request #9498 from sfc-gh-jslocum/blobbify_purge_race_fix
waiting for known ranges in force purge to be blobbified before purging
2023-02-28 09:16:17 -08:00
Markus Pilman 04eb52bfc4
Merge pull request #9500 from sfc-gh-jslocum/bg_kms_missing_tenants
Handling missing tenants and errors in blob metadata kms fetch
2023-02-28 10:09:11 -07:00
Markus Pilman 537996b567
Merge pull request #9491 from sfc-gh-jslocum/blob_management_authz
adding authz blob management tests
2023-02-28 10:08:57 -07:00
A.J. Beamon 0abb33a9a5 Add the ability to print messages or log trace events based on a transaction's result 2023-02-28 09:06:54 -08:00
Markus Pilman f0b079bc85 Fix mockkms build with disabled go 2023-02-28 09:00:36 -07:00
Markus Pilman f07de7beb6 Fix unrelated warning 2023-02-28 08:59:18 -07:00
Josh Slocum ad5fe22f4a decreasing granule size knobs to enable more granules in local cluster tests 2023-02-28 08:57:01 -06:00
Vaidas Gasiunas 93c5147e03
Add documentation for fdb_database_get_client_status (#9471)
* Documentation for fdb_database_get_client_status

* Update documentation of fdb_database_get_client_status
2023-02-28 10:08:04 +01:00
Ata E Husain Bohra 2db1da26d9
EaR: Update ApiWorkload to validate encryption at-rest guarantees (#9466)
* EaR: Update ApiWorkload to validate encryption at-rest guarantees

Description

FDB encryption data at-rest guarantees if cluster is configured with feature
enabled, all data written to persistent disks shall be "encrypted". Given FDB
maintains multiple persistent storages during lifecycle of the data, the patch
proposes a scheme to validate the invariant via "simulation testing"

Patch proposes updating ApiCorrectness workload to do the following:
1. Client supplied params and/randomly enable the validation feature.
2. Validation when enabled, allows injecting a known "marker string"
to workload generated Key and Value data patterns.
3. On shutdown, if the validation is enabled, all test files are
scanned for the known "marker" pattern.

Simulation tests are already capable of doing the following:
1. Randomly select TenantMode (disabled/optional/required)
2. Randomly select EncryptionAtRestMode (cluster_aware/domain_aware)

Hence, the updates test all possible combinations are validated. Also,
'defaultTenant' is present to cover 'domain_aware' encryption use cases.

Testing
devRunCorrectness
devRetryCorrectness - ApiCorrectness & EncryptedBackupCorrectness
2023-02-27 21:40:46 -08:00
Markus Pilman 871a9676ea Don't store lock id in txnStateStore 2023-02-27 21:25:42 -07:00
Markus Pilman 20874d8575
Merge pull request #9502 from sfc-gh-ajbeamon/metacluster-tenant-lock-support
Metacluster tenant lock support
2023-02-27 21:19:03 -07:00
Jingyu Zhou 5ac526a3e5
Merge pull request #9474 from sfc-gh-xwang/feature/main/readaware
enable read-aware DD by default and write release notes/doc
2023-02-27 20:04:04 -08:00
Jingyu Zhou 842d485862
Merge pull request #9402 from yao-xiao-github/main
Add shard consistency validation.
2023-02-27 17:05:30 -08:00
Jingyu Zhou f414cd0ed8
Merge pull request #9486 from sfc-gh-ajbeamon/metacluster-management-concurrency-restore-support
Add restore to the metacluster management concurrency workload
2023-02-27 17:05:05 -08:00
A.J. Beamon 469e77158f Add metacluster support for tenant locking 2023-02-27 16:53:13 -08:00
Jingyu Zhou 29a406948a
Merge pull request #9370 from sfc-gh-mpilman/features/tenant-lock-fdbcli
fdbcli commands for tenant lock
2023-02-27 16:18:51 -08:00
Dan Adkins b8c9c8b0f4 Add metric for tlog commit time minus time spent waiting in the queue. 2023-02-27 15:40:22 -08:00