Commit Graph

12377 Commits

Author SHA1 Message Date
Josh Slocum 0c9397ef22 BW metric improvements for reads and file blocking 2023-03-02 10:57:51 -06:00
A.J. Beamon 557b778053 Merge branch 'main' into transaction-not-found-for-conflicted-transactions 2023-03-02 08:46:33 -08:00
A.J. Beamon 9abca2b3de Merge branch 'main' into fix-metacluster-issues 2023-03-02 08:45:15 -08:00
Xiaoxi Wang 314f86c668 fix invert range bug 2023-03-01 22:41:50 -08:00
Xiaoxi Wang 2d78b126f6 rename splitCount to chunkCount 2023-03-01 21:51:51 -08:00
Xiaoxi Wang c7e2eae88c add unit test for new readHotDetect implementation and add comments 2023-03-01 21:42:22 -08:00
Xiaoxi Wang 179f0ba71c new version of getReadHotRanges 2023-03-01 15:55:29 -08:00
A.J. Beamon 533f83b05e Fix a few more issues in metacluster code and tests:
1. Some additional idempotency problems in metacluster tests
2. An assertion that checked that a rename had expected values could fail during concurrent restores, but it would only happen if the transaction itself would fail to commit
3. Tweak the parameters of the MetaclusterRecovery test to try to avoid rare cases of logging too many trace events
2023-03-01 15:31:36 -08:00
Hui Liu fd561bd36a
Merge pull request #9503 from sfc-gh-huliu/fix
BlobRestore - add deepScan option when describing backup
2023-03-01 14:20:53 -08:00
A.J. Beamon 3367cfabce Merge branch 'main' into transaction-not-found-for-conflicted-transactions 2023-03-01 10:12:30 -08:00
A.J. Beamon a8b9197d7e Merge branch 'main' into fix-metacluster-issues 2023-03-01 10:11:09 -08:00
Hui Liu 1fd47bbe0d BlobRestore Add deepScan when describing backup 2023-03-01 09:47:59 -08:00
Josh Slocum d64e55a4a2 adding platform error to list of acceptable blob manager purge errors 2023-03-01 10:55:30 -06:00
Josh Slocum 546a8879c2 Fix feed fetch and lock race 2023-03-01 10:37:54 -06:00
Josh Slocum 7dc6de7aee fixing invalid feed assertion in restarting tests 2023-03-01 10:36:41 -06:00
A.J. Beamon 55b752edf1 Check transaction validity with respect to tenants even if it will fail with a conflict. This allows us to report the appropriate non-retryable error instead. 2023-02-28 15:47:00 -08:00
A.J. Beamon 2898a95c81 Fix two metacluster issues:
1. When retrying the transaction to register a restoring cluster, don't choose a new ID if the current ID matches the one recorded for the restoring cluster
2. A metacluster test was incorrectly handling the case where a transaction was retried with unknown result and had committed successfully
2023-02-28 15:40:04 -08:00
Xiaoxi Wang 26237a291d update read range reply field 2023-02-28 13:18:57 -08:00
Josh Slocum 09122a9eb0 removing extra wait in blob manager failure detection if unneeded 2023-02-28 14:39:35 -06:00
Jingyu Zhou 40b24c3dbb
Merge pull request #9493 from sfc-gh-jslocum/bg_delete_tenant_test
added unit test for bg tenant deletion to make sure nothing breaks
2023-02-28 12:32:37 -08:00
Jingyu Zhou a350a929b9
Merge pull request #9494 from sfc-gh-jslocum/bg_cp_improvements
addressing review comments and fixmes in bg commit proxy code
2023-02-28 12:30:58 -08:00
Markus Pilman 8fe9d31907
Merge pull request #9516 from sfc-gh-jslocum/simkms_check
adding simulation check guard
2023-02-28 13:29:13 -07:00
Jingyu Zhou 7963e6ef69
Merge pull request #9480 from apple/sfc-gh-dadkins/tlog-queue-metrics
tlog: Measure time spent waiting for previous versions separately from actual commit time
2023-02-28 12:24:26 -08:00
Xiaoxi Wang 8cb2a1553a add read ops sampler 2023-02-28 12:03:42 -08:00
Josh Slocum 36430d32ae adding simulation check guard 2023-02-28 12:39:20 -06:00
Josh Slocum c5e73bfd22
Blob Granule correctness fixes (#9514)
* handling new race with reassign and force purge

* handling error race causing flow lock leak
2023-02-28 12:08:07 -06:00
Lukas Joswiak 47fc53ed6e Adds more detailed mutation logging to commit proxy
The commit proxy writes a `ProxyMetrics` trace every 5 seconds. This
event contains a lot of useful information, such as the number of commit
batches that arrived and exited, the number of mutations processed, the
number of bytes those mutations made up, etc.. However, it is difficult
to tell what the workload pattern looks like within these 5 second
intervals when the metrics are being calculated.

This PR adds a new trace, `ProxyDetailedMetrics`, which logs itself
every 100ms. It currently only writes the number of mutations and the
number of mutation bytes that arrived during the 100ms time period. But
it should be easy to add more metrics in the future.

It's possible this increased logging could cause issues. Based off a
simulation run of the `WriteDuringRead` test, I got the following
results:

```
$ rg ProxyDetailedMetrics trace.json | wc -l
    6877
$ rg "Roles\": \".*CP.*\"" trace.json | wc -l
   11402
$ wc -l trace.json
   96147 trace.json
```

So on processes running as a commit proxy, this approximately doubled
the number of lines logged. But relative to the cluster overall, it only
added about 5% overhead.

If we want to reduce this number, one possibility would be to not write
a trace if all the values being written are 0. I'm not sure if this
would help much in production, but in simulation the large majority of
the traces (99%+) consist of zero values.
2023-02-28 09:48:39 -08:00
Dan Adkins 035314b277 Merge branch 'main' of github.com:apple/foundationdb into sfc-gh-dadkins/tlog-queue-metrics 2023-02-28 09:26:03 -08:00
Jingyu Zhou 4ea70b1f59
Merge pull request #9512 from sfc-gh-mpilman/bugfixes/remove-lockid-from-txnstatestore
Don't store lockid in txnStateStore
2023-02-28 09:16:45 -08:00
Evan Tschannen 1a87364286
Merge pull request #9498 from sfc-gh-jslocum/blobbify_purge_race_fix
waiting for known ranges in force purge to be blobbified before purging
2023-02-28 09:16:17 -08:00
Markus Pilman 04eb52bfc4
Merge pull request #9500 from sfc-gh-jslocum/bg_kms_missing_tenants
Handling missing tenants and errors in blob metadata kms fetch
2023-02-28 10:09:11 -07:00
Markus Pilman 537996b567
Merge pull request #9491 from sfc-gh-jslocum/blob_management_authz
adding authz blob management tests
2023-02-28 10:08:57 -07:00
Markus Pilman f07de7beb6 Fix unrelated warning 2023-02-28 08:59:18 -07:00
Xiaoxi Wang 613391f40f Make DDQueue and DDShardTracker reference counted 2023-02-27 22:36:31 -08:00
Ata E Husain Bohra 2db1da26d9
EaR: Update ApiWorkload to validate encryption at-rest guarantees (#9466)
* EaR: Update ApiWorkload to validate encryption at-rest guarantees

Description

FDB encryption data at-rest guarantees if cluster is configured with feature
enabled, all data written to persistent disks shall be "encrypted". Given FDB
maintains multiple persistent storages during lifecycle of the data, the patch
proposes a scheme to validate the invariant via "simulation testing"

Patch proposes updating ApiCorrectness workload to do the following:
1. Client supplied params and/randomly enable the validation feature.
2. Validation when enabled, allows injecting a known "marker string"
to workload generated Key and Value data patterns.
3. On shutdown, if the validation is enabled, all test files are
scanned for the known "marker" pattern.

Simulation tests are already capable of doing the following:
1. Randomly select TenantMode (disabled/optional/required)
2. Randomly select EncryptionAtRestMode (cluster_aware/domain_aware)

Hence, the updates test all possible combinations are validated. Also,
'defaultTenant' is present to cover 'domain_aware' encryption use cases.

Testing
devRunCorrectness
devRetryCorrectness - ApiCorrectness & EncryptedBackupCorrectness
2023-02-27 21:40:46 -08:00
Markus Pilman 20874d8575
Merge pull request #9502 from sfc-gh-ajbeamon/metacluster-tenant-lock-support
Metacluster tenant lock support
2023-02-27 21:19:03 -07:00
Jingyu Zhou 842d485862
Merge pull request #9402 from yao-xiao-github/main
Add shard consistency validation.
2023-02-27 17:05:30 -08:00
Jingyu Zhou f414cd0ed8
Merge pull request #9486 from sfc-gh-ajbeamon/metacluster-management-concurrency-restore-support
Add restore to the metacluster management concurrency workload
2023-02-27 17:05:05 -08:00
A.J. Beamon 469e77158f Add metacluster support for tenant locking 2023-02-27 16:53:13 -08:00
Dan Adkins b8c9c8b0f4 Add metric for tlog commit time minus time spent waiting in the queue. 2023-02-27 15:40:22 -08:00
Dan Adkins 37b6804f88 Add metric for queue wait time in tlog. 2023-02-27 15:40:22 -08:00
Jingyu Zhou d59b341108
Merge pull request #9501 from jzhou77/fix
Fix a memory bug in ClogTlog workload
2023-02-27 13:46:55 -08:00
Russell Sears bcc05b1058 Improve support for prebuilt boost 2023-02-27 15:38:58 -06:00
Junhyun Shim b811881f41
Allow unthrottled, unsuppressed traces for security-related events (#9459)
* Define API for unsuppressable TraceEvent types

Add trace checking tests for authz trace events

* Revert temporary configurations used for debugging

* Simplify/Modernize flow audit logging API

- Do event type whitelist checks at compile time
- Use ""_audit literal API instead of a tag struct
- Replace int with a lightweight struct for tracking/modifying TraceEvent enablement

* Revert installing signal handler for SIGTERM and refactor test script

Move trace checker to local_cluster.py

* Lengthen public key refresh interval and add more audited events

* Try and make MSVC and Mac build happy

* consteval > constexpr

'inline consteval' still causes link errors in Mac builds
2023-02-27 21:51:13 +01:00
Steve Atherton 80eb84de3c
Merge pull request #9488 from sfc-gh-satherton/redwood-first-commit-record
Fix correctness failures in encryption enforcement related to Redwood instances killed during initialization.
2023-02-27 12:10:02 -08:00
Jingyu Zhou 1f9b44480e Fix a memory bug in ClogTlog workload 2023-02-27 11:39:50 -08:00
Josh Slocum 48bcf2b25f adding code probes 2023-02-27 13:12:51 -06:00
Josh Slocum 1ef6971f27 Handling missing tenants and errors in blob metadata fetch 2023-02-27 13:07:54 -06:00
Josh Slocum 411ff6831d adding missing tenant or error injection to blob metadata kms fetch 2023-02-27 12:08:59 -06:00
Josh Slocum 5dcf236296 waiting for known ranges in force purge to be blobbified before purging 2023-02-27 11:35:35 -06:00
Josh Slocum 716a9c3817 addressing review comments and fixmes in bg commit proxy code 2023-02-27 10:51:47 -06:00
Josh Slocum 4fa528f3fa added unit test for bg tenant deletion 2023-02-27 10:21:26 -06:00
Josh Slocum 1e7368aa21 removing double counted counter 2023-02-27 10:18:50 -06:00
Josh Slocum faf8ec6281 adding authz blob management tests to ensure tenant-scoped clients cannot do blob management 2023-02-27 08:06:36 -06:00
Steve Atherton cad7596e2d Apply clang-format. 2023-02-27 01:07:35 -08:00
Steve Atherton 674c105050
Merge pull request #9473 from sfc-gh-etschannen/feature-change-feed-lock
Replace fetchKeysParallelismFullLock to speed up fetch keys in idle clusters
2023-02-26 23:18:43 -08:00
Steve Atherton 01f89d1245 Merge commit '97c355085c0e22e727a5c3e35ace926a58b16f5d' into redwood-first-commit-record 2023-02-26 21:19:17 -08:00
Steve Atherton 92bfa1d578 Throw an error if reopened Redwood file is recovered but to a point prior to when the BTree was created if the Encryption Mode of the cluster is not known. 2023-02-26 21:15:48 -08:00
Steve Atherton 29444252fc Prevent Redwood Pager from recovering to a point before the user commit record (ie the BTree CommitRecord) was initialized as this isn't a useful recovery point. 2023-02-26 01:38:24 -08:00
A.J. Beamon b5b7d51213 Fix maybe committed issue in tenant lock test 2023-02-25 20:24:16 -08:00
A.J. Beamon 846b3c731d Update metacluster consistency check to handle tenants that are in the process of being renamed correctly 2023-02-25 19:52:07 -08:00
A.J. Beamon 040d44927b Store the rename destination for tenant movements in the crluster tenant index with an ID of -1. Use this to filter out tenant aliases when modifying the tenant count during a tenant purge. 2023-02-25 19:51:21 -08:00
A.J. Beamon fddec2b573 Add restore support to the metacluster management concurrency workload 2023-02-25 19:27:04 -08:00
A.J. Beamon 8c3ee768a2 Add an option to allow exceeding the tenant group capacity limit when changing tenant configuration 2023-02-24 21:01:36 -08:00
A.J. Beamon 1c71056e26
Merge pull request #9479 from sfc-gh-nwijetunga/nim/enforce-metacluster-tenant-mode
Enforce Disabled Tenant Mode in Metacluster
2023-02-24 19:27:57 -08:00
Jingyu Zhou 6b121de6a6
Merge pull request #9464 from jzhou77/fix
Add exclude to fdbcli's configure command
2023-02-24 16:31:02 -08:00
Nim Wijetunga c8b7cff10c fix api test 2023-02-24 15:26:59 -08:00
Josh Slocum 6187811f71
Reworking getBlobGranuleRanges to also use commit proxy rpc for authz, and adding test (#9470) 2023-02-24 17:15:32 -06:00
Jingyu Zhou 204b31bc87
Merge pull request #9472 from sfc-gh-ajbeamon/fix-invalid-iterator-use
Fix memory error from using invalidated iterator
2023-02-24 14:55:28 -08:00
Nim Wijetunga eca98afcb0 metacluster check tenant mode 2023-02-24 13:59:54 -08:00
Evan Tschannen 8872e5a462
Merge pull request #9347 from sfc-gh-etschannen/feature-change-feed-cache
added a disk to blob workers
2023-02-24 13:59:03 -08:00
Josh Slocum 17bcbaeb03
add tenant reopen to ensure no races with tenant opening (#9476) 2023-02-24 15:34:57 -06:00
Dan Adkins e3a61b9b22
Add metrics to understand tail commit latency (#9435)
* Add server-side latency metrics for Resolver requests.

* Add separate resolver latency metrics for queue wait and compute time.

* Add histogram for queue depth observed on resolver (during metrics interval).

* Fix tlog latency measurement to use timer() instead of now().
2023-02-24 14:13:12 -05:00
Evan Tschannen f3673d808b Replaced the fetchKeysParallelismFullLock with a lock specifically for change feeds to avoid blocking fetches on idle clusters 2023-02-24 10:59:35 -08:00
A.J. Beamon 2d936acbdd Fix memory error from using invalidated iterator 2023-02-24 10:25:38 -08:00
A.J. Beamon 03fbc59bb1
Merge pull request #9461 from sfc-gh-ajbeamon/metacluster-concurrent-restore-testing
Metacluster concurrent restore testing
2023-02-24 09:13:51 -08:00
Josh Slocum 910965a5a6
Adding additional blob granule authz tests (#9443)
* added granule location authz tests

* added authz test for blob worker endpoint

* addressing comments

* fixing ide build
2023-02-24 09:32:05 -06:00
A.J. Beamon 344f6977c9 Restores can now throw a cluster_already_exists error in the metacluster management workload if we timeout a restore and have to retry 2023-02-23 20:32:18 -08:00
Nim Wijetunga 29819b0645
Change Feed Bug Fix + Encryption Asserts (#9457)
* add encryption asserts

* modify function name

* address pr comments

* address pr comments

* Trigger Build
2023-02-23 19:33:25 -08:00
Xiaoxi Wang 7e14324cc1 remove dataDistributionQueue actor and replace it with class DDQueue 2023-02-23 17:15:33 -08:00
A.J. Beamon 2b25cfef8b Merge branch 'main' into metacluster-concurrent-restore-testing 2023-02-23 16:06:47 -08:00
Jingyu Zhou 0b2e02c402 Fix rare test failures
Unclog after DB is recovered, otherwise another recovery may become stuck again.
2023-02-23 15:42:33 -08:00
Xiaoxi Wang 9cda7ebe5a encapsulate DDQueue constructor params into DDQueueInitParams 2023-02-23 15:38:48 -08:00
Jon Fu 33f8e90f9f
Split tenant group metadata (#9446)
* initial commit to split tenant group metadata

* attempt to fix merge errors

* fix compile errors and adjust existing tests

* fix infinite loop and extra ACTOR tag

* direct assignment instead of store

* direct assign instead of store (missed a few)
2023-02-23 18:11:49 -05:00
A.J. Beamon 3ac7e17b79 Fix create tenant usage in tenant management workload 2023-02-23 15:08:52 -08:00
Jingyu Zhou 65443b6541 Fix compiling errors 2023-02-23 15:02:44 -08:00
Jingyu Zhou ecae81882c Change to only clog once for a particular tlog
If we repeat clogging, different tlogs may be excluded, which can cause the
recovery to stuck.
2023-02-23 14:31:39 -08:00
Jingyu Zhou 6055f752c2 Exclude failed tlog if recovery stuck more than 30s
Because the tlog is clogged, recovery can stuck in initializing_transaction_servers.
This exclude allows the recovery to complete.
2023-02-23 14:31:32 -08:00
Jingyu Zhou c4773b7cc8 Update clogTlog workload to be single region 2023-02-23 14:31:24 -08:00
Jingyu Zhou 955826f2fe Add ClogTlog workload 2023-02-23 14:31:12 -08:00
Jingyu Zhou 792950dbdc
Merge pull request #9434 from sfc-gh-huliu/splitmetrics
Implement SplitMetric pagination in blob migrator
2023-02-23 14:10:27 -08:00
A.J. Beamon b828f3f257 Add missing change to explicit TenantMapEntry conversion 2023-02-23 13:38:04 -08:00
A.J. Beamon 06fe00544a Remove TenantMapEntry <-> MetaclusterTenantMapEntry conversion constructors and use named functions instead 2023-02-23 13:28:10 -08:00
A.J. Beamon dcae48cbbd Add concurrent restore testing to the metacluster restore workload 2023-02-23 13:27:20 -08:00
Xiaoxi Wang cdfaf7d4e0 move DDRelocationQueue to header file 2023-02-23 13:12:38 -08:00
A.J. Beamon e151a2d363
Merge pull request #9451 from sfc-gh-ajbeamon/metacluster-management-workload-restore-support
Improve restore support in the metacluster management workload
2023-02-23 13:10:31 -08:00
Markus Pilman efc5bf9ee8
Merge pull request #9456 from sfc-gh-ajbeamon/smaller-tenant-in-txn-state-store
Store a smaller tenant object in the txn state store
2023-02-23 14:00:12 -07:00
A.J. Beamon 9e9a31c0f1 Use error variable consistently 2023-02-23 11:27:53 -08:00
Xiaoxi Wang 021f7c1b43 remove unused headers and outdated comments 2023-02-23 11:00:21 -08:00
Xiaoxi Wang a6feafc371 remove dataDistributionTracker actor and use DataDistributionTracker class instead 2023-02-23 10:18:28 -08:00
Evan Tschannen cf3a4e6161 Merge branch 'main' into feature-change-feed-cache 2023-02-23 10:16:13 -08:00
Evan Tschannen a581a55452 ensure a worker cannot run multiple blob worker roles 2023-02-23 09:51:26 -08:00
A.J. Beamon dd650215d4 Store a smaller tenant object in the txn state store 2023-02-23 09:29:33 -08:00
Ata E Husain Bohra 7d079690d4 Merge branch 'main' into ahusain-misc-fixes 2023-02-22 18:11:11 -08:00
A.J. Beamon a76af5c696 Improve restore support in the metacluster management workload 2023-02-22 17:42:36 -08:00
Ata E Husain Bohra 1f7ee9437f EaR: RESTClient and EKP changes to handle unreachable external KMS
Description

Two major changes proposed are:

I)
Used following setup for testing:
1. Run `fdbserver` locally.
2. Run a mock python based HTTP server (encryption endpoints not implemented)

Expectation was RESTClient code should go in loop trying to establish connection
to the desired encryption endpoint. However, observation was the code loops for
one cycle and followup cycle SEGV while printing a log using RESTUrl object which
is obtained as a 'pointer' from the caller. Update the code to use RESTUrl object
instead of the pointer.

II) In above setup, KMSConnector would throw 'encrypt_key_fetch_failed' error
which wasn't handled by EKProxy, hence, causing the service to terminate. Add
code to re-throw the error to the caller.

Testing
2023-02-22 17:15:34 -08:00
A.J. Beamon 9b906d9b3d
Merge pull request #9447 from sfc-gh-ajbeamon/metacluster-restore-fixes
Metacluster restore fixes
2023-02-22 17:07:19 -08:00
Hui Liu 0fba65a3cd Implement SplitMetric pagination in blob migrator 2023-02-22 16:00:49 -08:00
Xiaoxi Wang 35a3d2f15f Merge branch 'main' of https://github.com/apple/foundationdb into fix/main/anyExisted 2023-02-22 15:36:37 -08:00
Xiaoxi Wang 16404c5bdc make DataDistributionTracker constructor use DataDistributionTrackerInitParams 2023-02-22 15:36:15 -08:00
A.J. Beamon 87cb21be06
Merge pull request #9310 from sfc-gh-mpilman/features/tenant-lock2
Tenant Lock
2023-02-22 15:18:14 -08:00
A.J. Beamon 33431f062d Add some trace events, use a more appropriate error, and improve a check of allocated tenant groups 2023-02-22 14:39:51 -08:00
Markus Pilman 4e31cd7582 Fix compilation error due to TenantLockState being moved to a different namespace 2023-02-22 15:21:47 -07:00
A.J. Beamon a5a8c57a38 Fix some merge issues and missing updates to use new boolean parameters 2023-02-22 12:52:58 -08:00
A.J. Beamon f06ed85044 Update the tenant and metacluster consistency checks to account for the possibility of renamed tenants or partially registered clusters 2023-02-22 12:29:32 -08:00
A.J. Beamon ec79ecce73 Add a boolean parameter for ForceRemove; rename ForceJoinNewMetacluster to ForceJoin 2023-02-22 12:26:24 -08:00
Markus Pilman 8695fc15fc Merge remote-tracking branch 'origin/main' into features/tenant-lock2 2023-02-22 13:12:23 -07:00
A.J. Beamon 006a2ead6f Merge branch 'main' into check-metacluster-restore-dryrun 2023-02-22 11:16:45 -08:00
A.J. Beamon 935cd70600 Use MetaclusterData and TenantData in the MetaclusterConsistencyCheck and TenantConsistencyCheck 2023-02-22 11:04:42 -08:00
Josh Slocum 9cd0f32e87
Fixing several metrics issues in blob workers (#9426)
* fixing int vs int64 data type, and fixing cause of incorrect request counter

* fixing incorrect count of mutation bytes buffered on granule cancel
2023-02-22 11:08:12 -06:00
Steve Atherton 23df46773d
Merge pull request #9422 from sfc-gh-satherton/client-read-options
Add transaction option definitions for read priority and read cache
2023-02-22 09:00:25 -08:00
Josh Slocum 354bfef099
Added blob granules authz test, and fixed a bug it found (#9433)
* Adding blob granules authz test

* Fixing tenant race found by authz test
2023-02-22 10:53:45 -06:00
A.J. Beamon cb57541c98 Add testing that the metacluster restore dry-run mode doesn't change anything 2023-02-22 08:41:16 -08:00
Steve Atherton a21b2fe9f9 Simplified read priority option to three separate options for normal/low/high priority. 2023-02-21 22:48:38 -08:00
Steve Atherton 5969616af8 Merge commit '6de85e7cd8e9dd74a571de9e04679e669bcbb5b6' into client-read-options 2023-02-21 20:46:20 -08:00
Yao Xiao 36d79e0d2d resolve comments 2023-02-21 15:40:44 -08:00
Jon Fu ff7174065f Merge branch 'main' of github.com:apple/foundationdb into split-tenant-metadata 2023-02-21 14:16:13 -08:00
Steve Atherton bb4fb3d81d
Merge pull request #9419 from sfc-gh-satherton/page-rebuild-fix
Optimize/fix node rebuild vs update trigger in Redwood
2023-02-21 13:49:14 -08:00
Evan Tschannen 9047edd14f added a comment 2023-02-21 13:24:17 -08:00
Josh Slocum 33c0b35ee6
No RK throttling on blob workers if no blob ranges (#9425) 2023-02-21 15:23:40 -06:00
Markus Pilman 15d8548c0e Merge remote-tracking branch 'origin/main' into features/tenant-lock2
# Conflicts:
#	fdbserver/ApplyMetadataMutation.cpp
#	fdbserver/storageserver.actor.cpp
2023-02-21 13:39:35 -07:00
Evan Tschannen 8129381689 merge in main 2023-02-21 12:06:35 -08:00
Jon Fu 2d74d10a91 use MetaclusterAPI namespace over TenantAPI for TenantState 2023-02-21 11:33:36 -08:00
Josh Slocum 958f3b531b
Plumbing blob worker mapping through commit proxy like storage server (#9401)
* Plumbing blob worker mapping through commit proxy like storage server mapping

* review comments

* formatting
2023-02-21 13:21:44 -06:00
Jon Fu c0b332501b update code to match style 2023-02-21 11:17:25 -08:00
Hui Liu 2eeb29132c
Merge pull request #9413 from sfc-gh-huliu/counters
Add stats counters for manifest dump
2023-02-21 11:13:36 -08:00
Jon Fu f27c3e24a8 fix code formatting 2023-02-21 11:09:14 -08:00
Jingyu Zhou 81f8c360db
Merge pull request #8811 from sfc-gh-tclinkenbeard/expose-tag-throttled-duration 2023-02-21 10:47:35 -08:00
Jon Fu 9e01cffef0 fix some merge conflicts and address review comments 2023-02-21 10:29:36 -08:00
Jon Fu 428eb07766 Merge branch 'main' of github.com:apple/foundationdb into split-tenant-metadata 2023-02-21 10:11:35 -08:00
Junhyun Shim 2497aa5701
Clamp GetKeyServerLocations result to tenant prefix (#9424) 2023-02-21 18:55:24 +01:00
Steve Atherton 246fd1dd4e Remove auto cache option since there is no meaningful implementation of this yet. Change places using trState in a native Transaction to set cache mode or Low/Normal/High priority to use the new transaction options instead. 2023-02-21 02:50:30 -08:00
Ata E Husain Bohra fa60f1b4fa
RESTClient: Initialize RESTClient connection pool instance (#9414)
Description

Patch fixes an issue where new connection for a corresponding
'connectKey' isn't getting added to the connectionPoolMap.

Testing

Standlone fdbserver triggering RESTClient connection path
2023-02-20 19:32:10 -08:00
sfc-gh-tclinkenbeard 398079db3a Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-20 17:54:06 -08:00
Evan Tschannen 4f9e86b0a4 fixed two bugs that prevented the blob manager from properly loading worker affinity 2023-02-20 16:47:26 -08:00
Steve Atherton a5238eaeb3
Allow fetch keys to do more work when mutation stream is infrequent and low throughput. (#9398) 2023-02-20 13:38:04 -08:00
Steve Atherton e169e65021 Fix to BTree node rebuild logic - rebuild when imbalance hits a limit controlled by a new knob. 2023-02-19 16:40:28 -08:00
A.J. Beamon 487b444da9
Merge pull request #9416 from sfc-gh-ajbeamon/fix-tester-tenant-delete-bug
When clearing tenants at the end of a simulation test, reset the transaction when looping
2023-02-18 20:25:15 -08:00
Yi Wu eac757d186
EaR: cleanup encryption knobs (#9386)
Changes:
* Cleanup all encryption knobs 
* Update simulated cluster to randomly enable encryption with higher probability
2023-02-18 13:18:20 -08:00
A.J. Beamon 3163201201 Restore ID fixes: we weren't generating a restore ID; we weren't setting the restore ID on the management cluster in some restore modes; it is possible in some test scenarios to encounter a restore conflict, in which case we need to retry. 2023-02-17 21:20:06 -08:00
A.J. Beamon 95156198b5 When clearing tenants at the end of a simulation test, reset the transaction when looping 2023-02-17 21:13:53 -08:00
sfc-gh-tclinkenbeard 1aef6cb5f7 Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-17 20:41:59 -08:00
Hui Liu 323a257f1f Add stats counters for manifest dump 2023-02-17 16:22:45 -08:00
Jon Fu 8e6800663b add missing txn reset 2023-02-17 15:22:04 -08:00
Jon Fu 762cbcdc5d unconditionally set restore id 2023-02-17 14:41:41 -08:00
Jon Fu edb7a51b7e Revert "let client supply restore id"
This reverts commit 5fe32b8503.
2023-02-17 14:37:22 -08:00
Jon Fu 5fe32b8503 let client supply restore id 2023-02-17 14:01:58 -08:00
Jingyu Zhou 5d12b05090
Merge pull request #9369 from sfc-gh-ahusain/ahusain-fdbcore-3951 2023-02-17 13:11:00 -08:00
Hui Liu bdba85a86f
Merge pull request #9303 from sfc-gh-huliu/logtruncation
Truncate mutation logs after flushing blob granules
2023-02-17 12:52:45 -08:00
Ata E Husain Bohra b3d3889328 EaR: Fix ASAN OOM failure - EncryptionOps
Description

Fix ASAN build OOM failure while running OOM error.
Patch addresses the issue by avoiding using a global arena shortening
the lifecycle of allocation per iteration. Recently test was extended
to support 'configurable encryption' codepath coverage, hence, per
iteration the memory allocation was doubled.

Testing

EncryptionOps - ASAN compatible
2023-02-17 11:49:16 -08:00
Ankita Kejriwal 253934bb82
Merge pull request #9395 from sfc-gh-akejriwal/tenantmap
Assert that `checkTenantEntry()` on storage server is reading tenantMap at a legal version
2023-02-17 11:42:37 -08:00
Yi Wu 653a5eee28
EaR: Configurable encryption support for Redwood (#9359)
Supporting configurable encryption for Redwood, which supports switching to different encryption algorithm and having variable size encryption header.

Currently to support both old (non-configurable) and new (configurable) encryption header, the PR assume we have a fixed size encryption header (104 bytes) which is large enough to fit both kind of encryption header. Moving forward the plan is to update the IPager interface to support variable size encoding header, and when Redwood tries to in-place update a page but the reserved encoding header buffer is not large enough, rebuild the page instead.
2023-02-17 10:58:48 -08:00
Jon Fu 0d7b6d626b update restoreCluster test to account for conflicting_restore 2023-02-17 10:36:28 -08:00
Josh Slocum 6c2fb13173
adding wait parameter to blobbify api (#9360)
* adding wait parameter to blobbify api

* formatting

* fixing comment style

* fixing bug and adding debugging

* adding blob ranges unit test

* testing both blobbify cases in cancel

* formatting

* switch to explicit blocking api instead of boolean flag

* remove comments

* format
2023-02-17 12:20:53 -06:00
Hui Liu aa1d983132 Truncate logs after force-flushing cold blob granules 2023-02-17 10:17:04 -08:00
Steve Atherton aba2188491
Merge pull request #9399 from sfc-gh-satherton/read-after-shutdown-protection
Shut down PriorityMultiLocks more gracefully
2023-02-17 10:04:51 -08:00
Steve Atherton bca42ed232 Merge commit '99b23ac04d302b1edc6db04f1488a20f8f772ae1' into redwood-used-bytes-update 2023-02-16 22:07:44 -08:00
Ata E Husain Bohra 99b23ac04d
EaR: Configurable encryption support for Tlog mutations (#9394)
* EaR: Configurable encryption support for TLog mutations

Description

  diff-1 : Address review comments

Major changes includes:
1. Update the code involved in ensuring Tlog mutation encryption to be
compliant with "configurable encryption" feature.
2. Update ENABLE_CONFIGURABLE_ENCRYPTION flag to be 'true' by default
and BUGGIFY it.

Testing

devRunCorrectness - 100K
2023-02-16 19:01:59 -08:00
Ankita Kejriwal da7d3c1129 Merge branch 'main' of github.com:apple/foundationdb into tenantmap 2023-02-16 16:06:10 -08:00
Nim Wijetunga fd231e3f14
Configurable Encryption Support for TxnStateStore (#9387)
Configurable encryption for Transaction State Store
2023-02-16 15:20:14 -08:00
Josh Slocum c26831ec04
adding version metadata to blob granule file pointers (#9392) 2023-02-16 17:11:11 -06:00
Nim Wijetunga e03eca778c
Configurable Encryption Support for Backup (#9375)
Snapshot backup configurable encryption support
2023-02-16 15:03:27 -08:00
Jon Fu 53fc43a3a6 Merge branch 'main' of github.com:apple/foundationdb into split-tenant-metadata 2023-02-16 14:46:03 -08:00
Yanqin Jin c3d6ae0213
Throw invalid_tenant_configuration when changing assigned cluster (#9350)
Since we currently do not support tenant movement, we should as well explicitly disallow changing the assigned
cluster of a tenant during configuration by throwing `invalid_tenant_configuration` for now.

Test plan:
- add coverage for changing assigned cluster during tenant configuration
- fdbcli
- simulation tests
2023-02-16 14:20:59 -08:00
Yao Xiao a3b2324816 add validation 2023-02-16 13:52:27 -08:00
Jon Fu 036078234e fix some leftover issues from branch merge 2023-02-16 12:54:45 -08:00
Ankita Kejriwal b92aeca869 Merge branch 'main' of github.com:apple/foundationdb into tenantmap 2023-02-16 12:00:34 -08:00
Jon Fu ab15478ef9 Merge branch 'main' of github.com:apple/foundationdb into split-tenant-metadata 2023-02-16 10:04:07 -08:00
Jingyu Zhou 437cdb37ee
Merge pull request #9388 from jzhou77/fix
Fix exclusion in repairDeadDatacenter to be remote only
2023-02-16 09:42:42 -08:00
Jingyu Zhou 6b5d4fcb36
Merge pull request #9390 from sfc-gh-ajbeamon/clear-tenants-after-test
When clearing the database after a test, delete all tenants except the default tenant
2023-02-16 09:40:30 -08:00
Junhyun Shim d9c126a2d9
Introduce WipedString for Arena block holding AuthZ tokens (#9381)
* Enable secure allocation mode in Arena

This mode allows zeroing out blocks holding sensitive data after use

* Introduce WipedString to all token-holding memory

Also introduce a option flag "sensitive"

* Make pointer equivalency a hard requirement for non-ASAN builds

So that we can detect when Arena/malloc/memory-wipe behavior changes
2023-02-16 10:44:32 +01:00
Steve Atherton a1c804bc87 Added PriorityMultiLock::halt() and used that instead of kill() which avoids broken_promise errors if the lock user does not stop all of its lock-taking actors before destructing the lock and itself. 2023-02-15 23:32:17 -08:00
Steve Atherton 1d01444383 Update StorageBytes::used reported by Redwood to be the physical file size to match the behavior of other storage engines. 2023-02-15 19:02:56 -08:00
Ankita Kejriwal 56bdffa9dc Assert that storage server is reading tenantMap at a legal version 2023-02-15 18:47:17 -08:00
Jon Fu 8f4aec7d7f address review comments and fix some test errors 2023-02-15 15:58:19 -08:00
Jon Fu 1af6d8ff92 fix various correctness issues and listTenant test 2023-02-15 15:01:17 -08:00
Jingyu Zhou 99a8bfda11 Reorder trace events 2023-02-15 12:27:17 -08:00
A.J. Beamon 84142ab48d When clearing the database after a test, delete all tenants except the default tenant 2023-02-15 11:57:25 -08:00
Jingyu Zhou 02cdd0e1db Fix exclusion in repairDeadDatacenter to be remote only
If primary is excluded, the recovery will become stuck because no servers can
be recruited in the next time.
2023-02-15 11:15:18 -08:00
Jingyu Zhou 63814c9145
Merge pull request #9380 from sfc-gh-ajbeamon/fix-assigned-cluster-in-test
Try again to fix the logic in choosing the assigned cluster in the metacluster management test
2023-02-15 11:05:10 -08:00
Hui Liu fa9c80e17f
Merge pull request #9373 from sfc-gh-huliu/fixurl
Fix manifest file list issue on s3
2023-02-15 10:16:38 -08:00
Josh Slocum eefc889389
Add tenant and encryption support to new bg file apis (#9315)
* Add tenant and encryption support to new bg file apis

* formatting

* fixing comment style for linter
2023-02-15 11:48:40 -06:00
Ata E Husain Bohra 8c94b340ce
EaR: Update encryption methods to make 'cipherHeaderKey' optional (#9378)
* EaR: Update encryption methods to make 'cipherHeaderKey' optional

Description

 diff-1: Address review comments

Major changes includes:
1. Update BlobCipher Encrypt/Decrypt classes to make 'headerCipher' optional
2. Update GetEncryptionCipherKeys actor methods to make 'headerCipherKey' optional
3. Update the usage across all encryption participant methods

Testing

BlobCipherUnitTest
EnryptedBackupCorrecctness
BlobGranuleCorrectness*

devRunCorrectness - 100K
2023-02-15 08:56:11 -08:00
A.J. Beamon 545e08e1af Try again to fix the logic in choosing the assigned cluster in the metacluster management test 2023-02-14 21:39:42 -08:00
Jingyu Zhou b97ee87eba
Merge pull request #9367 from jzhou77/fix
Enable RocksDB restarting tests
2023-02-14 20:47:31 -08:00
Jingyu Zhou a5ca96ddf6
Merge pull request #9342 from sfc-gh-ajbeamon/metacluster-mgmt-restore
Metacluster restore support
2023-02-14 20:41:20 -08:00
Yi Wu 3d882a99c5
EaR: Refactor encryption header std::variant serializer and versioning (#9345)
Changes:
1. Make binary serializer natively support `std::variant`. Serialize size is 1 byte (the type index, i.e. `std::variant::index()`), plus the serialize size of the actual type stored in the `std::variant`. Update `BlobCipherEncryptHeaderRef` to use the `std::variant` binary serializer
3. Remove `flagsVersion` and `algoHeaderVersion` from `BlobCipherEncryptHeaderRef`. The former is replaced by `flags.index() + 1`, and the latter is moved into each of the algorithm-specific sub-headers. Each sub-header types will have nesting version-specific subtypes to handle serialization of that specific version (e.g. for `AesCtrNoAuth` it has a `AesCtrNoAuthV1` subtype).
2023-02-14 20:19:27 -08:00
Nim Wijetunga bf85c9f8af
Backup Mutation Log Separates Tenant Map Modifications During Restore (#9292)
mutation log separates tenant map modifications
2023-02-14 16:46:09 -08:00
A.J. Beamon adc4e932c4
Merge pull request #9364 from sfc-gh-ajbeamon/fix-get-mapped-range-assertion
Fix get mapped range test assertion
2023-02-14 15:56:41 -08:00
Hui Liu d9adbdcc66 Fix manifest file list issue on s3 2023-02-14 14:56:09 -08:00
Jingyu Zhou 4184239e72
Merge pull request #9372 from sfc-gh-xwang/fix/main/anyExisted
fix anyExisted when beginTenant==endTenant
2023-02-14 14:48:47 -08:00
Yi Wu fe18c87ac6
EaR: commit proxy fetch additional cipher keys post-resolution (#9308)
Commit proxy needs to fetch additional cipher keys post-resolution, since tenant ids for raw access requests and cross-tenant clear ranges are calculated after resolution.
2023-02-14 13:05:51 -08:00
A.J. Beamon 7284e691fb Fix a few minor restore bugs and add a dry-run mode. Some improvements to the fdbcli output. 2023-02-14 12:28:55 -08:00
Evan Tschannen c0597cc614 the blob worker uses affinity when assigning ranges on startup or after a failure 2023-02-14 11:16:59 -08:00
Jingyu Zhou d28f253182 Save shard_encode_location_metadata knob value for restarting tests
This is needed so that sharded rocks use consistent knob values.
2023-02-14 09:57:08 -08:00
Xiaoxi Wang 53f105eec5 fix anyExisted when beginTenant==endTenant 2023-02-14 09:07:09 -08:00
Ata E Husain Bohra 401b9c8918
EaR: Helper routines to support configurable encryption (#9368)
* EaR: Helper routines to support configurable encryption

Description

Add helper methods to BlobCipherEncryptHeaderRef enabling:
1. Extract 'IV' abstracting out underlying algorithm header
1. Extract 'cipherDetails' abstracting out underlying algorithm header

Testing

BlobCipherUnitTest & EncryptionOps are updated - 100K loop

* EaR: Helper routines to support configurable encryption

Description

Add helper methods to BlobCipherEncryptHeaderRef enabling:
1. Extract 'IV' abstracting out underlying algorithm header
1. Extract 'cipherDetails' abstracting out underlying algorithm header

Testing

BlobCipherUnitTest & EncryptionOps are updated - 100K loop
2023-02-14 08:34:41 -08:00
Jingyu Zhou a4d3035f64 Enable RocksDB restarting tests
Disable sharded rocks storage for downgrade tests where we
Need to keep knob "shard_encode_location_metadata" so that downgrade tests
can pass the second phase.
2023-02-13 20:12:19 -08:00
Evan Tschannen a1576a890c fix formatting 2023-02-13 15:57:16 -08:00
A.J. Beamon f3b58a063f Fix some merge issues and review comments 2023-02-13 15:32:44 -08:00
Evan Tschannen a646d15326 use Redwood as the storage engine for the blob worker cache 2023-02-13 15:28:41 -08:00
Jingyu Zhou 05e73b7836
Merge pull request #9361 from apple/sfc-gh-dadkins/disk-failure-attrition2
Disable machine attrition in DiskFailure workload.
2023-02-13 14:07:24 -08:00
Jon Fu ad210072c0 address code review 2023-02-13 13:29:19 -08:00
A.J. Beamon 958ff862e0 Fix some merge issues 2023-02-13 12:59:48 -08:00
Evan Tschannen 20bc868ee0 merge in main 2023-02-13 12:41:31 -08:00
A.J. Beamon 98407809d9 Merge branch 'main' into metacluster-mgmt-restore
# Conflicts:
#	fdbcli/MetaclusterCommands.actor.cpp
#	fdbclient/Metacluster.cpp
#	fdbclient/include/fdbclient/MetaclusterManagement.actor.h
#	fdbserver/workloads/MetaclusterManagementWorkload.actor.cpp
#	tests/CMakeLists.txt
2023-02-13 12:30:33 -08:00
A.J. Beamon 0127dd4b5a
Merge pull request #9356 from sfc-gh-ajbeamon/metacluster-concurrency-testing
Add metacluster concurrency test and fix various bugs that it found
2023-02-13 11:57:47 -08:00
A.J. Beamon 473dd33a1f Fix get mapped range test assertion to account for the possibility of a range terminating early when it reaches the end of a shard 2023-02-13 11:53:47 -08:00
A.J. Beamon 93b1e04aa9
Merge pull request #9355 from sfc-gh-ajbeamon/fix-assigned-cluster-test-retry-logic
Fix logic in MetaclusterManagementWorkload when retrying a tenant creation with an invalid assigned cluster
2023-02-13 10:43:09 -08:00
Markus Pilman 0172ca9615 Address review comments 2023-02-13 18:27:18 +01:00
Markus Pilman 8a47a484c9
Update fdbserver/storageserver.actor.cpp
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-02-13 18:25:03 +01:00
Jingyu Zhou fa7f15e46a
Merge pull request #9353 from sfc-gh-yiwu/redwood_restart
Redwood: fix restart test failure with XOR encoding
2023-02-13 09:24:57 -08:00
Markus Pilman 3b4bd1692a
Remove unused function
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2023-02-13 18:22:25 +01:00
Dan Adkins 5ede2d439c Disable machine attrition in DiskFailure workload.
The machine attrition logic doesn't take into account the possibility
that a disk corruption could an unrecoverable failure in the cluster.

Before disabling attrition during the DiskFailure workload, the failure
rate was >10/100,000 in the DiskFailureCycle test. Afterwards, there
were no failures in 100,000 runs.
2023-02-13 08:53:58 -08:00
Markus Pilman 3017e15448 Merge remote-tracking branch 'origin/main' into features/tenant-lock2
# Conflicts:
#	fdbclient/include/fdbclient/TenantManagement.actor.h
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/TenantManagementWorkload.actor.cpp
2023-02-13 11:43:07 +01:00
Steve Atherton 41fa3eada9
Merge branch 'main' into add-redwood-slack-knob 2023-02-12 19:31:20 -08:00
Evan Tschannen add21bbc3c fixed formatting 2023-02-12 10:50:17 -08:00
Evan Tschannen 73691467ee register previous affinity from the blob worker 2023-02-12 10:45:30 -08:00
Evan Tschannen bad8b2fad4 blob workers reboot with a different ID and register in the database their previous ID 2023-02-12 10:44:53 -08:00
A.J. Beamon a261c1d94c Run tenant management concurrency alongside metacluster management concurrency. Fix a few issues where performing tenant operations returned undesirable errors when the associated cluster was removed. 2023-02-11 19:46:47 -08:00
Xiaoxi Wang 93f892c085
Merge pull request #9340 from sfc-gh-xwang/fix/main/tenantList
fix the way verifyListFilter detect tenant state change
2023-02-11 17:20:46 -08:00
A.J. Beamon e6021f8326 Add Jon's metacluster concurrency test and fix various bugs that it found 2023-02-11 15:15:32 -08:00
Xiaoxi Wang 21a2378de5
Merge pull request #9298 from sfc-gh-xwang/feature/main/clearRange
Split raw clear ranges across tenants in required mode
2023-02-11 14:29:46 -08:00
Xiaoxi Wang a9c7632c83 Merge branch 'main' of https://github.com/apple/foundationdb into fix/main/tenantList 2023-02-11 13:54:27 -08:00
Xiaoxi Wang ac1ddc81b0 remove debug trace; change function comment 2023-02-11 13:17:59 -08:00
A.J. Beamon b4f45a0a87 Fix logic in MetaclusterManagementWorkload when retrying a tenant creation with an invalid assigned cluster 2023-02-11 12:09:17 -08:00
A.J. Beamon ee1b48323d
Merge pull request #9346 from sfc-gh-nwijetunga/nim/global-tenant-ids
Support for Two Byte Prefix for Tenant IDs
2023-02-11 11:31:24 -08:00
A.J. Beamon 4579a4319d Merge branch 'main' into storage-quota-in-tenant-metadata-space 2023-02-11 09:04:15 -08:00
Xiaoxi Wang a0f7943fc3 simplify implementation of lowerBoundTenantId and withinSingleTenant 2023-02-10 22:14:59 -08:00
Yi Wu a37d8f757c Redwood: fix restart test failure with xor encoding 2023-02-10 21:01:52 -08:00
Nim Wijetunga 640f1afd77 address pr comments 2023-02-10 16:39:06 -08:00
Nim Wijetunga 9e5c61e127 address pr comments 2023-02-10 15:56:41 -08:00
Nim Wijetunga de9eef72ff address pr comments 2023-02-10 13:49:15 -08:00
Jon Fu 5c68c95a60 fix assertion 2023-02-10 13:12:24 -08:00
Xiaoxi Wang ffc5733e9c add comments 2023-02-10 12:51:13 -08:00
Evan Tschannen 487189738f change reassignment after reboot to be the blob manager's responsibility 2023-02-10 12:47:03 -08:00
Xiaoxi Wang bb8d96c026 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/clearRange 2023-02-10 12:30:16 -08:00
Xiaoxi Wang ffadea08cb change isSingleTenant check; add unit tests 2023-02-10 12:29:38 -08:00
Jon Fu bf508f5642 adjust tenantconsistency workload to account for lastTenantId on data clusters 2023-02-10 12:25:29 -08:00
A.J. Beamon a6b47c1da4 Fix merge issue 2023-02-10 11:12:36 -08:00