Commit Graph

7112 Commits

Author SHA1 Message Date
Zhe Wu 40dc54223c Add GC generation test, and make all simulation test passing 2023-03-27 11:46:13 -07:00
Zhe Wu 2da86c37aa Add a knob to guard track tlog recovery 2023-03-27 11:42:27 -07:00
Zhe Wu 78bef8110b Track tlog recovery: tlog side implementation 2023-03-27 11:42:27 -07:00
Jay Zhuang cb389bf026
Merge pull request #9610 from sfc-gh-jazhuang/encrypt_inplace
Add inplace encryption and decryption API
which avoids the memory allocation and memcpy.
2023-03-27 11:21:06 -07:00
Zhe Wu 4a7f7cdfce
Merge pull request #9803 from halfprice/zhewu/exclude-check-existance
Do not update exclude/failed system metadata in excludeServers if the input list is already excluded/failed.
2023-03-27 09:38:04 -07:00
Zhe Wu 6e1bb08677 Update documentation 2023-03-24 15:29:47 -07:00
Zhe Wu 8211b5d097 Add a check in excludeServer function that if the exclusion list already exists, don't need to issue new writes. 2023-03-24 14:57:31 -07:00
Dan Adkins 02f0a44987
Avoid divide-by-zero in isKeyValueInSample. (#9799)
In the pathological case that both key size and value size are 0,
the probability of choosing that key-value pair is 0, and we divide
by zero when computing the sampledSize.

This change adds documentation to that function, which was quite
difficult to understand. In addition, we add `probability` to the
returned values, since one of the callers was backing it out from
sampledSize and itself in danger of dividing by zero.
2023-03-24 16:05:26 -04:00
Xiaoge Su 88eeb5a526 Remove WolfSSL support in FoundationDB 2023-03-23 20:17:18 -07:00
Jay Zhuang dba3555635 fix inplaceEncrypt() unittest issue 2023-03-23 15:26:22 -07:00
Jay Zhuang d9b37e527c Replace EncryptFinal() with CTX_reset() 2023-03-23 15:26:22 -07:00
Jay Zhuang 0efd403e59 Add inplace encryption/decryption API 2023-03-23 15:26:22 -07:00
Jingyu Zhou 18a0fa0263
Merge pull request #9468 from johscheuer/dont-block-exclusion-stateless-processes
Don't block the exclusion of stateless processes by the free capacity check
2023-03-23 08:59:43 -07:00
Johannes M. Scheuermann 694263ae5f Format code and update comment 2023-03-22 16:31:04 +01:00
neethuhaneesha 1e656210e1 Adding rocksdb bloom filter knobs. 2023-03-21 13:10:40 -07:00
neethuhaneesha 06657e1e0e
Rocksdb knob changes. (#9389) 2023-03-21 12:03:43 -07:00
He Liu 81c3cb8c50
Psm checkpoint (#9636)
* Update NativeAPI getCheckpointForRange().

* Implemented checkpoint in SS.

* clean up.

* Disabled StorageServerCheckpointTest.

* Serialized checkpoint creation and deletion.

Simplified checkpoint GC, via deleting CheckpointMetaData::dir.

* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.

* Minor improvements on CheckpointMetaData and DataMoveMetaData.

* fmt.

* Optimized PhysicalShardMove test

cleanup.

* dismiss operation_obsolete, and throw actor_cancelled.

* fmt.

* Resolved commments.
2023-03-21 09:14:10 -07:00
A.J. Beamon d324afe1bd
Merge pull request #9753 from sfc-gh-ajbeamon/fix-tenant-list-infinite-loop
Do not list renaming tenants twice when listing tenant metadata
2023-03-20 16:05:56 -07:00
Evan Tschannen a258d775c3
Merge pull request #9710 from sfc-gh-etschannen/feature-custom-dd
Added the ability to manually create a shard and also increase its replication factor
2023-03-20 15:35:10 -07:00
Zhongxing Zhang d2c1b3124e
add a field to show average data movement bytes in MovingData trace (#9591)
* add a field to show average data movement bytes in MovingData trace

* change variable type

* Make changes to variable/function naming and add more comments

* move rolling window struct to a new file; deal with corner case: dd startup, elements are full

* format

* simplify codes

* modify file/struct name, universal to moving window

* fix typo

* add a new Knob to control MovingWindow::Deque size

* make the general use of dequeSize limit

* format

* format, use space rather than tab
2023-03-20 14:33:32 -07:00
A.J. Beamon 6becf12ecd Merge branch 'main' into fix-tenant-list-infinite-loop 2023-03-20 14:11:16 -07:00
A.J. Beamon 690a0a81ae Reading a list of tenant metadata ordered by tenant name would occasionally get stuck in an infinite loop if the last tenant in a batch was being renamed and has the same ID as the tenant read in the previous batch. This change removes rename destinations from the list and avoids this problem. 2023-03-20 13:30:27 -07:00
Evan Tschannen 8e4eb83ba7 addressed review comments 2023-03-20 11:41:23 -07:00
Xiaoxi Wang dc1eb1375b add a miss healthy_perpetual_wiggle enum 2023-03-20 09:46:36 -07:00
Xiaoxi Wang ef706e551f Add more details into priority comments. 2023-03-20 09:46:36 -07:00
Xiaoxi Wang e48fd10d8d add perpetual wiggle to .team_tracker field 2023-03-20 09:46:36 -07:00
Xiaoxi Wang f89a483f3d add informal classification of priority 2023-03-20 09:46:36 -07:00
Xiaoxi Wang c73577de7d Add team priority comments and document. 2023-03-20 09:46:36 -07:00
Steve Atherton 216d0be2cf
Add processID, networkAddress, and locality to layer status JSON for Backup Agents. (#9736)
* Add processID, networkAddress, and locality to layer status JSON for Backup Agents.

* Backup/dr agent determines network address to report in Layer Status only once, when the status updater loop begins, since it is a blocking call which connects to the cluster.  And lots of code cleanup.
2023-03-17 18:07:03 -07:00
A.J. Beamon dc2bd78aa7 The consistency check should retry if it couldn't find all the commit proxies when getting key server locations 2023-03-17 12:00:47 -07:00
Evan Tschannen 73767501d4 Merge branch 'main' into feature-custom-dd
# Conflicts:
#	fdbserver/tester.actor.cpp
2023-03-17 10:33:38 -07:00
Ata E Husain Bohra c492f83bf4
EaR: Avoid appending `tls` to the URL (#9734)
Description

Patch proposes two changes:

1. Avoid appending tls as part of URI for secure connections
2. RefreshEKs recurring task can be skipped if there are no keys to be refreshed

Testing

EncryptionOps.toml
EncryptKeyProxyTest.toml
devRunCorrectness 
devRunCorrectnessFiltered 'Encrypt*'
2023-03-16 22:52:51 -07:00
He Liu 0f5e75b34b
Added newDataMoveId(). (#9647)
* Added newDataMoveId().

* Added `ENABLE_DD_PHYSICAL_SHARD_MOVE`

* fmt.

* Replace `teamId` with `shardId`.
2023-03-16 18:06:06 -07:00
A.J. Beamon 484a414117 Increase the buggified tag measurement interval to reduce trace spam 2023-03-16 11:53:45 -07:00
Evan Tschannen ac54962533 code cleanup 2023-03-16 09:47:21 -07:00
A.J. Beamon 6d5ffa11f9 Merge branch 'main' into fix-tenant-id-increment 2023-03-15 17:56:42 -07:00
Josh Slocum b4eb665f1d
fixing copy constructor error and adding test for it (#9711) 2023-03-15 15:33:16 -07:00
A.J. Beamon 3881f1ccc6 More carefully validate tenant increments to avoid overflows 2023-03-15 14:56:12 -07:00
Ata E Husain Bohra dbcab0b1bd
Revert "Refactor GetEncryptCipherKeys (#9600)" (#9708)
This reverts commit 2702665e35.
2023-03-15 12:10:08 -07:00
Evan Tschannen aaf7b9b32b Added the ability to manually create a shard and also increase its replication factor 2023-03-15 11:26:15 -07:00
Markus Pilman aa09baadab
Merge pull request #9635 from sfc-gh-etschannen/fix-consistency-check
Fix: the consistency check did not properly report failed tests
2023-03-15 11:21:44 -07:00
Evan Tschannen 6c1d02a14f
Merge pull request #9703 from sfc-gh-jslocum/bg_file_logical_size
adding blob granule logical size
2023-03-15 09:59:57 -07:00
Evan Tschannen 2f96627d43 merge in main 2023-03-15 09:26:22 -07:00
Evan Tschannen 0a8435b742
Merge pull request #9702 from sfc-gh-jslocum/dbg_bg_ctest_timeout
fixing 2 bugs related to high delta file waitCommitted latency
2023-03-15 08:52:35 -07:00
Johannes M. Scheuermann b317928646 Only consider newly excluded processes 2023-03-15 15:36:15 +01:00
Josh Slocum a5b4212990 adding blob granule logical size 2023-03-15 08:54:49 -05:00
Josh Slocum 52c0dc56cc fixing 2 bugs related to high delta file waitCommitted latency 2023-03-15 08:39:42 -05:00
Evan Tschannen c435e8336a no message 2023-03-14 16:40:50 -07:00
He Liu a0a3f4bff3
Fetch byte sample file (#9657) 2023-03-14 16:24:08 -07:00
Zhe Wang 7d2766b692
Fix KeyRangeRef::isCovered() (#9675)
* fix KeyRangeRef::isCovered()

* reproduce bug

* more unit test added

* fmt
2023-03-14 12:41:18 -07:00
Jingyu Zhou c5e9bdc6e4
Merge pull request #9684 from sfc-gh-ahusain/ahusain-fix-rest-test
Fix RestUtilUnit test
2023-03-14 09:16:39 -07:00
A.J. Beamon d39cda610a Merge branch 'main' into metacluster-improvements
# Conflicts:
#	fdbcli/TenantCommands.actor.cpp
2023-03-13 15:58:39 -07:00
Ata E Husain Bohra aae8b131cb Remove 'printf'
Description

Testing
2023-03-13 15:50:04 -07:00
Ata E Husain Bohra a196f2fd75 Fix RestUtilUnit test
Description

Fix RestUtilUnit test

Testing

RESTUtilUnits.toml
2023-03-13 15:46:15 -07:00
A.J. Beamon 45056370b8 Merge branch 'main' into metacluster-improvements 2023-03-13 13:14:09 -07:00
A.J. Beamon 18cf523f49
Merge pull request #9660 from sfc-gh-ajbeamon/tenant-id-restore-safety
Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix
2023-03-13 13:12:30 -07:00
Ata E Husain Bohra ea796eb3ec
EaR: REST kms misc fixes (#9664)
* EaR: REST kms misc fixes

Description

Patch addresses following issues:
1. Fix "return connection" routine, it fixes a regression introduced by
an earlier fix.
2. Update RESTConnectionPool::connectionPoolMap to an "unordered_map"
for O(1) lookups
3. Improve logging
4. Make RESTUrl parsing handle extra '/' for 'resource'

Testing

Standalone fdbserver connecting to external KMS and database create
2023-03-13 13:11:05 -07:00
A.J. Beamon cbc330697c Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix unless forced. Remember the largest used tenant ID on the data cluster and use it to update the management cluster tenant ID when force repopulating the same ID. 2023-03-10 15:36:37 -08:00
Jingyu Zhou b755e668bf
Merge pull request #9601 from jzhou77/fix-head
Allow log router to detect slow peeks and to switch DC for peeking
2023-03-09 15:34:24 -08:00
Ata E Husain Bohra b227007ab0
EaR: Fix knob name (#9630)
Description

Knob 'REST_KMS_ALLOW_NOT_SECURE_CONNECTION' got renamed in recent
patch, however, there are other places that needs an update too.

Testing

devRunCorrectness - 100K
RESTUtilUnits.toml
RESTKmsConnectorUnits.toml
2023-03-08 17:37:39 -08:00
Nim Wijetunga 2702665e35
Refactor GetEncryptCipherKeys (#9600)
* inital commit

* address pr comments
2023-03-08 17:05:03 -08:00
Evan Tschannen 4a17ed363a Fix: the consistency check did not properly report failed tests 2023-03-08 16:56:23 -08:00
Nim Wijetunga 218ed4519f
Strengthen Snapshot Backup/Restore Asserts (#9552)
strengthen backup/restore asserts for encryption
2023-03-08 15:24:02 -08:00
Ata E Husain Bohra d0eec9d0ba
EaR: REST KMS fixes - encryption integration testing (#9598)
* EaR: REST KMS fixes - encryption integration testing

Description

Major changes:
1. Multiple fixes observed while performing integration end-to-end
testing for Encryption at-rest feature.
2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL
to have more granular control of feature logging disconnected from
the cluster log level.

Testing

Integration testbed:
1. Run fdbserver standalone
2. Run external KMS http-server to serve encryption key fetch requests
2023-03-08 09:49:43 -08:00
Hui Liu c43f8b3fdc
Refactor - introduce BlobRestoreController for APIs to manage restore state (#9616) 2023-03-08 07:50:30 -08:00
Johannes M. Scheuermann c6eca3f398 Format code 2023-03-08 08:33:19 +01:00
Johannes M. Scheuermann 1550f3c596 Make use of precomputed exclude check 2023-03-08 08:19:42 +01:00
Johannes M. Scheuermann bae627f016 Fix syntax 2023-03-08 08:19:42 +01:00
Johannes M. Scheuermann db8c60c80f Don't block the exclusion of stateless processes by the free capacity check 2023-03-08 08:19:41 +01:00
A.J. Beamon de5f2c0fee Disallow cluster names that start with the `\xff` byte 2023-03-07 11:46:34 -08:00
Steve Atherton 5ff0bc3f87
Merge pull request #9576 from sfc-gh-satherton/storage-configure-refactor
Storage and log engine configuration support / refactor a few things.
2023-03-07 02:10:14 -08:00
Steve Atherton f6747adf89 Move implementations to cpp file. 2023-03-06 18:43:26 -08:00
Jingyu Zhou 0259a243ae Switch DC if log router peek becomes stuck
Trying to a different DC if this happens.
2023-03-06 17:41:56 -08:00
Ata E Husain Bohra a45de70003
EaR: RESTClient HTTP compliance, fix json request content type (#9544)
* EaR: RESTClient HTTP compliance, fix json request content type

Description

  diff-1: Address review comments

RESTClient is responsible to handle FDB <-> KMS communication
for Encryption and other usecases. By design, it only supports
"secure connection" i.e. "https"; however, it seems there is a
need to expand the module to support "http" connection,
for instance: test and dev deployments for instance.

However, given RESTClient gets involved in handling high
sensitive contents such as: plaintext "encryption cipher
from a KMS", the feature is guarded using
CLIENT_KNOB->REST_KMS_ENABLE_NOT_SECURE_CONNECTION which is
settable using FDBServer command line argument
"--kms-rest-enable_not_secure_connection" (boolean)

Testing

Deployed a standalone fdbserver and communicate with a
simple "http" server
2023-03-06 16:06:03 -08:00
Jingyu Zhou 0d8bde9dcd
Merge pull request #9505 from jzhou77/fix
Support multiple key prefix filters for fdbdecode
2023-03-06 15:57:03 -08:00
Chaoguang Lin 7273723a43
Add the hotrange fdbcli command (#9570)
* Add the hotrange fdbcli command

* Remove the unnecessary state

* Add the doc about the hotrange command
2023-03-06 14:46:52 -08:00
Jingyu Zhou 7a0b3c05b9
Merge pull request #9540 from sfc-gh-huliu/timestamp
Report restore phase start time and eta
2023-03-06 14:06:23 -08:00
A.J. Beamon 85c3cf702c
Merge pull request #9584 from sfc-gh-ajbeamon/fix-metacluster-create-error-msg
Fix metacluster create error message
2023-03-06 10:30:03 -08:00
A.J. Beamon ea907f10f5 Print the tenant mode string rather than integer value when reporting that we couldn't create a metacluster 2023-03-06 09:25:50 -08:00
Josh Slocum e1b620135b Merge branch 'main' into bg_latency_fixes 2023-03-06 09:23:11 -06:00
Steve Atherton 50d567b5a5 Refactored some parts of database configuration to support log_engine=<name> and storage_engine=<name> and generate these when converting a DatabaseConfig JSON object to a `configure` command. Refactored `fileconfigure` and simulation setup to use the same JSON -> configure function as the same code was copy/pasted to both places but only one has been kept up to date with new features. Renamed Redwood to `ssd-redwood-1` canonically but the experimental name is still supported for backward compatibility. 2023-03-04 20:52:31 -08:00
Jingyu Zhou df53bcd844 Merge branch 'main' of https://github.com/apple/foundationdb into fix 2023-03-03 20:32:29 -08:00
Hui Liu b2d497a3b2 Report restore phase start timestamp 2023-03-03 18:09:51 -08:00
Jingyu Zhou 8847e70be0
Merge pull request #9306 from kakaiu/add-physical-shard-meta-data-to-checkpoint
Dump checkpoint metadata to sst file
2023-03-03 14:45:50 -08:00
Jingyu Zhou ca00c9485b Merge branch 'main' of https://github.com/apple/foundationdb into fix 2023-03-03 11:12:40 -08:00
Xiaoxi Wang b13b586b71
Merge pull request #9547 from sfc-gh-xwang/feature/main/minReadBand
add knob for min read rebalance shard bandwidth
2023-03-03 09:37:23 -08:00
Jingyu Zhou ee5154f478 Refactor decoder to read file as a whole once
To reduce the number of network requests.
2023-03-03 09:32:12 -08:00
Zhe Wang 1766f412bb address comments 2023-03-03 09:04:26 -08:00
Zhe Wang e8aced0961 add sampled sample bytes to sst file 2023-03-03 09:04:26 -08:00
Zhe Wang 83a0547281 address comments and add test 2023-03-03 09:04:26 -08:00
Zhe Wang 338e0971a9 address comments 2023-03-03 09:04:25 -08:00
Zhe Wang e283e067d3 clean and address comments 2023-03-03 09:04:25 -08:00
Zhe Wang 2e68b44579 dump-checkpoint-meta-data-to-sstfile 2023-03-03 09:04:25 -08:00
Josh Slocum c6a21245d8 also using other GRVs BW already gets for committed version checking 2023-03-02 17:14:17 -06:00
Josh Slocum 57b120e702 Adding grv history so delta files can wait for less time to determine that they're committed 2023-03-02 17:14:17 -06:00
Xiaoxi Wang 26ffcf6b4a add knob for min read rebalance shard bandwidth 2023-03-02 11:26:27 -08:00
Xiaoxi Wang 010d5590e3 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/hotRangeDetect 2023-03-02 10:07:17 -08:00
Jingyu Zhou de89d2cca1
Merge pull request #9521 from sfc-gh-ajbeamon/fix-metacluster-issues
Fix some metacluster issues
2023-03-02 10:07:11 -08:00
Jingyu Zhou ad778cbe5e Merge branch 'main' of https://github.com/apple/foundationdb into fix 2023-03-02 09:56:30 -08:00
Josh Slocum 0c9397ef22 BW metric improvements for reads and file blocking 2023-03-02 10:57:51 -06:00
Josh Slocum 3861a2249b increasing bw request timeout 2023-03-02 09:36:47 -06:00
Josh Slocum 40ed365303 fixing checkBlobSubRange to not increase version every retry 2023-03-02 09:34:55 -06:00
Xiaoxi Wang 2d78b126f6 rename splitCount to chunkCount 2023-03-01 21:51:51 -08:00
Xiaoxi Wang 179f0ba71c new version of getReadHotRanges 2023-03-01 15:55:29 -08:00
A.J. Beamon 533f83b05e Fix a few more issues in metacluster code and tests:
1. Some additional idempotency problems in metacluster tests
2. An assertion that checked that a rename had expected values could fail during concurrent restores, but it would only happen if the transaction itself would fail to commit
3. Tweak the parameters of the MetaclusterRecovery test to try to avoid rare cases of logging too many trace events
2023-03-01 15:31:36 -08:00
A.J. Beamon 544890a6cd Merge branch 'main' into transaction-debug-logging 2023-03-01 10:09:17 -08:00
A.J. Beamon 2898a95c81 Fix two metacluster issues:
1. When retrying the transaction to register a restoring cluster, don't choose a new ID if the current ID matches the one recorded for the restoring cluster
2. A metacluster test was incorrectly handling the case where a transaction was retried with unknown result and had committed successfully
2023-02-28 15:40:04 -08:00
Junhyun Shim 6b26f5a6da
Fix transaction option consistency in TagThrottleInfo getter (#9513)
* Fix transaction option consistency in TagThrottleInfo getter

Subroutine of getter actor function for throttled and recommended tags
was, upon retry, resetting the transaction object which the caller also uses,
resetting the transaction option and causing a key_outside_legal_range by caller

Also, allowing a subroutine to conditionally, non-trivially modify the passed object
(i.e. transaction reset) is a risky pattern.

Fix: confine subroutine's responsibility to "attempting to" fetch and parse
"autoThrottlingEnabled" key. Let the calling function reset the object if needed.

* Apply Clang format
2023-02-28 23:47:26 +01:00
A.J. Beamon 310fc2ff4e Merge branch 'main' into transaction-debug-logging 2023-02-28 14:18:51 -08:00
Xiaoxi Wang 26237a291d update read range reply field 2023-02-28 13:18:57 -08:00
Jingyu Zhou 6c955080e9
Merge pull request #9207 from sfc-gh-jslocum/disable_feed_coalesce
disabling feed coalesce for now
2023-02-28 12:32:01 -08:00
Jingyu Zhou a350a929b9
Merge pull request #9494 from sfc-gh-jslocum/bg_cp_improvements
addressing review comments and fixmes in bg commit proxy code
2023-02-28 12:30:58 -08:00
Xiaoxi Wang 8cb2a1553a add read ops sampler 2023-02-28 12:03:42 -08:00
Josh Slocum f4308a0f6c Merge branch 'main' into disable_feed_coalesce 2023-02-28 13:57:21 -06:00
A.J. Beamon 87ac857aeb Make debug logging functions pure virtual on the transaction interfaces. Rename the function on TraceEvent to be more generic. 2023-02-28 11:11:06 -08:00
Markus Pilman 5bebb5b4aa
Merge pull request #9492 from sfc-gh-vgasiunas/vgasiunas-api-version-defs
Centralize definition of API Version for Java, Python and C API
2023-02-28 12:04:02 -07:00
Josh Slocum 301f2fd201 disabling feed coalesce for now 2023-02-28 12:07:12 -06:00
Lukas Joswiak 47fc53ed6e Adds more detailed mutation logging to commit proxy
The commit proxy writes a `ProxyMetrics` trace every 5 seconds. This
event contains a lot of useful information, such as the number of commit
batches that arrived and exited, the number of mutations processed, the
number of bytes those mutations made up, etc.. However, it is difficult
to tell what the workload pattern looks like within these 5 second
intervals when the metrics are being calculated.

This PR adds a new trace, `ProxyDetailedMetrics`, which logs itself
every 100ms. It currently only writes the number of mutations and the
number of mutation bytes that arrived during the 100ms time period. But
it should be easy to add more metrics in the future.

It's possible this increased logging could cause issues. Based off a
simulation run of the `WriteDuringRead` test, I got the following
results:

```
$ rg ProxyDetailedMetrics trace.json | wc -l
    6877
$ rg "Roles\": \".*CP.*\"" trace.json | wc -l
   11402
$ wc -l trace.json
   96147 trace.json
```

So on processes running as a commit proxy, this approximately doubled
the number of lines logged. But relative to the cluster overall, it only
added about 5% overhead.

If we want to reduce this number, one possibility would be to not write
a trace if all the values being written are 0. I'm not sure if this
would help much in production, but in simulation the large majority of
the traces (99%+) consist of zero values.
2023-02-28 09:48:39 -08:00
A.J. Beamon cb66a76d80 Merge branch 'main' into transaction-debug-logging 2023-02-28 09:21:30 -08:00
Jingyu Zhou 4ea70b1f59
Merge pull request #9512 from sfc-gh-mpilman/bugfixes/remove-lockid-from-txnstatestore
Don't store lockid in txnStateStore
2023-02-28 09:16:45 -08:00
A.J. Beamon 0abb33a9a5 Add the ability to print messages or log trace events based on a transaction's result 2023-02-28 09:06:54 -08:00
Ata E Husain Bohra 2db1da26d9
EaR: Update ApiWorkload to validate encryption at-rest guarantees (#9466)
* EaR: Update ApiWorkload to validate encryption at-rest guarantees

Description

FDB encryption data at-rest guarantees if cluster is configured with feature
enabled, all data written to persistent disks shall be "encrypted". Given FDB
maintains multiple persistent storages during lifecycle of the data, the patch
proposes a scheme to validate the invariant via "simulation testing"

Patch proposes updating ApiCorrectness workload to do the following:
1. Client supplied params and/randomly enable the validation feature.
2. Validation when enabled, allows injecting a known "marker string"
to workload generated Key and Value data patterns.
3. On shutdown, if the validation is enabled, all test files are
scanned for the known "marker" pattern.

Simulation tests are already capable of doing the following:
1. Randomly select TenantMode (disabled/optional/required)
2. Randomly select EncryptionAtRestMode (cluster_aware/domain_aware)

Hence, the updates test all possible combinations are validated. Also,
'defaultTenant' is present to cover 'domain_aware' encryption use cases.

Testing
devRunCorrectness
devRetryCorrectness - ApiCorrectness & EncryptedBackupCorrectness
2023-02-27 21:40:46 -08:00
Markus Pilman 871a9676ea Don't store lock id in txnStateStore 2023-02-27 21:25:42 -07:00
Markus Pilman 20874d8575
Merge pull request #9502 from sfc-gh-ajbeamon/metacluster-tenant-lock-support
Metacluster tenant lock support
2023-02-27 21:19:03 -07:00
Jingyu Zhou 5ac526a3e5
Merge pull request #9474 from sfc-gh-xwang/feature/main/readaware
enable read-aware DD by default and write release notes/doc
2023-02-27 20:04:04 -08:00
Jingyu Zhou 1313a7fa25 Use KeyspaceSnapshotFile to filter range files 2023-02-27 19:41:08 -08:00
Jingyu Zhou 842d485862
Merge pull request #9402 from yao-xiao-github/main
Add shard consistency validation.
2023-02-27 17:05:30 -08:00
Jingyu Zhou f414cd0ed8
Merge pull request #9486 from sfc-gh-ajbeamon/metacluster-management-concurrency-restore-support
Add restore to the metacluster management concurrency workload
2023-02-27 17:05:05 -08:00
A.J. Beamon 469e77158f Add metacluster support for tenant locking 2023-02-27 16:53:13 -08:00
Jingyu Zhou 29a406948a
Merge pull request #9370 from sfc-gh-mpilman/features/tenant-lock-fdbcli
fdbcli commands for tenant lock
2023-02-27 16:18:51 -08:00
Xiaoxi Wang eeade33c30 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/readaware 2023-02-27 14:18:44 -08:00
Russell Sears bcc05b1058 Improve support for prebuilt boost 2023-02-27 15:38:58 -06:00
Zhe Wu 7304e5cad0
Merge pull request #9485 from halfprice/zhewu/backup-size-estimate
Enhance fdbbackup query command to estimate data processing from a specific snapshot to a target version
2023-02-27 13:35:54 -08:00
Vishesh Yadav dd0ea8b0cf Clang-format 2023-02-27 13:10:19 -08:00
Vishesh Yadav 3e6e31ad0b Use the RangeMapFilters 2023-02-27 13:08:55 -08:00
Jingyu Zhou dd4bc82862 Refactor code 2023-02-27 13:06:01 -08:00
Jingyu Zhou 46fce2710e Use RangeMap for backup agent filtering
This is more efficient than going through ranges one by one.
2023-02-27 12:21:52 -08:00
A.J. Beamon a44c4c2e2e
Merge pull request #9478 from sfc-gh-ajbeamon/assert-comparison-all-types
Allow performing assert comparisons with any traceable type
2023-02-27 11:03:27 -08:00
Xiaoxi Wang da7d441436 Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/readaware 2023-02-27 09:09:35 -08:00
Josh Slocum 716a9c3817 addressing review comments and fixmes in bg commit proxy code 2023-02-27 10:51:47 -06:00
Ankita Kejriwal f7108958bf
Merge pull request #9449 from sfc-gh-akejriwal/exclusion
Improve space estimation in checkExclusion()
2023-02-27 08:31:52 -08:00
Markus Pilman a0e347c7ba Merge remote-tracking branch 'origin/main' into features/tenant-lock-fdbcli 2023-02-27 09:09:20 -07:00
Vaidas Gasiunas ba726fac87 Replace hardcoded API version checks for 720 and 730 2023-02-27 16:18:01 +01:00
Steve Atherton 674c105050
Merge pull request #9473 from sfc-gh-etschannen/feature-change-feed-lock
Replace fetchKeysParallelismFullLock to speed up fetch keys in idle clusters
2023-02-26 23:18:43 -08:00
Zhe Wu ffa3467098 Explicitly using min and max restorable version from backup description in query command in stead of going throw snapshots 2023-02-26 12:17:07 -08:00
A.J. Beamon 364bf062cb A few fixes to prevent simultaneous a simultaneous restore and removal from both making progress:
1. Change the cluster ID each time the restore registers the cluster
2. Handle commit unknown result in the removal purge
3. Delete the active restore ID when a removal is first recorded rather than at the end of the removal
4. Delete any existing active restore IDs when registering a cluster in the management cluster
2023-02-25 22:52:02 -08:00
A.J. Beamon e6a4b0489e Run a couple restore transactions using the correct runRestoreManagamentTransaction function in order to verify that the restore is still valid 2023-02-25 19:59:59 -08:00
A.J. Beamon 8e231f8809 During a dry-run restore, it is possible that the tenants being restored are modified concurrently. Handle this case with an output message. 2023-02-25 19:59:29 -08:00
A.J. Beamon b869f5a6ac Don't validate the configuration sequence number of a tenant during tenant reconciliation until we've started a transaction and confirmed the restore is still valid 2023-02-25 19:58:41 -08:00
A.J. Beamon b1111ce112 When renaming a tenant during a restore, use the rename destination for the tenant if it has one 2023-02-25 19:57:59 -08:00
A.J. Beamon 3aca47e600 When reconciling tenants during a restore, if a tenant is in the removing state on the management cluster, remove it from the data cluster 2023-02-25 19:54:30 -08:00
A.J. Beamon 040d44927b Store the rename destination for tenant movements in the crluster tenant index with an ID of -1. Use this to filter out tenant aliases when modifying the tenant count during a tenant purge. 2023-02-25 19:51:21 -08:00
Zhe Wu 8a88df0e91 Query backup size from a specific snapshot 2023-02-25 17:38:27 -08:00
Zhe Wu a94dd3a430 Fix fdbbackup query returning earliest version 2023-02-25 16:44:45 -08:00
A.J. Beamon 8c3ee768a2 Add an option to allow exceeding the tenant group capacity limit when changing tenant configuration 2023-02-24 21:01:36 -08:00
A.J. Beamon 1c71056e26
Merge pull request #9479 from sfc-gh-nwijetunga/nim/enforce-metacluster-tenant-mode
Enforce Disabled Tenant Mode in Metacluster
2023-02-24 19:27:57 -08:00
Ankita Kejriwal 99a1fb52e3 Prevent divison by 0 2023-02-24 18:36:55 -08:00
Jingyu Zhou 6b121de6a6
Merge pull request #9464 from jzhou77/fix
Add exclude to fdbcli's configure command
2023-02-24 16:31:02 -08:00
Nim Wijetunga c1087187d1 format change 2023-02-24 15:43:40 -08:00
Nim Wijetunga c8b7cff10c fix api test 2023-02-24 15:26:59 -08:00
Josh Slocum 6187811f71
Reworking getBlobGranuleRanges to also use commit proxy rpc for authz, and adding test (#9470) 2023-02-24 17:15:32 -06:00
Nim Wijetunga eca98afcb0 metacluster check tenant mode 2023-02-24 13:59:54 -08:00
Evan Tschannen 8872e5a462
Merge pull request #9347 from sfc-gh-etschannen/feature-change-feed-cache
added a disk to blob workers
2023-02-24 13:59:03 -08:00
A.J. Beamon 4a38bb4c3f Allow performing assert comparisons (e.g. ASSERT_EQ) with any traceable type 2023-02-24 12:53:01 -08:00
Xiaoxi Wang 998a5b7c0e enable read-aware DD by default and write release notes/doc 2023-02-24 11:11:25 -08:00
Evan Tschannen f3673d808b Replaced the fetchKeysParallelismFullLock with a lock specifically for change feeds to avoid blocking fetches on idle clusters 2023-02-24 10:59:35 -08:00
Jingyu Zhou 9a257a60a4 Address review comments 2023-02-24 10:47:32 -08:00
Markus Pilman 6c15506c36 Fixed tests 2023-02-24 11:32:37 -07:00
A.J. Beamon 03fbc59bb1
Merge pull request #9461 from sfc-gh-ajbeamon/metacluster-concurrent-restore-testing
Metacluster concurrent restore testing
2023-02-24 09:13:51 -08:00
Markus Pilman ee9d511d16
Merge pull request #9463 from sfc-gh-mpilman/buildcop/2023-02-23/bugfixes/arm-awssdk
Fix build issue with awssdk_target
2023-02-24 09:20:08 -07:00
Nim Wijetunga 29819b0645
Change Feed Bug Fix + Encryption Asserts (#9457)
* add encryption asserts

* modify function name

* address pr comments

* address pr comments

* Trigger Build
2023-02-23 19:33:25 -08:00
A.J. Beamon 2b25cfef8b Merge branch 'main' into metacluster-concurrent-restore-testing 2023-02-23 16:06:47 -08:00
Jon Fu 33f8e90f9f
Split tenant group metadata (#9446)
* initial commit to split tenant group metadata

* attempt to fix merge errors

* fix compile errors and adjust existing tests

* fix infinite loop and extra ACTOR tag

* direct assignment instead of store

* direct assign instead of store (missed a few)
2023-02-23 18:11:49 -05:00
Jingyu Zhou 1f1dc5e768 Allow a comma separated list of excluded addresses 2023-02-23 14:29:08 -08:00
Jingyu Zhou 6ac8720364 Add exclude to fdbcli's configure command
Right now this only allows one server address being excluded. This is useful
when the database is unavailable but we want the recruitment to skip some
particular processes.

Manually tested the concept works with a loopback cluster.
2023-02-23 14:28:20 -08:00
Markus Pilman c1f80fe471 Merge remote-tracking branch 'origin/main' into features/tenant-lock-fdbcli 2023-02-23 15:16:14 -07:00
Jingyu Zhou 792950dbdc
Merge pull request #9434 from sfc-gh-huliu/splitmetrics
Implement SplitMetric pagination in blob migrator
2023-02-23 14:10:27 -08:00
Markus Pilman 1862e65415 Fix build issue with awssdk_target 2023-02-23 15:05:17 -07:00
Markus Pilman 8759fd8f12 Fix refactoring mistake 2023-02-23 14:41:27 -07:00
A.J. Beamon 54955d54f2 Don't allow repopulating from a management cluster if there is another ID registered for the same cluster. Instead, the cluster must be unregistered first before repopulating from it. Also improves a trace event. 2023-02-23 13:28:10 -08:00
A.J. Beamon c2d28377af Set the restore ID in the data cluster after marking the cluster restoring in the management cluster 2023-02-23 13:28:10 -08:00
A.J. Beamon 6adccdafa9 Add a conflict range on the active restore ID when setting it 2023-02-23 13:28:10 -08:00
A.J. Beamon 537834ef00 Properly initialize API version of simulated MVC clusters when calling openDatabase 2023-02-23 13:28:10 -08:00
A.J. Beamon 06fe00544a Remove TenantMapEntry <-> MetaclusterTenantMapEntry conversion constructors and use named functions instead 2023-02-23 13:28:10 -08:00
Markus Pilman 193e517cc4 Address review comments and move lock ID into TenantMapEntry 2023-02-23 14:25:36 -07:00
Markus Pilman efc5bf9ee8
Merge pull request #9456 from sfc-gh-ajbeamon/smaller-tenant-in-txn-state-store
Store a smaller tenant object in the txn state store
2023-02-23 14:00:12 -07:00
Evan Tschannen cf3a4e6161 Merge branch 'main' into feature-change-feed-cache 2023-02-23 10:16:13 -08:00
Jingyu Zhou 3d8b8a2a05
Merge pull request #9450 from sfc-gh-ahusain/ahusain-misc-fixes
EaR: RESTClient and EKP changes to handle unreachable external KMS
2023-02-23 10:04:12 -08:00
Evan Tschannen a581a55452 ensure a worker cannot run multiple blob worker roles 2023-02-23 09:51:26 -08:00
A.J. Beamon dd650215d4 Store a smaller tenant object in the txn state store 2023-02-23 09:29:33 -08:00
Vaidas Gasiunas 402f618180
Default transaction options for report_conflicting_keys and used_during_commit_protection_disable (#9441)
* Introducing default transaction options for report_conflicting_keys and used_during_commit_protection_disable, set the latter option always in Java bindings

* Reformatting TransactionIntegrationTest.java

* Update description of transaction_report_conflicting_keys option

* Remove dependency between mock and real database implementation in RangeQueryTest.java

* Update generated.go after changing desciption of an option

* Small improvements of the TransactionIntegrationTest code
2023-02-23 18:05:01 +01:00
Ata E Husain Bohra 7d079690d4 Merge branch 'main' into ahusain-misc-fixes 2023-02-22 18:11:11 -08:00
Ata E Husain Bohra 1f7ee9437f EaR: RESTClient and EKP changes to handle unreachable external KMS
Description

Two major changes proposed are:

I)
Used following setup for testing:
1. Run `fdbserver` locally.
2. Run a mock python based HTTP server (encryption endpoints not implemented)

Expectation was RESTClient code should go in loop trying to establish connection
to the desired encryption endpoint. However, observation was the code loops for
one cycle and followup cycle SEGV while printing a log using RESTUrl object which
is obtained as a 'pointer' from the caller. Update the code to use RESTUrl object
instead of the pointer.

II) In above setup, KMSConnector would throw 'encrypt_key_fetch_failed' error
which wasn't handled by EKProxy, hence, causing the service to terminate. Add
code to re-throw the error to the caller.

Testing
2023-02-22 17:15:34 -08:00
Ankita Kejriwal 64ac92bd4b Improve comments as per review 2023-02-22 17:13:44 -08:00
A.J. Beamon 9b906d9b3d
Merge pull request #9447 from sfc-gh-ajbeamon/metacluster-restore-fixes
Metacluster restore fixes
2023-02-22 17:07:19 -08:00
Hui Liu 0fba65a3cd Implement SplitMetric pagination in blob migrator 2023-02-22 16:00:49 -08:00
Ankita Kejriwal 8aafbfe6cc Improve space estimation in checkExclusion() 2023-02-22 15:58:25 -08:00
A.J. Beamon 87cb21be06
Merge pull request #9310 from sfc-gh-mpilman/features/tenant-lock2
Tenant Lock
2023-02-22 15:18:14 -08:00
A.J. Beamon 33431f062d Add some trace events, use a more appropriate error, and improve a check of allocated tenant groups 2023-02-22 14:39:51 -08:00
A.J. Beamon 91df95e816 If registering a cluster fails on the data cluster, try to rollback the registration on the management cluster 2023-02-22 12:30:50 -08:00
A.J. Beamon 587c47832c Swap the register and remove cluster opertations (no change to the code) 2023-02-22 12:30:50 -08:00
A.J. Beamon 6dad4c5c60 Improve trace event 2023-02-22 12:30:50 -08:00
A.J. Beamon c4ef32c77a Check for registraton tombstones when registering a cluster during restores 2023-02-22 12:30:50 -08:00
A.J. Beamon 92011b9339 When erasing tenants during a force removal, tenants being renamed were double counted against the tenant count 2023-02-22 12:30:50 -08:00
A.J. Beamon 1b9d4a8d5a Registering a restoring data cluster didn't detect cases where it should fail, such as if the cluster already existed 2023-02-22 12:30:50 -08:00
A.J. Beamon 7da247cde2 Prevent operations on clusters that are restoring 2023-02-22 12:29:32 -08:00
A.J. Beamon ec79ecce73 Add a boolean parameter for ForceRemove; rename ForceJoinNewMetacluster to ForceJoin 2023-02-22 12:26:24 -08:00
Markus Pilman 8695fc15fc Merge remote-tracking branch 'origin/main' into features/tenant-lock2 2023-02-22 13:12:23 -07:00
A.J. Beamon 006a2ead6f Merge branch 'main' into check-metacluster-restore-dryrun 2023-02-22 11:16:45 -08:00
A.J. Beamon 18ae6bda12 Fix formatting issue 2023-02-22 11:12:42 -08:00
Josh Slocum 9cd0f32e87
Fixing several metrics issues in blob workers (#9426)
* fixing int vs int64 data type, and fixing cause of incorrect request counter

* fixing incorrect count of mutation bytes buffered on granule cancel
2023-02-22 11:08:12 -06:00
Steve Atherton 23df46773d
Merge pull request #9422 from sfc-gh-satherton/client-read-options
Add transaction option definitions for read priority and read cache
2023-02-22 09:00:25 -08:00
Josh Slocum bf97c3dbce
adding java tenant blob management test and fixing bug it found (#9428) 2023-02-22 10:52:26 -06:00
A.J. Beamon d18fffd251 Fix one item that the metacluster restore dry-run testing turned up as changing 2023-02-22 08:41:16 -08:00
A.J. Beamon cb57541c98 Add testing that the metacluster restore dry-run mode doesn't change anything 2023-02-22 08:41:16 -08:00
Steve Atherton a21b2fe9f9 Simplified read priority option to three separate options for normal/low/high priority. 2023-02-21 22:48:38 -08:00
Steve Atherton 5969616af8 Merge commit '6de85e7cd8e9dd74a571de9e04679e669bcbb5b6' into client-read-options 2023-02-21 20:46:20 -08:00
Jon Fu ff7174065f Merge branch 'main' of github.com:apple/foundationdb into split-tenant-metadata 2023-02-21 14:16:13 -08:00
Steve Atherton bb4fb3d81d
Merge pull request #9419 from sfc-gh-satherton/page-rebuild-fix
Optimize/fix node rebuild vs update trigger in Redwood
2023-02-21 13:49:14 -08:00
Markus Pilman 15d8548c0e Merge remote-tracking branch 'origin/main' into features/tenant-lock2
# Conflicts:
#	fdbserver/ApplyMetadataMutation.cpp
#	fdbserver/storageserver.actor.cpp
2023-02-21 13:39:35 -07:00
Evan Tschannen e383cdfddd fixed merge conflict 2023-02-21 12:18:07 -08:00
Evan Tschannen 8129381689 merge in main 2023-02-21 12:06:35 -08:00
Jon Fu 37fa579587 fix compile issues 2023-02-21 11:46:11 -08:00
Ata E Husain Bohra 4652eaf85d
EaR: Reduce logging level for RESTClient (#9429)
Description

Reduce the logging level to SevDebug for RESTClient operation

Testing

compiles
2023-02-21 11:43:28 -08:00
Jon Fu 2d74d10a91 use MetaclusterAPI namespace over TenantAPI for TenantState 2023-02-21 11:33:36 -08:00
Josh Slocum 958f3b531b
Plumbing blob worker mapping through commit proxy like storage server (#9401)
* Plumbing blob worker mapping through commit proxy like storage server mapping

* review comments

* formatting
2023-02-21 13:21:44 -06:00
Jon Fu da688f9b77 update documentation 2023-02-21 11:13:35 -08:00
Jingyu Zhou 81f8c360db
Merge pull request #8811 from sfc-gh-tclinkenbeard/expose-tag-throttled-duration 2023-02-21 10:47:35 -08:00
Jon Fu 9e01cffef0 fix some merge conflicts and address review comments 2023-02-21 10:29:36 -08:00
Jon Fu 428eb07766 Merge branch 'main' of github.com:apple/foundationdb into split-tenant-metadata 2023-02-21 10:11:35 -08:00
Junhyun Shim 2497aa5701
Clamp GetKeyServerLocations result to tenant prefix (#9424) 2023-02-21 18:55:24 +01:00
Steve Atherton 246fd1dd4e Remove auto cache option since there is no meaningful implementation of this yet. Change places using trState in a native Transaction to set cache mode or Low/Normal/High priority to use the new transaction options instead. 2023-02-21 02:50:30 -08:00
Steve Atherton 9bf28899d4 Add transaction option definitions for read priority and read server side cache mode. 2023-02-21 00:54:35 -08:00
Ata E Husain Bohra fa60f1b4fa
RESTClient: Initialize RESTClient connection pool instance (#9414)
Description

Patch fixes an issue where new connection for a corresponding
'connectKey' isn't getting added to the connectionPoolMap.

Testing

Standlone fdbserver triggering RESTClient connection path
2023-02-20 19:32:10 -08:00
sfc-gh-tclinkenbeard 398079db3a Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-20 17:54:06 -08:00
Evan Tschannen 4f9e86b0a4 fixed two bugs that prevented the blob manager from properly loading worker affinity 2023-02-20 16:47:26 -08:00
Josh Slocum bfb3ffc509
added c and java apis for granule flush (#9412) 2023-02-20 10:28:11 -06:00
Steve Atherton e169e65021 Fix to BTree node rebuild logic - rebuild when imbalance hits a limit controlled by a new knob. 2023-02-19 16:40:28 -08:00
Yi Wu eac757d186
EaR: cleanup encryption knobs (#9386)
Changes:
* Cleanup all encryption knobs 
* Update simulated cluster to randomly enable encryption with higher probability
2023-02-18 13:18:20 -08:00
A.J. Beamon c2ca21cdb4
Merge pull request #9417 from sfc-gh-ajbeamon/fix-restore-id-logic
Fixes to metacluster restore ID
2023-02-18 06:35:13 -08:00
A.J. Beamon 3163201201 Restore ID fixes: we weren't generating a restore ID; we weren't setting the restore ID on the management cluster in some restore modes; it is possible in some test scenarios to encounter a restore conflict, in which case we need to retry. 2023-02-17 21:20:06 -08:00
Ankita Kejriwal b74a35a986 Enable STORAGE_QUOTA_ENABLED knob by default 2023-02-17 21:14:46 -08:00
sfc-gh-tclinkenbeard 1aef6cb5f7 Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration 2023-02-17 20:41:59 -08:00
Jon Fu 762cbcdc5d unconditionally set restore id 2023-02-17 14:41:41 -08:00
Jon Fu edb7a51b7e Revert "let client supply restore id"
This reverts commit 5fe32b8503.
2023-02-17 14:37:22 -08:00
Jon Fu 5fe32b8503 let client supply restore id 2023-02-17 14:01:58 -08:00
Hui Liu bdba85a86f
Merge pull request #9303 from sfc-gh-huliu/logtruncation
Truncate mutation logs after flushing blob granules
2023-02-17 12:52:45 -08:00
Jon Fu 0d7b6d626b update restoreCluster test to account for conflicting_restore 2023-02-17 10:36:28 -08:00
Josh Slocum 6c2fb13173
adding wait parameter to blobbify api (#9360)
* adding wait parameter to blobbify api

* formatting

* fixing comment style

* fixing bug and adding debugging

* adding blob ranges unit test

* testing both blobbify cases in cancel

* formatting

* switch to explicit blocking api instead of boolean flag

* remove comments

* format
2023-02-17 12:20:53 -06:00
Hui Liu aa1d983132 Truncate logs after force-flushing cold blob granules 2023-02-17 10:17:04 -08:00