Zhe Wu
40dc54223c
Add GC generation test, and make all simulation test passing
2023-03-27 11:46:13 -07:00
Zhe Wu
2da86c37aa
Add a knob to guard track tlog recovery
2023-03-27 11:42:27 -07:00
Zhe Wu
78bef8110b
Track tlog recovery: tlog side implementation
2023-03-27 11:42:27 -07:00
Jay Zhuang
cb389bf026
Merge pull request #9610 from sfc-gh-jazhuang/encrypt_inplace
...
Add inplace encryption and decryption API
which avoids the memory allocation and memcpy.
2023-03-27 11:21:06 -07:00
Zhe Wu
4a7f7cdfce
Merge pull request #9803 from halfprice/zhewu/exclude-check-existance
...
Do not update exclude/failed system metadata in excludeServers if the input list is already excluded/failed.
2023-03-27 09:38:04 -07:00
Zhe Wu
6e1bb08677
Update documentation
2023-03-24 15:29:47 -07:00
Zhe Wu
8211b5d097
Add a check in excludeServer function that if the exclusion list already exists, don't need to issue new writes.
2023-03-24 14:57:31 -07:00
Dan Adkins
02f0a44987
Avoid divide-by-zero in isKeyValueInSample. ( #9799 )
...
In the pathological case that both key size and value size are 0,
the probability of choosing that key-value pair is 0, and we divide
by zero when computing the sampledSize.
This change adds documentation to that function, which was quite
difficult to understand. In addition, we add `probability` to the
returned values, since one of the callers was backing it out from
sampledSize and itself in danger of dividing by zero.
2023-03-24 16:05:26 -04:00
Xiaoge Su
88eeb5a526
Remove WolfSSL support in FoundationDB
2023-03-23 20:17:18 -07:00
Jay Zhuang
dba3555635
fix inplaceEncrypt() unittest issue
2023-03-23 15:26:22 -07:00
Jay Zhuang
d9b37e527c
Replace EncryptFinal() with CTX_reset()
2023-03-23 15:26:22 -07:00
Jay Zhuang
0efd403e59
Add inplace encryption/decryption API
2023-03-23 15:26:22 -07:00
Jingyu Zhou
18a0fa0263
Merge pull request #9468 from johscheuer/dont-block-exclusion-stateless-processes
...
Don't block the exclusion of stateless processes by the free capacity check
2023-03-23 08:59:43 -07:00
Johannes M. Scheuermann
694263ae5f
Format code and update comment
2023-03-22 16:31:04 +01:00
neethuhaneesha
1e656210e1
Adding rocksdb bloom filter knobs.
2023-03-21 13:10:40 -07:00
neethuhaneesha
06657e1e0e
Rocksdb knob changes. ( #9389 )
2023-03-21 12:03:43 -07:00
He Liu
81c3cb8c50
Psm checkpoint ( #9636 )
...
* Update NativeAPI getCheckpointForRange().
* Implemented checkpoint in SS.
* clean up.
* Disabled StorageServerCheckpointTest.
* Serialized checkpoint creation and deletion.
Simplified checkpoint GC, via deleting CheckpointMetaData::dir.
* Fixed PhysicalShardMove test. Where fetchCheckpoint target range is misset.
* Minor improvements on CheckpointMetaData and DataMoveMetaData.
* fmt.
* Optimized PhysicalShardMove test
cleanup.
* dismiss operation_obsolete, and throw actor_cancelled.
* fmt.
* Resolved commments.
2023-03-21 09:14:10 -07:00
A.J. Beamon
d324afe1bd
Merge pull request #9753 from sfc-gh-ajbeamon/fix-tenant-list-infinite-loop
...
Do not list renaming tenants twice when listing tenant metadata
2023-03-20 16:05:56 -07:00
Evan Tschannen
a258d775c3
Merge pull request #9710 from sfc-gh-etschannen/feature-custom-dd
...
Added the ability to manually create a shard and also increase its replication factor
2023-03-20 15:35:10 -07:00
Zhongxing Zhang
d2c1b3124e
add a field to show average data movement bytes in MovingData trace ( #9591 )
...
* add a field to show average data movement bytes in MovingData trace
* change variable type
* Make changes to variable/function naming and add more comments
* move rolling window struct to a new file; deal with corner case: dd startup, elements are full
* format
* simplify codes
* modify file/struct name, universal to moving window
* fix typo
* add a new Knob to control MovingWindow::Deque size
* make the general use of dequeSize limit
* format
* format, use space rather than tab
2023-03-20 14:33:32 -07:00
A.J. Beamon
6becf12ecd
Merge branch 'main' into fix-tenant-list-infinite-loop
2023-03-20 14:11:16 -07:00
A.J. Beamon
690a0a81ae
Reading a list of tenant metadata ordered by tenant name would occasionally get stuck in an infinite loop if the last tenant in a batch was being renamed and has the same ID as the tenant read in the previous batch. This change removes rename destinations from the list and avoids this problem.
2023-03-20 13:30:27 -07:00
Evan Tschannen
8e4eb83ba7
addressed review comments
2023-03-20 11:41:23 -07:00
Xiaoxi Wang
dc1eb1375b
add a miss healthy_perpetual_wiggle enum
2023-03-20 09:46:36 -07:00
Xiaoxi Wang
ef706e551f
Add more details into priority comments.
2023-03-20 09:46:36 -07:00
Xiaoxi Wang
e48fd10d8d
add perpetual wiggle to .team_tracker field
2023-03-20 09:46:36 -07:00
Xiaoxi Wang
f89a483f3d
add informal classification of priority
2023-03-20 09:46:36 -07:00
Xiaoxi Wang
c73577de7d
Add team priority comments and document.
2023-03-20 09:46:36 -07:00
Steve Atherton
216d0be2cf
Add processID, networkAddress, and locality to layer status JSON for Backup Agents. ( #9736 )
...
* Add processID, networkAddress, and locality to layer status JSON for Backup Agents.
* Backup/dr agent determines network address to report in Layer Status only once, when the status updater loop begins, since it is a blocking call which connects to the cluster. And lots of code cleanup.
2023-03-17 18:07:03 -07:00
A.J. Beamon
dc2bd78aa7
The consistency check should retry if it couldn't find all the commit proxies when getting key server locations
2023-03-17 12:00:47 -07:00
Evan Tschannen
73767501d4
Merge branch 'main' into feature-custom-dd
...
# Conflicts:
# fdbserver/tester.actor.cpp
2023-03-17 10:33:38 -07:00
Ata E Husain Bohra
c492f83bf4
EaR: Avoid appending `tls` to the URL ( #9734 )
...
Description
Patch proposes two changes:
1. Avoid appending tls as part of URI for secure connections
2. RefreshEKs recurring task can be skipped if there are no keys to be refreshed
Testing
EncryptionOps.toml
EncryptKeyProxyTest.toml
devRunCorrectness
devRunCorrectnessFiltered 'Encrypt*'
2023-03-16 22:52:51 -07:00
He Liu
0f5e75b34b
Added newDataMoveId(). ( #9647 )
...
* Added newDataMoveId().
* Added `ENABLE_DD_PHYSICAL_SHARD_MOVE`
* fmt.
* Replace `teamId` with `shardId`.
2023-03-16 18:06:06 -07:00
A.J. Beamon
484a414117
Increase the buggified tag measurement interval to reduce trace spam
2023-03-16 11:53:45 -07:00
Evan Tschannen
ac54962533
code cleanup
2023-03-16 09:47:21 -07:00
A.J. Beamon
6d5ffa11f9
Merge branch 'main' into fix-tenant-id-increment
2023-03-15 17:56:42 -07:00
Josh Slocum
b4eb665f1d
fixing copy constructor error and adding test for it ( #9711 )
2023-03-15 15:33:16 -07:00
A.J. Beamon
3881f1ccc6
More carefully validate tenant increments to avoid overflows
2023-03-15 14:56:12 -07:00
Ata E Husain Bohra
dbcab0b1bd
Revert "Refactor GetEncryptCipherKeys ( #9600 )" ( #9708 )
...
This reverts commit 2702665e35
.
2023-03-15 12:10:08 -07:00
Evan Tschannen
aaf7b9b32b
Added the ability to manually create a shard and also increase its replication factor
2023-03-15 11:26:15 -07:00
Markus Pilman
aa09baadab
Merge pull request #9635 from sfc-gh-etschannen/fix-consistency-check
...
Fix: the consistency check did not properly report failed tests
2023-03-15 11:21:44 -07:00
Evan Tschannen
6c1d02a14f
Merge pull request #9703 from sfc-gh-jslocum/bg_file_logical_size
...
adding blob granule logical size
2023-03-15 09:59:57 -07:00
Evan Tschannen
2f96627d43
merge in main
2023-03-15 09:26:22 -07:00
Evan Tschannen
0a8435b742
Merge pull request #9702 from sfc-gh-jslocum/dbg_bg_ctest_timeout
...
fixing 2 bugs related to high delta file waitCommitted latency
2023-03-15 08:52:35 -07:00
Johannes M. Scheuermann
b317928646
Only consider newly excluded processes
2023-03-15 15:36:15 +01:00
Josh Slocum
a5b4212990
adding blob granule logical size
2023-03-15 08:54:49 -05:00
Josh Slocum
52c0dc56cc
fixing 2 bugs related to high delta file waitCommitted latency
2023-03-15 08:39:42 -05:00
Evan Tschannen
c435e8336a
no message
2023-03-14 16:40:50 -07:00
He Liu
a0a3f4bff3
Fetch byte sample file ( #9657 )
2023-03-14 16:24:08 -07:00
Zhe Wang
7d2766b692
Fix KeyRangeRef::isCovered() ( #9675 )
...
* fix KeyRangeRef::isCovered()
* reproduce bug
* more unit test added
* fmt
2023-03-14 12:41:18 -07:00
Jingyu Zhou
c5e9bdc6e4
Merge pull request #9684 from sfc-gh-ahusain/ahusain-fix-rest-test
...
Fix RestUtilUnit test
2023-03-14 09:16:39 -07:00
A.J. Beamon
d39cda610a
Merge branch 'main' into metacluster-improvements
...
# Conflicts:
# fdbcli/TenantCommands.actor.cpp
2023-03-13 15:58:39 -07:00
Ata E Husain Bohra
aae8b131cb
Remove 'printf'
...
Description
Testing
2023-03-13 15:50:04 -07:00
Ata E Husain Bohra
a196f2fd75
Fix RestUtilUnit test
...
Description
Fix RestUtilUnit test
Testing
RESTUtilUnits.toml
2023-03-13 15:46:15 -07:00
A.J. Beamon
45056370b8
Merge branch 'main' into metacluster-improvements
2023-03-13 13:14:09 -07:00
A.J. Beamon
18cf523f49
Merge pull request #9660 from sfc-gh-ajbeamon/tenant-id-restore-safety
...
Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix
2023-03-13 13:12:30 -07:00
Ata E Husain Bohra
ea796eb3ec
EaR: REST kms misc fixes ( #9664 )
...
* EaR: REST kms misc fixes
Description
Patch addresses following issues:
1. Fix "return connection" routine, it fixes a regression introduced by
an earlier fix.
2. Update RESTConnectionPool::connectionPoolMap to an "unordered_map"
for O(1) lookups
3. Improve logging
4. Make RESTUrl parsing handle extra '/' for 'resource'
Testing
Standalone fdbserver connecting to external KMS and database create
2023-03-13 13:11:05 -07:00
A.J. Beamon
cbc330697c
Disallow repopulating a management cluster from a data cluster with matching tenant ID prefix unless forced. Remember the largest used tenant ID on the data cluster and use it to update the management cluster tenant ID when force repopulating the same ID.
2023-03-10 15:36:37 -08:00
Jingyu Zhou
b755e668bf
Merge pull request #9601 from jzhou77/fix-head
...
Allow log router to detect slow peeks and to switch DC for peeking
2023-03-09 15:34:24 -08:00
Ata E Husain Bohra
b227007ab0
EaR: Fix knob name ( #9630 )
...
Description
Knob 'REST_KMS_ALLOW_NOT_SECURE_CONNECTION' got renamed in recent
patch, however, there are other places that needs an update too.
Testing
devRunCorrectness - 100K
RESTUtilUnits.toml
RESTKmsConnectorUnits.toml
2023-03-08 17:37:39 -08:00
Nim Wijetunga
2702665e35
Refactor GetEncryptCipherKeys ( #9600 )
...
* inital commit
* address pr comments
2023-03-08 17:05:03 -08:00
Evan Tschannen
4a17ed363a
Fix: the consistency check did not properly report failed tests
2023-03-08 16:56:23 -08:00
Nim Wijetunga
218ed4519f
Strengthen Snapshot Backup/Restore Asserts ( #9552 )
...
strengthen backup/restore asserts for encryption
2023-03-08 15:24:02 -08:00
Ata E Husain Bohra
d0eec9d0ba
EaR: REST KMS fixes - encryption integration testing ( #9598 )
...
* EaR: REST KMS fixes - encryption integration testing
Description
Major changes:
1. Multiple fixes observed while performing integration end-to-end
testing for Encryption at-rest feature.
2. Improve REST module logging. Introduced FLOW_KNOBS->REST_LOG_LEVEL
to have more granular control of feature logging disconnected from
the cluster log level.
Testing
Integration testbed:
1. Run fdbserver standalone
2. Run external KMS http-server to serve encryption key fetch requests
2023-03-08 09:49:43 -08:00
Hui Liu
c43f8b3fdc
Refactor - introduce BlobRestoreController for APIs to manage restore state ( #9616 )
2023-03-08 07:50:30 -08:00
Johannes M. Scheuermann
c6eca3f398
Format code
2023-03-08 08:33:19 +01:00
Johannes M. Scheuermann
1550f3c596
Make use of precomputed exclude check
2023-03-08 08:19:42 +01:00
Johannes M. Scheuermann
bae627f016
Fix syntax
2023-03-08 08:19:42 +01:00
Johannes M. Scheuermann
db8c60c80f
Don't block the exclusion of stateless processes by the free capacity check
2023-03-08 08:19:41 +01:00
A.J. Beamon
de5f2c0fee
Disallow cluster names that start with the `\xff` byte
2023-03-07 11:46:34 -08:00
Steve Atherton
5ff0bc3f87
Merge pull request #9576 from sfc-gh-satherton/storage-configure-refactor
...
Storage and log engine configuration support / refactor a few things.
2023-03-07 02:10:14 -08:00
Steve Atherton
f6747adf89
Move implementations to cpp file.
2023-03-06 18:43:26 -08:00
Jingyu Zhou
0259a243ae
Switch DC if log router peek becomes stuck
...
Trying to a different DC if this happens.
2023-03-06 17:41:56 -08:00
Ata E Husain Bohra
a45de70003
EaR: RESTClient HTTP compliance, fix json request content type ( #9544 )
...
* EaR: RESTClient HTTP compliance, fix json request content type
Description
diff-1: Address review comments
RESTClient is responsible to handle FDB <-> KMS communication
for Encryption and other usecases. By design, it only supports
"secure connection" i.e. "https"; however, it seems there is a
need to expand the module to support "http" connection,
for instance: test and dev deployments for instance.
However, given RESTClient gets involved in handling high
sensitive contents such as: plaintext "encryption cipher
from a KMS", the feature is guarded using
CLIENT_KNOB->REST_KMS_ENABLE_NOT_SECURE_CONNECTION which is
settable using FDBServer command line argument
"--kms-rest-enable_not_secure_connection" (boolean)
Testing
Deployed a standalone fdbserver and communicate with a
simple "http" server
2023-03-06 16:06:03 -08:00
Jingyu Zhou
0d8bde9dcd
Merge pull request #9505 from jzhou77/fix
...
Support multiple key prefix filters for fdbdecode
2023-03-06 15:57:03 -08:00
Chaoguang Lin
7273723a43
Add the hotrange fdbcli command ( #9570 )
...
* Add the hotrange fdbcli command
* Remove the unnecessary state
* Add the doc about the hotrange command
2023-03-06 14:46:52 -08:00
Jingyu Zhou
7a0b3c05b9
Merge pull request #9540 from sfc-gh-huliu/timestamp
...
Report restore phase start time and eta
2023-03-06 14:06:23 -08:00
A.J. Beamon
85c3cf702c
Merge pull request #9584 from sfc-gh-ajbeamon/fix-metacluster-create-error-msg
...
Fix metacluster create error message
2023-03-06 10:30:03 -08:00
A.J. Beamon
ea907f10f5
Print the tenant mode string rather than integer value when reporting that we couldn't create a metacluster
2023-03-06 09:25:50 -08:00
Josh Slocum
e1b620135b
Merge branch 'main' into bg_latency_fixes
2023-03-06 09:23:11 -06:00
Steve Atherton
50d567b5a5
Refactored some parts of database configuration to support log_engine=<name> and storage_engine=<name> and generate these when converting a DatabaseConfig JSON object to a `configure` command. Refactored `fileconfigure` and simulation setup to use the same JSON -> configure function as the same code was copy/pasted to both places but only one has been kept up to date with new features. Renamed Redwood to `ssd-redwood-1` canonically but the experimental name is still supported for backward compatibility.
2023-03-04 20:52:31 -08:00
Jingyu Zhou
df53bcd844
Merge branch 'main' of https://github.com/apple/foundationdb into fix
2023-03-03 20:32:29 -08:00
Hui Liu
b2d497a3b2
Report restore phase start timestamp
2023-03-03 18:09:51 -08:00
Jingyu Zhou
8847e70be0
Merge pull request #9306 from kakaiu/add-physical-shard-meta-data-to-checkpoint
...
Dump checkpoint metadata to sst file
2023-03-03 14:45:50 -08:00
Jingyu Zhou
ca00c9485b
Merge branch 'main' of https://github.com/apple/foundationdb into fix
2023-03-03 11:12:40 -08:00
Xiaoxi Wang
b13b586b71
Merge pull request #9547 from sfc-gh-xwang/feature/main/minReadBand
...
add knob for min read rebalance shard bandwidth
2023-03-03 09:37:23 -08:00
Jingyu Zhou
ee5154f478
Refactor decoder to read file as a whole once
...
To reduce the number of network requests.
2023-03-03 09:32:12 -08:00
Zhe Wang
1766f412bb
address comments
2023-03-03 09:04:26 -08:00
Zhe Wang
e8aced0961
add sampled sample bytes to sst file
2023-03-03 09:04:26 -08:00
Zhe Wang
83a0547281
address comments and add test
2023-03-03 09:04:26 -08:00
Zhe Wang
338e0971a9
address comments
2023-03-03 09:04:25 -08:00
Zhe Wang
e283e067d3
clean and address comments
2023-03-03 09:04:25 -08:00
Zhe Wang
2e68b44579
dump-checkpoint-meta-data-to-sstfile
2023-03-03 09:04:25 -08:00
Josh Slocum
c6a21245d8
also using other GRVs BW already gets for committed version checking
2023-03-02 17:14:17 -06:00
Josh Slocum
57b120e702
Adding grv history so delta files can wait for less time to determine that they're committed
2023-03-02 17:14:17 -06:00
Xiaoxi Wang
26ffcf6b4a
add knob for min read rebalance shard bandwidth
2023-03-02 11:26:27 -08:00
Xiaoxi Wang
010d5590e3
Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/hotRangeDetect
2023-03-02 10:07:17 -08:00
Jingyu Zhou
de89d2cca1
Merge pull request #9521 from sfc-gh-ajbeamon/fix-metacluster-issues
...
Fix some metacluster issues
2023-03-02 10:07:11 -08:00
Jingyu Zhou
ad778cbe5e
Merge branch 'main' of https://github.com/apple/foundationdb into fix
2023-03-02 09:56:30 -08:00
Josh Slocum
0c9397ef22
BW metric improvements for reads and file blocking
2023-03-02 10:57:51 -06:00
Josh Slocum
3861a2249b
increasing bw request timeout
2023-03-02 09:36:47 -06:00
Josh Slocum
40ed365303
fixing checkBlobSubRange to not increase version every retry
2023-03-02 09:34:55 -06:00
Xiaoxi Wang
2d78b126f6
rename splitCount to chunkCount
2023-03-01 21:51:51 -08:00
Xiaoxi Wang
179f0ba71c
new version of getReadHotRanges
2023-03-01 15:55:29 -08:00
A.J. Beamon
533f83b05e
Fix a few more issues in metacluster code and tests:
...
1. Some additional idempotency problems in metacluster tests
2. An assertion that checked that a rename had expected values could fail during concurrent restores, but it would only happen if the transaction itself would fail to commit
3. Tweak the parameters of the MetaclusterRecovery test to try to avoid rare cases of logging too many trace events
2023-03-01 15:31:36 -08:00
A.J. Beamon
544890a6cd
Merge branch 'main' into transaction-debug-logging
2023-03-01 10:09:17 -08:00
A.J. Beamon
2898a95c81
Fix two metacluster issues:
...
1. When retrying the transaction to register a restoring cluster, don't choose a new ID if the current ID matches the one recorded for the restoring cluster
2. A metacluster test was incorrectly handling the case where a transaction was retried with unknown result and had committed successfully
2023-02-28 15:40:04 -08:00
Junhyun Shim
6b26f5a6da
Fix transaction option consistency in TagThrottleInfo getter ( #9513 )
...
* Fix transaction option consistency in TagThrottleInfo getter
Subroutine of getter actor function for throttled and recommended tags
was, upon retry, resetting the transaction object which the caller also uses,
resetting the transaction option and causing a key_outside_legal_range by caller
Also, allowing a subroutine to conditionally, non-trivially modify the passed object
(i.e. transaction reset) is a risky pattern.
Fix: confine subroutine's responsibility to "attempting to" fetch and parse
"autoThrottlingEnabled" key. Let the calling function reset the object if needed.
* Apply Clang format
2023-02-28 23:47:26 +01:00
A.J. Beamon
310fc2ff4e
Merge branch 'main' into transaction-debug-logging
2023-02-28 14:18:51 -08:00
Xiaoxi Wang
26237a291d
update read range reply field
2023-02-28 13:18:57 -08:00
Jingyu Zhou
6c955080e9
Merge pull request #9207 from sfc-gh-jslocum/disable_feed_coalesce
...
disabling feed coalesce for now
2023-02-28 12:32:01 -08:00
Jingyu Zhou
a350a929b9
Merge pull request #9494 from sfc-gh-jslocum/bg_cp_improvements
...
addressing review comments and fixmes in bg commit proxy code
2023-02-28 12:30:58 -08:00
Xiaoxi Wang
8cb2a1553a
add read ops sampler
2023-02-28 12:03:42 -08:00
Josh Slocum
f4308a0f6c
Merge branch 'main' into disable_feed_coalesce
2023-02-28 13:57:21 -06:00
A.J. Beamon
87ac857aeb
Make debug logging functions pure virtual on the transaction interfaces. Rename the function on TraceEvent to be more generic.
2023-02-28 11:11:06 -08:00
Markus Pilman
5bebb5b4aa
Merge pull request #9492 from sfc-gh-vgasiunas/vgasiunas-api-version-defs
...
Centralize definition of API Version for Java, Python and C API
2023-02-28 12:04:02 -07:00
Josh Slocum
301f2fd201
disabling feed coalesce for now
2023-02-28 12:07:12 -06:00
Lukas Joswiak
47fc53ed6e
Adds more detailed mutation logging to commit proxy
...
The commit proxy writes a `ProxyMetrics` trace every 5 seconds. This
event contains a lot of useful information, such as the number of commit
batches that arrived and exited, the number of mutations processed, the
number of bytes those mutations made up, etc.. However, it is difficult
to tell what the workload pattern looks like within these 5 second
intervals when the metrics are being calculated.
This PR adds a new trace, `ProxyDetailedMetrics`, which logs itself
every 100ms. It currently only writes the number of mutations and the
number of mutation bytes that arrived during the 100ms time period. But
it should be easy to add more metrics in the future.
It's possible this increased logging could cause issues. Based off a
simulation run of the `WriteDuringRead` test, I got the following
results:
```
$ rg ProxyDetailedMetrics trace.json | wc -l
6877
$ rg "Roles\": \".*CP.*\"" trace.json | wc -l
11402
$ wc -l trace.json
96147 trace.json
```
So on processes running as a commit proxy, this approximately doubled
the number of lines logged. But relative to the cluster overall, it only
added about 5% overhead.
If we want to reduce this number, one possibility would be to not write
a trace if all the values being written are 0. I'm not sure if this
would help much in production, but in simulation the large majority of
the traces (99%+) consist of zero values.
2023-02-28 09:48:39 -08:00
A.J. Beamon
cb66a76d80
Merge branch 'main' into transaction-debug-logging
2023-02-28 09:21:30 -08:00
Jingyu Zhou
4ea70b1f59
Merge pull request #9512 from sfc-gh-mpilman/bugfixes/remove-lockid-from-txnstatestore
...
Don't store lockid in txnStateStore
2023-02-28 09:16:45 -08:00
A.J. Beamon
0abb33a9a5
Add the ability to print messages or log trace events based on a transaction's result
2023-02-28 09:06:54 -08:00
Ata E Husain Bohra
2db1da26d9
EaR: Update ApiWorkload to validate encryption at-rest guarantees ( #9466 )
...
* EaR: Update ApiWorkload to validate encryption at-rest guarantees
Description
FDB encryption data at-rest guarantees if cluster is configured with feature
enabled, all data written to persistent disks shall be "encrypted". Given FDB
maintains multiple persistent storages during lifecycle of the data, the patch
proposes a scheme to validate the invariant via "simulation testing"
Patch proposes updating ApiCorrectness workload to do the following:
1. Client supplied params and/randomly enable the validation feature.
2. Validation when enabled, allows injecting a known "marker string"
to workload generated Key and Value data patterns.
3. On shutdown, if the validation is enabled, all test files are
scanned for the known "marker" pattern.
Simulation tests are already capable of doing the following:
1. Randomly select TenantMode (disabled/optional/required)
2. Randomly select EncryptionAtRestMode (cluster_aware/domain_aware)
Hence, the updates test all possible combinations are validated. Also,
'defaultTenant' is present to cover 'domain_aware' encryption use cases.
Testing
devRunCorrectness
devRetryCorrectness - ApiCorrectness & EncryptedBackupCorrectness
2023-02-27 21:40:46 -08:00
Markus Pilman
871a9676ea
Don't store lock id in txnStateStore
2023-02-27 21:25:42 -07:00
Markus Pilman
20874d8575
Merge pull request #9502 from sfc-gh-ajbeamon/metacluster-tenant-lock-support
...
Metacluster tenant lock support
2023-02-27 21:19:03 -07:00
Jingyu Zhou
5ac526a3e5
Merge pull request #9474 from sfc-gh-xwang/feature/main/readaware
...
enable read-aware DD by default and write release notes/doc
2023-02-27 20:04:04 -08:00
Jingyu Zhou
1313a7fa25
Use KeyspaceSnapshotFile to filter range files
2023-02-27 19:41:08 -08:00
Jingyu Zhou
842d485862
Merge pull request #9402 from yao-xiao-github/main
...
Add shard consistency validation.
2023-02-27 17:05:30 -08:00
Jingyu Zhou
f414cd0ed8
Merge pull request #9486 from sfc-gh-ajbeamon/metacluster-management-concurrency-restore-support
...
Add restore to the metacluster management concurrency workload
2023-02-27 17:05:05 -08:00
A.J. Beamon
469e77158f
Add metacluster support for tenant locking
2023-02-27 16:53:13 -08:00
Jingyu Zhou
29a406948a
Merge pull request #9370 from sfc-gh-mpilman/features/tenant-lock-fdbcli
...
fdbcli commands for tenant lock
2023-02-27 16:18:51 -08:00
Xiaoxi Wang
eeade33c30
Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/readaware
2023-02-27 14:18:44 -08:00
Russell Sears
bcc05b1058
Improve support for prebuilt boost
2023-02-27 15:38:58 -06:00
Zhe Wu
7304e5cad0
Merge pull request #9485 from halfprice/zhewu/backup-size-estimate
...
Enhance fdbbackup query command to estimate data processing from a specific snapshot to a target version
2023-02-27 13:35:54 -08:00
Vishesh Yadav
dd0ea8b0cf
Clang-format
2023-02-27 13:10:19 -08:00
Vishesh Yadav
3e6e31ad0b
Use the RangeMapFilters
2023-02-27 13:08:55 -08:00
Jingyu Zhou
dd4bc82862
Refactor code
2023-02-27 13:06:01 -08:00
Jingyu Zhou
46fce2710e
Use RangeMap for backup agent filtering
...
This is more efficient than going through ranges one by one.
2023-02-27 12:21:52 -08:00
A.J. Beamon
a44c4c2e2e
Merge pull request #9478 from sfc-gh-ajbeamon/assert-comparison-all-types
...
Allow performing assert comparisons with any traceable type
2023-02-27 11:03:27 -08:00
Xiaoxi Wang
da7d441436
Merge branch 'main' of https://github.com/apple/foundationdb into feature/main/readaware
2023-02-27 09:09:35 -08:00
Josh Slocum
716a9c3817
addressing review comments and fixmes in bg commit proxy code
2023-02-27 10:51:47 -06:00
Ankita Kejriwal
f7108958bf
Merge pull request #9449 from sfc-gh-akejriwal/exclusion
...
Improve space estimation in checkExclusion()
2023-02-27 08:31:52 -08:00
Markus Pilman
a0e347c7ba
Merge remote-tracking branch 'origin/main' into features/tenant-lock-fdbcli
2023-02-27 09:09:20 -07:00
Vaidas Gasiunas
ba726fac87
Replace hardcoded API version checks for 720 and 730
2023-02-27 16:18:01 +01:00
Steve Atherton
674c105050
Merge pull request #9473 from sfc-gh-etschannen/feature-change-feed-lock
...
Replace fetchKeysParallelismFullLock to speed up fetch keys in idle clusters
2023-02-26 23:18:43 -08:00
Zhe Wu
ffa3467098
Explicitly using min and max restorable version from backup description in query command in stead of going throw snapshots
2023-02-26 12:17:07 -08:00
A.J. Beamon
364bf062cb
A few fixes to prevent simultaneous a simultaneous restore and removal from both making progress:
...
1. Change the cluster ID each time the restore registers the cluster
2. Handle commit unknown result in the removal purge
3. Delete the active restore ID when a removal is first recorded rather than at the end of the removal
4. Delete any existing active restore IDs when registering a cluster in the management cluster
2023-02-25 22:52:02 -08:00
A.J. Beamon
e6a4b0489e
Run a couple restore transactions using the correct runRestoreManagamentTransaction function in order to verify that the restore is still valid
2023-02-25 19:59:59 -08:00
A.J. Beamon
8e231f8809
During a dry-run restore, it is possible that the tenants being restored are modified concurrently. Handle this case with an output message.
2023-02-25 19:59:29 -08:00
A.J. Beamon
b869f5a6ac
Don't validate the configuration sequence number of a tenant during tenant reconciliation until we've started a transaction and confirmed the restore is still valid
2023-02-25 19:58:41 -08:00
A.J. Beamon
b1111ce112
When renaming a tenant during a restore, use the rename destination for the tenant if it has one
2023-02-25 19:57:59 -08:00
A.J. Beamon
3aca47e600
When reconciling tenants during a restore, if a tenant is in the removing state on the management cluster, remove it from the data cluster
2023-02-25 19:54:30 -08:00
A.J. Beamon
040d44927b
Store the rename destination for tenant movements in the crluster tenant index with an ID of -1. Use this to filter out tenant aliases when modifying the tenant count during a tenant purge.
2023-02-25 19:51:21 -08:00
Zhe Wu
8a88df0e91
Query backup size from a specific snapshot
2023-02-25 17:38:27 -08:00
Zhe Wu
a94dd3a430
Fix fdbbackup query returning earliest version
2023-02-25 16:44:45 -08:00
A.J. Beamon
8c3ee768a2
Add an option to allow exceeding the tenant group capacity limit when changing tenant configuration
2023-02-24 21:01:36 -08:00
A.J. Beamon
1c71056e26
Merge pull request #9479 from sfc-gh-nwijetunga/nim/enforce-metacluster-tenant-mode
...
Enforce Disabled Tenant Mode in Metacluster
2023-02-24 19:27:57 -08:00
Ankita Kejriwal
99a1fb52e3
Prevent divison by 0
2023-02-24 18:36:55 -08:00
Jingyu Zhou
6b121de6a6
Merge pull request #9464 from jzhou77/fix
...
Add exclude to fdbcli's configure command
2023-02-24 16:31:02 -08:00
Nim Wijetunga
c1087187d1
format change
2023-02-24 15:43:40 -08:00
Nim Wijetunga
c8b7cff10c
fix api test
2023-02-24 15:26:59 -08:00
Josh Slocum
6187811f71
Reworking getBlobGranuleRanges to also use commit proxy rpc for authz, and adding test ( #9470 )
2023-02-24 17:15:32 -06:00
Nim Wijetunga
eca98afcb0
metacluster check tenant mode
2023-02-24 13:59:54 -08:00
Evan Tschannen
8872e5a462
Merge pull request #9347 from sfc-gh-etschannen/feature-change-feed-cache
...
added a disk to blob workers
2023-02-24 13:59:03 -08:00
A.J. Beamon
4a38bb4c3f
Allow performing assert comparisons (e.g. ASSERT_EQ) with any traceable type
2023-02-24 12:53:01 -08:00
Xiaoxi Wang
998a5b7c0e
enable read-aware DD by default and write release notes/doc
2023-02-24 11:11:25 -08:00
Evan Tschannen
f3673d808b
Replaced the fetchKeysParallelismFullLock with a lock specifically for change feeds to avoid blocking fetches on idle clusters
2023-02-24 10:59:35 -08:00
Jingyu Zhou
9a257a60a4
Address review comments
2023-02-24 10:47:32 -08:00
Markus Pilman
6c15506c36
Fixed tests
2023-02-24 11:32:37 -07:00
A.J. Beamon
03fbc59bb1
Merge pull request #9461 from sfc-gh-ajbeamon/metacluster-concurrent-restore-testing
...
Metacluster concurrent restore testing
2023-02-24 09:13:51 -08:00
Markus Pilman
ee9d511d16
Merge pull request #9463 from sfc-gh-mpilman/buildcop/2023-02-23/bugfixes/arm-awssdk
...
Fix build issue with awssdk_target
2023-02-24 09:20:08 -07:00
Nim Wijetunga
29819b0645
Change Feed Bug Fix + Encryption Asserts ( #9457 )
...
* add encryption asserts
* modify function name
* address pr comments
* address pr comments
* Trigger Build
2023-02-23 19:33:25 -08:00
A.J. Beamon
2b25cfef8b
Merge branch 'main' into metacluster-concurrent-restore-testing
2023-02-23 16:06:47 -08:00
Jon Fu
33f8e90f9f
Split tenant group metadata ( #9446 )
...
* initial commit to split tenant group metadata
* attempt to fix merge errors
* fix compile errors and adjust existing tests
* fix infinite loop and extra ACTOR tag
* direct assignment instead of store
* direct assign instead of store (missed a few)
2023-02-23 18:11:49 -05:00
Jingyu Zhou
1f1dc5e768
Allow a comma separated list of excluded addresses
2023-02-23 14:29:08 -08:00
Jingyu Zhou
6ac8720364
Add exclude to fdbcli's configure command
...
Right now this only allows one server address being excluded. This is useful
when the database is unavailable but we want the recruitment to skip some
particular processes.
Manually tested the concept works with a loopback cluster.
2023-02-23 14:28:20 -08:00
Markus Pilman
c1f80fe471
Merge remote-tracking branch 'origin/main' into features/tenant-lock-fdbcli
2023-02-23 15:16:14 -07:00
Jingyu Zhou
792950dbdc
Merge pull request #9434 from sfc-gh-huliu/splitmetrics
...
Implement SplitMetric pagination in blob migrator
2023-02-23 14:10:27 -08:00
Markus Pilman
1862e65415
Fix build issue with awssdk_target
2023-02-23 15:05:17 -07:00
Markus Pilman
8759fd8f12
Fix refactoring mistake
2023-02-23 14:41:27 -07:00
A.J. Beamon
54955d54f2
Don't allow repopulating from a management cluster if there is another ID registered for the same cluster. Instead, the cluster must be unregistered first before repopulating from it. Also improves a trace event.
2023-02-23 13:28:10 -08:00
A.J. Beamon
c2d28377af
Set the restore ID in the data cluster after marking the cluster restoring in the management cluster
2023-02-23 13:28:10 -08:00
A.J. Beamon
6adccdafa9
Add a conflict range on the active restore ID when setting it
2023-02-23 13:28:10 -08:00
A.J. Beamon
537834ef00
Properly initialize API version of simulated MVC clusters when calling openDatabase
2023-02-23 13:28:10 -08:00
A.J. Beamon
06fe00544a
Remove TenantMapEntry <-> MetaclusterTenantMapEntry conversion constructors and use named functions instead
2023-02-23 13:28:10 -08:00
Markus Pilman
193e517cc4
Address review comments and move lock ID into TenantMapEntry
2023-02-23 14:25:36 -07:00
Markus Pilman
efc5bf9ee8
Merge pull request #9456 from sfc-gh-ajbeamon/smaller-tenant-in-txn-state-store
...
Store a smaller tenant object in the txn state store
2023-02-23 14:00:12 -07:00
Evan Tschannen
cf3a4e6161
Merge branch 'main' into feature-change-feed-cache
2023-02-23 10:16:13 -08:00
Jingyu Zhou
3d8b8a2a05
Merge pull request #9450 from sfc-gh-ahusain/ahusain-misc-fixes
...
EaR: RESTClient and EKP changes to handle unreachable external KMS
2023-02-23 10:04:12 -08:00
Evan Tschannen
a581a55452
ensure a worker cannot run multiple blob worker roles
2023-02-23 09:51:26 -08:00
A.J. Beamon
dd650215d4
Store a smaller tenant object in the txn state store
2023-02-23 09:29:33 -08:00
Vaidas Gasiunas
402f618180
Default transaction options for report_conflicting_keys and used_during_commit_protection_disable ( #9441 )
...
* Introducing default transaction options for report_conflicting_keys and used_during_commit_protection_disable, set the latter option always in Java bindings
* Reformatting TransactionIntegrationTest.java
* Update description of transaction_report_conflicting_keys option
* Remove dependency between mock and real database implementation in RangeQueryTest.java
* Update generated.go after changing desciption of an option
* Small improvements of the TransactionIntegrationTest code
2023-02-23 18:05:01 +01:00
Ata E Husain Bohra
7d079690d4
Merge branch 'main' into ahusain-misc-fixes
2023-02-22 18:11:11 -08:00
Ata E Husain Bohra
1f7ee9437f
EaR: RESTClient and EKP changes to handle unreachable external KMS
...
Description
Two major changes proposed are:
I)
Used following setup for testing:
1. Run `fdbserver` locally.
2. Run a mock python based HTTP server (encryption endpoints not implemented)
Expectation was RESTClient code should go in loop trying to establish connection
to the desired encryption endpoint. However, observation was the code loops for
one cycle and followup cycle SEGV while printing a log using RESTUrl object which
is obtained as a 'pointer' from the caller. Update the code to use RESTUrl object
instead of the pointer.
II) In above setup, KMSConnector would throw 'encrypt_key_fetch_failed' error
which wasn't handled by EKProxy, hence, causing the service to terminate. Add
code to re-throw the error to the caller.
Testing
2023-02-22 17:15:34 -08:00
Ankita Kejriwal
64ac92bd4b
Improve comments as per review
2023-02-22 17:13:44 -08:00
A.J. Beamon
9b906d9b3d
Merge pull request #9447 from sfc-gh-ajbeamon/metacluster-restore-fixes
...
Metacluster restore fixes
2023-02-22 17:07:19 -08:00
Hui Liu
0fba65a3cd
Implement SplitMetric pagination in blob migrator
2023-02-22 16:00:49 -08:00
Ankita Kejriwal
8aafbfe6cc
Improve space estimation in checkExclusion()
2023-02-22 15:58:25 -08:00
A.J. Beamon
87cb21be06
Merge pull request #9310 from sfc-gh-mpilman/features/tenant-lock2
...
Tenant Lock
2023-02-22 15:18:14 -08:00
A.J. Beamon
33431f062d
Add some trace events, use a more appropriate error, and improve a check of allocated tenant groups
2023-02-22 14:39:51 -08:00
A.J. Beamon
91df95e816
If registering a cluster fails on the data cluster, try to rollback the registration on the management cluster
2023-02-22 12:30:50 -08:00
A.J. Beamon
587c47832c
Swap the register and remove cluster opertations (no change to the code)
2023-02-22 12:30:50 -08:00
A.J. Beamon
6dad4c5c60
Improve trace event
2023-02-22 12:30:50 -08:00
A.J. Beamon
c4ef32c77a
Check for registraton tombstones when registering a cluster during restores
2023-02-22 12:30:50 -08:00
A.J. Beamon
92011b9339
When erasing tenants during a force removal, tenants being renamed were double counted against the tenant count
2023-02-22 12:30:50 -08:00
A.J. Beamon
1b9d4a8d5a
Registering a restoring data cluster didn't detect cases where it should fail, such as if the cluster already existed
2023-02-22 12:30:50 -08:00
A.J. Beamon
7da247cde2
Prevent operations on clusters that are restoring
2023-02-22 12:29:32 -08:00
A.J. Beamon
ec79ecce73
Add a boolean parameter for ForceRemove; rename ForceJoinNewMetacluster to ForceJoin
2023-02-22 12:26:24 -08:00
Markus Pilman
8695fc15fc
Merge remote-tracking branch 'origin/main' into features/tenant-lock2
2023-02-22 13:12:23 -07:00
A.J. Beamon
006a2ead6f
Merge branch 'main' into check-metacluster-restore-dryrun
2023-02-22 11:16:45 -08:00
A.J. Beamon
18ae6bda12
Fix formatting issue
2023-02-22 11:12:42 -08:00
Josh Slocum
9cd0f32e87
Fixing several metrics issues in blob workers ( #9426 )
...
* fixing int vs int64 data type, and fixing cause of incorrect request counter
* fixing incorrect count of mutation bytes buffered on granule cancel
2023-02-22 11:08:12 -06:00
Steve Atherton
23df46773d
Merge pull request #9422 from sfc-gh-satherton/client-read-options
...
Add transaction option definitions for read priority and read cache
2023-02-22 09:00:25 -08:00
Josh Slocum
bf97c3dbce
adding java tenant blob management test and fixing bug it found ( #9428 )
2023-02-22 10:52:26 -06:00
A.J. Beamon
d18fffd251
Fix one item that the metacluster restore dry-run testing turned up as changing
2023-02-22 08:41:16 -08:00
A.J. Beamon
cb57541c98
Add testing that the metacluster restore dry-run mode doesn't change anything
2023-02-22 08:41:16 -08:00
Steve Atherton
a21b2fe9f9
Simplified read priority option to three separate options for normal/low/high priority.
2023-02-21 22:48:38 -08:00
Steve Atherton
5969616af8
Merge commit '6de85e7cd8e9dd74a571de9e04679e669bcbb5b6' into client-read-options
2023-02-21 20:46:20 -08:00
Jon Fu
ff7174065f
Merge branch 'main' of github.com:apple/foundationdb into split-tenant-metadata
2023-02-21 14:16:13 -08:00
Steve Atherton
bb4fb3d81d
Merge pull request #9419 from sfc-gh-satherton/page-rebuild-fix
...
Optimize/fix node rebuild vs update trigger in Redwood
2023-02-21 13:49:14 -08:00
Markus Pilman
15d8548c0e
Merge remote-tracking branch 'origin/main' into features/tenant-lock2
...
# Conflicts:
# fdbserver/ApplyMetadataMutation.cpp
# fdbserver/storageserver.actor.cpp
2023-02-21 13:39:35 -07:00
Evan Tschannen
e383cdfddd
fixed merge conflict
2023-02-21 12:18:07 -08:00
Evan Tschannen
8129381689
merge in main
2023-02-21 12:06:35 -08:00
Jon Fu
37fa579587
fix compile issues
2023-02-21 11:46:11 -08:00
Ata E Husain Bohra
4652eaf85d
EaR: Reduce logging level for RESTClient ( #9429 )
...
Description
Reduce the logging level to SevDebug for RESTClient operation
Testing
compiles
2023-02-21 11:43:28 -08:00
Jon Fu
2d74d10a91
use MetaclusterAPI namespace over TenantAPI for TenantState
2023-02-21 11:33:36 -08:00
Josh Slocum
958f3b531b
Plumbing blob worker mapping through commit proxy like storage server ( #9401 )
...
* Plumbing blob worker mapping through commit proxy like storage server mapping
* review comments
* formatting
2023-02-21 13:21:44 -06:00
Jon Fu
da688f9b77
update documentation
2023-02-21 11:13:35 -08:00
Jingyu Zhou
81f8c360db
Merge pull request #8811 from sfc-gh-tclinkenbeard/expose-tag-throttled-duration
2023-02-21 10:47:35 -08:00
Jon Fu
9e01cffef0
fix some merge conflicts and address review comments
2023-02-21 10:29:36 -08:00
Jon Fu
428eb07766
Merge branch 'main' of github.com:apple/foundationdb into split-tenant-metadata
2023-02-21 10:11:35 -08:00
Junhyun Shim
2497aa5701
Clamp GetKeyServerLocations result to tenant prefix ( #9424 )
2023-02-21 18:55:24 +01:00
Steve Atherton
246fd1dd4e
Remove auto cache option since there is no meaningful implementation of this yet. Change places using trState in a native Transaction to set cache mode or Low/Normal/High priority to use the new transaction options instead.
2023-02-21 02:50:30 -08:00
Steve Atherton
9bf28899d4
Add transaction option definitions for read priority and read server side cache mode.
2023-02-21 00:54:35 -08:00
Ata E Husain Bohra
fa60f1b4fa
RESTClient: Initialize RESTClient connection pool instance ( #9414 )
...
Description
Patch fixes an issue where new connection for a corresponding
'connectKey' isn't getting added to the connectionPoolMap.
Testing
Standlone fdbserver triggering RESTClient connection path
2023-02-20 19:32:10 -08:00
sfc-gh-tclinkenbeard
398079db3a
Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration
2023-02-20 17:54:06 -08:00
Evan Tschannen
4f9e86b0a4
fixed two bugs that prevented the blob manager from properly loading worker affinity
2023-02-20 16:47:26 -08:00
Josh Slocum
bfb3ffc509
added c and java apis for granule flush ( #9412 )
2023-02-20 10:28:11 -06:00
Steve Atherton
e169e65021
Fix to BTree node rebuild logic - rebuild when imbalance hits a limit controlled by a new knob.
2023-02-19 16:40:28 -08:00
Yi Wu
eac757d186
EaR: cleanup encryption knobs ( #9386 )
...
Changes:
* Cleanup all encryption knobs
* Update simulated cluster to randomly enable encryption with higher probability
2023-02-18 13:18:20 -08:00
A.J. Beamon
c2ca21cdb4
Merge pull request #9417 from sfc-gh-ajbeamon/fix-restore-id-logic
...
Fixes to metacluster restore ID
2023-02-18 06:35:13 -08:00
A.J. Beamon
3163201201
Restore ID fixes: we weren't generating a restore ID; we weren't setting the restore ID on the management cluster in some restore modes; it is possible in some test scenarios to encounter a restore conflict, in which case we need to retry.
2023-02-17 21:20:06 -08:00
Ankita Kejriwal
b74a35a986
Enable STORAGE_QUOTA_ENABLED knob by default
2023-02-17 21:14:46 -08:00
sfc-gh-tclinkenbeard
1aef6cb5f7
Merge remote-tracking branch 'origin/main' into expose-tag-throttled-duration
2023-02-17 20:41:59 -08:00
Jon Fu
762cbcdc5d
unconditionally set restore id
2023-02-17 14:41:41 -08:00
Jon Fu
edb7a51b7e
Revert "let client supply restore id"
...
This reverts commit 5fe32b8503
.
2023-02-17 14:37:22 -08:00
Jon Fu
5fe32b8503
let client supply restore id
2023-02-17 14:01:58 -08:00
Hui Liu
bdba85a86f
Merge pull request #9303 from sfc-gh-huliu/logtruncation
...
Truncate mutation logs after flushing blob granules
2023-02-17 12:52:45 -08:00
Jon Fu
0d7b6d626b
update restoreCluster test to account for conflicting_restore
2023-02-17 10:36:28 -08:00
Josh Slocum
6c2fb13173
adding wait parameter to blobbify api ( #9360 )
...
* adding wait parameter to blobbify api
* formatting
* fixing comment style
* fixing bug and adding debugging
* adding blob ranges unit test
* testing both blobbify cases in cancel
* formatting
* switch to explicit blocking api instead of boolean flag
* remove comments
* format
2023-02-17 12:20:53 -06:00
Hui Liu
aa1d983132
Truncate logs after force-flushing cold blob granules
2023-02-17 10:17:04 -08:00