Hui Liu
788f3420c8
Fix tenant map update race when applying mutations ( #10557 )
2023-06-27 10:31:58 -07:00
Jingyu Zhou
321b77d809
Add 7.3 protocol version to transaction_profiling_analyzer.py
...
Manually tested the tool to be working.
2023-06-27 10:19:10 -07:00
Yi Wu
92fc089eb0
FDBCORE-5211: Fix worker server handleIOErrors heap-use-after-free ( #380 ) ( #10555 )
2023-06-26 16:35:33 -07:00
Jingyu Zhou
e5728fcb84
Fix the version for deprecated keys ( #10553 )
2023-06-26 13:49:41 -07:00
A.J. Beamon
da382372f0
Merge pull request #10550 from sfc-gh-ajbeamon/speedup-backup-correctness-clean
...
Speed up BackupCorrectnessClean
2023-06-26 09:48:18 -07:00
A.J. Beamon
ecf8a5b3f6
Merge pull request #10542 from sfc-gh-ajbeamon/reduce-mutation-log-reader-correctness-duration
...
Decrease the number of records in MutationLogReaderCorrectness
2023-06-26 08:55:34 -07:00
He Liu
6337125712
Several minor improvements for ShardedRocksDB ( #10520 )
...
* Terminate DD if SHARD_ENCODE_LOCATION_METADATA is not enabled and storage_engine_type is ShardedRocksDB.
* Fixed Error in non-main thread.
* Minor improvements.
2023-06-24 16:07:14 -07:00
Zhe Wang
37689af3f2
Detect inconsistency of KeyServers and ServerKeys in real time ( #10484 )
...
* add framework
* add audit logic
* refactor audit loc metadata
* address comments
* add realtime audit timeout, add post validation logic
* fix input empty range to compareKeyServersAndServerKeys
* add context for auditKeyServersAndServerKeysInRealTime
* focus on moveShard
* remove space
* address comments
* cleanup
* add audit cleanup
* make validateRangeAssignment simple
* change trace name
* add shardAssigned
* stop DD when inconsistency detected
* fix ci
* small fix
* revert ss and auditUtl and simplify rt audit
* cleanup ss
* tiny change
* address comments and refactor code
* make auditLocationMetadataPreCheck retriable
* handle actor cancel in auditLocationMetadataPreCheck
* rm timeout and add new protection for failure of audit
* fix bugs
* import dataMoveId to validation
* improve trace event
* carefully propagate error and stop DD
* tiny fix
* small change
* remove a state var
* nit
* clean comments
* fmt
2023-06-23 17:40:21 -07:00
A.J. Beamon
155c03f6fe
Decrease transaction rate of backup correctness clean to speed it up
2023-06-23 16:14:03 -07:00
Lukas Joswiak
8b8da372d5
Fix issue with inconsistent coordinator disk queue
...
An inconsistent coordinator disk queue could cause repeated reboots of
the coordinator process. In simulation, this could cause stuck runs if
another role was running on the coordinator process, as the other role
would also repeatedly reboot (specifically stateful roles like tlogs).
See the comment included in this commit for much more detail.
2023-06-23 15:37:31 -07:00
A.J. Beamon
09a0a60c07
Merge pull request #10546 from sfc-gh-ajbeamon/speedup-dr-upgrade
...
Speed up DR upgrade tests
2023-06-23 12:00:42 -07:00
A.J. Beamon
1af1346cde
Change a buggify and decrease the number of transactions per second so that the dr upgrade test runs faster
2023-06-23 09:03:34 -07:00
A.J. Beamon
daf20abeee
Decrease the number of records in MutationLogReaderCorrectness to decrease its duration
2023-06-22 13:37:35 -07:00
Xiaoxi Wang
7d3cc86860
check serverList before update storage metadata ( #10540 )
2023-06-22 12:02:09 -07:00
Josh Slocum
356d0030a4
fixed consistency scan corruption check when worker is hard-killed in the middle, and improved debugging for stuck consistency scan ( #10516 )
2023-06-22 11:54:08 -05:00
A.J. Beamon
b5497bdc3e
Merge pull request #10529 from sfc-gh-ajbeamon/test-metacluster-use-after-restore
...
Test use of the metacluster after a restore
2023-06-21 18:50:18 -07:00
Ata E Husain Bohra
370e39fbdb
Update SimKmsVault unit test assert checking max encryption keys ( #10535 )
...
Description
SimKmsVault unit test when run as part of simulation Random test,
based on the test order, SimKmsVaultKeyCtx can be initialized as
part of some other test (FlowSingleton).
Update the test to handle the scenario.
Testing
devRunCorrectness - 100K
2023-06-21 17:17:21 -07:00
Zhe Wu
8c796e6ff9
Merge pull request #10534 from halfprice/zhewu/gray-failure-recovery-cc
...
Cluster controller remove recovered peer in gray failure
2023-06-21 14:54:26 -07:00
Zhe Wu
b89a50a37d
Cluster controller remove recovered peer in gray failure
2023-06-21 13:02:59 -07:00
A.J. Beamon
a2302e45ec
Merge pull request #10522 from sfc-gh-ajbeamon/fix-mapped-range-assert
...
Fix check in getExactRange that determines whether we can return early
2023-06-21 10:24:52 -07:00
Zhe Wu
d1bafa7514
Merge pull request #10528 from halfprice/zhewu/nit-fixing
...
Nit fixing
2023-06-21 10:21:47 -07:00
A.J. Beamon
728b847bd6
Merge pull request #10526 from sfc-gh-yajin/mark-cp-rare-1
...
SNOW-XXXXXX Mark a rare code probe as rare
2023-06-21 09:41:29 -07:00
Zhe Wu
36befca2ba
Nit fixing
2023-06-21 09:25:51 -07:00
A.J. Beamon
cc68320333
Add testing that a metacluster can be used after a restore. Fix some bugs found by this related to the restore ID and tenant tombstones.
2023-06-21 08:59:22 -07:00
Zhe Wu
aac6d95182
Merge pull request #10525 from halfprice/zhewu/turn-on-batch-empty-commit-by-default
...
Turn on batching empty peek by default
2023-06-21 07:57:17 -07:00
Zhe Wu
f3e95f6698
Merge pull request #10521 from halfprice/zhewu/gray-failure-recovery
...
Worker health monitor tracks recovered peers
2023-06-21 07:56:55 -07:00
Yanqin Jin
75ab773819
SNOW-XXXXXX Mark a rare code probe as rare ( #415 )
2023-06-20 21:56:54 -07:00
Zhe Wu
514561e167
Turn on batching empty peek by default
2023-06-20 20:24:20 -07:00
Zhe Wu
8eb526684a
Merge pull request #10437 from halfprice/zhewu/one-shard-per-physical-shard
...
Add an option to limit the number of shard per team
2023-06-20 19:43:31 -07:00
Xiaoge Su
c5f09b0921
fixup! Reformat source
2023-06-20 18:07:05 -07:00
Xiaoge Su
de7db7ba14
fixup! reformat source
2023-06-20 18:07:05 -07:00
Xiaoge Su
63dde188dd
Report missing attribute in StorageServer status
2023-06-20 18:07:05 -07:00
A.J. Beamon
75ec56bffb
When redoing a key location request, wait until after we've checked whether we've satisfied our min rows
2023-06-20 16:02:12 -07:00
Zhe Wu
31de83ccc1
cleanup gray failure recovery documentation
2023-06-20 13:30:03 -07:00
Zhe Wu
4741d7032f
Revert logic to test testing logic
2023-06-20 12:40:22 -07:00
Zhe Wu
ef638cd6a6
Worker health monitor keeps track of recovered peer
2023-06-20 12:32:29 -07:00
Evan Tschannen
902d87a72c
Merge pull request #10515 from sfc-gh-etschannen/fix-ikeyvaluestore
...
fix IKeyValueStore include
2023-06-20 06:28:23 -07:00
Hui Liu
af20493ad0
Move lastFlushTs to BlobGranuleBackupConfig ( #10505 )
2023-06-16 16:12:10 -07:00
Jefferson Zhong
13853c9f89
Move stepSize knob from ClientKnobs to ServerKnobs
2023-06-16 14:48:11 -07:00
Evan Tschannen
60cbc7ec8d
fix formatting
2023-06-16 13:43:25 -07:00
Evan Tschannen
ef682d304e
fix IKeyValueStore include
2023-06-16 13:28:40 -07:00
A.J. Beamon
ac30b3c84f
Merge pull request #10507 from sfc-gh-yajin/reset-tenant-state
...
Support restoring a cluster with a tenant in the error state
2023-06-16 09:35:03 -07:00
Jingyu Zhou
459172499a
Merge pull request #10508 from sfc-gh-jslocum/fix_empty_bg_flush
...
fix flushing empty range
2023-06-16 08:36:15 -07:00
Josh Slocum
a7ba6fde3c
fix flushing empty range
2023-06-16 09:26:08 -05:00
Yanqin Jin
626a8a1a5f
SNOW-804199 Support restoring a cluster with a tenant in the error state ( #357 )
...
If we restore a cluster and a previously created tenant was not included in the backup, then the tenant will be marked in an error state on the management cluster. It is then up to the operator to resolve the error, generally by deleting the tenant and recreating it if needed.
There is, however, the possibility that we restored a backup that was older than we wanted, and a newer backup would have the tenant. If we tried to restore the newer backup, it would not leave the previously missing tenant in a fully usable state.
We need to have a way to deal with this case. One option is to allow us to clear the error state of a tenant, and that can be performed before (or maybe even after) the second restore.
Test plan:
Joshua test
100K ensemble: 20230613-225414-yajin-439d13ef3c6b3afd fail=0
2023-06-15 22:23:46 -07:00
Jay Zhuang
5130fe99d3
Fix RandomStringSetGenerator may not generate enough unique key ( #10503 )
...
For small key set, RandomStringSetGenerator may not try enough random
key to generate unique key set.
Increase the retry for smaller key set number.
2023-06-15 17:23:13 -07:00
Josh Slocum
6fb225c68b
Reset connection idle time when restarting connection monitor ( #10495 )
2023-06-15 13:37:00 -05:00
Josh Slocum
367088b831
made blob metadata load lazily from EKP ( #10463 )
2023-06-15 13:36:52 -05:00
Josh Slocum
01ce1ab24d
fix consistency scan ubsan issue and replication factor calculation ( #10492 )
2023-06-15 08:10:49 -05:00
Zhe Wu
5c8a163c72
Update main branch to 7.4 ( #10459 )
...
* Update main branch to 7.4
* Update API version to 740
* Makes fdb_c_client_config_tests.py passing after API version update
* Remove from_7.3.0_until_7.4.0 and add from_7.3.0
* Update tests in fdb_c_client_config_tests.py
2023-06-15 10:19:39 +02:00