Commit Graph

27121 Commits

Author SHA1 Message Date
Yao Xiao 67a588380e
shard size log (#11342) 2024-04-29 13:42:19 -07:00
Pierre Zemb d0eb68bd25 improve devcontainer 2024-04-29 13:04:27 -05:00
Pierre Zemb ab0cce0be0 Create devcontainer.json 2024-04-29 13:04:27 -05:00
Yao Xiao 99910100a5 versoin upgrade 2024-04-26 14:53:01 -05:00
Jingyu Zhou 2ba121cb12
Merge pull request #11334 from sbodagala/main
Disable replica consistency check related knob
2024-04-26 11:17:24 -07:00
Jingyu Zhou 773fb951b3
Merge pull request #11336 from kakaiu/dcc-assert-false
Fix assert false in consistency check urgent
2024-04-26 11:15:44 -07:00
Aaron Molitor 0f1dd6e4ea update go bindings build to play nice with golang 1.22 2024-04-25 15:04:36 -05:00
Zhe Wang 848b9c5b13 fix dcc assert false 2024-04-23 12:47:48 -07:00
Sreenath Bodagala bd68263558 - Disabe replica consistency check related knob 2024-04-22 21:53:28 +00:00
Yao Xiao 9789c7f4ff
async io (#11325) 2024-04-22 14:20:11 -07:00
dependabot[bot] 73ad0d1361
Bump golang.org/x/net from 0.17.0 to 0.23.0 in /fdbkubernetesmonitor (#11322)
Bumps [golang.org/x/net](https://github.com/golang/net) from 0.17.0 to 0.23.0.
- [Commits](https://github.com/golang/net/compare/v0.17.0...v0.23.0)

---
updated-dependencies:
- dependency-name: golang.org/x/net
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-22 07:00:13 +01:00
Zhe Wang 314f4c41c7
Fix ACS mutation bug and improve accumulative checksum (#11319)
* enable acs by default

* code clean

* improve ACS code

* nits

* nits

* fix data corruption issue triggered by acs mutation
2024-04-20 01:31:55 -07:00
Jingyu Zhou 55bbf4687e
Merge pull request #11311 from jzhou77/release-notes
Throw errors in getConsistentReadVersion
2024-04-19 16:42:22 -07:00
Yao Xiao 81b342fccd
Don't remove team when total team count is within threshold (#11295) 2024-04-19 15:40:42 -07:00
neethuhaneesha adf0e8fa18
Rocksdb metrics in status json (#11321) 2024-04-18 22:00:58 -07:00
Johannes M. Scheuermann 728c7a7e92 Correct the docker image name in fdb-kubernetes-monitor docs 2024-04-18 15:06:13 -05:00
Jingyu Zhou 04128834a4
Merge pull request #11313 from jzhou77/fix
Increase CommitProxyTerminated severity for failed_to_progress errors.
2024-04-17 22:06:42 -07:00
Jingyu Zhou e0674ced7c Increase CommitProxyTerminated severity for failed_to_progress errors.
For better visibility.
2024-04-17 14:49:48 -07:00
Jingyu Zhou 17bb1f4278 Fix BlobRestoreWorkload errors 2024-04-17 10:18:51 -07:00
neethuhaneesha c89074ab04
Revert "Added perpetualStorageWiggleSpeed check to pick perpetualStoreType (#…" (#11305)
This reverts commit 3f5b60f711.
2024-04-17 10:09:38 -07:00
neethuhaneesha ed7a275231
Rocksdb caching knob options. (#11282) 2024-04-17 10:09:14 -07:00
Jingyu Zhou 9ac965886c Throw errors in getConsistentReadVersion
In the current code, errors are retried in getConsistentReadVersion, so it's
possible that the client has cancelled the GRV request, but readVersionBatcher
continue retrying, which can lead to many clients DDoS GRV proxies, especially
when the database has become unavailable for a while and clients are issuing
many GRV requests.
2024-04-17 09:13:21 -07:00
Jingyu Zhou 84d6b86715
Merge pull request #11309 from apple/sevwarn
Raise visibility of gray failure actions
2024-04-17 09:08:40 -07:00
Dan Lambright 3dc6c49791
respond to review comments
Make DegradedServerDetectedAndTriggerRecovery SevWarnAlways
2024-04-17 09:26:08 -04:00
Yao Xiao be3dcbde62
Sharded RocksDB knob changes. (#11291) 2024-04-16 11:15:08 -07:00
Dan Lambright 9cd5090965 Raise visibility of gray failure actions 2024-04-16 12:23:16 -04:00
Jingyu Zhou 8b8b25a238
Merge pull request #11304 from hfu94/con
Add convert.py that converts between timestamp and version for clusters
2024-04-15 19:53:07 -07:00
hao fu b26cb205e0 Fix bug when there is no data in the first half of a binary search itereation 2024-04-15 14:08:52 -07:00
hao fu 4f2bdc72e9 Add convert.py that converts between timestamp and version for clusters 2024-04-15 14:06:14 -07:00
sulinehk b8081e2685 fix: golang docker compose example 2024-04-12 14:51:34 -05:00
Zhe Wang 832972e2da
Validate Mutation Version in Accumulative Checksum Framework (#11293)
* validate-mutation-version-in-acs-framework

* turn off knob

* randomly enable feature
2024-04-12 10:15:46 -07:00
Aaron Molitor 38bb833d41 updates to add fdb-kubernetes-monitor to the standard build flow 2024-04-12 08:31:48 -05:00
Sreenath Bodagala a4430b9169
Compare storage replicas on reads (#11235)
* - Compare storage replicas on reads (in "loadBalance()")

* - Do consistency check on reads in loadbalance

* - Do replica consistency check in the case where loadBalance issues
requests to multiple storage servers

* - Address a state variable related bug

* - Code formatting

* - API simplification

* - Simplify code

* - Code formatting

* - Address a review comment
2024-04-11 16:08:54 -04:00
Jingyu Zhou 2f50c4989f
Merge pull request #11287 from kakaiu:consistency-check-urgent-doc
Add consisteny check urgent doc
2024-04-09 12:07:53 -07:00
Zhe Wang c700f9ad5c improve 2024-04-08 16:16:18 -07:00
Zhe Wang 59681e4031 improve 2024-04-08 16:15:00 -07:00
Zhe Wang 0ace28e4a4 improve 2024-04-08 16:12:38 -07:00
Zhe Wang a3ef5d84bd fix escape char 2024-04-08 16:07:29 -07:00
Zhe Wang 2676f11856 address comments 2024-04-08 15:50:55 -07:00
Zhe Wang 4b0ea9902d fix 2024-04-08 14:08:42 -07:00
Zhe Wang 36d958470f add consisteny check urgent doc 2024-04-08 14:06:28 -07:00
Jingyu Zhou 34a3c778b7
Merge pull request #11285 from jzhou77/release-notes
Add 7.1.32 - 7.1.37 release notes
2024-04-08 13:58:07 -07:00
Jingyu Zhou c6988fff09 Add 7.1.32 - 7.1.37 release notes 2024-04-08 12:37:03 -07:00
Aaron Molitor 12fa060fb3
Update release-notes-730.rst 2024-04-08 10:16:21 -05:00
Jingyu Zhou 11cb531b31
Merge pull request #11280 from xis19/main
fixup! Update the document of Machine-Readable Status
2024-04-05 13:48:47 -07:00
Zhe Wang 33eecd0775
Real-time corruption detection with accumulative checksum (#11255)
* acs framework

* code refactor and fix bugs

* add ss crash loop protector

* use sharedptr instead of raw pointer

* fixed critical bugs and add provate mutation acs to the framework

* enable ACS for all mutations except for clear serverTag mutation and fix bugs

* fix restarting tests

* refactor code and fix bugs

* fix AccumulativeChecksumState toString

* fix bugs

* allow all mutations in acs and fixed bugs

* fix bugs and code cleanup

* code clean up for adding recovery support

* simplify code and support recovery

* clear acs state at ss

* fix bug

* terminate validator if ss will be removed in the current batch

* simplify code

* add trace

* address comments

* optimize code

* deep copy when adding mutation to acs validator

* warp encode and decode persist acs key

* make acstable private

* remove unless func

* remove unless func

* remove epoch in ACS validator

* add acs mutation counter in SS metrics

* code cleanup and make knob check better

* make mutation buffer global

* simplify code

* add comments

* make knob randomly set

* address comments

* ss reboot after acs mismatch found
2024-04-04 15:03:44 -07:00
Hao Fu 8635035cc5
Add destructor in LRU2 to release memory (#11276) 2024-04-03 17:26:41 -04:00
Hao Fu 2a774d39a5
Suppress ChosenMachine to fix simulation error (#11277) 2024-04-03 17:26:22 -04:00
Xiaoge Su 5b587d3c61 fixup! Update the document of Machine-Readable Status
The key has to be of byte type in Python, not str.
2024-04-03 13:22:33 -07:00
Hao Fu a6e543b5c7
fix signed int and overflow bugs (#11273)
now timestamp has uint64 and page number starts at 1
2024-04-01 18:33:50 -04:00