Commit Graph

283 Commits

Author SHA1 Message Date
Markus Pilman ea1325a552
Merge pull request #8319 from sfc-gh-tclinkenbeard/add-rare-code-probe-annotation
Add `rare` code probe decoration
2022-10-07 09:39:00 -06:00
Josh Slocum dc917453c1
Targeted blob granules fault injection (#8231) 2022-10-05 13:44:38 -05:00
Hui Liu 01d7668fd1 injected faults should be retryable errors in blob worker 2022-09-29 16:47:31 -07:00
sfc-gh-tclinkenbeard 985958c260 Add rare code probe decoration 2022-09-25 15:28:32 -07:00
Josh Slocum 9122e6e5fe
fixing off by one (#8279) 2022-09-22 16:31:55 -07:00
Josh Slocum 6270016bed
Seq insert perf fixes main (#8264)
* Force flushing granules post-split to guarantee parent feeds get cleaned up

* fixing bug and cleaning up split finalize code
2022-09-21 12:36:02 -07:00
Andrew Noyes 2bdfc52f97
Fix heap use after free (#8189)
Previously, we had Ref types outliving the arena's that owned them,
specifically encryptDomains in the getResolution actor. Refactor to use
Standalone's, which both fixes the memory error and makes this easier to
reason about.

Also fix a potential ODR violation.
2022-09-16 13:46:05 -07:00
Hui Liu b19f1b5e3b
Merge pull request #8109 from sfc-gh-huliu/bmr
Add blob manifest for full restore
2022-09-15 11:05:41 -07:00
Hui Liu 59be25848f bootstrap blob manager and blob worker from blob manifest 2022-09-15 09:50:12 -07:00
sfc-gh-tclinkenbeard 82adc1e856 Make g_simulator a pointer 2022-09-15 09:00:33 -07:00
Yi Wu d831c87d14
Add encryption metrics (#8070)
Adding the following metrics:
* BlobCipherKeyCache hit/miss
* EKP: KMS requests latencies
* For each component that using encryption, they now need to pass a UsageType enum to the encryption helper methods (GetEncryptCipherKeys/GetLatestEncryptCipherKey/encrypt/decrypt) and those methods will help to log get cipher key latency samples and encryption/decryption cpu times accordingly.
2022-09-09 18:43:09 -07:00
Josh Slocum 3887ed5409
fixing memory leak caused by uncessary notified version promises (#8106) 2022-09-08 16:32:24 -07:00
Andrew Noyes 475ed4b1dc
Improve memory safety (#8069)
* Move arena members to the end of serializer calls

See
https://github.com/apple/foundationdb/tree/main/flow#flatbuffersobjectserializer
for why this is necessary.

* Fix a heap-use-after-free

Previously memory owned by
EncryptKeyProxyData::baseCipherDomainIdKeyIdCache was borrowed by a call
to EncryptKeyProxyData::insertIntoBaseDomainIdCache where it was
invalidated and then used. Now
EncryptKeyProxyData::insertIntoBaseDomainIdCache takes shared ownership
by taking a Standalone.

And also rename some types to end in Ref to follow the flow conventions
described here: https://github.com/apple/foundationdb/tree/main/flow#arenas
2022-09-01 12:47:03 -07:00
Josh Slocum 9721de70b6 Adding knob and increasing delay for simulation ratekeeper throttling assert 2022-08-31 09:08:27 -05:00
Josh Slocum adc0fea18c
Fix rare force purge and granule assignment race (#8018)
* Fix rare force purge and granule assignment race

* Adding missed transaction options
2022-08-29 17:29:28 -05:00
Josh Slocum 5f9d8e94af
Adding future_version reply to change feed operations for behind ss (#8000) 2022-08-29 17:20:23 -05:00
Josh Slocum 46b02cab49
Blob granule summary implementation (in native client) (#7981)
* implemented blob granule summary call in native client

* clean up prints
2022-08-26 14:04:59 -05:00
Nim Wijetunga a857609478 refactor ekp interface 2022-08-23 23:04:12 -07:00
Josh Slocum 98a7ec1797
Blob Granules Cleanup (#7941)
* Cleaned up BlobGranule TODO + FIXMEs and addressed some

* popping feed at correct version

* blob worker taking over a granule will pop from where previous worker left off

* addressed fixme of blob worker not re-snapshotting from old change feed

* formatting

* more change feed popped fixes after pop updates

* Getting rid of change feed parallelism lock since it can cause deadlocks in fetching, and relying on full fetch lock

* New blob worker metric and fixing old one

* server-side popped checking still doesn't work because of pops at non-mutation versions

* format
2022-08-19 17:25:31 -07:00
Josh Slocum bbbaa80e52
assorted bug fixes for blob granules (#7866)
* assorted bug fixes for blob granules

* Fixing transaction used after commit in blob manager recovery

* fixing race with granule merging across hard boundaries because it hadn't loaded them yet
2022-08-12 17:26:43 -05:00
Evan Tschannen a9d3c9f9b3
Added throttling when a blob worker falls behind (#7751)
* throttle the cluster when blob workers fall behind

* do not throttle on blob workers if they are not enabled

* remove an unnecessary actor

* fixed a compile error

* fetch blob worker metrics at the same interval as the rate is updated, avoid fetching the complete blob worker list too frequently

* fixed another compilation bug

* added a 5 second delay before bw throttling to prevent false positives caused by the 100e6 version jump during recovery. Lower the throttling thresholds to react much quicker to bw lag.

* fixed a number of problems

* changed the minBlobVersionRequest to look at storage server versions since this will be a lot more efficient

* fix: do not let desired go backwards

* fix: track the version of notAtLatest changefeeds for throttling

* ratekeeper now throttled blob workers by estimating the transaction per second throughput of the blob workers

* added metrics for blob worker change feeds

* added a knob to disable bw throttling

* fixed the transaction options in blob manager
2022-08-12 13:15:56 -07:00
Josh Slocum 7c155f4521
Granule force purging (#7846)
* Granule purge cannot delete history entry for fully deleting granule until all children are completely done splitting

* Several purging fixes related to granule history

* Fixed typo in refactor

* fixing memory model for purgeRange

* formatting

* weakening granule purge test for now

* cleanup

* First version of force purging granules

* fixing issue in BW range assignment reporting

* Fixing incorrect assert with force purging

* Error handling when checking force purged state

* fixed force purging and recover/reassign range races and check

* Handling force purge + boundary change race

* more places to check for force purged status

* fixed manager restart in the middle of force purge bug

* fixing same-BM purge and assignment races in all cases

* weakening orphaned granule history check a bit because of difficult to solve races

* fixing txn options on retry

* loading force purged ranges at start to avoid resuming a merge that is being force purged

* cleanup

* Enabling purging in granule tests, and adding check for leaked change feeds in force purge

* formatting

* missed parameter in merge conflicts

* Fixing leaked change feed race with merge and force purge

* adding change feed cleanup when new blob manager recovers in-progress merge that raced with force purge

* added forcepurge fdbcli command
2022-08-11 15:22:32 -07:00
Josh Slocum 44f8bdd258
Blob Worker memory limit (#7858)
* Simulation version of blob_worker_full

* tracking blocked BM assignments

* actual memory estimation implementation
2022-08-11 15:07:08 -07:00
Josh Slocum 62494f048c
several changes to manage blob worker memory more and to test that management (#7834) 2022-08-09 17:53:52 -05:00
Nim Wijetunga 6d0b20b07a Merge branch 'main' of github.com:sfc-gh-nwijetunga/foundationdb into nim/refactor-encryption-flag
* 'main' of github.com:sfc-gh-nwijetunga/foundationdb: (32 commits)
  Store rocksdb::DBOptions and rocksdb::ColumnFamilyOptions to (#7766)
  Update CONTRIBUTING.md
  Update tests/rare/SpecificUnitTests.toml
  fix ASAN OOM problem
  Update CONTRIBUTING.md
  Write tracing and ALP special key errors as JSON
  Fix: the static tenant map in the Java tester was being accessed concurrently from multiple threads. Make it a concurrent map. (#7805)
  Run clang-format
  Print SIGNAL output to stdout
  Print to stderr only upon errors
  Testing upgrades to a future version of FDB (#7780)
  Flush gcov coverage upon SIGTERM
  Report the unit tests being run in test harness
  Fix a bug in a storage wiggler unit test where some servers were added with too recent a timestamp
  Fix undefined behavior in versioned btree test due to integer overflow
  When a transaction operation gets an unknown tenant error, it needs to reset the tenant ID so it can be updated in the next tenant lookup request.
  Don't buggify max tenants per cluster globally; instead buggify it in specific tests
  Remove non-existing unittest
  Add unit tests to the correctness package
  Add comment to INetwork
  ...
2022-08-08 22:07:05 -07:00
Nim Wijetunga c63b30f698 address pr comments 2022-08-08 11:32:49 -07:00
Vaidas Gasiunas 79571dd2b4
Testing upgrades to a future version of FDB (#7780)
* Enable configuring the next future protocol version as the current protocol version in FDB client, fdbserver, and fdbcli

* Auto format python files used in upgrade tests

* Add a test for upgrading to a future FDB version

* Emphasize that the options for using future protocol version are intended for test purposes only

* Make the global variable for current protocol version visible only locally

* Refactirng to avoid using currentProtocolVersion() in static intialization

* Update go bindings
2022-08-08 17:29:49 +02:00
Nim Wijetunga 369494ab19 merge 2022-08-05 18:52:39 -07:00
Josh Slocum f866ffc36b
Better granule conversion (#7787)
* better check for granule-ification

* Handling blob granule initial split too large

* Re-evaluating split size if too large, even if read doesn't get transaction_too_old

* reworked to have blob worker propose split key

* New GranuleStatusReply to avoid seqno issue stream side effects

* Handling retries on reevaluateInitialSplit properly

* Waiting for stream to be initialized

* Checking reevaluate split for additional split points beyond proposed

* Fixing more races in reevaluate initial split

* properly handling cleaning up old change feed after split re-evaluate

* fixing granule conversion bug with hard boundaries

* fixing clear and merge check race with cycle test

* refactor missed knob check for clearAndMerge

* Fixing formatting

* review comments and improving large range conversion

* fixing typo

* more formatting
2022-08-05 18:12:17 -05:00
Hui Liu cd0020ccaf
add counters for blob compression (#7799) 2022-08-05 18:07:13 -05:00
Josh Slocum b2835921ba
Using knownBlobRanges for blob granule ranges whether tenants are enabled or not (#7788)
* Using knownBlobRanges for blob granule ranges whether tenants are enabled or not

* Effectively disabled blob granule tests when tenants enabled to fix ctest
2022-08-05 11:46:09 -05:00
Josh Slocum d721d1b850
More cf bug fixes (#7789)
* Fixing change feed fetch and rollback race

* Fixing validation issue for change feed validation

* Fixing shutdown segfault in blob worker
2022-08-04 15:57:42 -05:00
Josh Slocum 7f45cccb56
More granule purging fixes (#7756)
* Granule purge cannot delete history entry for fully deleting granule until all children are completely done splitting

* Several purging fixes related to granule history

* Fixed typo in refactor

* fixing memory model for purgeRange

* formatting

* weakening granule purge test for now

* cleanup

* review comments
2022-08-03 16:43:27 -05:00
Nim Wijetunga b922470ceb fix pr issues 2022-08-03 13:26:14 -07:00
Nim Wijetunga 8d591fc5e7 address pr comments 2022-08-02 10:51:41 -07:00
Nim Wijetunga 9a4fb8ad4e merge 2022-08-02 09:49:16 -07:00
Ata E Husain Bohra ef6012c1d1
Encrypt BlobGranule delta files (#7735)
* Encrypt  BlobGranule delta files

Description

 diff-1: Address review comments

Major changes proposed by the patch are:
1. Refactor code to allow caching of 'encryption key ctx' as part of
BlobFilePointerRef. The refactoring allows snapshot and/or delta files
to store their own file encryption context.
2. Enable BlobGranule delta file encryption/decryption semantics.

Testing

BlobGranuleCorrrectness  
BlobGranuleCorrectnessClean
BlobGranuleFileUnitTestToml

Description

Testing
2022-08-01 16:34:44 -07:00
Nim Wijetunga af6db42b1b temp 2022-08-01 15:19:21 -07:00
Junhyun Shim c6342a6e5b
Merge branch 'main' into features/authz 2022-07-27 20:51:32 +02:00
A.J. Beamon d39c0b773a Add a limit to the number of tenants that can be created in a cluster 2022-07-27 08:21:03 -07:00
Junhyun Shim 5169616b16 Fix unresolved merge conflicts 2022-07-27 00:38:16 +02:00
Josh Slocum c32e1da908
Merge pull request #7673 from sfc-gh-jslocum/delta_files_v2
Sorted Delta Files
2022-07-26 16:04:55 -05:00
Josh Slocum 15e7a4b186 addressing review comments 2022-07-26 14:20:35 -05:00
Josh Slocum ea9018460a cleanup and polish 2022-07-22 15:13:32 -05:00
Josh Slocum 095a5a4868 First version of key-sorted delta files 2022-07-22 11:43:49 -05:00
A.J. Beamon 17146c484b Use key-backed types for tenants. Add a tenant state field that will be used in upcoming work. Some other tenant related refactoring. 2022-07-21 20:33:28 -07:00
Josh Slocum 6fc0d61146 delta file test and delta generation 2022-07-21 11:49:13 -05:00
Josh Slocum 316b7a5344 Merge branch 'main' into granule_merging_converge 2022-07-20 12:13:48 -05:00
Josh Slocum 78b6a96006 Merge branch 'main' into granule_merging_batch 2022-07-20 07:42:26 -05:00
Josh Slocum 12b6f386cb Refactoring granule flush to retry properly on granule rollback 2022-07-19 19:49:20 -05:00