Commit Graph

671 Commits

Author SHA1 Message Date
Lukas Joswiak 72a97afcd6 Avoid recruiting workers with different cluster ID 2022-10-27 13:56:13 -07:00
Hui Liu f2289ced27 Add StorageServerInterface for BlobMigrator 2022-10-24 13:12:07 -07:00
Jingyu Zhou a8391caf23 Revert "Data loss protection v2" 2022-10-20 18:09:58 -05:00
Lukas Joswiak dd0644dd04 Format 2022-10-18 21:56:28 -07:00
Lukas Joswiak 17b6726958 Avoid blocking in choose when 2022-10-18 21:37:42 -07:00
Lukas Joswiak 72bc89cf39 Remove cluster ID logic from individual roles
The logic to determine the validity of a process joining a cluster now
belongs on the worker and the cluster controller. It is no longer
restricted to tlogs and storages, but instead applies to all processes
(even stateless ones).
2022-10-18 21:37:42 -07:00
Lukas Joswiak 7342672c11 Move cluster ID from txnStateStore to the database
The cluster ID is now stored in the database instead of in the
txnStateStore. The cluster controller will read it on boot and send it
to all processes to persist.
2022-10-18 21:37:42 -07:00
Lukas Joswiak 743609f217 Notify processes joining the wrong cluster
And have these processes enter a "zombie" state where they cancel all
their actors and then wait forever, refusing to do any additional work
until they are manually handled by the operator.
2022-10-18 21:37:40 -07:00
Lukas Joswiak 2394a9f4b9 Avoid recruiting workers with different cluster ID 2022-10-18 21:37:16 -07:00
sfc-gh-tclinkenbeard 142d46400b Reduce severity of trace events for local_config_changed 2022-10-18 13:46:52 -07:00
Hui Liu 169c341f79
Merge pull request #8386 from sfc-gh-huliu/blobmigrator
Add blob migrator to assist data copy from blob to storage server
2022-10-13 14:46:04 -07:00
Hui Liu 049df622f1 add a blob migrator 2022-10-13 13:21:45 -07:00
Yi Wu ac6aaf3785
encryption: fix some data not being encrypted (#8403)
Changes:
1. Change `isEncryptionOpSupported` to not check against `clientDBInfo.isEncryptionEnabled`, but instead against ENABLE_ENCRYPTION server knob. The problem with clientDBInfo is before its being broadcast to the workers, its content is uninitialized, during which some data (e.g. item 2) is not getting encrypted when they should.
2. Fix CommitProxy not encrypting metadata mutations which are recovered from txnStateStore
3. Fix KeyValueStoreMemory (thus TxnStateStore) partial transaction coming from recovery is not encrypted
4. new CODE_PROBE for the above fixes
5. Logging changes
2022-10-12 14:18:56 -07:00
Kevin Hoxha ff1b2df8f6 fdbcli: Add options for knob management
- setknob <knob_name> <knob_value> [config_class]
- getknob <knob_name> [config_class]
- Added new option to begin to specify if it's a configuration txn. Syntax is begin [config-txn]
- Added utility function for converting tuples to string
- Added knobmanagment test in fdbcli_tests.py
2022-10-11 15:32:01 -07:00
Jingyu Zhou 7c89cd705f
Merge pull request #8402 from halfprice:zhewu/add-satellite-info
Add info in HealthMonitorDetectDegradedPeer to indicate whether a peer is in satellite DC
2022-10-07 11:25:55 -07:00
Markus Pilman ea1325a552
Merge pull request #8319 from sfc-gh-tclinkenbeard/add-rare-code-probe-annotation
Add `rare` code probe decoration
2022-10-07 09:39:00 -06:00
Zhe Wu a09d6e8b08 Add info in HealthMonitorDetectDegradedPeer to indicate whether a peer is in satellite DC 2022-10-04 23:09:30 -07:00
Markus Pilman 550488b020 Merge remote-tracking branch 'origin/main' into bugfixes/open-for-ide
# Conflicts:
#	bindings/c/CMakeLists.txt
#	fdbclient/include/fdbclient/GetEncryptCipherKeys.actor.h
#	fdbserver/BackupWorker.actor.cpp
#	fdbserver/BlobWorker.actor.cpp
#	fdbserver/CommitProxyServer.actor.cpp
#	fdbserver/KeyValueStoreMemory.actor.cpp
#	fdbserver/StorageCache.actor.cpp
#	fdbserver/include/fdbserver/GetEncryptCipherKeys.actor.h
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/PhysicalShardMove.actor.cpp
#	flow/CMakeLists.txt
2022-10-04 18:27:48 -06:00
Markus Pilman 97dfc6823f fixed build with OPEN_FOR_IDE 2022-10-04 17:01:02 -06:00
Zhe Wu 219dde9f41 differentiate degraded peers and disconnected peers in gray failure 2022-09-29 13:55:16 -07:00
sfc-gh-tclinkenbeard 985958c260 Add rare code probe decoration 2022-09-25 15:28:32 -07:00
A.J. Beamon 4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
Yi Wu e66942ada4
Update Redwood encryption interface (#8172)
Update Redwood encryption interface to make it better suit for per-tenant encryption, where we will need to do tenant page split.
2022-09-16 15:56:05 -07:00
sfc-gh-ngoyal 1bd97fe628
Recruit new singleton for consistency checker. (#5804)
* Recruit new singleton for consistency checker.

* Recruit the consistency checker only if enabled.

* Add a yield in monitorConsistencyChecker().

* Minor fixes.

* Consistency check workload enhancements.

* Minor fixes and clarifications.

* clang format

* Clang format.

* Minor fixes, cleanup, debug tracing.

* Misc.

* Move the consistency scan information from dbconfig to a key backed object.

* Move consistency scan config out of db cofig to a state object and feature rename.

* ConsistencyCheck workload refactor.

* devFormat

* Update fdbcli/ConsistencyScanCommand.actor.cpp

* Review Comments.

Co-authored-by: negoyal <neelam.goyal@gmail.com>
Co-authored-by: Ata E Husain Bohra <ata.husain@snowflake.com>
2022-09-16 09:03:06 -07:00
sfc-gh-tclinkenbeard 82adc1e856 Make g_simulator a pointer 2022-09-15 09:00:33 -07:00
Jingyu Zhou e70a18e638
Merge pull request #8122 from xumengpanda/mengxu/io-timeout-main
Add STORAGE_SERVER_REBOOT_ON_IO_TIMEOUT knob to reboot SS on io_timeout
2022-09-14 10:11:46 -07:00
Meng Xu 9e9efb69a0 Format code to repo style 2022-09-13 16:59:45 -07:00
Lukas Joswiak 5a1f9f3e9e Update fdbserver/worker.actor.cpp
Co-authored-by: Trevor Clinkenbeard <trevor.clinkenbeard@snowflake.com>
2022-09-13 16:53:54 -07:00
Lukas Joswiak 7ee6be9238 Simplify how ConfigBroadcastInterface is stored on worker 2022-09-13 16:53:54 -07:00
Lukas Joswiak 249ff2b2fd Fix configuration database unit tests 2022-09-13 16:53:54 -07:00
Lukas Joswiak cd2bbffa4c Add flag to disable the configuration database
The `--no-config-db` flag, passed to `fdbserver`, will disable the
configuration database. When this flag is specified, no `ConfigNode`s
will be started, the `ConfigBroadcaster` will not be started, and on a
coordinator change no attempt will be made to lock `ConfigNode`s.
2022-09-13 16:53:54 -07:00
Lukas Joswiak 74ac617a34 Add support for changing coordinators to the configuration database
Configuration database data lives on the coordinators. When a change
coordinators command is issued, the data must be sent to the new
coordinators to keep the database consistent.
2022-09-13 16:53:54 -07:00
Meng Xu 358a0dd88d Add STORAGE_SERVER_REBOOT_ON_IO_TIMEOUT knob to reboot SS on io_timeout 2022-09-07 15:30:04 -07:00
Dennis Zhou 80a0816157
flow: switch from hard coded to ApiVersion like ProtocolVersion (#8071)
* flow: add ApiVersion to replace hard coding api version

Instead of hard coding api value, let's rely on feature versions akin to
ProtocolVersion.

* ApiVersion: remove use of -1 for latest and use LATEST_VERSION
2022-09-02 09:28:13 +02:00
Yi Wu 49503987cc
Support Redwood encryption (#7376)
A new knob `ENABLE_STORAGE_SERVER_ENCRYPTION` is added, which despite its name, currently only Redwood supports it. The knob is mean to be only used in tests to test encryption in individual components, and otherwise enabling encryption should be done through the general `ENABLE_ENCRYPTION` knob.

Under the hood, a new `Encryption` encoding type is added to `IPager`, which use AES-256 to encrypt a page. With this encoding, `BlobCipherEncryptHeader` is inserted into page header for encryption metadata. Moreover, since we compute and store an SHA-256 auth token with the encryption header, we rely on it to checksum the data (and the encryption header), and skip the standard xxhash checksum.

`EncryptionKeyProvider` implements the `IEncryptionKeyProvider` interface to provide encryption keys, which utilizes the existing `getLatestEncryptCipherKey` and `getEncryptCipherKey` actors to fetch encryption keys from either local cache or EKP server. If multi-tenancy is used, for writing a new page, `EncryptionKeyProvider` checks if a page contain only data for a single tenant, if so, fetches tenant specific encryption key; otherwise system encryption key is used. The tenant check is done by extracting tenant id from page bound key prefixes. `EncryptionKeyProvider` also holds a reference of the `tenantPrefixIndex` map maintained by storage server, which is used to check if a tenant do exists, and getting the tenant name in order to get the encryption key.
2022-08-31 12:19:55 -07:00
Jingyu Zhou 05e463f79f
Merge pull request #7994 from jzhou77/main
Fix missing localities for fdbserver
2022-08-25 16:01:33 -07:00
Jingyu Zhou 808182e258 Add more traces for load balancing 2022-08-25 14:25:11 -07:00
Jingyu Zhou f7cb86701a Fix missing localities for fdbserver
The localities are stored in ServerDBInfo for calculating distances to other
processes. The localities are not set when creating ServerDBInfo, thus any
distances calculated before UpdateServerDBInfoRequest will be wrong.

This PR fixes this issue, thus preventing unnecessary cross DC calls,
especially for index prefetching on the storage servers.
2022-08-25 13:47:51 -07:00
Nim Wijetunga 827363fb06
Merge pull request #7980 from sfc-gh-nwijetunga/nim/move-ekp-client
Move EKP Interface and GetEncryptCipherKey to fdbclient
2022-08-24 08:04:09 -07:00
Chaoguang Lin 06aa6ee5ff
Add system monitor for flowprocess (#6925)
* Update network address in trace logs; Add system monitor for flowprocess

* Create a new trace file with the correct process address for flowprocess

* Remove unused debugging traces

* Add a new error lock_file_failure; Change please_reboot_remote_kv_store to please_reboot_kv_store; Add the code to only reboot the kv store but not the worker; Remove some unnecessay traces

* Add error handling for file_not_found in handleIOErrors

* Format worker.actor.cpp file
2022-08-24 00:40:38 -07:00
Nim Wijetunga a857609478 refactor ekp interface 2022-08-23 23:04:12 -07:00
Evan Tschannen 493771b6a8
Throttle the cluster if the blob manager cannot assign ranges (#7900)
* Throttle the cluster if the blob manager cannot assign ranges

* fixed a number of different bugs which caused ratekeeper to throttle to zero because of blob worker lag

* fix: do not mark an assignment as block if it is cancelled

* remove asserts to merge bug fixes

* fix formatting

* restored old control flow to storage updater

* storage updater did not throw errors

* disable buggify to see if it fixes CI
2022-08-23 13:33:46 -05:00
Josh Slocum a42e81a795
Bg bug fixes3 (#7904)
* Fixing merge boundary recovery

* fixing an edge case in blob manager repeat recruitment

* fixing a race between tenant loading and key alignment

* formatting
2022-08-18 12:42:53 -05:00
Chaoguang Lin a27d27c5ee
Add traces for snapshot related updates (#7862)
* Add logging; fix typos in comments;

* format files
2022-08-13 03:10:20 -04:00
Evan Tschannen a9d3c9f9b3
Added throttling when a blob worker falls behind (#7751)
* throttle the cluster when blob workers fall behind

* do not throttle on blob workers if they are not enabled

* remove an unnecessary actor

* fixed a compile error

* fetch blob worker metrics at the same interval as the rate is updated, avoid fetching the complete blob worker list too frequently

* fixed another compilation bug

* added a 5 second delay before bw throttling to prevent false positives caused by the 100e6 version jump during recovery. Lower the throttling thresholds to react much quicker to bw lag.

* fixed a number of problems

* changed the minBlobVersionRequest to look at storage server versions since this will be a lot more efficient

* fix: do not let desired go backwards

* fix: track the version of notAtLatest changefeeds for throttling

* ratekeeper now throttled blob workers by estimating the transaction per second throughput of the blob workers

* added metrics for blob worker change feeds

* added a knob to disable bw throttling

* fixed the transaction options in blob manager
2022-08-12 13:15:56 -07:00
Nim Wijetunga 6d0b20b07a Merge branch 'main' of github.com:sfc-gh-nwijetunga/foundationdb into nim/refactor-encryption-flag
* 'main' of github.com:sfc-gh-nwijetunga/foundationdb: (32 commits)
  Store rocksdb::DBOptions and rocksdb::ColumnFamilyOptions to (#7766)
  Update CONTRIBUTING.md
  Update tests/rare/SpecificUnitTests.toml
  fix ASAN OOM problem
  Update CONTRIBUTING.md
  Write tracing and ALP special key errors as JSON
  Fix: the static tenant map in the Java tester was being accessed concurrently from multiple threads. Make it a concurrent map. (#7805)
  Run clang-format
  Print SIGNAL output to stdout
  Print to stderr only upon errors
  Testing upgrades to a future version of FDB (#7780)
  Flush gcov coverage upon SIGTERM
  Report the unit tests being run in test harness
  Fix a bug in a storage wiggler unit test where some servers were added with too recent a timestamp
  Fix undefined behavior in versioned btree test due to integer overflow
  When a transaction operation gets an unknown tenant error, it needs to reset the tenant ID so it can be updated in the next tenant lookup request.
  Don't buggify max tenants per cluster globally; instead buggify it in specific tests
  Remove non-existing unittest
  Add unit tests to the correctness package
  Add comment to INetwork
  ...
2022-08-08 22:07:05 -07:00
Nim Wijetunga 30c985e6d2 format 2022-08-08 11:02:59 -07:00
Nim Wijetunga f112022e66 fix pr issues 2022-08-08 10:48:22 -07:00
Vaidas Gasiunas 79571dd2b4
Testing upgrades to a future version of FDB (#7780)
* Enable configuring the next future protocol version as the current protocol version in FDB client, fdbserver, and fdbcli

* Auto format python files used in upgrade tests

* Add a test for upgrading to a future FDB version

* Emphasize that the options for using future protocol version are intended for test purposes only

* Make the global variable for current protocol version visible only locally

* Refactirng to avoid using currentProtocolVersion() in static intialization

* Update go bindings
2022-08-08 17:29:49 +02:00
Nim Wijetunga f3e7fd142b fix formatting 2022-08-01 18:12:34 -07:00