Commit Graph

8845 Commits

Author SHA1 Message Date
Zhe Wu 4ff4e3b826 address comments 2022-04-07 17:34:13 -07:00
Zhe Wu e017faa6c4 grey failure detection account for the case where the connection between primary and satellite DC becomes bad. 2022-04-07 17:34:13 -07:00
Xiaoxi Wang 4eb0adc51c
Merge pull request #6776 from sfc-gh-xwang/fix-snap-test-assertion
Check pseudo locality before pop
2022-04-07 10:45:22 -07:00
Xiaoxi Wang d25fc4db34 add ASSERT_WE_THINK 2022-04-07 09:21:50 -07:00
Zhe Wang 3d325940ad
Fix data race issue when multithreaded RocksDB store using histogram (#6766)
* using-ThreadReturnPromiseStream-to-make-histogram-thread-safe

* address-comment-and-verify-functionality-in-cluster

* remove-old-histogram-metrics

* fix-comment

Co-authored-by: Zhe Wang <zhewang@Zhes-MacBook-Pro.local>
2022-04-07 00:31:36 -04:00
Yi Wu 994b8c92f8
Add option to limit resident memory and remove default memory limit (#6719)
Changing `memory` option to limit resident memory instead of virtual memory, in config file and fdbserver/fdbbackup/fdbcli command-line argument. Since `rlimit` doesn't support limiting virtual memory, the current implementation have both of fdbmonitor and the fdbserver/fdbbackup process checking process RSS periodically and kill and restart the process if the limit is exceeded.

Adding a new `memory_vsize` option to limit virtual memory, if backward-compatible behavior is desired.

closes #6671, closes #6672
2022-04-06 20:06:24 -07:00
Chaoguang Lin f62904187e Disable remote kvs if RocksDB is used 2022-04-06 17:44:20 -07:00
Zhe Wu 5fd494a57b Allow worker health monitor to report recent destroyed peers who currently have roles in transaction systems 2022-04-06 13:33:50 -07:00
Renxuan Wang 267c4deaee
Add tryGetReplyFromHostname() and retryGetReplyFromHostname(). (#6761)
* Add hostname to coordination interfaces.

* Add tryGetReplyFromHostname() and retryGetReplyFromHostname().

* Change tryGetReplyFromHostname() to call hostname.resolve().

* Add throw for actor_cancelled.
2022-04-06 10:47:00 -07:00
Xiaoxi Wang 20fee3dd06 check pseudo locality before pop 2022-04-05 23:48:18 -07:00
Xiaoxi Wang ce33366396
only add mutations can change configuration (#6760) 2022-04-05 17:05:51 -07:00
Josh Slocum aaaf42525a misc bg operational fixes and improvements 2022-04-05 12:26:00 -05:00
Zhe Wu 1c6dfae48e Making gray failure also monitors connection failures 2022-04-05 09:59:05 -07:00
Evan Tschannen c168840b54 blob workers properly destroy change feeds when they are no longer needed 2022-04-05 11:02:32 -05:00
Jingyu Zhou f68fd28d73 Refactor duplicated code into IKnobCollection::setupKnobs() 2022-04-05 02:06:38 -07:00
Renxuan Wang 465ff712b6
Move Hostname to its own files. (#6759)
* Change DNS cache to use std::map.

Revert commit 90c259d84e, because if we use unordered_map, toString() can be inconsistent.

* Move ClientKnob::COORDINATOR_HOSTNAME_RESOLVE_DELAY to FlowKnob::HOSTNAME_RESOLVE_DELAY.

* Move Hostname to its own files.

Also, add resolve-related variables and functions in Hostname.
2022-04-04 19:04:51 -07:00
Chaoguang Lin c8455237ea Fix the bug where use the pointer after it's cleaned 2022-04-04 11:49:41 -07:00
Xiaoge Su 6b69c439f0 Allowing globally knob change in TOML file based test
In commit 99b030c2f6, it is allowed to set
knob values in TOML file per single test, using syntax

[[test]]
    [[test.knobs]]
    knob_key = knob_value

the knob key/value pairs are changed before the TEST_CASE starts, then
reverted after TEST_CASE completes.

With this patch, it is possible to *globally* update the knob value,
i.e.

[[knobs]]
enable_encryption = true

[[test]]
testTitle = 'EncryptKeyProxy'

    [[test.workload]]
    testName = 'EncryptKeyProxyTest'

This is manually tested by printing out knob key/value pairs. Also
tested using Ata's EncryptKeyProxy test code by enabling
enable_encryption key.
2022-04-04 11:17:32 -07:00
Josh Slocum cb918b9cef Added basic blob granule consistency check 2022-04-04 11:38:42 -05:00
Steve Atherton 38190ad7e7
Merge pull request #6737 from sfc-gh-satherton/fix-storage-timestamps
Change storage metadata and perpetual wiggle timestamps to double epoch seconds
2022-04-02 09:47:23 -07:00
Steve Atherton 6eb1c2ae48
Merge pull request #6574 from sfc-gh-satherton/redwood-rare-bugs
Rare correctness bug fixes in Redwood
2022-04-01 16:40:22 -07:00
Josh Slocum 377e252fcf
Better split sizing in blob manager (#6725) 2022-04-01 16:09:46 -07:00
Bharadwaj V.R f749aac223
Merge branch 'apple:main' into ssupdateb4registration 2022-03-31 18:59:44 -07:00
Chaoguang Lin 7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
Bharadwaj V.R 8ff3b7d8a2
Merge branch 'apple:main' into ssupdateb4registration 2022-03-31 16:12:06 -07:00
Xiaoxi Wang c7d2f5fee2
Merge pull request #6739 from sfc-gh-jslocum/ddq_assert
fix destination limiting and cancelling logic in move_to_removed_serv…
2022-03-31 14:22:09 -07:00
Tao Lin 001909be08
Fixes for when getMappedRange cannot parse as tuple (#6665) 2022-03-31 14:06:45 -07:00
He Liu 966caadb3e
Merge pull request #6706 from kakaiu/Fix-block-cache-recreation-issue
Fix RocksDB Block Cache Recreation Problem
2022-03-31 13:50:15 -07:00
Josh Slocum 9e06881673 fix destination limiting and cancelling logic in move_to_removed_server case 2022-03-31 14:05:15 -05:00
Steve Atherton 6744e9e4f9 Change timestamps used in storage server metadata and perpetual wiggle metrics to epoch seconds, stored as doubles, and stringified as either floating point epoch seconds or timestamp strings of the form "2013-04-28 20:57:01.000 +0000". 2022-03-30 18:57:06 -07:00
Steve Atherton 2c9b2dd005 Merge commit '1b919f52e928e8a72d5acba9175eae32ed4b0c90' into redwood-rare-bugs
# Conflicts:
#	flow/ThreadHelper.actor.h
2022-03-30 18:21:03 -07:00
Steve Atherton 84f9e00258 Remove duplicative generic actor repeatEvery() since recurring() exists. 2022-03-30 18:10:29 -07:00
Steve Atherton d6e2d2a1fe Fix nondeterminism in StorageWiggleMetrics caused by use of timer_int(). 2022-03-30 14:48:01 -07:00
Bharadwaj V.R d56066ee91
Merge branch 'apple:main' into ssupdateb4registration 2022-03-30 09:47:20 -07:00
He Liu ca4bfb55d6 Merge branch 'main' of https://github.com/apple/foundationdb into rename-rocks-engine 2022-03-29 16:24:50 -07:00
Bharadwaj V.R 7926917d5f
Merge branch 'apple:main' into ssupdateb4registration 2022-03-29 13:04:19 -07:00
Josh Slocum 7fc6dfa6c5 Adding useful debugging trace events 2022-03-29 14:48:28 -05:00
Bharadwaj V.R 2f7b68d06f Switch to signalling storageIntefaceReg actor with an Optional<Future<Void>> 2022-03-29 11:50:46 -07:00
Jingyu Zhou da0673ccce
Merge pull request #6705 from RenxuanW/another
Add proxy option to backup and restore params.
2022-03-29 11:36:13 -07:00
He Liu dd15489605 rename ssd-rocksdb-experimental as ssd-rocksdb-v1. 2022-03-29 10:53:38 -07:00
Zhe Wang 37c7b3ff18 fix-rocksdb-blockcache-recreation-problem 2022-03-29 12:07:41 -04:00
Josh Slocum 61474d5d54 Future-proof blob granules with full file size 2022-03-29 08:06:07 -05:00
Josh Slocum 2f8e9d9de0 misc bg fixes 2022-03-29 08:05:52 -05:00
Bharadwaj V.R 2348c46dac Resolve merge conflict 2022-03-28 22:54:00 -07:00
Bharadwaj V.R 726cb3a18f merge commits from main 2022-03-28 22:49:03 -07:00
Steve Atherton 16afeb43fa Avoid false positive for determinism check in DEBUG_DETERMINISM by avoiding use of shared memory. 2022-03-28 20:02:55 -07:00
Steve Atherton 1e9c8b3684 Shutdown bug fix, extent cache should be cleared on shutdown as if recovery never completed it wouldn't have been cleared yet. 2022-03-28 18:14:05 -07:00
Renxuan Wang 0a332ee1c1 Add proxy option to backup and restore params. 2022-03-28 17:10:49 -07:00
Trevor Clinkenbeard ad98d64799
Merge pull request #6473 from sfc-gh-tclinkenbeard/change-data-hall
Test renaming data hall on restart
2022-03-28 16:39:43 -07:00
Steve Atherton 8cf40f86e6 Merge commit '478ff1eb76bc88201b6803b8b8fb5ad9d0bcc040' into aggressive-storage-migration 2022-03-28 10:10:32 -07:00