Commit Graph

45 Commits

Author SHA1 Message Date
A.J. Beamon 4fd64630e8 Convert literal string ref instances to use _sr suffix 2022-09-19 11:35:58 -07:00
sfc-gh-tclinkenbeard 82adc1e856 Make g_simulator a pointer 2022-09-15 09:00:33 -07:00
Chaoguang Lin 29f98f3654
Avoid duplicate snapshot on one process if it serves as multiple roles (#7294)
* Fix comments

* Add simulation value for SERVER_KNOBS->SNAP_CREATE_MAX_TIMEOUT

* A work version with correctness clean

* Remove unnecessay comments; debugging symbols

* Only check secondary address for coordinators, same as before

* Change the trace to SevError and remove the ASSERT(false)

* Remove TLogSnapRequest handling on TlogServer, which is changed to use WorkerSnapRequest

* Add retry for network failures

* Add retry limit for network failures; still allow duplicate snapshots on processes are both tlog and storage to avoid race

* Add retry limit as a knob and make backoff exponentail

* Add getDatabaseConfiguration(Transaction* tr)

* revert back to send request for each role once

* update some comments
2022-06-29 11:23:07 -07:00
Markus Pilman ffaf15c12a moved wellknownendpoints and fixed some includes 2022-06-23 17:03:53 -06:00
Josh Slocum a43c98519d
Switching char* to std::string for ProcessInfo to have it own memory (valgrind errors) (#7176) 2022-05-17 12:41:04 -07:00
Andrew Noyes 9f628df278 outputBuffer may not be null-terminated, so don't convert to std::string
This fixes a heap buffer overflow caught by ASAN, and importantly not
valgrind, since valgrind currently doesn't run on the first binary in a
restarting test. The next commit will change that so we always run
valgrind on the binary under test.
2022-05-10 14:45:00 -07:00
Binglin Chang 408c0cf1c9
Fix compile errors on ubuntu 20.04 (#4931) 2022-04-20 10:00:46 -07:00
Chaoguang Lin 7d365bd1bb
Remote ikvs debugging (#6465)
* initial structure for remote IKVS server

* moved struct to .h file, added new files to CMakeList

* happy path implementation, connection error when testing

* saved minor local change

* changed tracing to debug

* fixed onClosed and getError being called before init is finished

* fix spawn process bug, now use absolute path

* added server knob to set ikvs process port number

* added server knob for remote/local kv store

* implement simulator remote process spawning

* fixed bug for simulator timeout

* commit all changes

* removed print lines in trace

* added FlowProcess implementation by Markus

* initial debug of FlowProcess, stuck at parent sending OpenKVStoreRequest to child

* temporary fix for process factory throwing segfault on create

* specify public address in command

* change remote kv store knob to false for jenkins build

* made port 0 open random unused port

* change remote store knob to true for benchmark

* set listening port to randomly opened port

* added print lines for jenkins run open kv store timeout debug

* removed most tracing and print lines

* removed tutorial changes

* update handleIOErrors error handling to handle remote-ikvs cases

* Push all debugging changes

* A version where worker bug exists

* A version where restarting tests fail

* Use both the name and the port to determine the child process

* Remove unnecessary update on local address

* Disable remote-kvs for DiskFailureCycle test

* A version where restarting stuck

* A version where most restarting tests green

* Reset connection with child process explicitly

* Remove change on unnecessary files

* Unify flags from _ to -

* fix merging unexpected changes

* fix trac.error to .errorUnsuppressed

* Add license header

* Remove unnecessary header in FlowProcess.actor.cpp

* Fix Windows build

* Fix Windows build, add missing ;

* Fix a stupid bug caused by code dropped by code merging

* Disable remote kvs by default

* Pass the conn_file path to the flow process, though not needed, but the buildNetwork is difficult to tune

* serialization change on readrange

* Update traces

* Refactor the RemoteIKVS interface

* Format files

* Update sim2 interface to not clog connections between parent and child processes in simulation

* Update comments; remove debugging symbols; Add error handling for remote_kvs_cancelled

* Add comments, format files

* Change method name from isBuggifyDisabled to isStableConnection; Decrease(0.1x) latency for stable connections

* Commit the IConnection interface change, forgot in previous commit

* Fix the issue that onClosed request is cancelled by ActorCollection

* Enable the remote kv store knob

* Remove FlowProcess.actor.cpp and move functions to RemoteIKeyValueStore.actor.cpp; Add remote kv store delay to avoid race; Bind the child process to die with parent process

* Fix the bug where one process starts storage server more than once

* Add a please_reboot_remote_kv_store error to restart the storage server worker if remote kvs died abnormally

* Remove unreachable code path and add comments

* Clang format the code

* Fix a simple wait error

* Clang format after merging the main branch

* Testing mixed mode in simulation if remote_kvs knob is enabled, setting the default to false

* Disable remote kvs for PhysicalShardMove which is for RocksDB

* Cleanup #include orders, remove debugging traces

* Revert the reorder in fdbserver.actor.cpp, which fails the gcc build

Co-authored-by: “Lincoln <“lincoln.xiao@snowflake.com”>
2022-03-31 17:08:59 -07:00
sfc-gh-tclinkenbeard 66d71e107d Move actorcompiler.h include to the end of includes 2022-03-16 00:09:16 -07:00
“Lincoln cbcf0fa400 forgot to save before commit 2021-09-25 16:35:54 -06:00
“Lincoln 112d7fe632 make sure that bytes only get added to bytesRead when bytes > 0 2021-09-25 16:35:54 -06:00
“Lincoln cd25d42d66 minor fix 2021-09-25 16:35:54 -06:00
“Lincoln 8d30766129 modified file descriptor to non-blocking access before performing read, also throw any error that occurs while reading 2021-09-25 16:35:54 -06:00
Josh Slocum c31196ab01
Update fdbserver/FDBExecHelper.actor.cpp
Co-authored-by: A.J. Beamon <aj.beamon@snowflake.com>
2021-05-25 10:24:00 -07:00
Josh Slocum 07edc1db9a Removing spaces in SevWarn trace event names 2021-05-25 17:06:48 +00:00
FDB Formatster df90cc89de apply clang-format to *.c, *.cpp, *.h, *.hpp files 2021-03-10 10:18:07 -08:00
Andrew Noyes 79cec09255 Apply clang-tidy's performance-inefficient-vector-operation fix
I ran this command in my build directory after compiling with
OPEN_FOR_IDE. It took a few small tweaks to get it to compile, which is
outside the scope of this commit.

    $ python run-clang-tidy.py -j $(nproc) -checks='-*,performance-inefficient-vector-operation' -fix
2021-03-04 03:58:25 +00:00
sfc-gh-tclinkenbeard f9aba7064d Use consistent return method for fork_child 2021-01-29 16:27:48 -08:00
sfc-gh-tclinkenbeard acac02587d Trace output from forked processes 2021-01-29 01:31:26 -08:00
sfc-gh-tclinkenbeard 8dc39f4d8f Make ExecCmdValueString const-correct 2020-12-27 14:15:22 -04:00
A.J. Beamon d128252e90 Merge release-6.3 into master 2020-05-22 09:25:32 -07:00
sramamoorthy 096afe40be enhance spawnProcess 2020-05-07 14:24:35 -07:00
sramamoorthy 789975e191 fixes in spawnProcess 2020-05-07 14:24:34 -07:00
sramamoorthy 697a9422f5 replace boost::process with execv to spawn snapCreate process 2020-05-07 14:24:34 -07:00
Markus Pilman e4611e8ae4 fix versions.h stupidity 2020-04-06 10:28:55 -07:00
Markus Pilman 8b5780c36c don't include source and binary dir
This forces users to use include paths from the sources root.

So `#include "Arena.h"` won't work anymore, only
`#include "flow/Arena.h"` will.
2020-04-06 10:13:49 -07:00
mpilman d09e07f1f5 Merge remote-tracking branch 'upstream/master' into features/icc 2020-02-04 10:26:18 -08:00
Andrew Noyes 1827e77f2e Update fdbserver/FDBExecHelper.actor.cpp
Co-Authored-By: Jingyu Zhou <jingyuzhou@gmail.com>
2019-10-25 10:42:22 -07:00
Andrew Noyes d4de608bb6 Fix OPEN_FOR_IDE build 2019-10-25 10:42:22 -07:00
sramamoorthy c9097cca18 deprecate isTLogInSameNode used by snapshot V1 2019-10-09 15:33:11 -07:00
Evan Tschannen dc1d055b27
Merge pull request #2042 from senthil-ram/snap_cli_fix
fix fdbcli --exec 'snapshot create.sh' failure
2019-08-30 13:40:38 -07:00
sramamoorthy b3277f2982 Fix #2009 posix compliant args for snapshot binary 2019-08-30 12:54:09 -07:00
sramamoorthy 64000eafb2 Fixes #2020 - snap binpath not to be passed as arg 2019-08-27 11:49:12 -07:00
sramamoorthy 9afd162e2f remove snap v1 related code 2019-07-25 17:29:31 -07:00
sramamoorthy 869f77aef1 Few cosmetic edits and fixes 2019-07-24 15:36:28 -07:00
sramamoorthy 8f1f0c0435 snap v2: worker and other helper related changes 2019-07-24 15:36:28 -07:00
mpilman ab019fbe41 More minor fixes, removed snapshots 2019-06-20 14:28:31 -07:00
sramamoorthy 1d1d42c8af disable boost::process code for windows and mac 2019-06-13 15:43:03 -07:00
A.J. Beamon 3dd2479193 Try avoiding use of boost in FDBExecHelper 2019-06-13 13:09:29 -07:00
sramamoorthy 4bcb590f12 g_random -> deterministicRandom() 2019-05-28 22:07:46 -07:00
sramamoorthy c906da1f62 simulator: spawnProcess to wait for long duration
spawnProcess was waiting for 3 seconds and terminating
the child process for synchronous calls, but in the
simulator, this can lead to non-determinism, because
some cases the command can run in <3 or >3 seconds.
The fix is to increase the wait for duration to be
very long that it has to synchronously wait and get
the results or the test will timeout.
2019-05-28 22:07:46 -07:00
sramamoorthy b56d8e648f bp::child->wait_for does not give correct err code
boost::process::child->wait_for does not give the error code
from the process being run. Re-arrange the code to work-around
it.
2019-05-28 22:07:46 -07:00
sramamoorthy dcd2d96751 make spawnProcess predictable in the simulator 2019-05-28 22:07:46 -07:00
sramamoorthy 936ffc2dde rebase related changes 2019-05-28 22:07:46 -07:00
sramamoorthy ec7834e2f7 code re-orgnaization and address comments 2019-05-28 22:07:46 -07:00