Zhe Wu
4e3e2b0392
Create health monitor in FDB workers to monitor network condition. This change is only inside the worker.
2021-06-16 14:50:44 -07:00
Lukas Joswiak
121ec1022c
Fix simulation bug
2021-06-13 22:31:04 -07:00
Josh Slocum
56dadaa428
TSS Mismatch Changes
2021-06-11 23:13:16 +00:00
sfc-gh-tclinkenbeard
41c790b299
Merge remote-tracking branch 'origin/master' into config-db
2021-06-10 22:31:23 -07:00
Steve Atherton
f7554b8fcb
Move FlowMutex unit test to FlowTests.
2021-06-08 16:58:35 -07:00
Evan Tschannen
08a5f17660
Merge branch 'master' of https://github.com/apple/foundationdb into feature-sim-time-batching
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
2021-06-08 10:04:06 -07:00
Evan Tschannen
52ef8b94fb
added comments
2021-06-08 09:57:37 -07:00
sfc-gh-tclinkenbeard
371a38e6e5
Merge remote-tracking branch 'origin/master' into remove-extra-copies
2021-06-07 10:26:06 -07:00
Andrew Noyes
402622ace9
Merge pull request #4909 from apple/anoyes/fix-ub
...
Fix several instances of undefined behavior
2021-06-07 08:58:45 -07:00
sfc-gh-tclinkenbeard
f10dd70c37
Remove configuration_database from status when disabled
2021-06-06 08:51:18 -07:00
Andrew Noyes
d6a6a8b3dd
Remove header that's no longer needed
2021-06-06 08:36:48 -07:00
Lukas Joswiak
486a04659f
Lazy inititialization
2021-06-04 15:01:18 -07:00
Josh Slocum
9b36f69b8d
Merge pull request #4892 from sfc-gh-jslocum/tss_mappingv2
...
TSS Mapping Change
2021-06-04 14:57:51 -07:00
Lukas Joswiak
84b06c68bc
Bump well known endpoint index
2021-06-04 13:31:57 -07:00
Lukas Joswiak
153de33f57
Revert "Merge pull request #4802 from sfc-gh-ljoswiak/revert/actor-lineage"
...
This reverts commit 6499fa178e
, reversing
changes made to 1512631957
.
2021-06-04 13:31:55 -07:00
Andrew Noyes
ce25a99000
Disallow conversion from float in specialCounter
2021-06-04 12:09:13 -07:00
Andrew Noyes
5fbadb66c2
Clamp to max int if large float is not representable as int
2021-06-04 09:42:39 -07:00
A.J. Beamon
24d17c013b
Add an assert to confirm that try_emplace is inserting a new entry
2021-06-03 13:51:47 -07:00
A.J. Beamon
7d83340993
Fix: when a file open completes synchronously, it wasn't being stored in the openFiles map.
2021-06-03 13:30:28 -07:00
Josh Slocum
ac209b32fd
Addressing review comments
2021-06-03 15:31:16 +00:00
Josh Slocum
b3e4f182ef
TSS Mapping Change
2021-06-02 17:30:09 +00:00
sfc-gh-tclinkenbeard
a775f92fca
Merge remote-tracking branch 'origin/master' into config-db
2021-06-01 15:39:34 -07:00
Josh Slocum
d67184163b
Merge pull request #4556 from sfc-gh-jslocum/tss
...
Testing Storage Server
2021-06-01 09:11:10 -07:00
sfc-gh-tclinkenbeard
ca0893571c
Move server knobs into fdbclient
2021-06-01 03:12:47 -07:00
sfc-gh-tclinkenbeard
6665f5cc4d
Support and test restricted range reads
2021-05-29 03:58:18 -07:00
A.J. Beamon
d35da1aeae
Merge pull request #4873 from sfc-gh-ajbeamon/close-files-in-simulation
...
Actually close files in simulation
2021-05-28 15:32:10 -07:00
A.J. Beamon
69dbe04d42
Rename WeakFutureReference to UnsafeWeakFutureReference and add warning comment
2021-05-28 14:34:20 -07:00
Josh Slocum
f6253db7dc
Addressing final PR comments
2021-05-28 18:19:42 +00:00
A.J. Beamon
d82eac4062
Fix a test issue where closing an AsyncFileNonDurable could permanently prevent you from reopening the file if the machine was in a failed state during cleanup
2021-05-27 20:41:49 -07:00
Dan Lambright
10289ef8f1
Respond to AJs comments
2021-05-27 09:14:32 -04:00
Dan Lambright
64c10d3625
fix joshua failures, formatting
2021-05-27 08:08:07 -04:00
Dan Lambright
fcfb78162c
misc cleanup for publishing
2021-05-27 08:08:07 -04:00
Dan Lambright
742c22cef2
Don't allow changing desriptor if knob is set
2021-05-27 08:08:07 -04:00
A.J. Beamon
944a03d575
For files that use the atomic write and create mechanism, attempt to remove the file from the openFiles map at both its old and new name
2021-05-26 16:26:45 -07:00
A.J. Beamon
a756469670
Use a weak reference in the open files cache (abstracted from a similar cache in AsyncFileCached) to avoid a problem where removing an item from the cache could cause us to reentrantly remove it again.
2021-05-26 13:38:24 -07:00
Markus Pilman
cbce2f6f11
delete dead code
2021-05-26 11:12:07 -07:00
Markus Pilman
7b4de4e037
Revert change
2021-05-26 11:11:51 -07:00
Markus Pilman
7cb767fd3c
only remove files from the open map if they have no modifications in flight
2021-05-26 11:11:44 -07:00
Markus Pilman
04613c3b13
handle file renames properly
2021-05-26 11:11:37 -07:00
Markus Pilman
f32ce0c4b5
fix typo
2021-05-26 11:11:24 -07:00
Markus Pilman
6bd7fa4036
Actually close files in simulation
2021-05-26 11:11:12 -07:00
Josh Slocum
4257ac2b4d
More TSS Changes/Fixes
2021-05-25 20:37:48 +00:00
Josh Slocum
ce82c9653e
Testing Storage Server implementation
2021-05-25 20:28:50 +00:00
Evan Tschannen
f57f0d64f4
Merge branch 'master' into feature-sim-time-batching
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
2021-05-20 09:09:35 -07:00
sfc-gh-tclinkenbeard
748a3ebfbe
Add GetSnapshotAndChangesRequest type
2021-05-18 15:28:44 -07:00
Steve Atherton
390b026f08
Merge branch 'master' into arena-page
2021-05-16 05:06:50 -07:00
Steve Atherton
2298567c2b
Use of aligned_alloc() for 4k pages causes too much wasted virtual memory. Added new 4k-aligned fast allocator, and changed Arena::allocatedAlignedBuffer() to be 4k-specific, now called Arena::allocate4kAlignedBuffer().
2021-05-14 23:12:00 -07:00
Lukas Joswiak
4ea760b2a9
Revert "Merge pull request #4136 from sfc-gh-mpilman/features/actor-lineage"
...
This reverts commit da41534618
, reversing
changes made to e6300905d6
.
2021-05-10 20:26:12 -07:00
sfc-gh-tclinkenbeard
f28ac955c3
Remove unnecessary temporary objects while growing objects of type std::vector<std::pair<A, B>>
2021-05-10 16:32:50 -07:00
Steve Atherton
f8a8bf315b
Added Arena::allocateAlignedBuffer() to get an aligned memory block owned by an Arena, and ArenaPage uses this.
2021-05-05 15:00:12 -07:00
Andrew Noyes
6bffbdf7e3
Revert "Actually close files in simulation"
2021-05-04 15:38:24 -07:00
sfc-gh-tclinkenbeard
d56906cd54
Addressed review comments
2021-05-03 15:26:27 -07:00
sfc-gh-tclinkenbeard
e5d6c5ed17
Merge remote-tracking branch 'origin/master' into encrypt-backup-files
2021-05-03 14:46:19 -07:00
Jingyu Zhou
1c92588cca
Merge pull request #4562 from sfc-gh-mpilman/bugfixes/simulator-close-files
...
Actually close files in simulation
2021-05-03 13:47:44 -07:00
Jingyu Zhou
d49e0091ce
Merge pull request #4727 from sfc-gh-etschannen/fix-rewrite-bme
...
Simulation could still stall writes for 10 seconds even when speedUpSimulation was on
2021-05-03 13:37:04 -07:00
Lukas Joswiak
8dcd779fc4
Merge branch 'master' into features/actor-lineage
2021-05-02 14:11:42 -07:00
Lukas Joswiak
cf4218dfd1
Fixes simulation failures
...
Fixes the following issues:
1. Use the right index when initializing the WriteOnlySet's vector of
atomics. Also switch to std::atomic_init to initialize each atomic in
the vector (cannot default construct the atomics in the vector
because std::atomic does not have a copy constructor).
2. Add failure check for when items cannot be inserted into the
WriteOnlySet due to capacity constraints. This situation occurs when
`copy` is not called on the WriteOnlySet, such as when sampling is
disabled. The `copy` function is what clears the WriteOnlySet.
3. Remove a global config feature I added to update the ClientDBInfo
object used by the global config listener function. This needs more
investigation, but the effect of this change could be that global
config changes are not correctly recognized on fdbserver processes.
4. Add various ASSERTs to verify data in WriteOnlySet.
2021-05-01 15:26:28 -07:00
Andrew Noyes
904a39e473
Merge pull request #4667 from sfc-gh-ajbeamon/feature-mvc-monitor-protocol-version
...
Use fewer connections in the multi-version client
2021-04-28 14:13:17 -07:00
Evan Tschannen
65fcf4014e
Fix: simulation could still stall writes for 10 seconds even when speedUpSimulation was on
...
Fix: disable connection failures in simulation when there are too many generations outstanding
2021-04-28 12:41:48 -07:00
A.J. Beamon
7158bfa82e
Merge branch 'master' into load-balance-remove-make-request-actor
2021-04-28 10:31:41 -07:00
A.J. Beamon
135cc9c69a
Make parameter const&
2021-04-28 10:30:30 -07:00
Markus Pilman
4fab2ecd30
Merge remote-tracking branch 'origin/master' into features/actor-lineage
2021-04-28 09:20:54 -06:00
A.J. Beamon
9009780aa8
Fix bug that could cause the server to crash when an old client connected
2021-04-27 11:15:16 -07:00
Evan Tschannen
a02da36e85
fixed the problem with the GrvProxyClass the proper way my keeping the enum the same between versions
2021-04-26 18:45:44 -07:00
Evan Tschannen
451609e6be
code cleanup
2021-04-26 10:16:18 -07:00
A.J. Beamon
a794fca932
Support 5.0 (and earlier) client versions by adding GRV probing for old versions. Update the C bindings implementation of get_server_protocol to convert the ProtocolVersion object into a uint64_t. Rename a misleading protocol version alias.
2021-04-23 15:00:21 -07:00
sfc-gh-tclinkenbeard
d6fa06afdd
Add IConfigTransaction::getRange (not yet tested)
2021-04-23 11:39:26 -07:00
sfc-gh-tclinkenbeard
050eb079bd
Add ConfigFollowerCompactRequest
2021-04-21 12:18:52 -07:00
A.J. Beamon
28f8a2716e
For old incompatible connections, set the correct protocol version on the version async var
2021-04-21 11:54:05 -07:00
Evan Tschannen
e18c9961b4
rewrote tlog recruitment logic so that it is deterministic, to prevent better master exists from triggering spuriously
2021-04-21 00:22:33 -07:00
A.J. Beamon
eaaae2e16d
Merge branch master into 'feature-mvc-monitor-protocol-version'
2021-04-20 15:07:02 -07:00
Markus Pilman
7307750e5e
Merge remote-tracking branch 'origin/master' into features/actor-lineage
2021-04-19 11:29:52 -06:00
sfc-gh-tclinkenbeard
f54f082159
Build interfaces for full config update pipeline
2021-04-16 17:58:00 -07:00
Steve Atherton
db610355cf
Keep simulated disk write delay high until speedUp is set.
2021-04-16 14:19:37 -07:00
Lukas Joswiak
551268b0f2
Add well known endpoint for worker communication
2021-04-15 13:50:50 -07:00
A.J. Beamon
b2d6930103
The multi-version client monitors the cluster's protocol version and only activates the client library that can connect.
2021-04-15 11:45:14 -07:00
sfc-gh-tclinkenbeard
18f17a4ea2
First draft of config-db
2021-04-14 22:06:37 -07:00
Steve Atherton
de236894cb
Merge commit 'eeee15f524ff769248495d70efa0501170fb5ea2' into correctness-fix
2021-04-14 11:50:49 -07:00
Steve Atherton
1958fde5c6
Added parentheses for clarity.
2021-04-13 20:49:04 -07:00
Steve Atherton
f74748ebac
Applied clang-format.
2021-04-13 20:43:12 -07:00
Steve Atherton
9475b6a5dd
Correctness fix, prevent AsyncFileNonDurable from always making file writes take up to 5 seconds.
2021-04-13 20:15:19 -07:00
Markus Pilman
ed8d43cb87
Merge pull request #5 from sfc-gh-ljoswiak/features/network-actors
...
Sample actors waiting on network
2021-04-09 15:07:53 -06:00
Evan Tschannen
a90c26f1d0
The master, proxies, and resolver all need to have the same machine class fitness function besides best fit to ensure recruitment is deterministic
...
if the first GRV proxy or resolver is forced to share a process, it should prefer to share with the commit proxy so that the commit proxy has more potential options it can share with
2021-04-08 14:29:12 -07:00
A.J. Beamon
931499fb3f
Merge branch 'master' into load-balance-remove-make-request-actor
2021-04-08 09:11:35 -07:00
A.J. Beamon
040ba0c587
Rearrange things no that the backoff delay has no impact unless it's needed.
2021-04-07 15:23:50 -07:00
Lukas Joswiak
433872e17d
Sample actors waiting on network
2021-04-06 17:28:28 -07:00
Markus Pilman
9bcde529f8
Merge pull request #4 from sfc-gh-ljoswiak/features/current-actor
...
Sample running actor
2021-04-05 11:36:48 -06:00
Markus Pilman
41d1aee609
delete dead code
2021-04-01 14:06:13 -06:00
Markus Pilman
1987682e1e
Merge remote-tracking branch 'origin/master' into bugfixes/simulator-close-files
2021-04-01 11:14:28 -06:00
Markus Pilman
dc35af3760
Merge remote-tracking branch 'origin/master' into features/actor-lineage
2021-04-01 11:01:31 -06:00
Markus Pilman
ce8fce94c8
Merge pull request #4596 from sfc-gh-etschannen/fix-starting-config
...
Fixed simulations which timeout setting starting configuration
2021-03-31 10:31:28 -06:00
Evan Tschannen
e774262046
fix: g_simulator.disableRemote did not contain the rest of the configuration
2021-03-30 21:11:26 -07:00
sfc-gh-tclinkenbeard
d4191899d9
Add comments for AsyncFileEncrypted changes
2021-03-28 22:14:37 -07:00
sfc-gh-tclinkenbeard
82420e5572
Merge remote-tracking branch 'origin/master' into encrypt-backup-files
2021-03-27 21:02:19 -07:00
Markus Pilman
1033db9fba
Revert change
2021-03-25 14:00:07 -06:00
Markus Pilman
1385a776da
only remove files from the open map if they have no modifications in flight
2021-03-25 13:22:29 -06:00
Markus Pilman
b51e4aa590
handle file renames properly
2021-03-24 19:57:24 -06:00
Markus Pilman
6a344ddeab
fix typo
2021-03-24 16:56:11 -06:00
Markus Pilman
f7d3b31ef8
Actually close files in simulation
2021-03-24 16:27:35 -06:00
A.J. Beamon
36f4c17ef1
Reduce the number of actor calls in load balancing to improve performance.
2021-03-24 15:04:45 -07:00
Lukas Joswiak
2dfd420882
Add sampling profiler thread
2021-03-24 14:52:42 -07:00
A.J. Beamon
f1166f2bf6
Merge pull request #4545 from sfc-gh-anoyes/anoyes/fix-truncate-simulation
...
In simulation, fix treatment of extending a file with truncate as a "pending modification"
2021-03-24 12:35:32 -07:00
Andrew Noyes
eb80321ea3
Attempt to fix windows build
2021-03-24 18:48:10 +00:00
Andrew Noyes
c186d363c6
Add unit test
2021-03-24 17:32:07 +00:00
Andrew Noyes
170c197c4c
Truncate marks everything after size modified
2021-03-23 21:07:12 +00:00
Andrew Noyes
e83de2b799
Fix bug: minSizeAfterPendingModifications needs to be maxed
2021-03-23 21:00:21 +00:00
Evan Tschannen
a893309112
Opening a file with OPEN_ATOMIC_WRITE_AND_CREATE should create a new file handle, so that if a file with the same name is still in use, operations against it will not happen to the new file. This can happen when the disk queue replaces a file.
2021-03-23 13:47:46 -07:00
Andrew Noyes
0daf6cf632
Consider extending a file with truncate as a "pending modification"
...
Before this, truncating and reading concurrently could cause to read
uninitialized memory. So could truncating then reading, since the effect
of the truncate in the actual file was allowed to be delayed. Now reads
will wait for a truncate that extends the file to complete if they
intersect the newly-zeroed region.
2021-03-23 19:44:36 +00:00
sfc-gh-tclinkenbeard
a0c49234b2
Merge remote-tracking branch 'origin/master' into encrypt-backup-files
2021-03-19 20:47:53 -07:00
Evan Tschannen
e1ebe2f487
clang-format
2021-03-19 13:17:39 -07:00
Evan Tschannen
78e81e514a
fix: OPEN_ATOMIC_WRITE_AND_CREATE did not create a new file handle for the replacement file, so when the disk queue calls replaceFile, truncates against the old file handle will happen on the new file resulting in corruption
2021-03-19 13:11:52 -07:00
Evan Tschannen
d7491a8f30
removed logging
2021-03-19 13:08:22 -07:00
Evan Tschannen
22f5033c6a
add filehandle
2021-03-18 23:29:22 -07:00
Evan Tschannen
335b59eafe
log all file size changes
2021-03-18 22:52:19 -07:00
Evan Tschannen
b1ac27cec1
attempt to avoid having batching consume extra simulated time
2021-03-18 13:05:29 -07:00
Evan Tschannen
00f114b976
update mutex usage
2021-03-18 11:18:07 -07:00
Evan Tschannen
c53dd4a46f
check isStopped between each task
2021-03-18 10:49:24 -07:00
Evan Tschannen
488fe6f008
give more time to cleanup tasks when rebooting
2021-03-17 21:50:19 -07:00
Evan Tschannen
c7ef8377d2
add back in the machine check
2021-03-17 18:48:56 -07:00
Evan Tschannen
67967b5272
slightly adjusted seconds check
2021-03-17 18:06:42 -07:00
Evan Tschannen
06fe6917ab
switch to a separate queue for ordered tasks
2021-03-17 17:45:04 -07:00
Evan Tschannen
c44035a27b
ordered tasks in a batch are executed first and in their creation order
2021-03-17 17:07:25 -07:00
Evan Tschannen
5af2962d04
ordered tasks are executed at the highest priority instead of disabling batching
2021-03-17 16:44:49 -07:00
Evan Tschannen
bf4fcbdb5e
fix compile error
2021-03-17 16:31:44 -07:00
Evan Tschannen
ec4c29361c
do not allow batching with tasks that must be ordered
2021-03-17 16:29:33 -07:00
Evan Tschannen
3233fa339e
temporarily disable stable reordering to make sure the result of the PR is correctness clean
2021-03-17 16:02:07 -07:00
Evan Tschannen
bf75ee2cc6
fixed formatting
2021-03-17 15:36:25 -07:00
Evan Tschannen
151018a36a
do not reorder on the machine process
2021-03-17 15:35:34 -07:00
Evan Tschannen
a7178b3e5f
changed the logic for when stable can be randomized
2021-03-17 15:01:33 -07:00
Evan Tschannen
e16c4d71f1
simulation framework delays still need to be ordered
2021-03-17 14:34:22 -07:00
Evan Tschannen
fb883b482d
avoid possible collisions between stable numbers
2021-03-17 14:09:46 -07:00
Evan Tschannen
7702f0151f
randomize execution order of tasks with the same priority
2021-03-17 13:59:00 -07:00
Evan Tschannen
514e80d8a5
sort by time after priority to better match Net2
2021-03-17 13:45:04 -07:00
Evan Tschannen
3275cd7b94
fix spacing
2021-03-17 13:19:45 -07:00
Evan Tschannen
524662e871
use the stable value from instead task instead of stable_sort
2021-03-17 13:18:28 -07:00
Evan Tschannen
056764462d
enable randLog
2021-03-17 12:15:20 -07:00
Evan Tschannen
a096d1f403
switch to stable_sort
2021-03-17 11:38:23 -07:00
Evan Tschannen
c390cf7c6c
removed an assert
2021-03-17 11:05:02 -07:00
Evan Tschannen
2ea3c971d1
execute tasks within a batch in priority order
2021-03-17 11:01:06 -07:00
Markus Pilman
eb036b7b02
Merge remote-tracking branch 'origin/master' into features/actor-lineage
2021-03-17 11:59:54 -06:00
Evan Tschannen
a3c48772e1
Merge branch 'master' into feature-sim-time-batching
2021-03-17 09:55:07 -07:00
Evan Tschannen
b3301fe361
fix: do not allow actualTime to go backward
2021-03-16 19:00:37 -07:00
Evan Tschannen
394e43a18d
updated how simulation does the batch to better match the real runloop
2021-03-16 18:50:47 -07:00
Evan Tschannen
9cf59b44be
do not batch with tasks created as a result of other tasks in the same batch
2021-03-16 17:27:14 -07:00
Evan Tschannen
3a218e4b32
limit the number of tasks that can be executed with the same now()
2021-03-16 16:55:53 -07:00
Andrew Noyes
e7abffbe71
Merge pull request #4494 from sfc-gh-etschannen/feature-fix-sim-reliable
...
Fixed a bug in isReliable() of a simulated process
2021-03-16 10:19:28 -07:00
A.J. Beamon
25c4880ebe
Merge branch 'release-6.3' into merge-release-6.3-into-master (temporarily discard all changes to BackupContainer.actor.cpp)
...
# Conflicts:
# fdbclient/BackupContainer.actor.cpp
# fdbserver/Knobs.h
2021-03-15 16:41:04 -07:00
Evan Tschannen
13242d8b35
the sim2 runloop now updates time in batches so that multiple tasks can execute with the same now()
2021-03-15 12:33:43 -07:00
Evan Tschannen
6a372e3fc7
fixed a simulation bug where a process on an unreliable machine would be considered reliable by the simulator
2021-03-15 11:07:36 -07:00
Markus Pilman
d0cc649ca2
fixed comment
2021-03-12 09:45:02 -07:00