Commit Graph

1460 Commits

Author SHA1 Message Date
Zhe Wu 4e3e2b0392 Create health monitor in FDB workers to monitor network condition. This change is only inside the worker. 2021-06-16 14:50:44 -07:00
Lukas Joswiak 121ec1022c Fix simulation bug 2021-06-13 22:31:04 -07:00
Josh Slocum 56dadaa428 TSS Mismatch Changes 2021-06-11 23:13:16 +00:00
sfc-gh-tclinkenbeard 41c790b299 Merge remote-tracking branch 'origin/master' into config-db 2021-06-10 22:31:23 -07:00
Steve Atherton f7554b8fcb Move FlowMutex unit test to FlowTests. 2021-06-08 16:58:35 -07:00
Evan Tschannen 08a5f17660 Merge branch 'master' of https://github.com/apple/foundationdb into feature-sim-time-batching
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
2021-06-08 10:04:06 -07:00
Evan Tschannen 52ef8b94fb added comments 2021-06-08 09:57:37 -07:00
sfc-gh-tclinkenbeard 371a38e6e5 Merge remote-tracking branch 'origin/master' into remove-extra-copies 2021-06-07 10:26:06 -07:00
Andrew Noyes 402622ace9
Merge pull request #4909 from apple/anoyes/fix-ub
Fix several instances of undefined behavior
2021-06-07 08:58:45 -07:00
sfc-gh-tclinkenbeard f10dd70c37 Remove configuration_database from status when disabled 2021-06-06 08:51:18 -07:00
Andrew Noyes d6a6a8b3dd
Remove header that's no longer needed 2021-06-06 08:36:48 -07:00
Lukas Joswiak 486a04659f Lazy inititialization 2021-06-04 15:01:18 -07:00
Josh Slocum 9b36f69b8d
Merge pull request #4892 from sfc-gh-jslocum/tss_mappingv2
TSS Mapping Change
2021-06-04 14:57:51 -07:00
Lukas Joswiak 84b06c68bc Bump well known endpoint index 2021-06-04 13:31:57 -07:00
Lukas Joswiak 153de33f57 Revert "Merge pull request #4802 from sfc-gh-ljoswiak/revert/actor-lineage"
This reverts commit 6499fa178e, reversing
changes made to 1512631957.
2021-06-04 13:31:55 -07:00
Andrew Noyes ce25a99000 Disallow conversion from float in specialCounter 2021-06-04 12:09:13 -07:00
Andrew Noyes 5fbadb66c2 Clamp to max int if large float is not representable as int 2021-06-04 09:42:39 -07:00
A.J. Beamon 24d17c013b Add an assert to confirm that try_emplace is inserting a new entry 2021-06-03 13:51:47 -07:00
A.J. Beamon 7d83340993 Fix: when a file open completes synchronously, it wasn't being stored in the openFiles map. 2021-06-03 13:30:28 -07:00
Josh Slocum ac209b32fd Addressing review comments 2021-06-03 15:31:16 +00:00
Josh Slocum b3e4f182ef TSS Mapping Change 2021-06-02 17:30:09 +00:00
sfc-gh-tclinkenbeard a775f92fca Merge remote-tracking branch 'origin/master' into config-db 2021-06-01 15:39:34 -07:00
Josh Slocum d67184163b
Merge pull request #4556 from sfc-gh-jslocum/tss
Testing Storage Server
2021-06-01 09:11:10 -07:00
sfc-gh-tclinkenbeard ca0893571c Move server knobs into fdbclient 2021-06-01 03:12:47 -07:00
sfc-gh-tclinkenbeard 6665f5cc4d Support and test restricted range reads 2021-05-29 03:58:18 -07:00
A.J. Beamon d35da1aeae
Merge pull request #4873 from sfc-gh-ajbeamon/close-files-in-simulation
Actually close files in simulation
2021-05-28 15:32:10 -07:00
A.J. Beamon 69dbe04d42 Rename WeakFutureReference to UnsafeWeakFutureReference and add warning comment 2021-05-28 14:34:20 -07:00
Josh Slocum f6253db7dc Addressing final PR comments 2021-05-28 18:19:42 +00:00
A.J. Beamon d82eac4062 Fix a test issue where closing an AsyncFileNonDurable could permanently prevent you from reopening the file if the machine was in a failed state during cleanup 2021-05-27 20:41:49 -07:00
Dan Lambright 10289ef8f1 Respond to AJs comments 2021-05-27 09:14:32 -04:00
Dan Lambright 64c10d3625 fix joshua failures, formatting 2021-05-27 08:08:07 -04:00
Dan Lambright fcfb78162c misc cleanup for publishing 2021-05-27 08:08:07 -04:00
Dan Lambright 742c22cef2 Don't allow changing desriptor if knob is set 2021-05-27 08:08:07 -04:00
A.J. Beamon 944a03d575 For files that use the atomic write and create mechanism, attempt to remove the file from the openFiles map at both its old and new name 2021-05-26 16:26:45 -07:00
A.J. Beamon a756469670 Use a weak reference in the open files cache (abstracted from a similar cache in AsyncFileCached) to avoid a problem where removing an item from the cache could cause us to reentrantly remove it again. 2021-05-26 13:38:24 -07:00
Markus Pilman cbce2f6f11 delete dead code 2021-05-26 11:12:07 -07:00
Markus Pilman 7b4de4e037 Revert change 2021-05-26 11:11:51 -07:00
Markus Pilman 7cb767fd3c only remove files from the open map if they have no modifications in flight 2021-05-26 11:11:44 -07:00
Markus Pilman 04613c3b13 handle file renames properly 2021-05-26 11:11:37 -07:00
Markus Pilman f32ce0c4b5 fix typo 2021-05-26 11:11:24 -07:00
Markus Pilman 6bd7fa4036 Actually close files in simulation 2021-05-26 11:11:12 -07:00
Josh Slocum 4257ac2b4d More TSS Changes/Fixes 2021-05-25 20:37:48 +00:00
Josh Slocum ce82c9653e Testing Storage Server implementation 2021-05-25 20:28:50 +00:00
Evan Tschannen f57f0d64f4 Merge branch 'master' into feature-sim-time-batching
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
2021-05-20 09:09:35 -07:00
sfc-gh-tclinkenbeard 748a3ebfbe Add GetSnapshotAndChangesRequest type 2021-05-18 15:28:44 -07:00
Steve Atherton 390b026f08
Merge branch 'master' into arena-page 2021-05-16 05:06:50 -07:00
Steve Atherton 2298567c2b Use of aligned_alloc() for 4k pages causes too much wasted virtual memory. Added new 4k-aligned fast allocator, and changed Arena::allocatedAlignedBuffer() to be 4k-specific, now called Arena::allocate4kAlignedBuffer(). 2021-05-14 23:12:00 -07:00
Lukas Joswiak 4ea760b2a9 Revert "Merge pull request #4136 from sfc-gh-mpilman/features/actor-lineage"
This reverts commit da41534618, reversing
changes made to e6300905d6.
2021-05-10 20:26:12 -07:00
sfc-gh-tclinkenbeard f28ac955c3 Remove unnecessary temporary objects while growing objects of type std::vector<std::pair<A, B>> 2021-05-10 16:32:50 -07:00
Steve Atherton f8a8bf315b Added Arena::allocateAlignedBuffer() to get an aligned memory block owned by an Arena, and ArenaPage uses this. 2021-05-05 15:00:12 -07:00
Andrew Noyes 6bffbdf7e3
Revert "Actually close files in simulation" 2021-05-04 15:38:24 -07:00
sfc-gh-tclinkenbeard d56906cd54 Addressed review comments 2021-05-03 15:26:27 -07:00
sfc-gh-tclinkenbeard e5d6c5ed17 Merge remote-tracking branch 'origin/master' into encrypt-backup-files 2021-05-03 14:46:19 -07:00
Jingyu Zhou 1c92588cca
Merge pull request #4562 from sfc-gh-mpilman/bugfixes/simulator-close-files
Actually close files in simulation
2021-05-03 13:47:44 -07:00
Jingyu Zhou d49e0091ce
Merge pull request #4727 from sfc-gh-etschannen/fix-rewrite-bme
Simulation could still stall writes for 10 seconds even when speedUpSimulation was on
2021-05-03 13:37:04 -07:00
Lukas Joswiak 8dcd779fc4
Merge branch 'master' into features/actor-lineage 2021-05-02 14:11:42 -07:00
Lukas Joswiak cf4218dfd1 Fixes simulation failures
Fixes the following issues:

1. Use the right index when initializing the WriteOnlySet's vector of
   atomics. Also switch to std::atomic_init to initialize each atomic in
   the vector (cannot default construct the atomics in the vector
   because std::atomic does not have a copy constructor).
2. Add failure check for when items cannot be inserted into the
   WriteOnlySet due to capacity constraints. This situation occurs when
   `copy` is not called on the WriteOnlySet, such as when sampling is
   disabled. The `copy` function is what clears the WriteOnlySet.
3. Remove a global config feature I added to update the ClientDBInfo
   object used by the global config listener function. This needs more
   investigation, but the effect of this change could be that global
   config changes are not correctly recognized on fdbserver processes.
4. Add various ASSERTs to verify data in WriteOnlySet.
2021-05-01 15:26:28 -07:00
Andrew Noyes 904a39e473
Merge pull request #4667 from sfc-gh-ajbeamon/feature-mvc-monitor-protocol-version
Use fewer connections in the multi-version client
2021-04-28 14:13:17 -07:00
Evan Tschannen 65fcf4014e Fix: simulation could still stall writes for 10 seconds even when speedUpSimulation was on
Fix: disable connection failures in simulation when there are too many generations outstanding
2021-04-28 12:41:48 -07:00
A.J. Beamon 7158bfa82e Merge branch 'master' into load-balance-remove-make-request-actor 2021-04-28 10:31:41 -07:00
A.J. Beamon 135cc9c69a Make parameter const& 2021-04-28 10:30:30 -07:00
Markus Pilman 4fab2ecd30 Merge remote-tracking branch 'origin/master' into features/actor-lineage 2021-04-28 09:20:54 -06:00
A.J. Beamon 9009780aa8 Fix bug that could cause the server to crash when an old client connected 2021-04-27 11:15:16 -07:00
Evan Tschannen a02da36e85 fixed the problem with the GrvProxyClass the proper way my keeping the enum the same between versions 2021-04-26 18:45:44 -07:00
Evan Tschannen 451609e6be code cleanup 2021-04-26 10:16:18 -07:00
A.J. Beamon a794fca932 Support 5.0 (and earlier) client versions by adding GRV probing for old versions. Update the C bindings implementation of get_server_protocol to convert the ProtocolVersion object into a uint64_t. Rename a misleading protocol version alias. 2021-04-23 15:00:21 -07:00
sfc-gh-tclinkenbeard d6fa06afdd Add IConfigTransaction::getRange (not yet tested) 2021-04-23 11:39:26 -07:00
sfc-gh-tclinkenbeard 050eb079bd Add ConfigFollowerCompactRequest 2021-04-21 12:18:52 -07:00
A.J. Beamon 28f8a2716e For old incompatible connections, set the correct protocol version on the version async var 2021-04-21 11:54:05 -07:00
Evan Tschannen e18c9961b4 rewrote tlog recruitment logic so that it is deterministic, to prevent better master exists from triggering spuriously 2021-04-21 00:22:33 -07:00
A.J. Beamon eaaae2e16d Merge branch master into 'feature-mvc-monitor-protocol-version' 2021-04-20 15:07:02 -07:00
Markus Pilman 7307750e5e Merge remote-tracking branch 'origin/master' into features/actor-lineage 2021-04-19 11:29:52 -06:00
sfc-gh-tclinkenbeard f54f082159 Build interfaces for full config update pipeline 2021-04-16 17:58:00 -07:00
Steve Atherton db610355cf Keep simulated disk write delay high until speedUp is set. 2021-04-16 14:19:37 -07:00
Lukas Joswiak 551268b0f2 Add well known endpoint for worker communication 2021-04-15 13:50:50 -07:00
A.J. Beamon b2d6930103 The multi-version client monitors the cluster's protocol version and only activates the client library that can connect. 2021-04-15 11:45:14 -07:00
sfc-gh-tclinkenbeard 18f17a4ea2 First draft of config-db 2021-04-14 22:06:37 -07:00
Steve Atherton de236894cb Merge commit 'eeee15f524ff769248495d70efa0501170fb5ea2' into correctness-fix 2021-04-14 11:50:49 -07:00
Steve Atherton 1958fde5c6 Added parentheses for clarity. 2021-04-13 20:49:04 -07:00
Steve Atherton f74748ebac Applied clang-format. 2021-04-13 20:43:12 -07:00
Steve Atherton 9475b6a5dd Correctness fix, prevent AsyncFileNonDurable from always making file writes take up to 5 seconds. 2021-04-13 20:15:19 -07:00
Markus Pilman ed8d43cb87
Merge pull request #5 from sfc-gh-ljoswiak/features/network-actors
Sample actors waiting on network
2021-04-09 15:07:53 -06:00
Evan Tschannen a90c26f1d0 The master, proxies, and resolver all need to have the same machine class fitness function besides best fit to ensure recruitment is deterministic
if the first GRV proxy or resolver is forced to share a process, it should prefer to share with the commit proxy so that the commit proxy has more potential options it can share with
2021-04-08 14:29:12 -07:00
A.J. Beamon 931499fb3f Merge branch 'master' into load-balance-remove-make-request-actor 2021-04-08 09:11:35 -07:00
A.J. Beamon 040ba0c587 Rearrange things no that the backoff delay has no impact unless it's needed. 2021-04-07 15:23:50 -07:00
Lukas Joswiak 433872e17d Sample actors waiting on network 2021-04-06 17:28:28 -07:00
Markus Pilman 9bcde529f8
Merge pull request #4 from sfc-gh-ljoswiak/features/current-actor
Sample running actor
2021-04-05 11:36:48 -06:00
Markus Pilman 41d1aee609 delete dead code 2021-04-01 14:06:13 -06:00
Markus Pilman 1987682e1e Merge remote-tracking branch 'origin/master' into bugfixes/simulator-close-files 2021-04-01 11:14:28 -06:00
Markus Pilman dc35af3760 Merge remote-tracking branch 'origin/master' into features/actor-lineage 2021-04-01 11:01:31 -06:00
Markus Pilman ce8fce94c8
Merge pull request #4596 from sfc-gh-etschannen/fix-starting-config
Fixed simulations which timeout setting starting configuration
2021-03-31 10:31:28 -06:00
Evan Tschannen e774262046 fix: g_simulator.disableRemote did not contain the rest of the configuration 2021-03-30 21:11:26 -07:00
sfc-gh-tclinkenbeard d4191899d9 Add comments for AsyncFileEncrypted changes 2021-03-28 22:14:37 -07:00
sfc-gh-tclinkenbeard 82420e5572 Merge remote-tracking branch 'origin/master' into encrypt-backup-files 2021-03-27 21:02:19 -07:00
Markus Pilman 1033db9fba Revert change 2021-03-25 14:00:07 -06:00
Markus Pilman 1385a776da only remove files from the open map if they have no modifications in flight 2021-03-25 13:22:29 -06:00
Markus Pilman b51e4aa590 handle file renames properly 2021-03-24 19:57:24 -06:00
Markus Pilman 6a344ddeab fix typo 2021-03-24 16:56:11 -06:00
Markus Pilman f7d3b31ef8 Actually close files in simulation 2021-03-24 16:27:35 -06:00
A.J. Beamon 36f4c17ef1 Reduce the number of actor calls in load balancing to improve performance. 2021-03-24 15:04:45 -07:00
Lukas Joswiak 2dfd420882 Add sampling profiler thread 2021-03-24 14:52:42 -07:00
A.J. Beamon f1166f2bf6
Merge pull request #4545 from sfc-gh-anoyes/anoyes/fix-truncate-simulation
In simulation, fix treatment of extending a file with truncate as a "pending modification"
2021-03-24 12:35:32 -07:00
Andrew Noyes eb80321ea3 Attempt to fix windows build 2021-03-24 18:48:10 +00:00
Andrew Noyes c186d363c6 Add unit test 2021-03-24 17:32:07 +00:00
Andrew Noyes 170c197c4c Truncate marks everything after size modified 2021-03-23 21:07:12 +00:00
Andrew Noyes e83de2b799 Fix bug: minSizeAfterPendingModifications needs to be maxed 2021-03-23 21:00:21 +00:00
Evan Tschannen a893309112 Opening a file with OPEN_ATOMIC_WRITE_AND_CREATE should create a new file handle, so that if a file with the same name is still in use, operations against it will not happen to the new file. This can happen when the disk queue replaces a file. 2021-03-23 13:47:46 -07:00
Andrew Noyes 0daf6cf632 Consider extending a file with truncate as a "pending modification"
Before this, truncating and reading concurrently could cause to read
uninitialized memory. So could truncating then reading, since the effect
of the truncate in the actual file was allowed to be delayed. Now reads
will wait for a truncate that extends the file to complete if they
intersect the newly-zeroed region.
2021-03-23 19:44:36 +00:00
sfc-gh-tclinkenbeard a0c49234b2 Merge remote-tracking branch 'origin/master' into encrypt-backup-files 2021-03-19 20:47:53 -07:00
Evan Tschannen e1ebe2f487 clang-format 2021-03-19 13:17:39 -07:00
Evan Tschannen 78e81e514a fix: OPEN_ATOMIC_WRITE_AND_CREATE did not create a new file handle for the replacement file, so when the disk queue calls replaceFile, truncates against the old file handle will happen on the new file resulting in corruption 2021-03-19 13:11:52 -07:00
Evan Tschannen d7491a8f30 removed logging 2021-03-19 13:08:22 -07:00
Evan Tschannen 22f5033c6a add filehandle 2021-03-18 23:29:22 -07:00
Evan Tschannen 335b59eafe log all file size changes 2021-03-18 22:52:19 -07:00
Evan Tschannen b1ac27cec1 attempt to avoid having batching consume extra simulated time 2021-03-18 13:05:29 -07:00
Evan Tschannen 00f114b976 update mutex usage 2021-03-18 11:18:07 -07:00
Evan Tschannen c53dd4a46f check isStopped between each task 2021-03-18 10:49:24 -07:00
Evan Tschannen 488fe6f008 give more time to cleanup tasks when rebooting 2021-03-17 21:50:19 -07:00
Evan Tschannen c7ef8377d2 add back in the machine check 2021-03-17 18:48:56 -07:00
Evan Tschannen 67967b5272 slightly adjusted seconds check 2021-03-17 18:06:42 -07:00
Evan Tschannen 06fe6917ab switch to a separate queue for ordered tasks 2021-03-17 17:45:04 -07:00
Evan Tschannen c44035a27b ordered tasks in a batch are executed first and in their creation order 2021-03-17 17:07:25 -07:00
Evan Tschannen 5af2962d04 ordered tasks are executed at the highest priority instead of disabling batching 2021-03-17 16:44:49 -07:00
Evan Tschannen bf4fcbdb5e fix compile error 2021-03-17 16:31:44 -07:00
Evan Tschannen ec4c29361c do not allow batching with tasks that must be ordered 2021-03-17 16:29:33 -07:00
Evan Tschannen 3233fa339e temporarily disable stable reordering to make sure the result of the PR is correctness clean 2021-03-17 16:02:07 -07:00
Evan Tschannen bf75ee2cc6 fixed formatting 2021-03-17 15:36:25 -07:00
Evan Tschannen 151018a36a do not reorder on the machine process 2021-03-17 15:35:34 -07:00
Evan Tschannen a7178b3e5f changed the logic for when stable can be randomized 2021-03-17 15:01:33 -07:00
Evan Tschannen e16c4d71f1 simulation framework delays still need to be ordered 2021-03-17 14:34:22 -07:00
Evan Tschannen fb883b482d avoid possible collisions between stable numbers 2021-03-17 14:09:46 -07:00
Evan Tschannen 7702f0151f randomize execution order of tasks with the same priority 2021-03-17 13:59:00 -07:00
Evan Tschannen 514e80d8a5 sort by time after priority to better match Net2 2021-03-17 13:45:04 -07:00
Evan Tschannen 3275cd7b94 fix spacing 2021-03-17 13:19:45 -07:00
Evan Tschannen 524662e871 use the stable value from instead task instead of stable_sort 2021-03-17 13:18:28 -07:00
Evan Tschannen 056764462d enable randLog 2021-03-17 12:15:20 -07:00
Evan Tschannen a096d1f403 switch to stable_sort 2021-03-17 11:38:23 -07:00
Evan Tschannen c390cf7c6c removed an assert 2021-03-17 11:05:02 -07:00
Evan Tschannen 2ea3c971d1 execute tasks within a batch in priority order 2021-03-17 11:01:06 -07:00
Markus Pilman eb036b7b02 Merge remote-tracking branch 'origin/master' into features/actor-lineage 2021-03-17 11:59:54 -06:00
Evan Tschannen a3c48772e1 Merge branch 'master' into feature-sim-time-batching 2021-03-17 09:55:07 -07:00
Evan Tschannen b3301fe361 fix: do not allow actualTime to go backward 2021-03-16 19:00:37 -07:00
Evan Tschannen 394e43a18d updated how simulation does the batch to better match the real runloop 2021-03-16 18:50:47 -07:00
Evan Tschannen 9cf59b44be do not batch with tasks created as a result of other tasks in the same batch 2021-03-16 17:27:14 -07:00
Evan Tschannen 3a218e4b32 limit the number of tasks that can be executed with the same now() 2021-03-16 16:55:53 -07:00
Andrew Noyes e7abffbe71
Merge pull request #4494 from sfc-gh-etschannen/feature-fix-sim-reliable
Fixed a bug in isReliable() of a simulated process
2021-03-16 10:19:28 -07:00
A.J. Beamon 25c4880ebe Merge branch 'release-6.3' into merge-release-6.3-into-master (temporarily discard all changes to BackupContainer.actor.cpp)
# Conflicts:
#	fdbclient/BackupContainer.actor.cpp
#	fdbserver/Knobs.h
2021-03-15 16:41:04 -07:00
Evan Tschannen 13242d8b35 the sim2 runloop now updates time in batches so that multiple tasks can execute with the same now() 2021-03-15 12:33:43 -07:00
Evan Tschannen 6a372e3fc7 fixed a simulation bug where a process on an unreliable machine would be considered reliable by the simulator 2021-03-15 11:07:36 -07:00
Markus Pilman d0cc649ca2 fixed comment 2021-03-12 09:45:02 -07:00