Commit Graph

228 Commits

Author SHA1 Message Date
Alex Miller c3a8ae4752
Merge pull request #1791 from fzhjon/fetch-keys-requests-priority
Introduce priority to fetchKeys requests from data distribution
2019-07-19 14:54:51 -07:00
Balachandar Namasivayam ecb3de3b49 Fixed space issue. 2019-07-17 18:10:05 -07:00
Balachandar Namasivayam 406bcebdc4 Ratekeeper to throttle tpsLimit to 1 if it is not able to fetch storage server list for some configurable amount of time. 2019-07-17 18:08:17 -07:00
Meng Xu 20f067e794 Merge with master:Resolve conflict with PR#1797 2019-07-16 10:52:28 -07:00
Meng Xu 415622f465 MachineTeamRemover:Change to remove MT with most teams
Change to remove machine team with most machine teams, using the same
logic as the serverTeamRemover.

The featue is guarded by TR_FLAG_REMOVE_MT_WITH_MOST_TEAMS knob.
2019-07-15 14:29:49 -07:00
Evan Tschannen db5b4a6331 avoid going to unlimited immediately after going below the durabilityLagTargetVersion 2019-07-12 18:50:56 -07:00
Evan Tschannen 1a18c859c7 knobified the durability lag rate controls 2019-07-12 18:50:56 -07:00
Evan Tschannen 02de53160d only skip confirm epoch live if CAUSAL_READ_RISKY is enabled
time checked on the proxy should be less than the time waited by the master to account for clock speed differences
setting REQUIRED_MIN_RECOVERY_DURATION and ENFORCED_MIN_RECOVERY_DURATION to 0 will go back to the old behavior
2019-07-12 17:58:16 -07:00
Evan Tschannen a63969afb3 enforce a minimum recovery duration, which allows proxies to avoid checking if the epoch is alive as long as its last commit has been less than MINIMUM_RECOVERY_DURATION ago 2019-07-12 13:10:21 -07:00
Jon Fu f12a3909f3 renamed workloads and made code style adjustments 2019-07-11 09:56:58 -07:00
Jon Fu 1e9d31597c removed extra parameter from getRange, added knob to guard new changes, and adjusted style/formatting in several places 2019-07-11 09:56:58 -07:00
Evan Tschannen 7e919e361c
Merge pull request #1817 from etschannen/feature-proxy-forward
Proxies will forward clients to the next generation
2019-07-10 13:53:12 -07:00
Evan Tschannen 49121172ea
Merge pull request #1795 from alexmiller-apple/peek-from-satellites
Log Routers will prefer to peek from satellite logs.
2019-07-09 17:38:57 -07:00
Evan Tschannen 001abec29d fixed a compiler error, buggified a new knob 2019-07-09 16:50:59 -07:00
Evan Tschannen 64aee73c4f we only need to hold the ReplyPromise for messages that we are going to forward to new proxies 2019-07-09 16:47:56 -07:00
Alex Miller 44f11702a8 Log Routers will prefer to peek from satellite logs.
Formerly, they would prefer to peek from the primary's logs.  Testing of
a failed region rejoining the cluster revealed that this becomes quite a
strain on the primary logs when extremely large volumes of peek requests
are coming from the Log Routers.  It happens that we have satellites
that contain the same mutations with Log Router tags, that have no other
peeking load, so we can prefer to use the satellite to peek rather than
the primary to distribute load across TLogs better.

Unfortunately, this revealed a latent bug in how tagged mutations in the
KnownCommittedVersion->RecoveryVersion gap were copied across
generations when the number of log router tags were decreased.
Satellite TLogs would be assigned log router tags using the
team-building based logic in getPushLocations(), whereas TLogs would
internally re-index tags according to tag.id%logRouterTags.  This
mismatch would mean that we could have:

    Log0 -2:0 ----- -2:0  Log 0

    Log1 -2:1 \
               >--- -2:1,-2:0 (-2:2 mod 2 becomes -2:0)  Log 1
    Log2 -2:2 /

And now we have data that's tagged as -2:0 on a TLog that's not the
preferred location for -2:0, and therefore a BestLocationOnly cursor
would miss the mutations.

This was never noticed before, as we never
used a satellite as a preferred location to peek from.  Merge cursors
always peek from all locations, and thus a peek for -2:0 that needed
data from the satellites would have gone to both TLogs and merged the
results.

We now take this mod-based re-indexing into account when assigning which
TLogs need to recover which tags from the previous generation, to make
sure that tag.id%logRouterTags always results in the assigned TLog being
the preferred location.

Unfortunately, previously existing will potentially have existing
satellites with log router tags indexed incorrectly, so this transition
needs to be gated on a `log_version` transition.  Old LogSets will have
an old LogVersion, and we won't prefer the sattelite for peeking.  Log
Sets post-6.2 (opt-in) or post-6.3 (default) will be indexed correctly,
and therefore we can safely offload peeking onto the satellites.
2019-07-08 22:25:01 -07:00
Meng Xu 3b9618fe11 ServerTeamRemover:Speedup removing teams in simulation
Otherwise, simulation may time out when team remover needs to
remove hundreds of teams.
2019-07-08 18:17:21 -07:00
Meng Xu 08a721b320 Merge branch 'master' into mengxu/server-team-remover-PR 2019-07-08 16:30:32 -07:00
Evan Tschannen c348b3da51 After a proxy dies, it will remain alive for an additional 10 seconds to forward clients to the new proxies 2019-07-08 12:53:40 -07:00
Evan Tschannen 310a5fe9a3 fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy 2019-07-05 17:28:22 -07:00
Evan Tschannen e7c0ecf729 fix: we cannot reject 100% of requests, because a storage server which is stuck needs to get a future version error to trigger an all alternatives failed message from load balance so that clients will re-grab storage server interfaces from the proxy 2019-07-05 15:46:16 -07:00
Meng Xu 599fcb2e6d Add serverTeamRemover to remove redundant server teams 2019-07-02 17:40:37 -07:00
Evan Tschannen b9a6271375 local ratekeeper no longer globally limits 2019-06-28 16:54:22 -07:00
Evan Tschannen 18d5fbf1e0 Avoid jumping from rejecting 0% of requests directly to 20% of requests 2019-06-28 16:54:22 -07:00
Evan Tschannen db413c37f7 restored the STORAGE_DURABILITY_LAG_SOFT_MAX knob and made the rk target slightly smaller than the soft limit, to avoid inaccuracies in ratekeeper control causing behavior changes on the storage servers 2019-06-28 16:54:22 -07:00
Evan Tschannen 92b32855ca ratekeeper’s control algorithm would oscillate when limited by local ratekeeper 2019-06-28 16:54:22 -07:00
A.J. Beamon 35b6277a50 Fix knob copy paste error 2019-06-27 12:55:39 -07:00
Alex Miller 61901effed Increase how long FDB will wait before starting DD to repair data loss.
10s is a bit short for starting data distribution, which is rather
expensive.  60s is a bit more reasonable.
2019-06-19 13:40:21 -07:00
Evan Tschannen 20e3edeb0a Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
#	versions.target
2019-06-14 12:42:59 -07:00
Evan Tschannen 924f92e5aa Prevent the byte sample recovery from interfering with storage server recovery 2019-06-13 15:55:25 -07:00
Evan Tschannen 054d775343 increase the delay between idle commits to reduce the rate idle clusters fsync 2019-06-13 14:55:37 -07:00
Trevor Clinkenbeard 8144882d7b Merge branch 'apple-master' into features/local-rk 2019-06-10 19:40:25 -07:00
Meng Xu 022b555b69 FastRestore:Fix bug in finish restore
RestoreMaster may not receive all acks. for the last command, i.e., finishRestore,
because RestoreLoaders and RestoreAppliers exit immediately after sending the ack.
If the ack is lost, it will not be resent.

This commit also removes some unneeded code.
This commit passes 50k random tests without errors.
2019-06-05 20:07:18 -07:00
Meng Xu 477fd152c0 FastRestore:Refactor code
1) Use the runRYWTransaction for simple DB access
2) Replace some printf with TraceEvent
3) Remove printf not used in debugging
4) Avoid wait inside the condition in loop-choose-when for
   the core routine of restore worker, loader and applier.
5) Rename Restore.actor.cpp to RestoreWorker.actor.cpp since
   the file only has functionalities related to restore worker.

Passed correctness test
2019-06-04 11:22:47 -07:00
Evan Tschannen 29b96414e2 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/NativeAPI.actor.cpp
#	fdbserver/Coordination.actor.cpp
#	flow/Arena.h
#	versions.target
2019-06-03 18:49:35 -07:00
Evan Tschannen 7c333dbc16 If a process receives a message in its clusterControllerInterface before becoming the cluster controller, if the process does not become the cluster controller in the next minute it should destroy the interface to prevent a memory leak. 2019-05-29 16:57:13 -07:00
sramamoorthy 31b6c86650 ignorePopDeadline to have high limit in simulator
- ignorePopDeadline to have highier limit in simulator
to accommdate for the buggify delays and make snapshot succeed.

- introduce a new knob for auto resetting the disabling of tlog pop
2019-05-28 22:07:46 -07:00
A.J. Beamon 603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
Evan Tschannen 8c3516951a Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-05-12 20:13:49 -07:00
Alex Miller ea12a54946 Rename DISK_QUEUE_MAX_TRUNCATE_EXTENTS -> ..._BYTES
So as to not make filesystem assumptions.  This knob did technically
appear in (only the) 6.1.5 release, but this feature was broken 6.1.5,
so thus impossible to use anyway.
2019-05-10 18:26:22 -10:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Evan Tschannen 22499666d0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/LogRouter.actor.cpp
#	flow/Trace.cpp
#	versions.target
2019-05-08 18:19:35 -07:00
Alex Miller 0685e6c1c7 Avoid large truncates in the DiskQueue.
And instead create a new file while incrementally truncating the old one
down.  This avoids queueing up a massive number of filesystem metadata
operations in one call, thus flooding the disk with requests and
stalling out all other filesystem operations.

This sets the knobs so that a truncate of >10GB causes us to create a
new file rather than trying to truncate the old one.
2019-05-08 12:33:31 -10:00
Alex Miller 4052f3826a Add a knob to limit the number of commits indexed per key.
Theoretically, we could spill 20MB of 22B mutations for one key, which
would generate a very long value being stored in SQLite, and very
inefficiently read back.  This stops that from being a problem, at the
cost of some extra write calls.
2019-05-03 15:27:10 -07:00
Alex Miller f4e48c3851 Add a knob to limit amount of data read from sqlite for one PeekRequest.
This prevents peeking from degrading over time if there are a very large
number of SpilledData entries for one particular tag.
2019-05-02 17:26:45 -07:00
Evan Tschannen 2d5043c665 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-30 18:27:04 -07:00
Evan Tschannen 1a4c1759a4
Merge pull request #1429 from jzhou77/pprof
Dump heap profiler when memory usage is high
2019-04-29 16:31:44 -07:00
Evan Tschannen cacd82758e Reduced data distribution speeds 2019-04-26 13:54:49 -07:00
Evan Tschannen 9ff8aca1da Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation) 2019-04-26 13:53:56 -07:00
A.J. Beamon 253d2400ef Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-04-23 14:38:52 -07:00
A.J. Beamon 4ad0496b39 Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process. 2019-04-23 14:01:51 -07:00
Stephen Atherton 83db547306 Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES. 2019-04-23 04:50:58 -07:00
Jingyu Zhou 6870e132b2
Merge branch 'master' into pprof 2019-04-19 14:06:44 -07:00
Andrew Noyes d1e86779a6 Address review comments 2019-04-18 08:48:27 -07:00
mpilman 32393ec4c9 Prototype of local ratekeeper 2019-04-08 11:04:44 -07:00
Evan Tschannen 05869a8383 do not log a degraded reset message if the previous reset was more than a week ago 2019-04-07 23:00:58 -07:00
Jingyu Zhou 4b08042a88 Change memory profiling threshold to a flag 2019-04-05 16:33:51 -07:00
Jingyu Zhou 09b2c35d11 Dump heap profiler when memory usage is high
Set the threshold of dump to 2GB.
2019-04-05 16:12:23 -07:00
Evan Tschannen 390ab9cfed A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy 2019-04-04 14:11:12 -07:00
A.J. Beamon 71e2fdafb8 Changes to ratekeeper camel case 2019-03-27 08:24:25 -07:00
Evan Tschannen 6254a1a8e4 fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop 2019-03-22 18:37:39 -07:00
A.J. Beamon 2d7b48dadc
Merge pull request #1311 from etschannen/feature-increase-grv-batch
Increased the GRV client batch size
2019-03-19 08:23:05 -07:00
Evan Tschannen 2554fed965 reduce max transaction to start 2019-03-18 16:16:03 -07:00
Evan Tschannen 87e2a1a029 The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update 2019-03-18 16:09:57 -07:00
Alex Miller 29ab7370cd Clear versionLocation when spilling, and pop DQ separately.
Popping the disk queue now requires potentially recovering the location
to which we can pop from the spilled data itself, and for each tag we
must maintain the first location with relevant data.

The previous queue we had to represent the ordering, queueOrder, was
used by spilling, and popped when a TLog had been spilled.  This means
that as soon as a TLog has been fully spilled, we have no idea how it
relates in order to other fully spilled TLogs.

Instead, use queueOrder to keep track of all the TLog UIDs until they're
removed, and use spillOrder to keep track of the order only for
spilling.
2019-03-18 15:09:22 -07:00
Evan Tschannen ec6c843124 increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch 2019-03-16 16:18:58 -07:00
Evan Tschannen e068c478b5 merge master 2019-03-12 18:31:25 -07:00
Evan Tschannen c6e94293bf reset a process to not be degraded after 2 days 2019-03-10 22:39:21 -07:00
Evan Tschannen 53f16b5347 when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded 2019-03-08 11:46:34 -05:00
Jingyu Zhou 3c86643822 Separate Ratekeeper from data distribution.
Add a new role for ratekeeper.

Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Alex Miller 94bf75cb00 Allow the disk queue to shrink if it has unneeded slack space. 2019-03-04 01:42:38 -08:00
Alex Miller 9ef283d4e7 Implement hard limiting of memory used to serve peek requests. 2019-03-04 01:42:38 -08:00
Alex Miller e7d8520c63 Batch more when spilling data. 2019-03-04 01:42:38 -08:00
Trevor Clinkenbeard 39f612d132 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-03-02 17:07:00 -08:00
A.J. Beamon a051055caf Initial implementation of adding separate limits for batch priority in ratekeeper 2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard abfe057805 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-02-25 13:47:16 -08:00
Evan Tschannen b8910ba7cd Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.h
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Meng Xu 9445ac0b0c Status: Use new data distributor worker to publish status
After we add a new data distributor role, we publish the data
related to data distributor and rate keeper through the new
role (and new worker).

So the status needs to contact the data distributor, instead of master,
to get the status information.
2019-02-21 18:05:50 -08:00
Meng Xu 7cca439e00 TeamRemover: Add status to show redundant team removing
Distinguish the removal of unhealthy team and redundant team.
Change status report to include redundant team removal report.
2019-02-21 14:16:46 -08:00
Trevor Clinkenbeard fa96b8dd33 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-02-20 16:56:16 -08:00
Meng Xu d86ba0e811 TeamRemover: Change it to run periodically
This simplifies the problem of when we should invoke the teamRemover
2019-02-20 16:08:34 -08:00
Evan Tschannen 27e3617548 fix: remove bad teams needed to use dd_stall_check delay, because in simulation the buggified delay time could make us remove bad teams before they submit their ranges to the queue 2019-02-20 14:18:36 -08:00
Evan Tschannen d4737fac0f knobify force recovery recovery check delay 2019-02-19 16:05:20 -08:00
Evan Tschannen 065a45e05f Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen d492395f84 fix: simulation could buggify a delay such that data distribution incorrectly thinks the queue is not processing unhealthy relocations 2019-02-18 14:57:07 -08:00
Meng Xu 6d09ac483c Merge with master 2019-02-15 17:03:40 -08:00
Jingyu Zhou fc3a784963 Fix another build team bug
The buildTeam() can create teams with undesired storage servers, which are
considered unhealthy. As a result, the data movement can become stuck.

Fix this by adding an ACTOR monitorHealthyTeams that builds team every one
second whenever there is no healthy teams.

Clean up storageServerTracker() interface.
2019-02-14 16:37:16 -08:00
Jingyu Zhou 816f8b1ae1 Per review comments
Add a knob for starting distributor delay.
Move distributor failed variable to a local loop.
2019-02-14 16:37:16 -08:00
Jingyu Zhou e0a7162cf8 Add a failure timeout knob for data distributor.
Set default time to 1.0s.
2019-02-14 16:37:16 -08:00
Meng Xu 5481851e82 TeamCollection: Add knobs for team remover
Added three knobs to control team remover

bool TR_FLAG_DISABLE_TEAM_REMOVER:
	Disable the teamRemover actor
double TR_REMOVE_MACHINE_TEAM_DELAY:
	Wait for the specified time before try to remove next machine team
double TR_WAIT_FOR_ALL_MACHINES_HEALTHY_DELAY:
	Wait before checking if all machines are healthy
2019-02-13 15:11:56 -08:00
Meng Xu 214a72fba3 TeamCollection: Resolve review comments
1) Reduce the frequency of checking if we need to call teamRemover
2) Improve code efficiency in finding the machine team to remove
3) Remove unused code
4) Add sanity check
2019-02-12 10:59:57 -08:00
Meng Xu 3b8ae0fe95 TeamCollection: Add into 6.1 release note 2019-02-08 13:50:27 -08:00
Meng Xu 7cfe6de27e TeamCollection: Server team number must match machine team number
DESIRED_TEAMS_PER_MACHINE must equal to DESIRED_TEAMS_PER_SERVER.
Otherwise, we may have to few machine teams to create enough server teams.

Note that BUGGIFY macro value is based on a random number generator.
When you have two BUGGIFY, one may be true and the other is false.

Also fix a bug in get the number of healthy machine teams.
2019-02-07 13:53:55 -08:00
Meng Xu 455024b3fe SimulationTest: Test the number of teams
Magnify the possibility that the number of created machine teams is
larger than the number of desired machine teams if we do NOT try to remove the surplus machine teams.
This help test the upgrade to machine team in FDB 6.1
2019-02-06 11:04:41 -08:00
Trevor Clinkenbeard 5822bd65bf Track health metrics in Ratekeeper and send these metrics to proxies in GetRateInfoReply messages 2019-01-31 12:56:58 -08:00
Trevor Clinkenbeard d7930af2cb Storage server periodically calculates cpuUsage and diskUsage metrics. These metrics (as well as all other metrics necessary for health metrics calculation) are sent in the StorageQueuingMetricsReply message. 2019-01-31 12:23:04 -08:00
Trevor Clinkenbeard 5b89db811a Throttle status requests with MAX_STATUS_REQUESTS_PER_SECOND knob, whenever status batching is used. 2019-01-28 15:37:30 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00
Evan Tschannen 57293a2db0 byte sample recovery did not use limits for its range reads, leading to slow tasks 2019-01-04 10:32:31 -08:00
Markus Pilman dbe9baff1f Several small compilation fixes for new versions of gcc
There are several missing includes for cmath in the code, I added those.

Next, Coro returns a reference to a stack variable and this causes a
warning. As this is probably ok for Coro, I disabled the warning in
that file for GCC. I want to have this warning in the build system as
it is generally a very useful warning to have.

Another change is that major and minor are deprecated for a while now.
I replaced those with gnu_dev_major and gnu_dev_minor.

ErrorOr currently implements operators ==, !=, and <. These do not
compile because Error does not implement ==. This compiles on older
versions of gcc and clang because ErrorOr<T>::operator== is not used
anywhere. It is still wrong though and newer gcc versions complain.
I simply removed these methods.

The most interesting fix is that TraceEvent::~TraceEvent is currently
throwing exceptions. This is illegal behavior in C++11 and a idea in
older versions of C++. For now I simply removed the throw, but this
might need some more thought.
2019-01-03 12:44:19 -08:00