Commit Graph

6537 Commits

Author SHA1 Message Date
Meng Xu 4ab322f52c Merge branch 'master' into mengxu/storage-engine-switch-PR-v2 2019-08-19 13:06:32 -07:00
Meng Xu b448f92d61 StorageEngineSwitch:Remove unnecessary code and format code
Uncessary code include debug code and the unnecessary calling of
the removeWrongStoreType actor;

Format the changes with clang-format as well.
2019-08-16 16:53:38 -07:00
Meng Xu 0648388b25 StorageEngineSwitch:Prefer remove SS without causing zero healthy teams 2019-08-16 16:34:23 -07:00
Meng Xu 85ba904e2c StorageEngineSwitch:Stop removeWrongStoreType actor if no SS has wrong storeType 2019-08-16 16:11:28 -07:00
Meng Xu 2859dc57a8 StorageEngineSwitch:Only allow one pending recruitment on a worker 2019-08-16 15:04:11 -07:00
A.J. Beamon 85d0dec585
Merge pull request #1978 from atn34/no-discard
Make cancellable actors [[nodiscard]] by default
2019-08-16 13:47:43 -07:00
Meng Xu 2a7b208df2 StorageEngineSwitch:Call removeWrongStoreType only when necessary
If a cluster does not change its storeType for a while, we do not need to
call removeWrongStoreType actor periodically.

This solution is the same as how badTeamRemover actor is handled.
2019-08-16 10:48:53 -07:00
Meng Xu 86b51624a4 StorageEngineSwitch:Check next SS to remove once old one is removed 2019-08-16 10:43:20 -07:00
Andrew Noyes 638c9379c6 Add [[flow_allow_discard]] to dsltest + FlowTests 2019-08-16 09:24:57 -07:00
Andrew Noyes 9d045c51e4 Add suggested nodiscards, and mention UNCANCELLABLE actors 2019-08-16 09:24:57 -07:00
Andrew Noyes 0a1fecfc2e Update flow/actorcompiler/ActorParser.cs
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-08-16 09:24:57 -07:00
Andrew Noyes 356abc2f9a Update flow/actorcompiler/ParseTree.cs
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2019-08-16 09:24:57 -07:00
Andrew Noyes 34dee5f9e3 Pass -Wno-unknown-attributes only for IDE 2019-08-16 09:24:57 -07:00
Andrew Noyes ad9bc06dd7 Make unknown flow attributes an error 2019-08-16 09:24:57 -07:00
Andrew Noyes 5751e8a95b Remove flow_* attributes in actor compiler 2019-08-16 09:24:57 -07:00
Andrew Noyes e4de4783bf Add [[flow_allow_discard]] 2019-08-16 09:24:57 -07:00
Andrew Noyes 1a0ab7854e Omit ActorFuzz etc from OPEN_FOR_IDE build 2019-08-16 09:24:57 -07:00
Andrew Noyes ae7ef12102 Fix nodiscard errors 2019-08-16 09:24:57 -07:00
Andrew Noyes 971b857c51 Avoid using new c sharp features 2019-08-16 09:24:57 -07:00
Andrew Noyes fe6f2f59e0 Suppress what is apparently an intentional memory leak 2019-08-16 09:24:57 -07:00
Andrew Noyes a8cdcff0c2 Change --disable-actor-without-wait-warning to --disable-diagnostics
We probably just want to disable all actor diagnostics for the flow test
files.

Also add --generate-probes to the help text
2019-08-16 09:24:57 -07:00
Andrew Noyes 4b97a7506d Add some prudent [[nodiscard]]'s 2019-08-16 09:24:57 -07:00
Andrew Noyes b17ad0ad64 Enable -Wunused-value 2019-08-16 09:24:57 -07:00
Andrew Noyes 4ebb325ff9 Make cancellable actors [[nodiscard]] by default 2019-08-16 09:24:57 -07:00
Andrew Noyes be0e4e2438 Teach actorcompiler about C++ attributes 2019-08-16 09:24:57 -07:00
A.J. Beamon e152cdd9a9
Merge pull request #2005 from negoyal/ctest_valgrind_option
Add a parameter to enable/disable valgrind for ctest.
2019-08-16 08:00:40 -07:00
Alvin Moore 0e0116af51
Merge pull request #1856 from tclinken/bump-min-cmake-version
Bump CMake minimum required version to 3.13
2019-08-16 05:50:20 -07:00
Meng Xu 0ad68fb89f StorageEngineSwitch:Timeout if to-be-removed SS fails
In case the wrong storeType SS picked to be removed fails before
it triggers the next round of checking if a SS has wrong store type,
we should time out and invoke the checking.

Otherwise, the removeWrongStoreType actor will never be running again.
2019-08-15 17:03:53 -07:00
Meng Xu 980fa39c23 StorageEngineSwitch:Speed up removing wrong storeType server in simulation 2019-08-15 15:55:11 -07:00
Meng Xu 794009b242 StorageEngineSwitch:Limit SS num per proc
Multiple storage server recruitment requests may be buffered in
cluster controller, hoping that in the near future cluster controller
will find an available worker for the request.

It is possible that many outstanding storage recruitment requests are
fullfilled by the cluster controller in a very short time interval.

When DD recruit those requests, it blindly initiaze a storage server
on the recruited worker and let the storage server tracker remove
storage servers on the same process (ip, port).

This is problematic because multiple SS on the same process can push
the process OOM. Even in simulation, initializing too many SS causes
simulator OOM.

This commit limits the max number of SS on a process to be 2.

We cannot enforce the number of SS on a process to be 1 right now,
because current simulation tests may change configuration in a situation that
without allowing more than 1 SS on a process will fail the tests.
2019-08-15 14:08:43 -07:00
negoyal fe53b5e7a7 Add a parameter to enable/disable valgrind for ctest. 2019-08-15 11:56:34 -07:00
A.J. Beamon 179ca2cc4a
Merge pull request #1995 from lingbin/fix_fdb_doc
Fix an error in configuration.rst
2019-08-15 08:42:12 -07:00
lingbin 745e569723 Fix an error in configuration.rst 2019-08-15 18:09:09 +08:00
A.J. Beamon 86104e9da0
Merge pull request #1998 from xumengpanda/mengxu/suppress-oustanding-recruitment-errors
Suppress spammy warnings
2019-08-14 15:01:42 -07:00
Meng Xu e76f7adeb7 PhysicalDiskMetrics:Remove the random sampling 2019-08-14 13:57:22 -07:00
Meng Xu ea9c8c09f5 Suppress PhysicalDiskMetrics events for 60s
Because PhysicalDiskMetrics metric is quite stable within short period of time,
we should suppress it with 60 second interval to avoid spammy log messages.

Spammy warning log can cause false positive in correctness test.
2019-08-14 13:51:20 -07:00
Meng Xu 3034a5e0c5 StorageRecruitment:Suppress outstanding req errors
When too many outstanding requests cannot find a worker for storage server
role, many same errors will be put into trace log. Only one error is enough
to alert the problem.

Too many same errors cause false positive in nightly test and thus should be suppressed.
2019-08-14 11:31:06 -07:00
Evan Tschannen 4b404503fd
Merge pull request #1994 from etschannen/master
Merge release 6.2 into master
2019-08-13 18:34:38 -07:00
Evan Tschannen 863a7ea84c Merge branch 'release-6.2'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-08-13 18:33:52 -07:00
Evan Tschannen b744c4b06e
Merge pull request #1993 from etschannen/post-release-cleanup-6.2.2
Post release cleanup 6.2.2
2019-08-13 18:30:48 -07:00
Evan Tschannen 1e05a681f8 update installer WIX GUID following release 2019-08-13 18:30:04 -07:00
Evan Tschannen 33a102ffba update versions target to 6.2.3 2019-08-13 18:30:04 -07:00
Meng Xu 4a321f983a StorageEngineSwitch:Periodic check if a server has wrong storeType
If DD checks storage server's storeType before some storage servers are
fully available, DD may miss those storage servers to remove.

To ensure no storage servers with wrong storeType is missed, SS marks the
doRemoveWrongStoreType to be true.

To avoid removing multiple servers at the same time, the actor waits for a
configurable delay before checking and removing a storage server.
2019-08-13 17:35:18 -07:00
Evan Tschannen a9bd5ac739
Merge pull request #1992 from etschannen/release-6.2
Updated upgrading notes
2019-08-13 16:41:17 -07:00
Evan Tschannen ccf21b4ed8 Updated upgrading notes 2019-08-13 16:40:32 -07:00
Evan Tschannen 3f8d0c39e7
Merge pull request #1991 from etschannen/prepare-release-6.2.2
update installer WIX GUID following release
2019-08-13 16:34:25 -07:00
Evan Tschannen a366f08e05 update installer WIX GUID following release 2019-08-13 16:33:32 -07:00
Evan Tschannen a067e6e812
Merge pull request #1990 from etschannen/release-6.2
Fixed status reporting bugs related to connected clients
2019-08-13 16:24:12 -07:00
Evan Tschannen c60210f39d Updated documentation for 6.2.2 2019-08-13 16:22:46 -07:00
Evan Tschannen 36658c995f
Merge pull request #1987 from ajbeamon/fix-client-read-version-accounting
Don't count read version requests if we've already started one.
2019-08-13 16:13:08 -07:00