Commit Graph

2034 Commits

Author SHA1 Message Date
Trevor Clinkenbeard 53f8ba499c
Merge branch 'master' into features/sqlite-crc32c 2019-05-24 16:46:32 -07:00
A.J. Beamon 20d83d61db Merge branch 'master' into thread-safe-random-number-generation 2019-05-23 11:07:08 -07:00
Evan Tschannen b451c2cd56
Merge pull request #1497 from alexmiller-apple/fastrecovery
Add an \xff keyrange that is backed by the txnStateStore.
2019-05-23 10:52:35 -07:00
A.J. Beamon f417e60264 Merge branch 'merge-release-6.1-into-master' into thread-safe-random-number-generation
# Conflicts:
#	fdbserver/QuietDatabase.actor.cpp
2019-05-23 09:52:00 -07:00
A.J. Beamon d29c7e4c9b Merge branch 'release-6.1' into merge-release-6.1-into-master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/QuietDatabase.actor.cpp
#	versions.target
2019-05-23 09:28:45 -07:00
A.J. Beamon e5381e0612 Fix some new usages of g_random 2019-05-23 09:23:27 -07:00
A.J. Beamon 603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
Evan Tschannen 003cc6be18 fix: nothingPersistent could be incorrect when popped is equal to persistentDataVersion 2019-05-22 20:23:35 -10:00
chaoguang c527b1a6b1 renaming function, add comments, fix bugs. 2019-05-22 17:39:36 -07:00
Evan Tschannen 4e12721227 fix: nothingPersistent could be incorrect when popped is equal to persistentDataVersion 2019-05-22 11:23:21 -07:00
Stephen Atherton 0fb8612ef5 debug_printf_noop() was incorrectly defined as a function, which still has a runtime cost of argument evaluation. 2019-05-22 03:40:18 -07:00
Stephen Atherton f99c36aad2 Fixed merge mistake. 2019-05-22 00:23:31 -07:00
Stephen Atherton ebc96a7e0e Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
2019-05-21 23:49:27 -07:00
Stephen Atherton e9197a8f70 Added time limit. 2019-05-21 22:19:14 -07:00
Stephen Atherton 3f8fce0296 Checkpointing progress on single-version mode in VersionedBTree. Subtree clears now work, preserving internal page boundary keys when necessary. Multi-version mode is unfortunately now broken, in addition to being incomplete. Added serial and simple btree unit test options. 2019-05-21 19:16:32 -07:00
chaoguang 57968d9df7 Merge branch 'master' of https://github.com/apple/foundationdb into MakoWorkload 2019-05-21 16:24:11 -07:00
chaoguang 0bbcc75e4b fix bug 2019-05-21 16:22:02 -07:00
Evan Tschannen a686402671 Merge branch 'feature-pop-diskqueue' into feature-slow-storage-failure 2019-05-21 15:19:06 -07:00
Evan Tschannen 9604452e50 mistakenly changed a quiet database parameter 2019-05-21 15:17:46 -07:00
Evan Tschannen 90fe085696 fix: the healthyZone needs to be checked again once the timeout is expected to have elapsed 2019-05-21 13:49:16 -07:00
Evan Tschannen a8e8be5aac added a wait failure client which always waits the full failure reaction time, even if it knows the interface is never coming back
use this new wait failure client in data distribution, to give time for a storage server to rejoin the cluster after its interface fails
2019-05-21 11:54:17 -07:00
Evan Tschannen f4b18f2c4f fixed whitespace 2019-05-21 11:31:34 -07:00
Evan Tschannen 23091a7d96 fixed review comments 2019-05-21 10:53:36 -07:00
Evan Tschannen ee04c583fa fix: do not pop the disk queue past the persistentDataVersion 2019-05-21 10:40:30 -07:00
Evan Tschannen 4059d68348 fix: the tlog would not pop data from the disk queue after a storage server was removed, because the tag still exists in memory on the logs
fix: we could incorrectly make data durable if eraseMessagesFromMemory was in progress while running updatePersistentData
the quiet database check now ensure that tlogs have no more than 30 seconds of versions unpopped from the disk queue
2019-05-20 23:58:45 -07:00
chaoguang 12a51b2d39 fix bugs, update naming and comments, refine functions 2019-05-20 18:26:30 -07:00
Evan Tschannen f4fbaac6b0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-05-19 10:27:59 -07:00
A.J. Beamon a8b9d8e34b
Merge pull request #1336 from tclinken/fast-allocate-ptree-nodes
Create 96-byte fast allocator for storage queue PTree nodes
2019-05-17 14:22:46 -07:00
Steve Atherton 5a8c97480a
Merge pull request #1506 from nikolas-ioannou/feature-pagecache-lru
AsyncFileCached: switch from a random to an LRU cache eviction policy
2019-05-17 13:42:21 -07:00
Jingyu Zhou b8e7fc1b84 Refactor: add std:: qualifier and use emplace_back 2019-05-17 09:38:50 -10:00
Trevor Clinkenbeard 12ff747e6a Avoid tracing in PageChecksumCodec::checksum if silent flag is set 2019-05-17 10:49:53 -07:00
Trevor Clinkenbeard 3fac380b90 Avoid tracing in PageChecksumCodec::checksum if silent flag is set 2019-05-17 10:43:28 -07:00
Alvin Moore 22fa0fa1d4
Merge pull request #1599 from AlvinMooreSr/winproject-update
Upgraded Windows Tools within projects to 2017
2019-05-17 03:07:39 -07:00
Trevor Clinkenbeard 20e93c67ea Allow sqlite pages to be checked for CRC32 checksum
Future versions of FDB will write sqlite pages with CRC32 checksums. In
order to roll back to this version from a version that writes CRC32
checksums, this version must be able to verify those checksums.
2019-05-17 01:05:06 -07:00
Alvin Moore 3acaa7343e Enabled C++17 for all Windows projects
Set Visual Studio version to 2017 (first version to support C++17)
2019-05-16 17:44:13 -07:00
Paul J. Davis 53b97fe506 Extend support for parentpid
This adds support for the `--parentpid` option to non-Windows platforms.
This option is intended for testing layer implementations. When running
higher level CI chains its useful to ensure that any ephemeral instances
of fdbserver are automatically reaped.
2019-05-16 14:24:11 -10:00
Trevor Clinkenbeard d7bcbe1210 Refactored PageChecksumCodec::checksum 2019-05-16 16:07:35 -07:00
Trevor Clinkenbeard 90d886df95 Trace both hashlittle2 and crc32 checksums for SQLitePageChecksumFailure 2019-05-16 15:51:21 -07:00
Alvin Moore 94aed513c7 Switched Windows tools within projects to 2017 2019-05-16 15:05:11 -07:00
Trevor Clinkenbeard 04a72bdad6 Eliminate duplicate code in PageChecksumCodec::checksum 2019-05-16 11:09:37 -07:00
Trevor Clinkenbeard aca90cd4e2 Don't use memcpy in PageChecksumCodec::checksum 2019-05-16 07:25:58 -07:00
chaoguang 6788c8eb7d update cleanup process 2019-05-15 16:17:01 -07:00
chaoguang 106bb7677d update 2019-05-15 12:58:12 -07:00
Alex Miller 658e61b394 And now use spilledOnly as a hint to do parallel peeks.
If there's some spilled data, there's probably a lot of spilled data,
and now we can pull all of it faster.
2019-05-14 21:03:44 -10:00
Alex Miller 69fb852ee0 Add more CLOEXEC-like things.
From missed call sites found during/after code review.
2019-05-14 20:30:58 -10:00
Alex Miller 4eb4c03ce5 Save TLog resources by letting peek request only spilled data.
If a peek is entirely fulfilled from spilled data, then it's likely that
the next peek will be also.  It is thus wasteful for each of these peeks
to call peekMessagesFromMemory, which memcpy's excessively, and then
throw all that data away without using it.

Now, TLogs will give a hint back to peek cursors about if the provided
reply was served entirely from the spilled data, which peek curors then
feed back as the hint into their next request.

At some point, a cursor will send a request for only spilled data, get
an incomplete response, and then be told to send its next request as one
that peeks from memory as well, and then it will fully catch up.
2019-05-14 15:38:48 -10:00
Trevor Clinkenbeard 601c38ad82 Use crc32 for sqlite page checksums 2019-05-14 13:43:55 -07:00
chaoguang 4c9cc44c73 add paras 2019-05-14 10:13:13 -07:00
mpilman 46e7a0ca56 address reviews and make compile with `-Wunused-variable` 2019-05-13 14:15:23 -07:00
mpilman 57912b33a5 fixed merge error 2019-05-13 14:15:23 -07:00
mpilman 96aaa31a6c Compiling on clang again 2019-05-13 14:15:23 -07:00
mpilman 20c3f7f264 remove mixed-mode support 2019-05-13 14:15:23 -07:00
mpilman 42385c2f81 Fixed issues introduced during rebase 2019-05-13 14:15:23 -07:00
mpilman f6fbad5061 Fix memory bug 2019-05-13 14:15:23 -07:00
mpilman 44db3450ec Several flatbuffers bug fixes 2019-05-13 14:15:23 -07:00
mpilman 9c02354255 pass NDEBUG to sqlite to enable debug mode 2019-05-13 14:15:23 -07:00
mpilman 69fa3d3903 fixed compilation issues after rebase 2019-05-13 14:15:23 -07:00
mpilman 642a96807b Fixed compilation issues after rebase 2019-05-13 14:15:22 -07:00
mpilman 6afce01744 Implementation complete (not yet working) 2019-05-13 14:15:22 -07:00
mpilman 92bad76479 Wrap ClusterClientInterface into its own type
When a process joins a cluster it fetches the cluster
interface. However, not the whole interface is exposed
to the client. This mechanism relies on the fact that
the serializer keeps the field ordering and doesn't
verify the message before parsing it.

To make this work, we provide a client type with one
member (the ClusterInterface which is exposed to the
client and the server). This client interface has the
same FileIdentifier as the ClusterControllerFullInterface
which has the same first member. This works because
FlatBuffers allows for members to be missing.
2019-05-13 14:15:22 -07:00
mpilman 9eeb48c43d Allow to turn on object serializer
This commit includes functionality to turn on
the object serializer for network communication.
This is done the following way:

- On incoming connections, a process will detect
  whether the client supports the object serializer
  and will only serialize responses with it, if it does
- On outgoing connections, the command line flag is used
  to determine whether the object serializer should be used
  to send data.

This way, a cluster can run in mixed mode. To upgrade one
can upgrade one process at a time and set the flag one process
at a time.

This is how this is tested on the simulator:
- The command line flag can take three options: on, off,
  and random.
- For off, the object serializer will never we used.
- For on, the object serializer will be always used.
- For random, the simulator will flip a coin for each
  process it starts up.
2019-05-13 14:15:22 -07:00
mpilman ba83c458a6 types implemented 2019-05-13 14:15:22 -07:00
Nikolas Ioannou 067cdf9cde Simplified cache eviction policy knob arg check. 2019-05-13 08:50:04 +02:00
Evan Tschannen 8c3516951a Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-05-12 20:13:49 -07:00
Alex Miller 4a7e0319c7 Refactor away pushlock.
Pushing was already a serialized, sequential operation.

Instead make it explicit that there are two waits as part of a push:
1. The setup work to reserve a spot on in the file
2. The work of writing and sync'ing the data

And we return a Future<Future<Void>> to force these to be done sequentially.
2019-05-10 20:30:52 -10:00
Alex Miller ea12a54946 Rename DISK_QUEUE_MAX_TRUNCATE_EXTENTS -> ..._BYTES
So as to not make filesystem assumptions.  This knob did technically
appear in (only the) 6.1.5 release, but this feature was broken 6.1.5,
so thus impossible to use anyway.
2019-05-10 18:26:22 -10:00
Alex Miller c95d09f9fd Convert truncate(0) to truncate(4KB) on Windows.
Blindly, in case Windows doesn't like 0 length truncates too.
2019-05-10 14:55:11 -10:00
Alex Miller c502ed3d15 Fix a variety of problems stemming from a wait() being added to push().
And that this code was previously insufficiently tested.
2019-05-10 14:55:11 -10:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Chaoguang 5678a7417e Mako Workload 2019-05-09 15:55:05 -07:00
Alex Miller 510b0b2fcd Fix DiskQueue not replaceFile'ing frequently enough for the final time. 2019-05-08 23:08:25 -10:00
Alex Miller c6c33a4daa Make replaceFile more likely to be tested. 2019-05-08 21:23:42 -10:00
Alex Miller 0d0f54d1e6 Fix IAsyncFileSystem::open() flags to stop a crash.
OPEN_ATOMIC_WRITE_AND_CREATE was missing a required OPEN_CREATE.

I'm honestly baffled how this was missed in testing.
2019-05-08 21:22:40 -10:00
Alex Miller b50926c792 replaceFile is truncate(0) on windows 2019-05-08 21:22:14 -10:00
Evan Tschannen 22499666d0 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/LogRouter.actor.cpp
#	flow/Trace.cpp
#	versions.target
2019-05-08 18:19:35 -07:00
Alex Miller e4ba2f5788 Add an ending TraceEvent. 2019-05-08 12:35:12 -10:00
Alex Miller c093017c2f Add a TraceEvent and release note. 2019-05-08 12:34:25 -10:00
Alex Miller 0685e6c1c7 Avoid large truncates in the DiskQueue.
And instead create a new file while incrementally truncating the old one
down.  This avoids queueing up a massive number of filesystem metadata
operations in one call, thus flooding the disk with requests and
stalling out all other filesystem operations.

This sets the knobs so that a truncate of >10GB causes us to create a
new file rather than trying to truncate the old one.
2019-05-08 12:33:31 -10:00
Alex Miller 36dfbf4fb3 Only truncate DiskQueues down to TLOG_HARD_LIMIT*2.
DiskQueue shrinking was implemented for spill-by-reference, as now
a DiskQueue could grow "unboundedly" large.

Without a minimum file size, write burst workloads would cause the
DiskQueue to shrink down to 100MB, and then grow back to its usual ~4GB
size in a cycle.  File growth means filesystem metadata mutations, which
we'd prefer to avoid if possible since they're more unpredicatble in
terms of latency.

In a healthy cluster, the TLog never spills, so the disk of a single
DiskQueue file should stay less than 2*TLOG_SPILL_THRESHOLD.  In the
worst case of spill-by-value, the DiskQueue could grow to
2*TLOG_HARD_LIMIT.  Therefore, having this limit will cause DiskQueue
shrinking to never behave sub-optimally for spill-by-value, and will
cause the DiskQueue files to return to the optimal size with
spill-by-reference.
2019-05-08 12:33:31 -10:00
Alex Miller a269a784cc Convert push() into an actor. 2019-05-08 12:33:31 -10:00
Evan Tschannen 68c773987c
Merge pull request #1544 from etschannen/release-6.1
The team tracker does not provide data movement priority information for non-failure related data movement
2019-05-08 11:39:17 -07:00
Balachandar Namasivayam d45e7bf0b1 Addressed review comments 2019-05-07 17:19:59 -07:00
Evan Tschannen d9a4553270 fix: The team tracker does not provide data movement priority information for non-failure related data movement 2019-05-07 17:06:54 -07:00
Balachandar Namasivayam 5d824f5fbc Address review comments 2019-05-07 17:06:52 -07:00
Nikolas Ioannou 5793b1a55e Validate cache eviction policy value after knob args have been set. 2019-05-07 08:32:57 +02:00
Balachandar Namasivayam a0cc3d98a1 Add a workload to trigger repeated recoveries. 2019-05-06 18:16:44 -07:00
Austin Seipp bf378952cb fdbserver: fix some print/scan format warnings
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Evan Tschannen 93eb2a9395
Merge pull request #1527 from alexmiller-apple/tstlog-6.1
Spill-by-reference knob + TLog6.0 Spilled Peek deprioritization
2019-05-03 17:19:45 -07:00
Alex Miller c918b21137 Deprioritize spilled peeks in spill-by-value, and improve its logic.
This deprioritizes before calling peekMessagesFromMemory, which should
improve the memory usage of the TLog, and makes sure to keep txsTag
peeks at a high priority to help recoveries stay fast.
2019-05-03 15:27:11 -07:00
Alex Miller 4052f3826a Add a knob to limit the number of commits indexed per key.
Theoretically, we could spill 20MB of 22B mutations for one key, which
would generate a very long value being stored in SQLite, and very
inefficiently read back.  This stops that from being a problem, at the
cost of some extra write calls.
2019-05-03 15:27:10 -07:00
Evan Tschannen 12088119d2
Merge pull request #1517 from alexmiller-apple/tstlog-6.1
Add a knob to limit amount of data read from sqlite for one PeekRequest.
2019-05-03 11:01:11 -07:00
Alex Miller f4e48c3851 Add a knob to limit amount of data read from sqlite for one PeekRequest.
This prevents peeking from degrading over time if there are a very large
number of SpilledData entries for one particular tag.
2019-05-02 17:26:45 -07:00
Evan Tschannen c91ac03ec6 LogRouterStats did not need to be a separate struct 2019-05-02 17:24:39 -07:00
Evan Tschannen 8590b710bf added additional logging on the logs and log routers 2019-05-02 17:24:39 -07:00
Jingyu Zhou e193cac5ef Merge remote-tracking branch 'apple/master' into tlog
Resolve Conflicts: fdbserver/MasterProxyServer.actor.cpp
2019-05-01 17:18:00 -07:00
Evan Tschannen 2d5043c665 Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-30 18:27:04 -07:00
Stephen Atherton 2801298ae8 Checkpointing incomplete and correctness-breaking progress on adding single-version mode to VersionedBTree. 2019-04-29 17:00:29 -07:00
Jingyu Zhou 8b5449e608 Fix review comments for PR #1473 2019-04-29 16:45:42 -07:00
Evan Tschannen 1a4c1759a4
Merge pull request #1429 from jzhou77/pprof
Dump heap profiler when memory usage is high
2019-04-29 16:31:44 -07:00
Alex Miller f367385a80 Add clearing 2019-04-29 15:10:52 -07:00
Evan Tschannen cacd82758e Reduced data distribution speeds 2019-04-26 13:54:49 -07:00
Evan Tschannen 9ff8aca1da Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation) 2019-04-26 13:53:56 -07:00
Evan Tschannen 1f37f82b87 invalid knob overrides do not prevent fdbserver from starting 2019-04-25 17:08:13 -07:00
Evan Tschannen 6c77864731 separate GetStorageServerRejoinInfoRequest from GetKeyServerLocationsRequest, to avoid yielding for the rejoin requests 2019-04-25 17:07:35 -07:00
Alex Miller 797d431934 Add an \xff keyrange that is backed by the txnStateStore. 2019-04-25 17:04:20 -07:00
Trevor Clinkenbeard d339becd7c Fix currentRate calculation for local ratekeeper 2019-04-25 15:35:34 -07:00
Jingyu Zhou 5462f560e7 Add pseudo locality for log routers and tlogs
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
  popping.

Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
A.J. Beamon 253d2400ef Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-04-23 14:38:52 -07:00
A.J. Beamon ea7abff9df Clean up from review 2019-04-23 14:16:52 -07:00
A.J. Beamon 4ad0496b39 Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process. 2019-04-23 14:01:51 -07:00
Stephen Atherton df0548503d Merge branch 'release-6.1' of https://github.com/apple/foundationdb into sqlite-grow-bigger 2019-04-23 13:43:58 -07:00
A.J. Beamon e0f76edf77
Merge pull request #1471 from AlvinMooreSr/release-6.1-merge
Merge Release 6.1 Into Master
2019-04-23 11:08:21 -07:00
Stephen Atherton 83db547306 Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES. 2019-04-23 04:50:58 -07:00
Evan Tschannen e0f7ec96aa Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers 2019-04-22 17:29:46 -07:00
Jingyu Zhou 439d5a3843 Use emplace_back instead of push_back in Proxy 2019-04-22 14:03:48 -07:00
Jingyu Zhou d2b215b926 Refactor tag population of ServerCacheInfo 2019-04-22 11:55:04 -07:00
Jingyu Zhou 7cb61c766b Fix tLogLocalities for current LogSet
In toCoreState(), the serialization of current LogSet is different from old
TLog sets. The locality data should be generated, not copied over.

Found by:
-r simulation --crash -f tests/fast/KillRegionCycle.txt -s 254666356 -b on
2019-04-21 10:41:07 -07:00
Jingyu Zhou 8b67da57bb Fix upgrade test failure
Serialize pseudoLocalities if protocol version is larger than 0x0FDB00B061060001LL.
Note this version may need to be changed to "currentProtocolVersion" when merging
into the master, and "currentProtocolVersion" should be incremented.
2019-04-21 10:41:07 -07:00
Jingyu Zhou 9e8ffd2ff7 Refactor OldLogData ctor 2019-04-21 10:41:07 -07:00
Jingyu Zhou 97986a28b7 Replace push_back with emplace_back for efficiency
And better code readability.
2019-04-21 10:41:07 -07:00
Jingyu Zhou 010f825aff Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet 2019-04-21 10:41:07 -07:00
Jingyu Zhou 7befce6bf1 More pseudoLocalities and refactors. 2019-04-21 10:41:07 -07:00
Jingyu Zhou 66000a07a5 Use emplace_back instead of push_back 2019-04-21 10:41:07 -07:00
Jingyu Zhou 966ec30fcc Add pseudoLocalities for special tag consumers 2019-04-21 10:41:07 -07:00
Jingyu Zhou b4e7e7a85b Refactor StorageCache updates 2019-04-21 10:41:07 -07:00
Jingyu Zhou 82ec80c42f Refactor TLogSet ctor 2019-04-21 10:41:07 -07:00
Jingyu Zhou d19b0cf1c1 Refactor LogSet with two new constructors 2019-04-21 10:41:07 -07:00
Jingyu Zhou 0b1984978a Small code refactoring. 2019-04-21 10:41:07 -07:00
Jingyu Zhou ec1bc5cfca Add LogSystemType enum 2019-04-21 10:41:07 -07:00
Jingyu Zhou 6870e132b2
Merge branch 'master' into pprof 2019-04-19 14:06:44 -07:00
Andrew Noyes d1e86779a6 Address review comments 2019-04-18 08:48:27 -07:00
Andrew Noyes 5af8208c62 Fix JavaWorkload unused variable 2019-04-17 16:29:22 -07:00
Andrew Noyes ef04471a66 Fix more unused-variable warnings 2019-04-17 16:04:10 -07:00
Alvin Moore 2bea99591e Merge branch 'release-6.1' of copy of master
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2019-04-17 15:51:48 -07:00
Andrew Noyes 13ba915a19 Fix more unused variable warnings 2019-04-17 15:38:08 -07:00
A.J. Beamon 43533b3d72 Don't validate the shard size estimate unless enough keys are sampled with a less than 100% probability. 2019-04-17 11:01:23 -07:00
Trevor Clinkenbeard 3426205167 Fixed readGuard usage bug 2019-04-16 15:05:57 -07:00
Trevor Clinkenbeard 1d921da170 readGuard sends server_overloaded error if request is rejected 2019-04-16 11:29:01 -07:00
Trevor Clinkenbeard 8a7d9afbe9 Fixed name of LocalRatekeeperWorkloadFactory 2019-04-16 10:36:09 -07:00
Trevor Clinkenbeard 0594154644 Fixed getPenalty calculation 2019-04-16 10:17:41 -07:00
Andrew Noyes baa3e806ef Address review comments from #1446 2019-04-16 09:48:15 -07:00
Andrew Noyes 6207d724f8 Fix all -Wunused-variable warnings 2019-04-15 18:13:00 -07:00
Evan Tschannen cd5c9d91fa
Merge pull request #1443 from etschannen/master
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
Balachandar Namasivayam 04e9aa6afd For small clusters that are growing quickly, it could happen that the rateLimit is set to a low value and it would take very long to read the entire database. Fix this by setting the rateLimit to the maximum allowed value if reading the entire database is taking a long time. 2019-04-10 17:13:37 -07:00
Jingyu Zhou ab834c4f7e Move profiling option help message to devhelp 2019-04-09 13:26:12 -07:00
Evan Tschannen 6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
Remove unused functions
2019-04-09 11:49:45 -07:00
A.J. Beamon 058d028099
Merge pull request #1301 from mpilman/features/cheaper-traces
Defer formatting in traces to make them cheaper
2019-04-09 10:11:04 -07:00
Evan Tschannen 21c0ba555c Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	versions.target
2019-04-08 18:38:42 -07:00
Evan Tschannen d126730b4d fixed a spurious test error where process_behind was treated as an error 2019-04-08 17:09:54 -07:00
A.J. Beamon 538b431656 Apply suggestions from code review 2019-04-08 14:55:58 -07:00
A.J. Beamon a7288e1325 Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate. 2019-04-08 14:21:24 -07:00
mpilman 789cd67bcd Don't compile JavaWorkload by default 2019-04-08 13:06:29 -07:00
mpilman c45fe8c697 Fixed typo 2019-04-08 11:33:45 -07:00
Trevor Clinkenbeard b286102d34 Update fdbserver/workloads/LocalRatekeeper.actor.cpp
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-08 11:06:17 -07:00
mpilman d2e74cb2c0 Fix stupid rounding error 2019-04-08 11:05:29 -07:00
mpilman aaa8f73bdc fixed missing refactoring code 2019-04-08 11:05:29 -07:00
mpilman bdba8e22eb Added test and bugfixes 2019-04-08 11:05:29 -07:00
mpilman b944e0b116 generalized read guards, allow for penalty+error 2019-04-08 11:04:44 -07:00
mpilman 207049e852 fixed serialization 2019-04-08 11:04:44 -07:00
mpilman 32393ec4c9 Prototype of local ratekeeper 2019-04-08 11:04:44 -07:00
Evan Tschannen 05869a8383 do not log a degraded reset message if the previous reset was more than a week ago 2019-04-07 23:00:58 -07:00
Jingyu Zhou 4b08042a88 Change memory profiling threshold to a flag 2019-04-05 16:33:51 -07:00
Andrew Noyes d7612a4426 Fix OPEN_FOR_IDE build errors 2019-04-05 16:30:42 -07:00
Jingyu Zhou 09b2c35d11 Dump heap profiler when memory usage is high
Set the threshold of dump to 2GB.
2019-04-05 16:12:23 -07:00
mpilman d01cbf3455 Addressed code review comments 2019-04-05 13:12:20 -07:00
mpilman 4287b1d2a1 resolved minor merge issues 2019-04-05 13:12:19 -07:00
A.J. Beamon 614a599a04 Update fdbserver/SimulatedCluster.actor.cpp
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-05 13:12:19 -07:00
mpilman 39ecbedd74 Fixed compilation errors on OS X & gcc8 2019-04-05 13:12:19 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
mpilman ea67b742c7 Implemented Traceable for printable types 2019-04-05 13:12:19 -07:00
mpilman bb82f8560a process all volatile ints correctly in traces 2019-04-05 13:12:19 -07:00
mpilman 02e3b634fb Compile sqlite with NDEBUG so we can debug 2019-04-05 13:12:19 -07:00
mpilman c008e16c81 Defer formatting in traces to make them cheaper
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:

- TraceEvent::detail now takes a c-string instead of std::string for
  literals. This prevents unnecessary allocations if the trace is not
  going to be printed in the first place (for example for SevDebug).
  Before that `detail` expected a `std::string` as key, which mean that
  any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
  specialized for any type that needs to be printed. The actual
  formatting will be deferred to after the `enabled` check. This
  provides two benefits: (1) if a TraceEvent is disabled, we don't pay
  for the formatting and (2) TraceEvent can trace types that it doesn't
  know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
  passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
  calls, a call to detail will only introduce a if-branch which is much
  cheaper than a function call.
2019-04-05 13:12:19 -07:00
Jingyu Zhou acf60c5e9a
Merge pull request #1414 from jzhou77/pprof
Add manually triggered heap profiling
2019-04-04 22:27:33 -07:00
Jingyu Zhou 5be592632b Change trace event message
If heap profiler is not running, we can't take a snapshot of the profile.
2019-04-04 15:29:50 -07:00
Jingyu Zhou f538df5e6c Add TraceEvent if unable to invoke heap profiler 2019-04-04 15:26:41 -07:00
Evan Tschannen 390ab9cfed A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy 2019-04-04 14:11:12 -07:00
Alex Miller 8f49be480b
Update fdbserver/worker.actor.cpp
Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>
2019-04-04 13:32:10 -07:00
Jingyu Zhou eaaf58ee34 Refactor profiler into cpu and heap profilers 2019-04-03 20:54:30 -07:00
Jingyu Zhou 3371cf22d4 Add manually triggered heap profiling
At client side:
fdb> profile
ERROR: Usage: profile <client|list|flow|heap>
fdb> profile heap 127.0.0.1:4500

On the server side:
$ HEAPPROFILE=/tmp/fdbserver bin/fdbserver -C ../test.cluster -p 127.0.0.1:4500
Starting tracking the heap
FDBD joined cluster.
Dumping heap profile to /tmp/fdbserver.0001.heap (1024 MB allocated cumulatively, 13 MB currently in use)
Dumping heap profile to /tmp/fdbserver.0002.heap (User triggered heap dump)
2019-04-03 16:00:54 -07:00
Markus Pilman 101a05ae77
Merge branch 'master' into features/client-simulator 2019-04-03 10:03:56 -08:00
Jingyu Zhou fc59587b3c
Merge pull request #1393 from jzhou77/pprof
Gperftools Profiling fix.
2019-04-03 10:35:31 -07:00
Evan Tschannen 39c595223b Merge branch 'release-6.1' 2019-04-02 22:30:02 -07:00
Evan Tschannen 30133a30e0
Merge pull request #1403 from etschannen/release-6.1
Ported a bug fix to the 6.0 log system, and updated documentation
2019-04-02 17:56:18 -07:00
Jingyu Zhou 56a1128a9b Enhance cmake's gperftools support
Add compiler flags and link flags for gperftools.
2019-04-02 17:34:29 -07:00
Evan Tschannen 31ed73d9f5 Ported the bug fix https://github.com/apple/foundationdb/pull/1379 to OldTLogServer_6_0 2019-04-02 15:27:37 -07:00
Evan Tschannen 1d4a6ab551 cleaned up status to keep the healthyZone read separated from relicaFutures 2019-04-02 14:46:56 -07:00
Evan Tschannen a38c396283 made all maintenance transactions lock aware 2019-04-02 14:27:48 -07:00
Evan Tschannen 628fec8c8b updated status with information about ongoing maintenance
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
mpilman 371a41dbba Allow classPath to be modified at runtime 2019-04-02 11:56:40 -07:00
mpilman e19901186f Fixed buggy register preparation for natives 2019-04-02 11:56:03 -07:00
Evan Tschannen 72203ba47a Merge commit '56f3f0b1bc60604f965152d856ae29a591227703' 2019-04-01 18:45:38 -07:00
Evan Tschannen 781cf9b5a0 added the ability to make a zoneId for maintenance in fdbcli 2019-04-01 17:55:13 -07:00
Evan Tschannen f5de52de91 fix: cancel the previous log system recruitment before calling newEpoch, to avoid multiple actors attempting to modify oldLogSystem at the same time 2019-04-01 16:38:25 -07:00
Jingyu Zhou 49fdc35e5e Gperftools Profiling fix.
Fix a bug and update gperftools compiling flags

The added flags are recommended by gperftools here:
https://github.com/gperftools/gperftools

Verified that heap profiles are saved with the following command:
HEAPPROFILE=/tmp/fdbserver fdbserver [args...]
2019-04-01 14:42:18 -07:00
mpilman b148981bba Fixed compilation issues with char* 2019-04-01 14:29:45 -07:00
Jingyu Zhou 47b4b82628
Merge branch 'master' into fix-unreferenced 2019-04-01 14:07:19 -07:00
Jingyu Zhou 3f76be8f45 Merge remote-tracking branch 'apple/master' into fix-unreferenced 2019-04-01 14:00:43 -07:00
Jingyu Zhou f7f8ddd894 Fix warnings on unused variables
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
mpilman e23e63c6ac Implemented JavaWorkload
This change allows a user to write a workload in Java.

The way this is implemented is by creating a JVM within the
simulator and calling the corresponding workload class. A
workload can then run in the simulator or on a testing cluster.

If the workload is executed within the simulator, the resulting
test will not be deterministic anymore as it will execute in a
different thread (and even without that it is not clear, whether
we could get determinism as the JVM does a lot of stuff that are
not deterministic).

This is intendet to get better testing of the Java client and
layer authors can use the simulator to test their layers on a single
machine but they can still simulate failing machines etc.
2019-03-31 17:57:43 -07:00