Trevor Clinkenbeard
53f8ba499c
Merge branch 'master' into features/sqlite-crc32c
2019-05-24 16:46:32 -07:00
A.J. Beamon
20d83d61db
Merge branch 'master' into thread-safe-random-number-generation
2019-05-23 11:07:08 -07:00
Evan Tschannen
b451c2cd56
Merge pull request #1497 from alexmiller-apple/fastrecovery
...
Add an \xff keyrange that is backed by the txnStateStore.
2019-05-23 10:52:35 -07:00
A.J. Beamon
f417e60264
Merge branch 'merge-release-6.1-into-master' into thread-safe-random-number-generation
...
# Conflicts:
# fdbserver/QuietDatabase.actor.cpp
2019-05-23 09:52:00 -07:00
A.J. Beamon
d29c7e4c9b
Merge branch 'release-6.1' into merge-release-6.1-into-master
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/QuietDatabase.actor.cpp
# versions.target
2019-05-23 09:28:45 -07:00
A.J. Beamon
e5381e0612
Fix some new usages of g_random
2019-05-23 09:23:27 -07:00
A.J. Beamon
603721e125
Merge branch 'master' into thread-safe-random-number-generation
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/AsyncFileCached.actor.h
# fdbrpc/genericactors.actor.cpp
# fdbrpc/sim2.actor.cpp
# fdbserver/DiskQueue.actor.cpp
# fdbserver/workloads/BulkSetup.actor.h
# flow/ActorCollection.actor.cpp
# flow/Net2.actor.cpp
# flow/Trace.cpp
# flow/flow.cpp
2019-05-23 08:35:47 -07:00
Evan Tschannen
003cc6be18
fix: nothingPersistent could be incorrect when popped is equal to persistentDataVersion
2019-05-22 20:23:35 -10:00
chaoguang
c527b1a6b1
renaming function, add comments, fix bugs.
2019-05-22 17:39:36 -07:00
Evan Tschannen
4e12721227
fix: nothingPersistent could be incorrect when popped is equal to persistentDataVersion
2019-05-22 11:23:21 -07:00
Stephen Atherton
0fb8612ef5
debug_printf_noop() was incorrectly defined as a function, which still has a runtime cost of argument evaluation.
2019-05-22 03:40:18 -07:00
Stephen Atherton
f99c36aad2
Fixed merge mistake.
2019-05-22 00:23:31 -07:00
Stephen Atherton
ebc96a7e0e
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/VersionedBTree.actor.cpp
2019-05-21 23:49:27 -07:00
Stephen Atherton
e9197a8f70
Added time limit.
2019-05-21 22:19:14 -07:00
Stephen Atherton
3f8fce0296
Checkpointing progress on single-version mode in VersionedBTree. Subtree clears now work, preserving internal page boundary keys when necessary. Multi-version mode is unfortunately now broken, in addition to being incomplete. Added serial and simple btree unit test options.
2019-05-21 19:16:32 -07:00
chaoguang
57968d9df7
Merge branch 'master' of https://github.com/apple/foundationdb into MakoWorkload
2019-05-21 16:24:11 -07:00
chaoguang
0bbcc75e4b
fix bug
2019-05-21 16:22:02 -07:00
Evan Tschannen
a686402671
Merge branch 'feature-pop-diskqueue' into feature-slow-storage-failure
2019-05-21 15:19:06 -07:00
Evan Tschannen
9604452e50
mistakenly changed a quiet database parameter
2019-05-21 15:17:46 -07:00
Evan Tschannen
90fe085696
fix: the healthyZone needs to be checked again once the timeout is expected to have elapsed
2019-05-21 13:49:16 -07:00
Evan Tschannen
a8e8be5aac
added a wait failure client which always waits the full failure reaction time, even if it knows the interface is never coming back
...
use this new wait failure client in data distribution, to give time for a storage server to rejoin the cluster after its interface fails
2019-05-21 11:54:17 -07:00
Evan Tschannen
f4b18f2c4f
fixed whitespace
2019-05-21 11:31:34 -07:00
Evan Tschannen
23091a7d96
fixed review comments
2019-05-21 10:53:36 -07:00
Evan Tschannen
ee04c583fa
fix: do not pop the disk queue past the persistentDataVersion
2019-05-21 10:40:30 -07:00
Evan Tschannen
4059d68348
fix: the tlog would not pop data from the disk queue after a storage server was removed, because the tag still exists in memory on the logs
...
fix: we could incorrectly make data durable if eraseMessagesFromMemory was in progress while running updatePersistentData
the quiet database check now ensure that tlogs have no more than 30 seconds of versions unpopped from the disk queue
2019-05-20 23:58:45 -07:00
chaoguang
12a51b2d39
fix bugs, update naming and comments, refine functions
2019-05-20 18:26:30 -07:00
Evan Tschannen
f4fbaac6b0
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-05-19 10:27:59 -07:00
A.J. Beamon
a8b9d8e34b
Merge pull request #1336 from tclinken/fast-allocate-ptree-nodes
...
Create 96-byte fast allocator for storage queue PTree nodes
2019-05-17 14:22:46 -07:00
Steve Atherton
5a8c97480a
Merge pull request #1506 from nikolas-ioannou/feature-pagecache-lru
...
AsyncFileCached: switch from a random to an LRU cache eviction policy
2019-05-17 13:42:21 -07:00
Jingyu Zhou
b8e7fc1b84
Refactor: add std:: qualifier and use emplace_back
2019-05-17 09:38:50 -10:00
Trevor Clinkenbeard
12ff747e6a
Avoid tracing in PageChecksumCodec::checksum if silent flag is set
2019-05-17 10:49:53 -07:00
Trevor Clinkenbeard
3fac380b90
Avoid tracing in PageChecksumCodec::checksum if silent flag is set
2019-05-17 10:43:28 -07:00
Alvin Moore
22fa0fa1d4
Merge pull request #1599 from AlvinMooreSr/winproject-update
...
Upgraded Windows Tools within projects to 2017
2019-05-17 03:07:39 -07:00
Trevor Clinkenbeard
20e93c67ea
Allow sqlite pages to be checked for CRC32 checksum
...
Future versions of FDB will write sqlite pages with CRC32 checksums. In
order to roll back to this version from a version that writes CRC32
checksums, this version must be able to verify those checksums.
2019-05-17 01:05:06 -07:00
Alvin Moore
3acaa7343e
Enabled C++17 for all Windows projects
...
Set Visual Studio version to 2017 (first version to support C++17)
2019-05-16 17:44:13 -07:00
Paul J. Davis
53b97fe506
Extend support for parentpid
...
This adds support for the `--parentpid` option to non-Windows platforms.
This option is intended for testing layer implementations. When running
higher level CI chains its useful to ensure that any ephemeral instances
of fdbserver are automatically reaped.
2019-05-16 14:24:11 -10:00
Trevor Clinkenbeard
d7bcbe1210
Refactored PageChecksumCodec::checksum
2019-05-16 16:07:35 -07:00
Trevor Clinkenbeard
90d886df95
Trace both hashlittle2 and crc32 checksums for SQLitePageChecksumFailure
2019-05-16 15:51:21 -07:00
Alvin Moore
94aed513c7
Switched Windows tools within projects to 2017
2019-05-16 15:05:11 -07:00
Trevor Clinkenbeard
04a72bdad6
Eliminate duplicate code in PageChecksumCodec::checksum
2019-05-16 11:09:37 -07:00
Trevor Clinkenbeard
aca90cd4e2
Don't use memcpy in PageChecksumCodec::checksum
2019-05-16 07:25:58 -07:00
chaoguang
6788c8eb7d
update cleanup process
2019-05-15 16:17:01 -07:00
chaoguang
106bb7677d
update
2019-05-15 12:58:12 -07:00
Alex Miller
658e61b394
And now use spilledOnly as a hint to do parallel peeks.
...
If there's some spilled data, there's probably a lot of spilled data,
and now we can pull all of it faster.
2019-05-14 21:03:44 -10:00
Alex Miller
69fb852ee0
Add more CLOEXEC-like things.
...
From missed call sites found during/after code review.
2019-05-14 20:30:58 -10:00
Alex Miller
4eb4c03ce5
Save TLog resources by letting peek request only spilled data.
...
If a peek is entirely fulfilled from spilled data, then it's likely that
the next peek will be also. It is thus wasteful for each of these peeks
to call peekMessagesFromMemory, which memcpy's excessively, and then
throw all that data away without using it.
Now, TLogs will give a hint back to peek cursors about if the provided
reply was served entirely from the spilled data, which peek curors then
feed back as the hint into their next request.
At some point, a cursor will send a request for only spilled data, get
an incomplete response, and then be told to send its next request as one
that peeks from memory as well, and then it will fully catch up.
2019-05-14 15:38:48 -10:00
Trevor Clinkenbeard
601c38ad82
Use crc32 for sqlite page checksums
2019-05-14 13:43:55 -07:00
chaoguang
4c9cc44c73
add paras
2019-05-14 10:13:13 -07:00
mpilman
46e7a0ca56
address reviews and make compile with `-Wunused-variable`
2019-05-13 14:15:23 -07:00
mpilman
57912b33a5
fixed merge error
2019-05-13 14:15:23 -07:00
mpilman
96aaa31a6c
Compiling on clang again
2019-05-13 14:15:23 -07:00
mpilman
20c3f7f264
remove mixed-mode support
2019-05-13 14:15:23 -07:00
mpilman
42385c2f81
Fixed issues introduced during rebase
2019-05-13 14:15:23 -07:00
mpilman
f6fbad5061
Fix memory bug
2019-05-13 14:15:23 -07:00
mpilman
44db3450ec
Several flatbuffers bug fixes
2019-05-13 14:15:23 -07:00
mpilman
9c02354255
pass NDEBUG to sqlite to enable debug mode
2019-05-13 14:15:23 -07:00
mpilman
69fa3d3903
fixed compilation issues after rebase
2019-05-13 14:15:23 -07:00
mpilman
642a96807b
Fixed compilation issues after rebase
2019-05-13 14:15:22 -07:00
mpilman
6afce01744
Implementation complete (not yet working)
2019-05-13 14:15:22 -07:00
mpilman
92bad76479
Wrap ClusterClientInterface into its own type
...
When a process joins a cluster it fetches the cluster
interface. However, not the whole interface is exposed
to the client. This mechanism relies on the fact that
the serializer keeps the field ordering and doesn't
verify the message before parsing it.
To make this work, we provide a client type with one
member (the ClusterInterface which is exposed to the
client and the server). This client interface has the
same FileIdentifier as the ClusterControllerFullInterface
which has the same first member. This works because
FlatBuffers allows for members to be missing.
2019-05-13 14:15:22 -07:00
mpilman
9eeb48c43d
Allow to turn on object serializer
...
This commit includes functionality to turn on
the object serializer for network communication.
This is done the following way:
- On incoming connections, a process will detect
whether the client supports the object serializer
and will only serialize responses with it, if it does
- On outgoing connections, the command line flag is used
to determine whether the object serializer should be used
to send data.
This way, a cluster can run in mixed mode. To upgrade one
can upgrade one process at a time and set the flag one process
at a time.
This is how this is tested on the simulator:
- The command line flag can take three options: on, off,
and random.
- For off, the object serializer will never we used.
- For on, the object serializer will be always used.
- For random, the simulator will flip a coin for each
process it starts up.
2019-05-13 14:15:22 -07:00
mpilman
ba83c458a6
types implemented
2019-05-13 14:15:22 -07:00
Nikolas Ioannou
067cdf9cde
Simplified cache eviction policy knob arg check.
2019-05-13 08:50:04 +02:00
Evan Tschannen
8c3516951a
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-05-12 20:13:49 -07:00
Alex Miller
4a7e0319c7
Refactor away pushlock.
...
Pushing was already a serialized, sequential operation.
Instead make it explicit that there are two waits as part of a push:
1. The setup work to reserve a spot on in the file
2. The work of writing and sync'ing the data
And we return a Future<Future<Void>> to force these to be done sequentially.
2019-05-10 20:30:52 -10:00
Alex Miller
ea12a54946
Rename DISK_QUEUE_MAX_TRUNCATE_EXTENTS -> ..._BYTES
...
So as to not make filesystem assumptions. This knob did technically
appear in (only the) 6.1.5 release, but this feature was broken 6.1.5,
so thus impossible to use anyway.
2019-05-10 18:26:22 -10:00
Alex Miller
c95d09f9fd
Convert truncate(0) to truncate(4KB) on Windows.
...
Blindly, in case Windows doesn't like 0 length truncates too.
2019-05-10 14:55:11 -10:00
Alex Miller
c502ed3d15
Fix a variety of problems stemming from a wait() being added to push().
...
And that this code was previously insufficiently tested.
2019-05-10 14:55:11 -10:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Chaoguang
5678a7417e
Mako Workload
2019-05-09 15:55:05 -07:00
Alex Miller
510b0b2fcd
Fix DiskQueue not replaceFile'ing frequently enough for the final time.
2019-05-08 23:08:25 -10:00
Alex Miller
c6c33a4daa
Make replaceFile more likely to be tested.
2019-05-08 21:23:42 -10:00
Alex Miller
0d0f54d1e6
Fix IAsyncFileSystem::open() flags to stop a crash.
...
OPEN_ATOMIC_WRITE_AND_CREATE was missing a required OPEN_CREATE.
I'm honestly baffled how this was missed in testing.
2019-05-08 21:22:40 -10:00
Alex Miller
b50926c792
replaceFile is truncate(0) on windows
2019-05-08 21:22:14 -10:00
Evan Tschannen
22499666d0
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/LogRouter.actor.cpp
# flow/Trace.cpp
# versions.target
2019-05-08 18:19:35 -07:00
Alex Miller
e4ba2f5788
Add an ending TraceEvent.
2019-05-08 12:35:12 -10:00
Alex Miller
c093017c2f
Add a TraceEvent and release note.
2019-05-08 12:34:25 -10:00
Alex Miller
0685e6c1c7
Avoid large truncates in the DiskQueue.
...
And instead create a new file while incrementally truncating the old one
down. This avoids queueing up a massive number of filesystem metadata
operations in one call, thus flooding the disk with requests and
stalling out all other filesystem operations.
This sets the knobs so that a truncate of >10GB causes us to create a
new file rather than trying to truncate the old one.
2019-05-08 12:33:31 -10:00
Alex Miller
36dfbf4fb3
Only truncate DiskQueues down to TLOG_HARD_LIMIT*2.
...
DiskQueue shrinking was implemented for spill-by-reference, as now
a DiskQueue could grow "unboundedly" large.
Without a minimum file size, write burst workloads would cause the
DiskQueue to shrink down to 100MB, and then grow back to its usual ~4GB
size in a cycle. File growth means filesystem metadata mutations, which
we'd prefer to avoid if possible since they're more unpredicatble in
terms of latency.
In a healthy cluster, the TLog never spills, so the disk of a single
DiskQueue file should stay less than 2*TLOG_SPILL_THRESHOLD. In the
worst case of spill-by-value, the DiskQueue could grow to
2*TLOG_HARD_LIMIT. Therefore, having this limit will cause DiskQueue
shrinking to never behave sub-optimally for spill-by-value, and will
cause the DiskQueue files to return to the optimal size with
spill-by-reference.
2019-05-08 12:33:31 -10:00
Alex Miller
a269a784cc
Convert push() into an actor.
2019-05-08 12:33:31 -10:00
Evan Tschannen
68c773987c
Merge pull request #1544 from etschannen/release-6.1
...
The team tracker does not provide data movement priority information for non-failure related data movement
2019-05-08 11:39:17 -07:00
Balachandar Namasivayam
d45e7bf0b1
Addressed review comments
2019-05-07 17:19:59 -07:00
Evan Tschannen
d9a4553270
fix: The team tracker does not provide data movement priority information for non-failure related data movement
2019-05-07 17:06:54 -07:00
Balachandar Namasivayam
5d824f5fbc
Address review comments
2019-05-07 17:06:52 -07:00
Nikolas Ioannou
5793b1a55e
Validate cache eviction policy value after knob args have been set.
2019-05-07 08:32:57 +02:00
Balachandar Namasivayam
a0cc3d98a1
Add a workload to trigger repeated recoveries.
2019-05-06 18:16:44 -07:00
Austin Seipp
bf378952cb
fdbserver: fix some print/scan format warnings
...
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Evan Tschannen
93eb2a9395
Merge pull request #1527 from alexmiller-apple/tstlog-6.1
...
Spill-by-reference knob + TLog6.0 Spilled Peek deprioritization
2019-05-03 17:19:45 -07:00
Alex Miller
c918b21137
Deprioritize spilled peeks in spill-by-value, and improve its logic.
...
This deprioritizes before calling peekMessagesFromMemory, which should
improve the memory usage of the TLog, and makes sure to keep txsTag
peeks at a high priority to help recoveries stay fast.
2019-05-03 15:27:11 -07:00
Alex Miller
4052f3826a
Add a knob to limit the number of commits indexed per key.
...
Theoretically, we could spill 20MB of 22B mutations for one key, which
would generate a very long value being stored in SQLite, and very
inefficiently read back. This stops that from being a problem, at the
cost of some extra write calls.
2019-05-03 15:27:10 -07:00
Evan Tschannen
12088119d2
Merge pull request #1517 from alexmiller-apple/tstlog-6.1
...
Add a knob to limit amount of data read from sqlite for one PeekRequest.
2019-05-03 11:01:11 -07:00
Alex Miller
f4e48c3851
Add a knob to limit amount of data read from sqlite for one PeekRequest.
...
This prevents peeking from degrading over time if there are a very large
number of SpilledData entries for one particular tag.
2019-05-02 17:26:45 -07:00
Evan Tschannen
c91ac03ec6
LogRouterStats did not need to be a separate struct
2019-05-02 17:24:39 -07:00
Evan Tschannen
8590b710bf
added additional logging on the logs and log routers
2019-05-02 17:24:39 -07:00
Jingyu Zhou
e193cac5ef
Merge remote-tracking branch 'apple/master' into tlog
...
Resolve Conflicts: fdbserver/MasterProxyServer.actor.cpp
2019-05-01 17:18:00 -07:00
Evan Tschannen
2d5043c665
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-04-30 18:27:04 -07:00
Stephen Atherton
2801298ae8
Checkpointing incomplete and correctness-breaking progress on adding single-version mode to VersionedBTree.
2019-04-29 17:00:29 -07:00
Jingyu Zhou
8b5449e608
Fix review comments for PR #1473
2019-04-29 16:45:42 -07:00
Evan Tschannen
1a4c1759a4
Merge pull request #1429 from jzhou77/pprof
...
Dump heap profiler when memory usage is high
2019-04-29 16:31:44 -07:00
Alex Miller
f367385a80
Add clearing
2019-04-29 15:10:52 -07:00
Evan Tschannen
cacd82758e
Reduced data distribution speeds
2019-04-26 13:54:49 -07:00
Evan Tschannen
9ff8aca1da
Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation)
2019-04-26 13:53:56 -07:00
Evan Tschannen
1f37f82b87
invalid knob overrides do not prevent fdbserver from starting
2019-04-25 17:08:13 -07:00
Evan Tschannen
6c77864731
separate GetStorageServerRejoinInfoRequest from GetKeyServerLocationsRequest, to avoid yielding for the rejoin requests
2019-04-25 17:07:35 -07:00
Alex Miller
797d431934
Add an \xff keyrange that is backed by the txnStateStore.
2019-04-25 17:04:20 -07:00
Trevor Clinkenbeard
d339becd7c
Fix currentRate calculation for local ratekeeper
2019-04-25 15:35:34 -07:00
Jingyu Zhou
5462f560e7
Add pseudo locality for log routers and tlogs
...
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
popping.
Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
A.J. Beamon
253d2400ef
Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-04-23 14:38:52 -07:00
A.J. Beamon
ea7abff9df
Clean up from review
2019-04-23 14:16:52 -07:00
A.J. Beamon
4ad0496b39
Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process.
2019-04-23 14:01:51 -07:00
Stephen Atherton
df0548503d
Merge branch 'release-6.1' of https://github.com/apple/foundationdb into sqlite-grow-bigger
2019-04-23 13:43:58 -07:00
A.J. Beamon
e0f76edf77
Merge pull request #1471 from AlvinMooreSr/release-6.1-merge
...
Merge Release 6.1 Into Master
2019-04-23 11:08:21 -07:00
Stephen Atherton
83db547306
Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES.
2019-04-23 04:50:58 -07:00
Evan Tschannen
e0f7ec96aa
Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers
2019-04-22 17:29:46 -07:00
Jingyu Zhou
439d5a3843
Use emplace_back instead of push_back in Proxy
2019-04-22 14:03:48 -07:00
Jingyu Zhou
d2b215b926
Refactor tag population of ServerCacheInfo
2019-04-22 11:55:04 -07:00
Jingyu Zhou
7cb61c766b
Fix tLogLocalities for current LogSet
...
In toCoreState(), the serialization of current LogSet is different from old
TLog sets. The locality data should be generated, not copied over.
Found by:
-r simulation --crash -f tests/fast/KillRegionCycle.txt -s 254666356 -b on
2019-04-21 10:41:07 -07:00
Jingyu Zhou
8b67da57bb
Fix upgrade test failure
...
Serialize pseudoLocalities if protocol version is larger than 0x0FDB00B061060001LL.
Note this version may need to be changed to "currentProtocolVersion" when merging
into the master, and "currentProtocolVersion" should be incremented.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
9e8ffd2ff7
Refactor OldLogData ctor
2019-04-21 10:41:07 -07:00
Jingyu Zhou
97986a28b7
Replace push_back with emplace_back for efficiency
...
And better code readability.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
010f825aff
Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet
2019-04-21 10:41:07 -07:00
Jingyu Zhou
7befce6bf1
More pseudoLocalities and refactors.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
66000a07a5
Use emplace_back instead of push_back
2019-04-21 10:41:07 -07:00
Jingyu Zhou
966ec30fcc
Add pseudoLocalities for special tag consumers
2019-04-21 10:41:07 -07:00
Jingyu Zhou
b4e7e7a85b
Refactor StorageCache updates
2019-04-21 10:41:07 -07:00
Jingyu Zhou
82ec80c42f
Refactor TLogSet ctor
2019-04-21 10:41:07 -07:00
Jingyu Zhou
d19b0cf1c1
Refactor LogSet with two new constructors
2019-04-21 10:41:07 -07:00
Jingyu Zhou
0b1984978a
Small code refactoring.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
ec1bc5cfca
Add LogSystemType enum
2019-04-21 10:41:07 -07:00
Jingyu Zhou
6870e132b2
Merge branch 'master' into pprof
2019-04-19 14:06:44 -07:00
Andrew Noyes
d1e86779a6
Address review comments
2019-04-18 08:48:27 -07:00
Andrew Noyes
5af8208c62
Fix JavaWorkload unused variable
2019-04-17 16:29:22 -07:00
Andrew Noyes
ef04471a66
Fix more unused-variable warnings
2019-04-17 16:04:10 -07:00
Alvin Moore
2bea99591e
Merge branch 'release-6.1' of copy of master
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-04-17 15:51:48 -07:00
Andrew Noyes
13ba915a19
Fix more unused variable warnings
2019-04-17 15:38:08 -07:00
A.J. Beamon
43533b3d72
Don't validate the shard size estimate unless enough keys are sampled with a less than 100% probability.
2019-04-17 11:01:23 -07:00
Trevor Clinkenbeard
3426205167
Fixed readGuard usage bug
2019-04-16 15:05:57 -07:00
Trevor Clinkenbeard
1d921da170
readGuard sends server_overloaded error if request is rejected
2019-04-16 11:29:01 -07:00
Trevor Clinkenbeard
8a7d9afbe9
Fixed name of LocalRatekeeperWorkloadFactory
2019-04-16 10:36:09 -07:00
Trevor Clinkenbeard
0594154644
Fixed getPenalty calculation
2019-04-16 10:17:41 -07:00
Andrew Noyes
baa3e806ef
Address review comments from #1446
2019-04-16 09:48:15 -07:00
Andrew Noyes
6207d724f8
Fix all -Wunused-variable warnings
2019-04-15 18:13:00 -07:00
Evan Tschannen
cd5c9d91fa
Merge pull request #1443 from etschannen/master
...
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
Balachandar Namasivayam
04e9aa6afd
For small clusters that are growing quickly, it could happen that the rateLimit is set to a low value and it would take very long to read the entire database. Fix this by setting the rateLimit to the maximum allowed value if reading the entire database is taking a long time.
2019-04-10 17:13:37 -07:00
Jingyu Zhou
ab834c4f7e
Move profiling option help message to devhelp
2019-04-09 13:26:12 -07:00
Evan Tschannen
6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
...
Remove unused functions
2019-04-09 11:49:45 -07:00
A.J. Beamon
058d028099
Merge pull request #1301 from mpilman/features/cheaper-traces
...
Defer formatting in traces to make them cheaper
2019-04-09 10:11:04 -07:00
Evan Tschannen
21c0ba555c
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-04-08 18:38:42 -07:00
Evan Tschannen
d126730b4d
fixed a spurious test error where process_behind was treated as an error
2019-04-08 17:09:54 -07:00
A.J. Beamon
538b431656
Apply suggestions from code review
2019-04-08 14:55:58 -07:00
A.J. Beamon
a7288e1325
Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate.
2019-04-08 14:21:24 -07:00
mpilman
789cd67bcd
Don't compile JavaWorkload by default
2019-04-08 13:06:29 -07:00
mpilman
c45fe8c697
Fixed typo
2019-04-08 11:33:45 -07:00
Trevor Clinkenbeard
b286102d34
Update fdbserver/workloads/LocalRatekeeper.actor.cpp
...
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-08 11:06:17 -07:00
mpilman
d2e74cb2c0
Fix stupid rounding error
2019-04-08 11:05:29 -07:00
mpilman
aaa8f73bdc
fixed missing refactoring code
2019-04-08 11:05:29 -07:00
mpilman
bdba8e22eb
Added test and bugfixes
2019-04-08 11:05:29 -07:00
mpilman
b944e0b116
generalized read guards, allow for penalty+error
2019-04-08 11:04:44 -07:00
mpilman
207049e852
fixed serialization
2019-04-08 11:04:44 -07:00
mpilman
32393ec4c9
Prototype of local ratekeeper
2019-04-08 11:04:44 -07:00
Evan Tschannen
05869a8383
do not log a degraded reset message if the previous reset was more than a week ago
2019-04-07 23:00:58 -07:00
Jingyu Zhou
4b08042a88
Change memory profiling threshold to a flag
2019-04-05 16:33:51 -07:00
Andrew Noyes
d7612a4426
Fix OPEN_FOR_IDE build errors
2019-04-05 16:30:42 -07:00
Jingyu Zhou
09b2c35d11
Dump heap profiler when memory usage is high
...
Set the threshold of dump to 2GB.
2019-04-05 16:12:23 -07:00
mpilman
d01cbf3455
Addressed code review comments
2019-04-05 13:12:20 -07:00
mpilman
4287b1d2a1
resolved minor merge issues
2019-04-05 13:12:19 -07:00
A.J. Beamon
614a599a04
Update fdbserver/SimulatedCluster.actor.cpp
...
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-05 13:12:19 -07:00
mpilman
39ecbedd74
Fixed compilation errors on OS X & gcc8
2019-04-05 13:12:19 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
mpilman
ea67b742c7
Implemented Traceable for printable types
2019-04-05 13:12:19 -07:00
mpilman
bb82f8560a
process all volatile ints correctly in traces
2019-04-05 13:12:19 -07:00
mpilman
02e3b634fb
Compile sqlite with NDEBUG so we can debug
2019-04-05 13:12:19 -07:00
mpilman
c008e16c81
Defer formatting in traces to make them cheaper
...
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:
- TraceEvent::detail now takes a c-string instead of std::string for
literals. This prevents unnecessary allocations if the trace is not
going to be printed in the first place (for example for SevDebug).
Before that `detail` expected a `std::string` as key, which mean that
any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
specialized for any type that needs to be printed. The actual
formatting will be deferred to after the `enabled` check. This
provides two benefits: (1) if a TraceEvent is disabled, we don't pay
for the formatting and (2) TraceEvent can trace types that it doesn't
know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
calls, a call to detail will only introduce a if-branch which is much
cheaper than a function call.
2019-04-05 13:12:19 -07:00
Jingyu Zhou
acf60c5e9a
Merge pull request #1414 from jzhou77/pprof
...
Add manually triggered heap profiling
2019-04-04 22:27:33 -07:00
Jingyu Zhou
5be592632b
Change trace event message
...
If heap profiler is not running, we can't take a snapshot of the profile.
2019-04-04 15:29:50 -07:00
Jingyu Zhou
f538df5e6c
Add TraceEvent if unable to invoke heap profiler
2019-04-04 15:26:41 -07:00
Evan Tschannen
390ab9cfed
A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy
2019-04-04 14:11:12 -07:00
Alex Miller
8f49be480b
Update fdbserver/worker.actor.cpp
...
Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>
2019-04-04 13:32:10 -07:00
Jingyu Zhou
eaaf58ee34
Refactor profiler into cpu and heap profilers
2019-04-03 20:54:30 -07:00
Jingyu Zhou
3371cf22d4
Add manually triggered heap profiling
...
At client side:
fdb> profile
ERROR: Usage: profile <client|list|flow|heap>
fdb> profile heap 127.0.0.1:4500
On the server side:
$ HEAPPROFILE=/tmp/fdbserver bin/fdbserver -C ../test.cluster -p 127.0.0.1:4500
Starting tracking the heap
FDBD joined cluster.
Dumping heap profile to /tmp/fdbserver.0001.heap (1024 MB allocated cumulatively, 13 MB currently in use)
Dumping heap profile to /tmp/fdbserver.0002.heap (User triggered heap dump)
2019-04-03 16:00:54 -07:00
Markus Pilman
101a05ae77
Merge branch 'master' into features/client-simulator
2019-04-03 10:03:56 -08:00
Jingyu Zhou
fc59587b3c
Merge pull request #1393 from jzhou77/pprof
...
Gperftools Profiling fix.
2019-04-03 10:35:31 -07:00
Evan Tschannen
39c595223b
Merge branch 'release-6.1'
2019-04-02 22:30:02 -07:00
Evan Tschannen
30133a30e0
Merge pull request #1403 from etschannen/release-6.1
...
Ported a bug fix to the 6.0 log system, and updated documentation
2019-04-02 17:56:18 -07:00
Jingyu Zhou
56a1128a9b
Enhance cmake's gperftools support
...
Add compiler flags and link flags for gperftools.
2019-04-02 17:34:29 -07:00
Evan Tschannen
31ed73d9f5
Ported the bug fix https://github.com/apple/foundationdb/pull/1379 to OldTLogServer_6_0
2019-04-02 15:27:37 -07:00
Evan Tschannen
1d4a6ab551
cleaned up status to keep the healthyZone read separated from relicaFutures
2019-04-02 14:46:56 -07:00
Evan Tschannen
a38c396283
made all maintenance transactions lock aware
2019-04-02 14:27:48 -07:00
Evan Tschannen
628fec8c8b
updated status with information about ongoing maintenance
...
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
mpilman
371a41dbba
Allow classPath to be modified at runtime
2019-04-02 11:56:40 -07:00
mpilman
e19901186f
Fixed buggy register preparation for natives
2019-04-02 11:56:03 -07:00
Evan Tschannen
72203ba47a
Merge commit '56f3f0b1bc60604f965152d856ae29a591227703'
2019-04-01 18:45:38 -07:00
Evan Tschannen
781cf9b5a0
added the ability to make a zoneId for maintenance in fdbcli
2019-04-01 17:55:13 -07:00
Evan Tschannen
f5de52de91
fix: cancel the previous log system recruitment before calling newEpoch, to avoid multiple actors attempting to modify oldLogSystem at the same time
2019-04-01 16:38:25 -07:00
Jingyu Zhou
49fdc35e5e
Gperftools Profiling fix.
...
Fix a bug and update gperftools compiling flags
The added flags are recommended by gperftools here:
https://github.com/gperftools/gperftools
Verified that heap profiles are saved with the following command:
HEAPPROFILE=/tmp/fdbserver fdbserver [args...]
2019-04-01 14:42:18 -07:00
mpilman
b148981bba
Fixed compilation issues with char*
2019-04-01 14:29:45 -07:00
Jingyu Zhou
47b4b82628
Merge branch 'master' into fix-unreferenced
2019-04-01 14:07:19 -07:00
Jingyu Zhou
3f76be8f45
Merge remote-tracking branch 'apple/master' into fix-unreferenced
2019-04-01 14:00:43 -07:00
Jingyu Zhou
f7f8ddd894
Fix warnings on unused variables
...
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
mpilman
e23e63c6ac
Implemented JavaWorkload
...
This change allows a user to write a workload in Java.
The way this is implemented is by creating a JVM within the
simulator and calling the corresponding workload class. A
workload can then run in the simulator or on a testing cluster.
If the workload is executed within the simulator, the resulting
test will not be deterministic anymore as it will execute in a
different thread (and even without that it is not clear, whether
we could get determinism as the JVM does a lot of stuff that are
not deterministic).
This is intendet to get better testing of the Java client and
layer authors can use the simulator to test their layers on a single
machine but they can still simulate failing machines etc.
2019-03-31 17:57:43 -07:00