Evan Tschannen
cacd82758e
Reduced data distribution speeds
2019-04-26 13:54:49 -07:00
Evan Tschannen
9ff8aca1da
Increased the SQLITE_CHUNK_SIZE to 100MB (left at 4MB for simulation)
2019-04-26 13:53:56 -07:00
Evan Tschannen
1f37f82b87
invalid knob overrides do not prevent fdbserver from starting
2019-04-25 17:08:13 -07:00
Evan Tschannen
6c77864731
separate GetStorageServerRejoinInfoRequest from GetKeyServerLocationsRequest, to avoid yielding for the rejoin requests
2019-04-25 17:07:35 -07:00
Alex Miller
797d431934
Add an \xff keyrange that is backed by the txnStateStore.
2019-04-25 17:04:20 -07:00
Trevor Clinkenbeard
d339becd7c
Fix currentRate calculation for local ratekeeper
2019-04-25 15:35:34 -07:00
Jingyu Zhou
5462f560e7
Add pseudo locality for log routers and tlogs
...
This changes the logic of pop operations from log routers (LG):
- LG pops tagLocalityLogRouterMapped from TLogs;
- TLog converts tagLocalityLogRouterMapped back to tagLocalityLogRouter before
popping.
Later when we add more psuedo localities, the same pattern can be used.
2019-04-23 21:35:56 -07:00
A.J. Beamon
253d2400ef
Merge branch 'release-6.1' into speed-up-and-parameterize-spring-cleaning
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-04-23 14:38:52 -07:00
A.J. Beamon
ea7abff9df
Clean up from review
2019-04-23 14:16:52 -07:00
A.J. Beamon
4ad0496b39
Increase the frequency that lazy deletes are run. Add more parameters for better control over the spring cleaning process.
2019-04-23 14:01:51 -07:00
Stephen Atherton
df0548503d
Merge branch 'release-6.1' of https://github.com/apple/foundationdb into sqlite-grow-bigger
2019-04-23 13:43:58 -07:00
A.J. Beamon
e0f76edf77
Merge pull request #1471 from AlvinMooreSr/release-6.1-merge
...
Merge Release 6.1 Into Master
2019-04-23 11:08:21 -07:00
Stephen Atherton
83db547306
Implemented the chunk size and db size hint fileControl options in our SQLite VFS implementation. KeyValueStoreSQLite now sets file chunk size based on a new knob, SQLITE_CHUNK_SIZE_PAGES.
2019-04-23 04:50:58 -07:00
Evan Tschannen
e0f7ec96aa
Data distribution needs to build new teams as old teams are removed to ensure data remains balanced across servers
2019-04-22 17:29:46 -07:00
Jingyu Zhou
439d5a3843
Use emplace_back instead of push_back in Proxy
2019-04-22 14:03:48 -07:00
Jingyu Zhou
d2b215b926
Refactor tag population of ServerCacheInfo
2019-04-22 11:55:04 -07:00
Jingyu Zhou
7cb61c766b
Fix tLogLocalities for current LogSet
...
In toCoreState(), the serialization of current LogSet is different from old
TLog sets. The locality data should be generated, not copied over.
Found by:
-r simulation --crash -f tests/fast/KillRegionCycle.txt -s 254666356 -b on
2019-04-21 10:41:07 -07:00
Jingyu Zhou
8b67da57bb
Fix upgrade test failure
...
Serialize pseudoLocalities if protocol version is larger than 0x0FDB00B061060001LL.
Note this version may need to be changed to "currentProtocolVersion" when merging
into the master, and "currentProtocolVersion" should be incremented.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
9e8ffd2ff7
Refactor OldLogData ctor
2019-04-21 10:41:07 -07:00
Jingyu Zhou
97986a28b7
Replace push_back with emplace_back for efficiency
...
And better code readability.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
010f825aff
Remove pseudoLocalities from LogSet, TLogSet, and CoreTLogSet
2019-04-21 10:41:07 -07:00
Jingyu Zhou
7befce6bf1
More pseudoLocalities and refactors.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
66000a07a5
Use emplace_back instead of push_back
2019-04-21 10:41:07 -07:00
Jingyu Zhou
966ec30fcc
Add pseudoLocalities for special tag consumers
2019-04-21 10:41:07 -07:00
Jingyu Zhou
b4e7e7a85b
Refactor StorageCache updates
2019-04-21 10:41:07 -07:00
Jingyu Zhou
82ec80c42f
Refactor TLogSet ctor
2019-04-21 10:41:07 -07:00
Jingyu Zhou
d19b0cf1c1
Refactor LogSet with two new constructors
2019-04-21 10:41:07 -07:00
Jingyu Zhou
0b1984978a
Small code refactoring.
2019-04-21 10:41:07 -07:00
Jingyu Zhou
ec1bc5cfca
Add LogSystemType enum
2019-04-21 10:41:07 -07:00
Jingyu Zhou
6870e132b2
Merge branch 'master' into pprof
2019-04-19 14:06:44 -07:00
Andrew Noyes
d1e86779a6
Address review comments
2019-04-18 08:48:27 -07:00
Andrew Noyes
5af8208c62
Fix JavaWorkload unused variable
2019-04-17 16:29:22 -07:00
Andrew Noyes
ef04471a66
Fix more unused-variable warnings
2019-04-17 16:04:10 -07:00
Alvin Moore
2bea99591e
Merge branch 'release-6.1' of copy of master
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
2019-04-17 15:51:48 -07:00
Andrew Noyes
13ba915a19
Fix more unused variable warnings
2019-04-17 15:38:08 -07:00
A.J. Beamon
43533b3d72
Don't validate the shard size estimate unless enough keys are sampled with a less than 100% probability.
2019-04-17 11:01:23 -07:00
Trevor Clinkenbeard
3426205167
Fixed readGuard usage bug
2019-04-16 15:05:57 -07:00
Trevor Clinkenbeard
1d921da170
readGuard sends server_overloaded error if request is rejected
2019-04-16 11:29:01 -07:00
Trevor Clinkenbeard
8a7d9afbe9
Fixed name of LocalRatekeeperWorkloadFactory
2019-04-16 10:36:09 -07:00
Trevor Clinkenbeard
0594154644
Fixed getPenalty calculation
2019-04-16 10:17:41 -07:00
Andrew Noyes
baa3e806ef
Address review comments from #1446
2019-04-16 09:48:15 -07:00
Andrew Noyes
6207d724f8
Fix all -Wunused-variable warnings
2019-04-15 18:13:00 -07:00
Evan Tschannen
cd5c9d91fa
Merge pull request #1443 from etschannen/master
...
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
Balachandar Namasivayam
04e9aa6afd
For small clusters that are growing quickly, it could happen that the rateLimit is set to a low value and it would take very long to read the entire database. Fix this by setting the rateLimit to the maximum allowed value if reading the entire database is taking a long time.
2019-04-10 17:13:37 -07:00
Jingyu Zhou
ab834c4f7e
Move profiling option help message to devhelp
2019-04-09 13:26:12 -07:00
Evan Tschannen
6220a5ce0f
Merge pull request #1370 from jzhou77/fix-unreferenced
...
Remove unused functions
2019-04-09 11:49:45 -07:00
A.J. Beamon
058d028099
Merge pull request #1301 from mpilman/features/cheaper-traces
...
Defer formatting in traces to make them cheaper
2019-04-09 10:11:04 -07:00
Evan Tschannen
21c0ba555c
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# versions.target
2019-04-08 18:38:42 -07:00
Evan Tschannen
d126730b4d
fixed a spurious test error where process_behind was treated as an error
2019-04-08 17:09:54 -07:00
A.J. Beamon
538b431656
Apply suggestions from code review
2019-04-08 14:55:58 -07:00
A.J. Beamon
a7288e1325
Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate.
2019-04-08 14:21:24 -07:00
mpilman
789cd67bcd
Don't compile JavaWorkload by default
2019-04-08 13:06:29 -07:00
mpilman
c45fe8c697
Fixed typo
2019-04-08 11:33:45 -07:00
Trevor Clinkenbeard
b286102d34
Update fdbserver/workloads/LocalRatekeeper.actor.cpp
...
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-08 11:06:17 -07:00
mpilman
d2e74cb2c0
Fix stupid rounding error
2019-04-08 11:05:29 -07:00
mpilman
aaa8f73bdc
fixed missing refactoring code
2019-04-08 11:05:29 -07:00
mpilman
bdba8e22eb
Added test and bugfixes
2019-04-08 11:05:29 -07:00
mpilman
b944e0b116
generalized read guards, allow for penalty+error
2019-04-08 11:04:44 -07:00
mpilman
207049e852
fixed serialization
2019-04-08 11:04:44 -07:00
mpilman
32393ec4c9
Prototype of local ratekeeper
2019-04-08 11:04:44 -07:00
Evan Tschannen
05869a8383
do not log a degraded reset message if the previous reset was more than a week ago
2019-04-07 23:00:58 -07:00
Jingyu Zhou
4b08042a88
Change memory profiling threshold to a flag
2019-04-05 16:33:51 -07:00
Andrew Noyes
d7612a4426
Fix OPEN_FOR_IDE build errors
2019-04-05 16:30:42 -07:00
Jingyu Zhou
09b2c35d11
Dump heap profiler when memory usage is high
...
Set the threshold of dump to 2GB.
2019-04-05 16:12:23 -07:00
mpilman
d01cbf3455
Addressed code review comments
2019-04-05 13:12:20 -07:00
mpilman
4287b1d2a1
resolved minor merge issues
2019-04-05 13:12:19 -07:00
A.J. Beamon
614a599a04
Update fdbserver/SimulatedCluster.actor.cpp
...
Co-Authored-By: mpilman <markus@pilman.ch>
2019-04-05 13:12:19 -07:00
mpilman
39ecbedd74
Fixed compilation errors on OS X & gcc8
2019-04-05 13:12:19 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
mpilman
ea67b742c7
Implemented Traceable for printable types
2019-04-05 13:12:19 -07:00
mpilman
bb82f8560a
process all volatile ints correctly in traces
2019-04-05 13:12:19 -07:00
mpilman
02e3b634fb
Compile sqlite with NDEBUG so we can debug
2019-04-05 13:12:19 -07:00
mpilman
c008e16c81
Defer formatting in traces to make them cheaper
...
This is the first part of making `TraceEvent` cheaper. The main idea is
to defer calls to any code that formats string. These are the main
changes:
- TraceEvent::detail now takes a c-string instead of std::string for
literals. This prevents unnecessary allocations if the trace is not
going to be printed in the first place (for example for SevDebug).
Before that `detail` expected a `std::string` as key, which mean that
any string literal would be copied on each call.
- Templates Traceable and SpecialTraceMetricType. These templates can be
specialized for any type that needs to be printed. The actual
formatting will be deferred to after the `enabled` check. This
provides two benefits: (1) if a TraceEvent is disabled, we don't pay
for the formatting and (2) TraceEvent can trace types that it doesn't
know about.
- TraceEvent::enabled will be set in the constructor if the Severity is
passed. This will make sure that `TraceEvent::init` is not called.
- `TraceEvent::detail` will be inlined. So for disabled TraceEvent
calls, a call to detail will only introduce a if-branch which is much
cheaper than a function call.
2019-04-05 13:12:19 -07:00
Jingyu Zhou
acf60c5e9a
Merge pull request #1414 from jzhou77/pprof
...
Add manually triggered heap profiling
2019-04-04 22:27:33 -07:00
Jingyu Zhou
5be592632b
Change trace event message
...
If heap profiler is not running, we can't take a snapshot of the profile.
2019-04-04 15:29:50 -07:00
Jingyu Zhou
f538df5e6c
Add TraceEvent if unable to invoke heap profiler
2019-04-04 15:26:41 -07:00
Evan Tschannen
390ab9cfed
A process will mark itself as degraded if it continually disconnects from a different process which the failure monitor thinks is healthy
2019-04-04 14:11:12 -07:00
Alex Miller
8f49be480b
Update fdbserver/worker.actor.cpp
...
Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>
2019-04-04 13:32:10 -07:00
Jingyu Zhou
eaaf58ee34
Refactor profiler into cpu and heap profilers
2019-04-03 20:54:30 -07:00
Jingyu Zhou
3371cf22d4
Add manually triggered heap profiling
...
At client side:
fdb> profile
ERROR: Usage: profile <client|list|flow|heap>
fdb> profile heap 127.0.0.1:4500
On the server side:
$ HEAPPROFILE=/tmp/fdbserver bin/fdbserver -C ../test.cluster -p 127.0.0.1:4500
Starting tracking the heap
FDBD joined cluster.
Dumping heap profile to /tmp/fdbserver.0001.heap (1024 MB allocated cumulatively, 13 MB currently in use)
Dumping heap profile to /tmp/fdbserver.0002.heap (User triggered heap dump)
2019-04-03 16:00:54 -07:00
Markus Pilman
101a05ae77
Merge branch 'master' into features/client-simulator
2019-04-03 10:03:56 -08:00
Jingyu Zhou
fc59587b3c
Merge pull request #1393 from jzhou77/pprof
...
Gperftools Profiling fix.
2019-04-03 10:35:31 -07:00
Evan Tschannen
39c595223b
Merge branch 'release-6.1'
2019-04-02 22:30:02 -07:00
Evan Tschannen
30133a30e0
Merge pull request #1403 from etschannen/release-6.1
...
Ported a bug fix to the 6.0 log system, and updated documentation
2019-04-02 17:56:18 -07:00
Jingyu Zhou
56a1128a9b
Enhance cmake's gperftools support
...
Add compiler flags and link flags for gperftools.
2019-04-02 17:34:29 -07:00
Evan Tschannen
31ed73d9f5
Ported the bug fix https://github.com/apple/foundationdb/pull/1379 to OldTLogServer_6_0
2019-04-02 15:27:37 -07:00
Evan Tschannen
1d4a6ab551
cleaned up status to keep the healthyZone read separated from relicaFutures
2019-04-02 14:46:56 -07:00
Evan Tschannen
a38c396283
made all maintenance transactions lock aware
2019-04-02 14:27:48 -07:00
Evan Tschannen
628fec8c8b
updated status with information about ongoing maintenance
...
clear the maintenance zone if a different storage server is detected failed
2019-04-02 14:15:51 -07:00
mpilman
371a41dbba
Allow classPath to be modified at runtime
2019-04-02 11:56:40 -07:00
mpilman
e19901186f
Fixed buggy register preparation for natives
2019-04-02 11:56:03 -07:00
Evan Tschannen
72203ba47a
Merge commit '56f3f0b1bc60604f965152d856ae29a591227703'
2019-04-01 18:45:38 -07:00
Evan Tschannen
781cf9b5a0
added the ability to make a zoneId for maintenance in fdbcli
2019-04-01 17:55:13 -07:00
Evan Tschannen
f5de52de91
fix: cancel the previous log system recruitment before calling newEpoch, to avoid multiple actors attempting to modify oldLogSystem at the same time
2019-04-01 16:38:25 -07:00
Jingyu Zhou
49fdc35e5e
Gperftools Profiling fix.
...
Fix a bug and update gperftools compiling flags
The added flags are recommended by gperftools here:
https://github.com/gperftools/gperftools
Verified that heap profiles are saved with the following command:
HEAPPROFILE=/tmp/fdbserver fdbserver [args...]
2019-04-01 14:42:18 -07:00
mpilman
b148981bba
Fixed compilation issues with char*
2019-04-01 14:29:45 -07:00
Jingyu Zhou
47b4b82628
Merge branch 'master' into fix-unreferenced
2019-04-01 14:07:19 -07:00
Jingyu Zhou
3f76be8f45
Merge remote-tracking branch 'apple/master' into fix-unreferenced
2019-04-01 14:00:43 -07:00
Jingyu Zhou
f7f8ddd894
Fix warnings on unused variables
...
Found by -Wunused-variable flag.
2019-04-01 14:00:20 -07:00
mpilman
e23e63c6ac
Implemented JavaWorkload
...
This change allows a user to write a workload in Java.
The way this is implemented is by creating a JVM within the
simulator and calling the corresponding workload class. A
workload can then run in the simulator or on a testing cluster.
If the workload is executed within the simulator, the resulting
test will not be deterministic anymore as it will execute in a
different thread (and even without that it is not clear, whether
we could get determinism as the JVM does a lot of stuff that are
not deterministic).
This is intendet to get better testing of the Java client and
layer authors can use the simulator to test their layers on a single
machine but they can still simulate failing machines etc.
2019-03-31 17:57:43 -07:00
Evan Tschannen
a46620fbee
Merge branch 'release-6.1'
2019-03-30 17:59:28 -07:00
Evan Tschannen
8ebf771392
cleanup cluster controller trace events
2019-03-30 14:17:18 -07:00
Alex Miller
e7ad39246c
Fix typo
2019-03-29 20:16:26 -07:00
Evan Tschannen
a44ffd851e
fix: the shared tlog could fail to update a stopped tlog’s queueCommitVersion to version if a second tlog registered before it could issue the first commit for the tlog
2019-03-29 20:11:30 -07:00
Evan Tschannen
d882c060bf
Merge commit '5dd6396eed0de0dfea6cf9eecc307995eff5cedc'
2019-03-28 18:00:55 -07:00
Balachandar Namasivayam
0bbdc15f71
Multi-test processes waits until a timeout if any of the tester processes restarts. Use getReplyUnlessFailedFor instead of getReply to detect the restarts and fail quickly instead of waiting for a timeout which is usually large.
2019-03-28 17:05:30 -07:00
Evan Tschannen
b6008558d3
renamed BinaryWriter.toStringRef() to .toValue(), because the function now returns a Standalone<StringRef>()
...
eliminated an unnecessary copy from the proxy commit path
eliminated an unnecessary copy from buffered peek cursor
2019-03-28 11:52:50 -07:00
Evan Tschannen
836bb95a7a
Merge pull request #1372 from etschannen/master
...
Merge 6.1 into master
2019-03-27 21:00:49 -07:00
Evan Tschannen
34b9d5e722
Merge pull request #1364 from etschannen/feature-fast-serialize
...
A few performance optimizations
2019-03-27 20:57:25 -07:00
Evan Tschannen
e5a80f2c94
optimized IPaddress
2019-03-27 18:21:13 -07:00
Jingyu Zhou
a55f06e082
Remove unused functions
...
Found with -Wunused-function flag.
2019-03-27 15:45:28 -07:00
Stephen Atherton
64554e90d4
Change this to THIS in actors for IDE compatibility.
2019-03-27 13:42:49 -07:00
Stephen Atherton
d5c8b6b083
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/VersionedBTree.actor.cpp
# flow/flow.h
2019-03-27 13:37:15 -07:00
A.J. Beamon
91014d4529
Add file changes that I accidentally failed to commit; fix naming issue in worker.
2019-03-27 08:41:19 -07:00
A.J. Beamon
71e2fdafb8
Changes to ratekeeper camel case
2019-03-27 08:24:25 -07:00
A.J. Beamon
d508658569
Make ratekeeper one word to match our existing convention
2019-03-27 08:15:19 -07:00
Jingyu Zhou
38c6681349
Fix some signed and unsigned mismatch warnings.
2019-03-26 14:54:11 -07:00
Jingyu Zhou
c0b58080ee
Fix type name warning for DDTeamCollection
...
Seen using 'class' now seen using 'struct' in DataDistribution.actor.cpp
2019-03-26 14:18:25 -07:00
Jingyu Zhou
7c02ee6fdd
Fix compiler warning about unreferenced exception variable
2019-03-26 13:43:47 -07:00
Jingyu Zhou
466a59a99d
Merge remote-tracking branch 'apple/release-6.1' into ratekeeper
2019-03-25 15:27:38 -07:00
Jingyu Zhou
f57a22e2ed
Add data distributor and ratekeeper to status output
2019-03-25 15:11:29 -07:00
Trevor Clinkenbeard
007abbc45b
Added 96-byte FastAllocator
...
Since storage queue nodes account for a large portion of memory usage,
we can save space by only allocating 96 bytes instead of 128 bytes for
each node.
2019-03-25 13:44:39 -07:00
Evan Tschannen
5e03e178de
Merge pull request #1345 from ajbeamon/support-multiple-client-or-worker-issues
...
Add support for a client or worker having multiple issues.
2019-03-24 17:27:50 -07:00
Evan Tschannen
d45159ebf7
Merge pull request #1307 from jzhou77/ratekeeper
...
Monitor placement of Ratekeeper and DataDistributor
2019-03-24 17:26:07 -07:00
Evan Tschannen
d6ad027d37
ratekeeper needs to be recruited for proxies to make progress, so if one has not registered with the cluster controller by the time we are accepting commits, recruit a new one
2019-03-24 16:48:24 -07:00
Evan Tschannen
f426d732ea
fix: forgot to remove one location where id_used was incremented for distributor and ratekeeper
2019-03-24 16:04:59 -07:00
Evan Tschannen
e8948726e8
once we recruit a ratekeeper, do not allow any other ratekeepers to register
2019-03-24 11:04:39 -07:00
Evan Tschannen
24c92a1870
Merge pull request #1352 from etschannen/feature-network-address-list
...
Changed NetworkAddressList to at most two addresses for performance
2019-03-24 10:22:38 -07:00
Evan Tschannen
50a4403661
fix: missing parathesis
2019-03-23 21:52:15 -07:00
Jingyu Zhou
40eec20252
Restore master PID in worker registration
...
This fix is lost during merge.
2019-03-23 21:02:11 -07:00
Jingyu Zhou
3ef26e6be3
Fix fitness assignment statements
...
Found by MacOS build.
2019-03-23 19:16:04 -07:00
Evan Tschannen
1fc6937802
changed NetworkAddressList to at most two addresses for performance
2019-03-23 17:54:46 -07:00
Evan Tschannen
b51a24453e
the data distributor and ratekeeper are not included in id_used, but when comparing equally good options we prefer to avoid sharing with those roles
...
excluded data distributor and ratekeeper were improperly killed when the best option was also excluded
2019-03-23 13:25:36 -07:00
Jingyu Zhou
10988f89d9
Code refactoring for ConsistencyCheck.actor.cpp
2019-03-23 11:06:43 -07:00
Jingyu Zhou
fdc5b5ddbf
Fix: spurious ratekeeper registration
...
A rare race condition:
-r simulation -f ./foundationdb/tests/slow/WriteDuringReadAtomicRestore.txt -s 114256311 -b on
- A is the ratekeeper.
- CC recruit B and B starts
- CC halts ratekeeper A and A is halted
- A registers back with CC, which then halts B. CC sets A to be the ratekeeper.
CC starts recruiting and finds A is the best machine. But skips recruiting
because CC thinks A is already used. Now the cluster is left with no ratekeeper.
Fix by disallowing ratekeeper registration with previous ID.
2019-03-23 11:03:51 -07:00
Jingyu Zhou
6523cd4931
Fix: recruit ratekeeper is not triggerred
2019-03-23 09:20:54 -07:00
Steve Atherton
09f37cf3d2
Merge pull request #533 from ajbeamon/fix-parent-directory
...
Fixes to parentDirectory() and abspath()
2019-03-22 23:53:46 -07:00
Evan Tschannen
2da46e3172
fix: halt if datacenters are different
2019-03-22 23:53:21 -07:00
Evan Tschannen
b68bc46042
Merge pull request #1348 from ajbeamon/fix-missing-metrics-when-ss-down
...
Fix missing read workload metrics
2019-03-22 19:08:04 -07:00
Evan Tschannen
d34c56c9a5
ensure that the processId exists in id_worker before accessing it
2019-03-22 18:54:39 -07:00
Balachandar Namasivayam
ac8ad07b45
Address review comments.
2019-03-22 18:48:49 -07:00
Balachandar Namasivayam
4ed323ac52
Fixed bug and addressed review comments.
2019-03-22 18:48:49 -07:00
Balachandar Namasivayam
d75020b44a
Fix bug where accessing shared memory created by boost 1.52 leads to error when accessed by boost 1.67.
2019-03-22 18:48:49 -07:00
Evan Tschannen
36ab852bb1
Merge branch 'master' into ratekeeper
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
2019-03-22 18:41:00 -07:00
Evan Tschannen
6254a1a8e4
fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop
2019-03-22 18:37:39 -07:00
Evan Tschannen
7dd1c1b60c
fix: processClassFitness could be wrong if the client changed their class while rebooting
2019-03-22 18:37:39 -07:00
Evan Tschannen
ddb6058770
simplified ratekeeper monitoring loop
2019-03-22 18:22:45 -07:00
Jingyu Zhou
12917d8c7d
Add actors to store halt request futures
...
Address best fitness in checking better DD or RK.
2019-03-22 18:06:38 -07:00
Jingyu Zhou
e8977aeb98
Remove clusterControllerDcId check
...
This is no longer needed since it'll be set in the ctor.
2019-03-22 18:01:54 -07:00
Evan Tschannen
82bc447e29
startRatekeeper is responsible for updating serverDBInfo
2019-03-22 17:56:16 -07:00
Evan Tschannen
82c80c225d
make sure id_worker is updated before setting ratekeeper or data distribution
2019-03-22 17:08:54 -07:00
Evan Tschannen
6a9c9d79cc
Update fdbserver/ClusterController.actor.cpp
2019-03-22 17:00:58 -07:00
Evan Tschannen
70b1c88cdd
Update fdbserver/ClusterController.actor.cpp
2019-03-22 17:00:52 -07:00
Jingyu Zhou
16f54577ee
Restore master PID in cluster controller worker registration
...
CC may think master failed and clear the master PID, which can block both data
distributor and ratekeeper recruitment. Fix by restoring it during worker
registration.
2019-03-22 14:53:05 -07:00
A.J. Beamon
fc48b6050e
When tabulating read workload metrics, ignore the absence of any particular storage server.
2019-03-22 14:22:22 -07:00
Evan Tschannen
78f7a2e40b
fix: restarting the provisional proxy causes all tlog peeks to restart, so if tlog peeks take longer than 1 second this could end in an infinite loop
2019-03-22 14:13:58 -07:00
A.J. Beamon
4eb5715689
Add support for a client or worker having multiple issues.
2019-03-22 08:29:41 -07:00
Jingyu Zhou
da338c3ad6
Avoid unnecessary recuriting of DD or RK
...
While waiting for recruting data distributor or ratekeeper, a previous one
could already joined. So we can skip this unnecessary recruiting.
Revert the change of worker.actor.cpp for ratekeeper. Instead, recruiting
ratekeeper should avoid the process with an existing one. This fixes a bug
where the ratekeeper interface became zombie, killing other healthy ratekeeper
but doing no useful work. Found by:
-r simulation --crash -f tests/fast/WriteDuringRead.txt -s 31858110 -b on
2019-03-21 22:40:07 -07:00
Evan Tschannen
fe4464e786
fix: processClassFitness could be wrong if the client changed their class while rebooting
2019-03-21 17:56:04 -07:00
Jingyu Zhou
299961aecb
Move ratekeeper or data distributor from excluded servers
2019-03-21 17:17:33 -07:00
Evan Tschannen
3ced178348
maxVersionDifference is a copy of a knob which is a double
2019-03-21 12:58:48 -07:00
Jingyu Zhou
48324ad4be
Fix a race during ratekeeper registration
...
When a ratekeeper registers, the monitorRatekeeper wakes up and recruits a new
ratekeeper. Adding a 0s delay to avoid this.
If a ratekeeper is recruited on an existing machine, update the interface so
that the cluster controller can clear the ratekeeperID.
2019-03-21 12:56:56 -07:00
Evan Tschannen
e692f0f70f
fix: degraded is only used for tlog recruitment, so we should not use it in the fitness calculation for other roles
2019-03-21 11:23:49 -07:00
Jingyu Zhou
8edefda193
Fix test stuck due to invalid worker in cluster controller
...
Test case:
-r simulation --crash -f ./tests/rare/CloggedCycleWithKills.txt -s 688927581 -b off
2019-03-20 22:24:01 -07:00
Evan Tschannen
59abd8f3d8
fix: make sure recoveryLocation is always a valid page
2019-03-20 18:12:56 -07:00
Evan Tschannen
3730142fcc
fix: after a rollback, uncommitted changes to the byte sample could be missed
2019-03-20 18:10:26 -07:00
Jingyu Zhou
937b6dde31
Fix a race of DD, RK, Master failure
...
If all DD, RK, Master run on the same process and failed. Recruiting of new
DD or RK could try to use the old master worker interface, which is an invalid
one and causes recruitment to be stuck.
Fix by adding a delay and checking master is valid before recruitment.
2019-03-20 16:19:20 -07:00
Evan Tschannen
2ed1d58d16
fix: change the location where stopped is checked, because a yield could cause cause stopped to be set after the existing check
2019-03-20 14:28:32 -07:00
Jingyu Zhou
ce5c6d18d2
Fix ratekeeper recruitment bug
2019-03-20 14:22:22 -07:00
Jingyu Zhou
86b687981b
Fix ratekeeper and data distributor recruiting bug
...
Avoid multiple concurrent recuriting of ratekeepers with a recruiting flag.
Fix endless recruiting when the chosen worker is a proxy or a resolver --
prefer master in this case.
2019-03-20 10:00:31 -07:00
Evan Tschannen
5a00f567be
fix CheckSatelliteTagLocation
2019-03-20 09:30:11 -07:00
Jingyu Zhou
474abd81bd
Move placement monitoring inside doCheckOutstandingRequests
2019-03-19 22:48:21 -07:00
Evan Tschannen
2605257737
Merge branch 'master' of github.com:apple/foundationdb
2019-03-19 18:47:29 -07:00
Evan Tschannen
f9aad46573
made use_provisional_proxies a transaction option
2019-03-19 18:44:37 -07:00
Evan Tschannen
20764efa24
Merge pull request #1320 from bnamasivayam/dc-as-satellite-config
...
Support config where the primary and remote DC's can be used as satel…
2019-03-19 15:49:24 -07:00
Balachandar Namasivayam
f9560e1abd
Addressed Review Comments
2019-03-19 15:23:14 -07:00
Jingyu Zhou
bc6fdaea3e
Recruit a new ratekeeper before halting the old
2019-03-19 15:21:46 -07:00
Evan Tschannen
5b9c45ea0b
clients do not attempt to connect to provisional proxies
2019-03-19 13:37:50 -07:00
Jingyu Zhou
0fb6a03c07
First round of review comment fixes for PR#1307
2019-03-19 11:29:19 -07:00
A.J. Beamon
2d7b48dadc
Merge pull request #1311 from etschannen/feature-increase-grv-batch
...
Increased the GRV client batch size
2019-03-19 08:23:05 -07:00
A.J. Beamon
7f4adcc338
Merge pull request #1314 from etschannen/feature-ssd-memory-spill
...
configure memory now selects the ssd engine for transaction log spilling
2019-03-19 08:22:22 -07:00
Vishesh Yadav
fea18e7be0
fix: fdbserver segfault when started with wrong arguments
...
Public address is required for roles FDBD, NetworkTestServer and
Restore only. Therefore, check those cases, and for others follow the
earlier behaviour of using default ip address 0.
FIXES #1305
2019-03-19 02:05:11 -07:00
Evan Tschannen
2554fed965
reduce max transaction to start
2019-03-18 16:16:03 -07:00
Evan Tschannen
87e2a1a029
The proxy budget is implemented to let one request over its limit through, and then pay back what was over the limit in the next update
2019-03-18 16:09:57 -07:00
Alex Miller
b11ecb3210
Remove random bits of code that were either unneeded or leftover from debugging.
2019-03-18 15:47:20 -07:00
Evan Tschannen
eb54a700ba
changed the old memory configuration to memory-1
2019-03-18 15:10:04 -07:00
Alex Miller
37ea71b117
Implement limiting how many bytes recovery will read.
...
This time, track what location in the DiskQueue has been spilled in
persistent state, and then feed it back into the disk queue before
recovery.
This also introduces an ASSERT that recovery only reads exactly the
bytes that it needs to have in memory.
2019-03-18 15:09:43 -07:00
Alex Miller
29ab7370cd
Clear versionLocation when spilling, and pop DQ separately.
...
Popping the disk queue now requires potentially recovering the location
to which we can pop from the spilled data itself, and for each tag we
must maintain the first location with relevant data.
The previous queue we had to represent the ordering, queueOrder, was
used by spilling, and popped when a TLog had been spilled. This means
that as soon as a TLog has been fully spilled, we have no idea how it
relates in order to other fully spilled TLogs.
Instead, use queueOrder to keep track of all the TLog UIDs until they're
removed, and use spillOrder to keep track of the order only for
spilling.
2019-03-18 15:09:22 -07:00
Jingyu Zhou
8d609eb51d
Protect ratekeeper registration race during recruitment
...
This is similar one to DataDistributor.
2019-03-18 13:53:50 -07:00
Balachandar Namasivayam
5471725db5
Support config where the primary and remote DC's can be used as satellites.
2019-03-18 12:17:59 -07:00
Jingyu Zhou
2b41a97a6e
Fix the issue of slow dying Data Distributor
...
Test with:
-r simulation -f ./foundationdb/tests/slow/CommitBug.txt -s 67828576 -b on
The test has the following event sequence:
- Time 113.3s, CC noticed DD failure, cleard DD interface.
- 1s later, DD rejoined and registered with CC.
- Time 131.7s, DD actor cancelled. This old DD raced to register with CC and
the failure monitor is not installed because monitorDataDistributor is stalled
waiting for new DD.
- Time 161.4s, new DD running. New DD recruting was delayed due to no servers
in the period.
Fix by disabling DD registration during the recruting process.
2019-03-17 22:19:23 -07:00
Evan Tschannen
44e25e219c
do not suppress KeyValueStoreMemory_OutOfSpace in simulation
2019-03-17 00:35:48 -07:00
Evan Tschannen
ec6c843124
increased the GRV client batch size, similarly increased the proxy limits related to the number of transactions started in a batch
2019-03-16 16:18:58 -07:00
Stephen Atherton
f88e53e640
Merge branch 'master' of https://github.com/apple/foundationdb into fix-parent-directory
2019-03-16 00:13:09 -07:00
Stephen Atherton
2efb6f4c0d
Added cleanPath() which puts a path in a canonical form without .., ., or duplicate separators without using the filesystem or resolving symbolic links. absPath() redefined to use cleanPath() so it will return the same result for a path without symbolic links regardless of whether or not the path actually exists. Redefined parentDirectory() to use absPath() and error on certain inputs. Added comments describing behavior of these functions, and added a unit test which verbosely tests many inputs to them.
2019-03-15 23:54:33 -07:00
Jingyu Zhou
254c78053c
Fix a segfault error
...
After wait, ServerDBInfo may have changed. Using the old copy is wrong.
2019-03-15 22:11:13 -07:00
Alex Miller
7f5bc2981f
Checksum DiskQueue pages on read, but at a lower priority.
...
If a server has its data spilled, then it's behind the 5s window.
Feeding it data is less important than committing, so we can hide the
extra CPU usage from checksumming the read amplified disk queue pages.
2019-03-15 21:01:19 -07:00
Alex Miller
ee4721a63f
Make checking or ignoring checksums part of the IDiskQueue::read API.
2019-03-15 21:01:18 -07:00
Alex Miller
81c59e88a8
Persist the protocol version of a TLog instance when it is created.
...
This allows us to do easy upgrades of SpilledData in the future, if the
need arises, because we then have a protocol version to compare against.
2019-03-15 21:01:17 -07:00
Alex Miller
bf247eeed0
If TLogVersion >= 3, use crc32c for the DiskQueue hash for TLogs.
...
We don't have a forward compatibility story for the memory storage
engine, so its DiskQueue will still be hashlittle2 until one exists.
2019-03-15 21:01:16 -07:00
Alex Miller
686b097397
Remove verification code from DiskQueue and TLogServer.
2019-03-15 21:01:15 -07:00
Alex Miller
bdd7d5d3df
Initialize firstPages with 0xFF.
...
There's various ASSERT()'s that assume firstPages is empty, and enforces
things about `seq`. Some of these asserts have spuriously passed, since
uninitialized pages look like they have a `seq` of 0, which would be the
beginning of the disk queue.
Now they'll look like the end of the disk queue, which is far easier to
fail on.
2019-03-15 21:01:14 -07:00
Alex Miller
77f596743f
Bump persistFormat in TLogServer to differ from OldTLogServer*
...
Though this format is being deprecated in favor of an eventual plumbing
through of TLogVersion, we should probably bump it anyway.
And also remove the fallback to OldTLogServer code. It should never be
executed, as OldTLogServer_6_0 is entirely relied upon to execute
OldTLogServer_4_6.
2019-03-15 21:01:13 -07:00
Alex Miller
4f98634f59
Add LogId to all TLog TraceEvents that have it.
2019-03-15 21:01:12 -07:00
Jingyu Zhou
12ddd56698
Fix Ratekeeper and DataDistributor placement
...
Make sure both RateKeeper and DataDistributor are placed in the same data
center as the Master. Make sure only one RateKeeper is live in the cluster as
well.
2019-03-15 17:09:28 -07:00
Jingyu Zhou
bb5686eb75
Fix monitoring of DD and RK
2019-03-15 16:02:17 -07:00
Jingyu Zhou
9f6fe5f649
Merge remote-tracking branch 'apple/master' into ratekeeper
2019-03-15 11:30:04 -07:00
Jingyu Zhou
40860e0093
Attempt to fix.
2019-03-15 11:29:04 -07:00
A.J. Beamon
85b3f11e71
Fix various compiler warnings
2019-03-15 10:34:57 -07:00
Stephen Atherton
126252a274
Changed checksum to crc32. Disabled pager housekeeping for now. Added more btree read/write/commit metrics. Changed readPage to use disk read priority. Bug fix in CommitSubtree causing it to recurse to children unnecessarily. Added point read speed test at the end of set performance unit test.
2019-03-15 00:46:09 -07:00
Jingyu Zhou
9e59c9c253
Check DataDistributor and RateKeeper fitness
...
Fail the test if they are not put in the best fitness.
2019-03-14 16:14:57 -07:00
Jingyu Zhou
99d521ef4f
Monitor Ratekeeper and DataDistributor to use stateless processes
...
Since Ratekeeper and DataDistributor are no longer running with Master, they
might be running with stateful processes before a new Master becomes alive,
which is undesirable.
This PR adds a monitoring of both Ratekeeper and DataDistributor at Cluster
Controller -- if Master runs on a stateless class and RK/DD runs at a worse
class, then RK/DD will be killed. I.e., RK/DD should be running at their own
classes or on the same stateless process as Master. After restart, RK/DD should
be running at a better process class.
2019-03-14 15:00:57 -07:00
Balachandar Namasivayam
2ac07fe7e0
Merge pull request #1248 from satherton/feature-backup-json
...
JSON output options for fdbbackup status and describe
2019-03-14 13:41:28 -07:00
Meng Xu
5a10bf5dfc
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-14 10:35:12 -07:00
Meng Xu
e30e2af1f3
ClientKnobs: Add CHECK_CONNECTED_COORDINATOR_NUM_DELAY
2019-03-13 16:54:56 -07:00
Evan Tschannen
e7d1f9e5f1
fixed review comments
2019-03-13 15:59:03 -07:00
Evan Tschannen
7f48025348
optimize confirm epoch alive
2019-03-13 14:47:17 -07:00
Steve Atherton
dbacfcbc82
Merge branch 'master' into feature-backup-json
2019-03-13 13:30:45 -07:00
Evan Tschannen
a2108047aa
removed LocalitySetRef and IRepPolicyRef typedefs, because for clarity the Ref suffix is reserved for arena allocated objects instead of reference counted objects.
2019-03-13 13:14:39 -07:00
Evan Tschannen
e068c478b5
merge master
2019-03-12 18:31:25 -07:00
Steve Atherton
8aab719c22
Merge branch 'master' into feature-backup-json
2019-03-12 18:23:16 -07:00
Evan Tschannen
a7e45cff91
Merge pull request #1176 from jzhou77/ratekeeper
...
Make Ratekeeper a separate role
2019-03-12 15:58:59 -07:00
Meng Xu
85c24b0067
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-12 15:20:54 -07:00
Evan Tschannen
5392742902
fixed review comments
2019-03-12 14:38:54 -07:00
Evan Tschannen
c5a18945b6
Merge pull request #1260 from vishesh/task/tls-upgrade
...
Allows cluster string to contain coordinators with different TLS states
2019-03-12 13:45:08 -07:00
A.J. Beamon
a25e224cda
Merge pull request #1213 from etschannen/feature-metadata-version
...
Added a metadata version key
2019-03-12 13:36:33 -07:00
Jingyu Zhou
2b0139670e
Fix review comment for PR 1176
2019-03-12 12:02:30 -07:00
Stephen Atherton
f0eae0295f
Merge branch 'master' of https://github.com/apple/foundationdb into feature-backup-json
2019-03-12 03:35:03 -07:00
Stephen Atherton
e9b8bf601e
Added backup status JSON output to backup workload to get sim coverage.
2019-03-12 03:34:38 -07:00
Balachandar Namasivayam
880e8643d1
Fix Windows link errors
2019-03-11 17:49:03 -07:00
Meng Xu
46f4b02807
TLS Status: Resolve review comments
...
Use connectedCoordinatorsNumDelayed to reduce the load on cluster controller;
Set connectedCoordinatorsNum to null by default for monitorLeader()
2019-03-11 17:10:08 -07:00
Evan Tschannen
5873705228
tlog commits very rarely take an additional 6 seconds
2019-03-11 12:11:17 -07:00
Meng Xu
435e515985
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-11 11:17:40 -07:00
Evan Tschannen
80c3f2f8e2
added status fields detailing which processes are degraded, and also the total number of degraded processes
2019-03-10 22:58:15 -07:00
Evan Tschannen
c6e94293bf
reset a process to not be degraded after 2 days
2019-03-10 22:39:21 -07:00
Evan Tschannen
2627bcd35e
Merge branch 'master' into feature-metadata-version
2019-03-10 21:13:28 -07:00
Evan Tschannen
1be9ae5ce3
fixed merge conflict
2019-03-08 22:51:06 -05:00
Evan Tschannen
044b6b4f8a
Merge branch 'master' into feature-degraded-tlog
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
2019-03-08 22:50:41 -05:00
mpilman
ebffe8c633
print correct pahes in alloc instrumentation
2019-03-08 15:03:17 -08:00
Evan Tschannen
45fe6b369b
tlog recruitment will prefer non-degraded processes, however it will not choose less than desired number of tlogs to avoid degraded processes
...
better master exists will switch the master to avoid degraded processes
2019-03-08 14:40:00 -05:00
Evan Tschannen
53f16b5347
when a tlog queue commit takes longer than 5 seconds, its process is marked as degraded
2019-03-08 11:46:34 -05:00
Evan Tschannen
710a64dc4e
replaced std::pair<WorkerInterface,ProcessClass> with a struct named WorkerDetails
2019-03-08 11:25:07 -05:00
mpilman
2537f26de6
First implementaion of more user-friendly cpack
...
Up unto here this code is only very rudiemantery tested.
This is a firest attempt of making cpack more user-friendly.
The basic idea is to generate a component for package type so
that we can have different paths depending on whether we build
an RPM, a DEB, a TGZ, or a MacOS installer. The cpack package
config file will then chose the correct components to use.
In a later point this should make it possible to build these
with `make packages` and the ugly iteration with calling cmake
between each package would be obsolete. While this solution is
a bit more bloated, it is also much more flexible and it will be
much easier to use.
Another benefit is, that this will get rid of all warnings during
a cpack run
2019-03-07 16:49:29 -08:00
Jingyu Zhou
cdfe906c30
Data distributor pulls batch limited info from proxy
...
Add a flag in HealthMetrics to indicate that batch priority is rate limited.
Data distributor pulls this flag from proxy to know roughly when rate limiting
happens.
DD uses this information to determine when to do the rebalance in the background,
i.e., moving data from heavily loaded servers to lighter ones. If the cluster is
currently rate limited for batch commits, then the rebalance will use longer
time intervals, otherwise use shorter intervals. See BgDDMountainChopper() and
BgDDValleyFiller() in DataDistributionQueue.actor.cpp.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
f43277e819
Format Ratekeeper.actor.cpp code
2019-03-07 13:16:20 -08:00
Jingyu Zhou
dc129207a9
Minor fix after rebase.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
835cc278c3
Fix rebase conflicts.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
7340998261
Fix status message for ratekeeper
2019-03-07 13:16:20 -08:00
Jingyu Zhou
517966fce2
Remove lastLimited from rate keeper
...
Refactor code to make IDE happy.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
d52ff738c0
Fix merge conflicts during rebase.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
b2ee41ba33
Remove lastLimited from data distribution
...
Fix a serialization bug in ServerDBInfo, which causes test failures.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
36a51a7b57
Fix a segfault bug due to uncopied ratekeeper interface
2019-03-07 13:16:20 -08:00
Jingyu Zhou
e6ac3f7fe8
Minor fix on ratekeeper work registration.
2019-03-07 13:16:20 -08:00
Jingyu Zhou
3c86643822
Separate Ratekeeper from data distribution.
...
Add a new role for ratekeeper.
Remove StorageServerChanges from data distribution.
Ratekeeper monitors storage servers, which borrows the idea from
DataDistribution.
2019-03-07 13:16:20 -08:00
Andrew Noyes
27d199409e
Add KillRegion.actor.cpp workload to cmake
2019-03-07 12:14:42 -08:00
Balachandar Namasivayam
9e4c780baa
Merge pull request #1249 from xumengpanda/mengxu/status/teamcollection-info
...
Status:healthy: Add optimizing_team_collections
2019-03-07 11:44:24 -08:00
Balachandar Namasivayam
f3391ea413
Merge pull request #1240 from satherton/feature-restore-by-timestamp
...
Restore by timestamp
2019-03-06 16:21:06 -08:00
Vishesh Yadav
ed49d603a0
Allows cluster string to contain coordinators with different TLS states
...
During live TLS upgrades, we can hence switch one coordinator at a time
to TLS than all of them together.
2019-03-06 16:05:10 -08:00
Meng Xu
845f8fdcbc
Status:healthy: Add optimizing_team_collections
...
Change removing_redundant_teams status name to
optimizing_team_collections.
The new name is more general and can be applied in the future
when we switch storage engines.
2019-03-06 15:05:23 -08:00
Meng Xu
04880e3d4d
Merge branch 'master' into mengxu/tls-switch-status-PR
2019-03-06 13:41:16 -08:00
Stephen Atherton
7778112f6a
Bug fix, restore was using the destination cluster to look up timestamps when printing the backup description instead of (optionally) the original cluster which generated the backup. Made missing cluster file errors more clear.
2019-03-06 02:45:55 -08:00
Alex Miller
c6a65389ae
Remove noexcept macro and replace with BOOST_NOEXCEPT.
...
BOOST_NOEXCEPT does what the noexcept macro was supposed to do, but in a
way that is correctly maintained over time.
2019-03-05 22:06:12 -08:00
Alex Miller
af617d68e6
boost 1.52.0 -> 1.67.0 in all vcxproj files
2019-03-05 22:06:12 -08:00
Meng Xu
820548223a
Status: connected_coordinators misc minor changes
...
Change the rst document file;
Change the coding style to be consistent with the nearby code;
Ensure we always initilize the connectedCoordinatesNum to 0
even when the variable is not used.
2019-03-05 21:45:18 -08:00
Meng Xu
b7a52e81e2
Status: Count connected coordinators per client
...
A client will always try to connect all coordinators.
This commit let Status track the number of connected coordinators
for each client.
This allows us to do canary in coordinators. For example,
when we switch from non-TLS to TLS, we can switch 1 coordinator
from non-TLS to TLS. This can help check if a client has the ability
to connect through TLS.
We can make the non-TLS to TLS switch for each coordinators
one by one. This avoid the risk of losing connection in the switch.
2019-03-05 21:21:23 -08:00
Alex Miller
ad0aca21b5
Update fdbserver/fdbserver.vcxproj
...
Co-Authored-By: atn34 <anoyes34@gmail.com>
2019-03-05 18:03:57 -08:00
anoyes
981426bac9
More ide fixes
2019-03-05 18:03:57 -08:00
Evan Tschannen
82d957e0bb
Merge pull request #1178 from vishesh/task/issue-963-IPv6
...
IPv6 Support
2019-03-05 17:14:16 -08:00
Vishesh Yadav
a9562f61be
fix: missing argument to printf in fdbserver
2019-03-05 14:03:09 -08:00
Steve Atherton
21f55e1878
Merge pull request #1190 from bnamasivayam/restore-multiple-ranges
...
Add support for restoring multiple ranges.
2019-03-05 10:15:55 -08:00
Evan Tschannen
f1897f3eb6
Merge branch 'master' into feature-metadata-version
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
2019-03-04 21:06:16 -08:00
Evan Tschannen
69d7633d5b
Merge pull request #1217 from alexmiller-apple/tstlog-goodref
...
Spill-By-Reference TLog Part 4: Actually Usable Reference Spilling
2019-03-04 20:58:24 -08:00
Evan Tschannen
3d196c9e97
fix: metadataVersionKey log range removal needs to be checked for each logDestination
2019-03-04 20:56:31 -08:00
Evan Tschannen
988add9fb5
cache the metadataVersion for commits, so that doing setVersion with the commit version of a different transaction will
2019-03-04 16:48:34 -08:00
Meng Xu
c0535c49bb
Status: TLS client status
...
Use ClientStatusInfo structure for each network address (client),
instead of passing each status info as a parameter.
2019-03-04 16:35:10 -08:00
Trevor Clinkenbeard
89cbb77b4e
Merge branch 'master' of https://github.com/apple/foundationdb into lazily-fetch-health-metrics
2019-03-04 14:17:58 -08:00
Trevor Clinkenbeard
56ae46f89e
Client lazily fetches health metrics from proxies
2019-03-04 14:16:39 -08:00
Vishesh Yadav
e93cd0ff21
Add some checks and comments to IPv6 changes #963
2019-03-04 14:12:45 -08:00
Vishesh Yadav
592e224155
net: add/use formatIpPort to format IP:PORT pairs #963
2019-03-04 14:12:45 -08:00
Vishesh Yadav
cc9ad0e202
net: Use IPv6 in simulation testing #963
...
25% times we will use IPv6 addresses
2019-03-04 14:12:45 -08:00
Vishesh Yadav
57832e625d
net: Support IPv6 #963
...
- NetworkAddress now contains IPAddress object which can be either
IPv4 or IPv6 address. 128bits are used even for IPv4 addresses,
however only 32bits are used when using/serializing IPv4 address.
- ConnectPacket is updated to store IPv6 address. Backward compatible
with old format since the first 32bits of IP address field is used
for serialization of IPv4.
- Mainly updates rest of the code to use IPAddress structure instead
of plain uint32_t.
- IPv6 address/pair ports should be represented as `[ip]:port` as per
convention. This applies to both cluster files and command line
arguments.
2019-03-04 14:12:41 -08:00
Alex Miller
baa3e1af2c
Replace `/sizeof(Page)*sizeof(Page)` with `pageFloor()`.
2019-03-04 01:42:39 -08:00
Alex Miller
ee64b43366
Change DQ shrink logic to consider "active" bytes rather than file size.
...
We know what the current ideal size of the DQ file should be, so we
should use it.
2019-03-04 01:42:39 -08:00
Alex Miller
244903a9de
Spill txsTag by value under TagMsg/ and not TagMsgRef/
...
There's not a tremendous reason as to why this matters now, but I feel
like I might regret sometime later not keeping the same schema under the
same key.
2019-03-04 01:42:39 -08:00
Alex Miller
72c2cf11ab
Replace ResourceLimiter with FlowLock.
2019-03-04 01:42:38 -08:00
Alex Miller
94bf75cb00
Allow the disk queue to shrink if it has unneeded slack space.
2019-03-04 01:42:38 -08:00
Alex Miller
52d5a721a6
Don't allocate 2x the memory for a read to save 1% of allocated memory.
2019-03-04 01:42:38 -08:00
Alex Miller
aff9ebe21a
Spill (start,length) instead of (begin,end) to save a few bytes.
2019-03-04 01:42:38 -08:00
Alex Miller
2aa527c0ef
Fix a bug resulting from concurrent TLog changes.
...
TLogServer was forked into OldTLogServer_6_0 at the same time that
3247d594
modified TLogServer, so the modification never made it into
OldTLogServer_6_0, resulting in a rare failure.
Manual code inspection revealed that there was also
78976161
that concurrently modified TLogServer, so that change was
copied to OldTLogServer_6_0 as well.
2019-03-04 01:42:38 -08:00
Alex Miller
fb4cb8c3a8
Print out configuration changes in ConfigureTest.
2019-03-04 01:42:38 -08:00
Alex Miller
9ef283d4e7
Implement hard limiting of memory used to serve peek requests.
2019-03-04 01:42:38 -08:00
Alex Miller
e3506ad9af
Add a yield to parseMessagesForTag
2019-03-04 01:42:38 -08:00
Alex Miller
742f6e1847
Solve overreading via pre-calculating tag bytes per commit
2019-03-04 01:42:38 -08:00
Alex Miller
e7d8520c63
Batch more when spilling data.
2019-03-04 01:42:38 -08:00
Alex Miller
71a794ccc3
Re-enable spill-by-reference testing.
2019-03-04 01:42:38 -08:00
Alex Miller
04e1170c88
Spill txsTag by value
2019-03-04 01:42:38 -08:00
Alex Miller
4d4e0a1d54
Fix the build on -O0.
...
C++ < 17 requires definitions of declared static constexpr variables.
2019-03-04 01:42:38 -08:00
Alex Miller
db546af4a3
Fix the build on -O0.
...
C++ < 17 requires definitions of declared static constexpr variables.
2019-03-04 01:38:58 -08:00
Evan Tschannen
075fdef31a
Merge branch 'master' into feature-metadata-version
...
# Conflicts:
# fdbclient/DatabaseContext.h
2019-03-03 22:58:45 -08:00
Evan Tschannen
057ebe56e4
fix: unknownCommit handling relied on soleOwnership of the version stamp keys, so we need to use a second key to track the commit version for the metadataVersionKey
...
renamed a confusing option
2019-03-03 21:31:40 -08:00