Trevor Clinkenbeard
39f612d132
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-03-02 17:07:00 -08:00
Trevor Clinkenbeard
3f59f82670
Calculate durabilityLag instead of NDV for health metrics
2019-02-27 16:30:01 -08:00
A.J. Beamon
a051055caf
Initial implementation of adding separate limits for batch priority in ratekeeper
2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard
abfe057805
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-25 13:47:16 -08:00
Trevor Clinkenbeard
78aad255b0
updateProcessStats gets process stats directly
...
Storage servers no longer parse ProcessMetrics trace lines to get their
own cpuUsage and diskUsage statistics.
2019-02-25 13:45:53 -08:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen
d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
...
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Trevor Clinkenbeard
fa96b8dd33
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-20 16:56:16 -08:00
Meng Xu
da36c13914
StorageServer reboot: Fix the bug
...
To avoid brokenPromise error, which is caused by the sender of the durableInProgress (i.e., this process)
never sets durableInProgress, we should set durableInProgress before send the please_reboot() error.
Otherwise, in the race situation when storage server receives both reboot and
brokenPromise of durableInProgress, the worker of the storage server will die.
We will eventually end up with no worker for storage server role.
The data distributor's buildTeam() will get stuck in building a team
2019-02-19 16:56:16 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
27a3153719
Use ACTOR forward declarations in MoveKeys
...
Also MoveKeys.h -> MoveKeys.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3a0f9839b9
Fix minor IDE build errors
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen
065a45e05f
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen
3247d59498
partially restored an optimization on remote storage servers where a behind storage server will keep less data in memory. This optimization was fully maintained on the primary storage servers, but remote storage servers can only use a version which is known to be durable on all remote transaction logs
2019-02-18 16:47:38 -08:00
Evan Tschannen
6cf4b416f7
move the KillRegionCycle workload into the fast directory
2019-02-18 15:43:01 -08:00
Evan Tschannen
4bdada1494
fixed a broken_promise when please_reboot() is thrown and someone is waiting on durableInProgress
2019-02-18 15:26:13 -08:00
Evan Tschannen
4c35ebdcc6
fix: because of forced recoveries, storage servers in remote regions cannot update their durable version to (lastLogVersion - 5e6), because the lastLogVersion might have jumped due to an epoch end and the recovery version after the forced recovery could be before the epoch end, causing the storage server to want to rollback to a version it does not have on disk
2019-02-18 14:40:30 -08:00
Evan Tschannen
05ca0a10d8
fix: kill all storage servers which are not in the safe locality after a forced recovery
2019-02-18 14:30:51 -08:00
A.J. Beamon
b435d51061
Merge branch 'master' into track-server-request-latencies
2019-02-14 08:07:32 -08:00
A.J. Beamon
d4349293b9
Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing.
2019-02-07 13:39:22 -08:00
Vishesh Yadav
5985566a8e
Don't issue a second read in storageserver if possible for CompareAndClear
...
If the previous eager read request is equal to the CompareAndClear op
key, do not issue a read again.
2019-02-04 16:10:59 -08:00
Vishesh Yadav
c532d5c277
Implements CompareAndClear AtomicOp
...
Adds CompareAndClear mutation. If the given parameter is equal to the
current value of the key, the key is cleared. At client, the mutation
is added to the operation stack. Hence if the mutation evaluates to
clear, we only get to know so when `read()` evaluates the stack in
`RYWIterator::kv()`, which is unlike what we currently do for typical
ClearRange.
2019-02-04 14:59:56 -08:00
Trevor Clinkenbeard
93d4ed6339
Fixed typo that was messing up storage server diskUsage calculation
2019-02-02 21:04:29 -08:00
Trevor Clinkenbeard
d7930af2cb
Storage server periodically calculates cpuUsage and diskUsage metrics. These metrics (as well as all other metrics necessary for health metrics calculation) are sent in the StorageQueuingMetricsReply message.
2019-01-31 12:23:04 -08:00
A.J. Beamon
2198d24ce1
Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
...
# Conflicts:
# fdbclient/MasterProxyInterface.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/ServerDBInfo.h
# fdbserver/Status.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon
8e05e95045
Added the ability to configure the latency band settings by setting a special key in \xff keyspace.
2019-01-18 16:18:34 -08:00
A.J. Beamon
7498c2308c
Merge branch 'release-6.0' into track-server-request-latencies
2019-01-16 13:39:01 -08:00
Evan Tschannen
684a22a52b
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/BackupContainer.actor.cpp
# fdbclient/HTTP.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/BackupCorrectness.actor.cpp
# versions.target
2019-01-09 16:14:46 -08:00
A.J. Beamon
d265517156
Fix: fast allocator would not cleanup memory for a thread if that thread never called getMagazine. This could happen if the first thing the thread did was to release memory.
...
Added a new metric for the number of threads that hold memory for each size and improve some existing metrics.
Fix: a failed ASSERT would crash if done early in the program lifetime.
2019-01-08 14:36:01 -08:00
Evan Tschannen
57293a2db0
byte sample recovery did not use limits for its range reads, leading to slow tasks
2019-01-04 10:32:31 -08:00
A.J. Beamon
eb2f27b8e5
Work in progress implementation of server-side latency tracking. The intent of this is to be able to measure the number of requests that achieve certain latency targets across the system relative to the total number of requests.
2018-11-30 10:46:04 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
19ae063b66
fix: storage servers need to be rebooted when increasing replication so that clients become aware that new options are available
2018-11-08 15:44:03 -08:00
Stephen Atherton
ade75ac692
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/worker.actor.cpp
2018-11-07 11:43:54 -08:00
Evan Tschannen
bf6545a9cf
clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
...
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Evan Tschannen
3b97f5a899
fix: the storage server still has to pop old tags, even if it does not need any data from them
2018-11-02 13:10:14 -07:00
Stephen Atherton
93501947f8
Process_behind should also be sent to clients.
2018-10-24 20:38:15 -07:00
Stephen Atherton
f17cc1e20f
StorageServer will no longer send io_error or other inappropriate errors to a client (this would never happen on SQLite). Many bug fixes around error handling, initialization, and shutdown in Redwood. StorageServer now calls init() on its underlying storage engine.
2018-10-24 15:57:06 -07:00
Evan Tschannen
6bb2ba5d92
Merge commit '2a34115e65639b7aad368a148de3c4189bc34bfc'
...
# Conflicts:
# fdbserver/storageserver.actor.cpp
# fdbserver/worker.actor.cpp
2018-10-23 17:05:42 -07:00
Clement Pang
3a30621071
Merge branch 'release-6.0' into memory-fast-rollback
2018-10-21 12:48:46 +09:00
Clement Pang
3ceec01392
Address review comments.
2018-10-19 18:55:35 -07:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
db71b60d72
Merge pull request #819 from satherton/feature-redwood
...
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen
ed7036139a
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
2018-10-18 17:00:52 -07:00
Evan Tschannen
952b26d746
removed upgrade code from 2.0.3
2018-10-18 16:53:40 -07:00
Evan Tschannen
0613a34845
The storage server would block the main thread when processing a single version with a large amount of data
2018-10-18 13:37:31 -07:00
Evan Tschannen
0217aed74c
Merge branch 'release-6.0'
...
# Conflicts:
# bindings/go/README.md
# documentation/sphinx/source/release-notes.rst
# fdbserver/MasterProxyServer.actor.cpp
# versions.target
2018-10-15 18:38:51 -07:00
Evan Tschannen
d8dc8e83b9
Do not rollback while uncommitted sets exist
2018-10-15 15:09:12 -07:00
Clement Pang
88e8422511
Per etschannen, wait on durable for reboots
2018-10-10 17:42:40 -07:00