Alex Miller
455bf3b3ec
Revert "Make protocol version a type"
2019-06-18 10:59:17 -07:00
mpilman
da53a92bec
Make protocol version a type
...
This fixes #1214
The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
Evan Tschannen
20e3edeb0a
Merge branch 'release-6.1'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/storageserver.actor.cpp
# versions.target
2019-06-14 12:42:59 -07:00
Evan Tschannen
924f92e5aa
Prevent the byte sample recovery from interfering with storage server recovery
2019-06-13 15:55:25 -07:00
Evan Tschannen
55f7e7d372
fix: The delay inside the disabledMap was causing the storage server updateStorage actor to run on the client process
2019-06-13 14:28:30 -07:00
Evan Tschannen
dccb9bc26d
fixed a number of correctness problems
2019-06-12 19:40:50 -07:00
Trevor Clinkenbeard
1e8f7e5b82
Refactor NextFastAllocatedSize to be constexpr function
2019-06-11 15:55:23 -07:00
Andrew Noyes
bc03421d05
Open Database as switchable only for client
2019-06-11 13:58:22 -07:00
Andrew Noyes
02e173b601
Add changeConnectionFile method to Transaction
...
Also add tests
2019-06-11 13:58:22 -07:00
Trevor Clinkenbeard
cb420ea4bd
Only construct waitDescription in simulator
2019-06-11 12:43:39 -07:00
Trevor Clinkenbeard
8144882d7b
Merge branch 'apple-master' into features/local-rk
2019-06-10 19:40:25 -07:00
Trevor Clinkenbeard
8dbb231f33
Don't reject read requests until the storage server durability lag gets large enough
2019-06-05 15:42:58 -07:00
Trevor Clinkenbeard
d1d98f298a
Changed storage server getPenalty calculation.
...
Penalty should always be >= 1.0
2019-06-05 14:14:40 -07:00
sramamoorthy
4083af0b01
Avoid using trackLatest for TLog pop test cases
2019-05-28 22:07:46 -07:00
sramamoorthy
ec7834e2f7
code re-orgnaization and address comments
2019-05-28 22:07:46 -07:00
sramamoorthy
b6e037ffbc
Replace fork with boost::process::child
2019-05-28 22:07:46 -07:00
sramamoorthy
61e93a9304
Address review comments and minor fixes
2019-05-28 22:07:46 -07:00
sramamoorthy
9e3104c2d4
Fix: races in async exec leading to bad backup
2019-05-28 22:07:46 -07:00
sramamoorthy
090bb53034
ShardInfo::addMutation to handle exec mutation
2019-05-28 22:07:46 -07:00
sramamoorthy
cfdad0c5e6
tlog to snapshot exactly at exec version
2019-05-28 22:07:46 -07:00
sramamoorthy
17ecba8313
trace cleanup and other indentation changes
2019-05-28 22:07:46 -07:00
sramamoorthy
aa79480d69
changes to make fdbfork asynchronous
2019-05-28 22:07:46 -07:00
sramamoorthy
72dd067173
Trace message changes and fix few FIXMEs
2019-05-28 22:07:46 -07:00
sramamoorthy
69edefe68b
Snapshot based backup and resotre implementation
2019-05-28 22:07:46 -07:00
A.J. Beamon
603721e125
Merge branch 'master' into thread-safe-random-number-generation
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbrpc/AsyncFileCached.actor.h
# fdbrpc/genericactors.actor.cpp
# fdbrpc/sim2.actor.cpp
# fdbserver/DiskQueue.actor.cpp
# fdbserver/workloads/BulkSetup.actor.h
# flow/ActorCollection.actor.cpp
# flow/Net2.actor.cpp
# flow/Trace.cpp
# flow/flow.cpp
2019-05-23 08:35:47 -07:00
Stephen Atherton
ebc96a7e0e
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/VersionedBTree.actor.cpp
2019-05-21 23:49:27 -07:00
A.J. Beamon
a8b9d8e34b
Merge pull request #1336 from tclinken/fast-allocate-ptree-nodes
...
Create 96-byte fast allocator for storage queue PTree nodes
2019-05-17 14:22:46 -07:00
Jingyu Zhou
b8e7fc1b84
Refactor: add std:: qualifier and use emplace_back
2019-05-17 09:38:50 -10:00
A.J. Beamon
5f55f3f613
Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used.
2019-05-10 14:01:52 -07:00
Austin Seipp
bf378952cb
fdbserver: fix some print/scan format warnings
...
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Trevor Clinkenbeard
d339becd7c
Fix currentRate calculation for local ratekeeper
2019-04-25 15:35:34 -07:00
Meng Xu
529ce66b6c
Merge branch 'apple/master' into mengxu/performant-restore-PR
2019-04-18 18:02:45 -07:00
Andrew Noyes
ef04471a66
Fix more unused-variable warnings
2019-04-17 16:04:10 -07:00
Trevor Clinkenbeard
3426205167
Fixed readGuard usage bug
2019-04-16 15:05:57 -07:00
Trevor Clinkenbeard
1d921da170
readGuard sends server_overloaded error if request is rejected
2019-04-16 11:29:01 -07:00
Trevor Clinkenbeard
0594154644
Fixed getPenalty calculation
2019-04-16 10:17:41 -07:00
Evan Tschannen
cd5c9d91fa
Merge pull request #1443 from etschannen/master
...
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
A.J. Beamon
538b431656
Apply suggestions from code review
2019-04-08 14:55:58 -07:00
A.J. Beamon
a7288e1325
Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate.
2019-04-08 14:21:24 -07:00
mpilman
d2e74cb2c0
Fix stupid rounding error
2019-04-08 11:05:29 -07:00
mpilman
aaa8f73bdc
fixed missing refactoring code
2019-04-08 11:05:29 -07:00
mpilman
bdba8e22eb
Added test and bugfixes
2019-04-08 11:05:29 -07:00
mpilman
b944e0b116
generalized read guards, allow for penalty+error
2019-04-08 11:04:44 -07:00
mpilman
32393ec4c9
Prototype of local ratekeeper
2019-04-08 11:04:44 -07:00
mpilman
d01cbf3455
Addressed code review comments
2019-04-05 13:12:20 -07:00
mpilman
1c16f87a4e
Remove trace-calls to printable (in non-workloads)
2019-04-05 13:12:19 -07:00
Meng Xu
70d7c289f4
Merge branch 'master' into mengxu/restore/parallel-v7
2019-03-30 22:13:10 -07:00
Stephen Atherton
d5c8b6b083
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/VersionedBTree.actor.cpp
# flow/flow.h
2019-03-27 13:37:15 -07:00
Trevor Clinkenbeard
007abbc45b
Added 96-byte FastAllocator
...
Since storage queue nodes account for a large portion of memory usage,
we can save space by only allocating 96 bytes instead of 128 bytes for
each node.
2019-03-25 13:44:39 -07:00
Evan Tschannen
e837e7572c
fix: after a rollback, uncommitted changes to the byte sample could be missed
2019-03-20 19:33:09 -07:00
Trevor Clinkenbeard
39f612d132
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-03-02 17:07:00 -08:00
Trevor Clinkenbeard
3f59f82670
Calculate durabilityLag instead of NDV for health metrics
2019-02-27 16:30:01 -08:00
A.J. Beamon
a051055caf
Initial implementation of adding separate limits for batch priority in ratekeeper
2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard
abfe057805
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-25 13:47:16 -08:00
Trevor Clinkenbeard
78aad255b0
updateProcessStats gets process stats directly
...
Storage servers no longer parse ProcessMetrics trace lines to get their
own cpuUsage and diskUsage statistics.
2019-02-25 13:45:53 -08:00
Evan Tschannen
b8910ba7cd
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.h
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen
d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
...
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Trevor Clinkenbeard
fa96b8dd33
Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics
2019-02-20 16:56:16 -08:00
Meng Xu
da36c13914
StorageServer reboot: Fix the bug
...
To avoid brokenPromise error, which is caused by the sender of the durableInProgress (i.e., this process)
never sets durableInProgress, we should set durableInProgress before send the please_reboot() error.
Otherwise, in the race situation when storage server receives both reboot and
brokenPromise of durableInProgress, the worker of the storage server will die.
We will eventually end up with no worker for storage server role.
The data distributor's buildTeam() will get stuck in building a team
2019-02-19 16:56:16 -08:00
mpilman
3f0fd2a20c
Use fwd decls in WorkerInterface
...
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman
27a3153719
Use ACTOR forward declarations in MoveKeys
...
Also MoveKeys.h -> MoveKeys.actor.h
2019-02-19 15:16:59 -08:00
mpilman
3a0f9839b9
Fix minor IDE build errors
2019-02-19 15:16:59 -08:00
mpilman
0bb60e5a3b
Use proper fwd decl in NativeAPI
...
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen
065a45e05f
Merge branch 'master' into feature-fix-force-recovery
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/ClusterController.actor.cpp
# fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen
3247d59498
partially restored an optimization on remote storage servers where a behind storage server will keep less data in memory. This optimization was fully maintained on the primary storage servers, but remote storage servers can only use a version which is known to be durable on all remote transaction logs
2019-02-18 16:47:38 -08:00
Evan Tschannen
6cf4b416f7
move the KillRegionCycle workload into the fast directory
2019-02-18 15:43:01 -08:00
Evan Tschannen
4bdada1494
fixed a broken_promise when please_reboot() is thrown and someone is waiting on durableInProgress
2019-02-18 15:26:13 -08:00
Evan Tschannen
4c35ebdcc6
fix: because of forced recoveries, storage servers in remote regions cannot update their durable version to (lastLogVersion - 5e6), because the lastLogVersion might have jumped due to an epoch end and the recovery version after the forced recovery could be before the epoch end, causing the storage server to want to rollback to a version it does not have on disk
2019-02-18 14:40:30 -08:00
Evan Tschannen
05ca0a10d8
fix: kill all storage servers which are not in the safe locality after a forced recovery
2019-02-18 14:30:51 -08:00
A.J. Beamon
b435d51061
Merge branch 'master' into track-server-request-latencies
2019-02-14 08:07:32 -08:00
A.J. Beamon
d4349293b9
Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing.
2019-02-07 13:39:22 -08:00
Vishesh Yadav
5985566a8e
Don't issue a second read in storageserver if possible for CompareAndClear
...
If the previous eager read request is equal to the CompareAndClear op
key, do not issue a read again.
2019-02-04 16:10:59 -08:00
Vishesh Yadav
c532d5c277
Implements CompareAndClear AtomicOp
...
Adds CompareAndClear mutation. If the given parameter is equal to the
current value of the key, the key is cleared. At client, the mutation
is added to the operation stack. Hence if the mutation evaluates to
clear, we only get to know so when `read()` evaluates the stack in
`RYWIterator::kv()`, which is unlike what we currently do for typical
ClearRange.
2019-02-04 14:59:56 -08:00
Trevor Clinkenbeard
93d4ed6339
Fixed typo that was messing up storage server diskUsage calculation
2019-02-02 21:04:29 -08:00
Trevor Clinkenbeard
d7930af2cb
Storage server periodically calculates cpuUsage and diskUsage metrics. These metrics (as well as all other metrics necessary for health metrics calculation) are sent in the StorageQueuingMetricsReply message.
2019-01-31 12:23:04 -08:00
Meng Xu
550f2e2682
Merge with master to use the latest backup container
...
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
A.J. Beamon
2198d24ce1
Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
...
# Conflicts:
# fdbclient/MasterProxyInterface.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/ServerDBInfo.h
# fdbserver/Status.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon
8e05e95045
Added the ability to configure the latency band settings by setting a special key in \xff keyspace.
2019-01-18 16:18:34 -08:00
A.J. Beamon
7498c2308c
Merge branch 'release-6.0' into track-server-request-latencies
2019-01-16 13:39:01 -08:00
Evan Tschannen
684a22a52b
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbbackup/backup.actor.cpp
# fdbclient/BackupContainer.actor.cpp
# fdbclient/HTTP.actor.cpp
# fdbserver/storageserver.actor.cpp
# fdbserver/workloads/BackupCorrectness.actor.cpp
# versions.target
2019-01-09 16:14:46 -08:00
A.J. Beamon
d265517156
Fix: fast allocator would not cleanup memory for a thread if that thread never called getMagazine. This could happen if the first thing the thread did was to release memory.
...
Added a new metric for the number of threads that hold memory for each size and improve some existing metrics.
Fix: a failed ASSERT would crash if done early in the program lifetime.
2019-01-08 14:36:01 -08:00
Evan Tschannen
57293a2db0
byte sample recovery did not use limits for its range reads, leading to slow tasks
2019-01-04 10:32:31 -08:00
Stephen Atherton
534e50de71
Added counters for page writes, sets, and commits.
2018-12-05 22:41:04 -08:00
Meng Xu
bc18110693
Wip: why correctness fail but local runs
2018-12-05 10:07:43 -08:00
A.J. Beamon
eb2f27b8e5
Work in progress implementation of server-side latency tracking. The intent of this is to be able to measure the number of requests that achieve certain latency targets across the system relative to the total number of requests.
2018-11-30 10:46:04 -08:00
Evan Tschannen
4b5d0b4e2c
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbclient/AsyncFileBlobStore.actor.cpp
# fdbclient/AsyncFileBlobStore.actor.h
# fdbclient/BlobStore.actor.cpp
# fdbclient/BlobStore.h
# fdbclient/HTTP.actor.cpp
# fdbclient/ManagementAPI.actor.cpp
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/LoadBalance.actor.h
# fdbrpc/batcher.actor.h
# fdbrpc/fdbrpc.vcxproj
# fdbrpc/sim2.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/DataDistributionTracker.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen
19ae063b66
fix: storage servers need to be rebooted when increasing replication so that clients become aware that new options are available
2018-11-08 15:44:03 -08:00
Stephen Atherton
ade75ac692
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/worker.actor.cpp
2018-11-07 11:43:54 -08:00
Evan Tschannen
bf6545a9cf
clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
...
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Evan Tschannen
3b97f5a899
fix: the storage server still has to pop old tags, even if it does not need any data from them
2018-11-02 13:10:14 -07:00
Stephen Atherton
93501947f8
Process_behind should also be sent to clients.
2018-10-24 20:38:15 -07:00
Stephen Atherton
f17cc1e20f
StorageServer will no longer send io_error or other inappropriate errors to a client (this would never happen on SQLite). Many bug fixes around error handling, initialization, and shutdown in Redwood. StorageServer now calls init() on its underlying storage engine.
2018-10-24 15:57:06 -07:00
Evan Tschannen
6bb2ba5d92
Merge commit '2a34115e65639b7aad368a148de3c4189bc34bfc'
...
# Conflicts:
# fdbserver/storageserver.actor.cpp
# fdbserver/worker.actor.cpp
2018-10-23 17:05:42 -07:00
Clement Pang
3a30621071
Merge branch 'release-6.0' into memory-fast-rollback
2018-10-21 12:48:46 +09:00
Clement Pang
3ceec01392
Address review comments.
2018-10-19 18:55:35 -07:00
Robert Escriva
268093a96d
Adjust all includes to be relative to the root.
...
Remove the use of relative paths. A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h". Adjust so that every include references such a header with the
latter form.
Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen
db71b60d72
Merge pull request #819 from satherton/feature-redwood
...
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen
ed7036139a
Merge branch 'release-6.0'
...
# Conflicts:
# documentation/sphinx/source/release-notes.rst
# fdbserver/DataDistribution.actor.cpp
# fdbserver/storageserver.actor.cpp
2018-10-18 17:00:52 -07:00
Evan Tschannen
952b26d746
removed upgrade code from 2.0.3
2018-10-18 16:53:40 -07:00
Evan Tschannen
0613a34845
The storage server would block the main thread when processing a single version with a large amount of data
2018-10-18 13:37:31 -07:00