Commit Graph

237 Commits

Author SHA1 Message Date
Alex Miller 455bf3b3ec Revert "Make protocol version a type" 2019-06-18 10:59:17 -07:00
mpilman da53a92bec Make protocol version a type
This fixes #1214

The basic idea is that ProtocolVersion is now its own type. This
alone is an improvement as it makes many things more typesafe. For
each version, we can now add breaking features (for example Fearless).
After that, there's no need to test against actual (confusing) version
numbers. Instead a developer can simply test
`protocolVersion->hasFearless()` and this will return true iff the
protocolVersion is newer than the newest version that didn't support
fearless.
2019-06-16 09:59:15 -07:00
Evan Tschannen 20e3edeb0a Merge branch 'release-6.1'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/storageserver.actor.cpp
#	versions.target
2019-06-14 12:42:59 -07:00
Evan Tschannen 924f92e5aa Prevent the byte sample recovery from interfering with storage server recovery 2019-06-13 15:55:25 -07:00
Evan Tschannen 55f7e7d372 fix: The delay inside the disabledMap was causing the storage server updateStorage actor to run on the client process 2019-06-13 14:28:30 -07:00
Evan Tschannen dccb9bc26d fixed a number of correctness problems 2019-06-12 19:40:50 -07:00
Trevor Clinkenbeard 1e8f7e5b82 Refactor NextFastAllocatedSize to be constexpr function 2019-06-11 15:55:23 -07:00
Andrew Noyes bc03421d05 Open Database as switchable only for client 2019-06-11 13:58:22 -07:00
Andrew Noyes 02e173b601 Add changeConnectionFile method to Transaction
Also add tests
2019-06-11 13:58:22 -07:00
Trevor Clinkenbeard cb420ea4bd Only construct waitDescription in simulator 2019-06-11 12:43:39 -07:00
Trevor Clinkenbeard 8144882d7b Merge branch 'apple-master' into features/local-rk 2019-06-10 19:40:25 -07:00
Trevor Clinkenbeard 8dbb231f33 Don't reject read requests until the storage server durability lag gets large enough 2019-06-05 15:42:58 -07:00
Trevor Clinkenbeard d1d98f298a Changed storage server getPenalty calculation.
Penalty should always be >= 1.0
2019-06-05 14:14:40 -07:00
sramamoorthy 4083af0b01 Avoid using trackLatest for TLog pop test cases 2019-05-28 22:07:46 -07:00
sramamoorthy ec7834e2f7 code re-orgnaization and address comments 2019-05-28 22:07:46 -07:00
sramamoorthy b6e037ffbc Replace fork with boost::process::child 2019-05-28 22:07:46 -07:00
sramamoorthy 61e93a9304 Address review comments and minor fixes 2019-05-28 22:07:46 -07:00
sramamoorthy 9e3104c2d4 Fix: races in async exec leading to bad backup 2019-05-28 22:07:46 -07:00
sramamoorthy 090bb53034 ShardInfo::addMutation to handle exec mutation 2019-05-28 22:07:46 -07:00
sramamoorthy cfdad0c5e6 tlog to snapshot exactly at exec version 2019-05-28 22:07:46 -07:00
sramamoorthy 17ecba8313 trace cleanup and other indentation changes 2019-05-28 22:07:46 -07:00
sramamoorthy aa79480d69 changes to make fdbfork asynchronous 2019-05-28 22:07:46 -07:00
sramamoorthy 72dd067173 Trace message changes and fix few FIXMEs 2019-05-28 22:07:46 -07:00
sramamoorthy 69edefe68b Snapshot based backup and resotre implementation 2019-05-28 22:07:46 -07:00
A.J. Beamon 603721e125 Merge branch 'master' into thread-safe-random-number-generation
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbrpc/AsyncFileCached.actor.h
#	fdbrpc/genericactors.actor.cpp
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/workloads/BulkSetup.actor.h
#	flow/ActorCollection.actor.cpp
#	flow/Net2.actor.cpp
#	flow/Trace.cpp
#	flow/flow.cpp
2019-05-23 08:35:47 -07:00
Stephen Atherton ebc96a7e0e Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
2019-05-21 23:49:27 -07:00
A.J. Beamon a8b9d8e34b
Merge pull request #1336 from tclinken/fast-allocate-ptree-nodes
Create 96-byte fast allocator for storage queue PTree nodes
2019-05-17 14:22:46 -07:00
Jingyu Zhou b8e7fc1b84 Refactor: add std:: qualifier and use emplace_back 2019-05-17 09:38:50 -10:00
A.J. Beamon 5f55f3f613 Replace g_random and g_nondeterministic_random with functions deterministicRandom() and nondeterministicRandom() that return thread_local random number generators. Delete g_debug_random and trace_random. Allow only deterministicRandom() to be seeded, and require it to be seeded from each thread on which it is used. 2019-05-10 14:01:52 -07:00
Austin Seipp bf378952cb fdbserver: fix some print/scan format warnings
Signed-off-by: Austin Seipp <aseipp@pobox.com>
2019-05-06 13:35:29 -07:00
Trevor Clinkenbeard d339becd7c Fix currentRate calculation for local ratekeeper 2019-04-25 15:35:34 -07:00
Meng Xu 529ce66b6c Merge branch 'apple/master' into mengxu/performant-restore-PR 2019-04-18 18:02:45 -07:00
Andrew Noyes ef04471a66 Fix more unused-variable warnings 2019-04-17 16:04:10 -07:00
Trevor Clinkenbeard 3426205167 Fixed readGuard usage bug 2019-04-16 15:05:57 -07:00
Trevor Clinkenbeard 1d921da170 readGuard sends server_overloaded error if request is rejected 2019-04-16 11:29:01 -07:00
Trevor Clinkenbeard 0594154644 Fixed getPenalty calculation 2019-04-16 10:17:41 -07:00
Evan Tschannen cd5c9d91fa
Merge pull request #1443 from etschannen/master
Merge 6.1 into master
2019-04-10 17:43:07 -07:00
A.J. Beamon 538b431656 Apply suggestions from code review 2019-04-08 14:55:58 -07:00
A.J. Beamon a7288e1325 Throw process_behind instead of future_version when all storage nodes on a team are behind. process_behind gets the same backoff behavior as not_committed. Add proxy_memory_limit_exceeded to the retryable predicate. 2019-04-08 14:21:24 -07:00
mpilman d2e74cb2c0 Fix stupid rounding error 2019-04-08 11:05:29 -07:00
mpilman aaa8f73bdc fixed missing refactoring code 2019-04-08 11:05:29 -07:00
mpilman bdba8e22eb Added test and bugfixes 2019-04-08 11:05:29 -07:00
mpilman b944e0b116 generalized read guards, allow for penalty+error 2019-04-08 11:04:44 -07:00
mpilman 32393ec4c9 Prototype of local ratekeeper 2019-04-08 11:04:44 -07:00
mpilman d01cbf3455 Addressed code review comments 2019-04-05 13:12:20 -07:00
mpilman 1c16f87a4e Remove trace-calls to printable (in non-workloads) 2019-04-05 13:12:19 -07:00
Meng Xu 70d7c289f4 Merge branch 'master' into mengxu/restore/parallel-v7 2019-03-30 22:13:10 -07:00
Stephen Atherton d5c8b6b083 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/VersionedBTree.actor.cpp
#	flow/flow.h
2019-03-27 13:37:15 -07:00
Trevor Clinkenbeard 007abbc45b Added 96-byte FastAllocator
Since storage queue nodes account for a large portion of memory usage,
we can save space by only allocating 96 bytes instead of 128 bytes for
each node.
2019-03-25 13:44:39 -07:00
Evan Tschannen e837e7572c fix: after a rollback, uncommitted changes to the byte sample could be missed 2019-03-20 19:33:09 -07:00
Trevor Clinkenbeard 39f612d132 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-03-02 17:07:00 -08:00
Trevor Clinkenbeard 3f59f82670 Calculate durabilityLag instead of NDV for health metrics 2019-02-27 16:30:01 -08:00
A.J. Beamon a051055caf Initial implementation of adding separate limits for batch priority in ratekeeper 2019-02-27 10:31:56 -08:00
Trevor Clinkenbeard abfe057805 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-02-25 13:47:16 -08:00
Trevor Clinkenbeard 78aad255b0 updateProcessStats gets process stats directly
Storage servers no longer parse ProcessMetrics trace lines to get their
own cpuUsage and diskUsage statistics.
2019-02-25 13:45:53 -08:00
Evan Tschannen b8910ba7cd Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.h
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-22 14:38:13 -08:00
Evan Tschannen d008de576e
Merge pull request #1139 from xumengpanda/mengxu/machine-team-upgrade-PR
Add background actor to remove redundant teams
2019-02-22 14:22:07 -08:00
Trevor Clinkenbeard fa96b8dd33 Merge branch 'master' of https://github.com/apple/foundationdb into add-health-metrics 2019-02-20 16:56:16 -08:00
Meng Xu da36c13914 StorageServer reboot: Fix the bug
To avoid brokenPromise error, which is caused by the sender of the durableInProgress (i.e., this process)
never sets durableInProgress, we should set durableInProgress before send the please_reboot() error.
Otherwise, in the race situation when storage server receives both reboot and
brokenPromise of durableInProgress, the worker of the storage server will die.
We will eventually end up with no worker for storage server role.
The data distributor's buildTeam() will get stuck in building a team
2019-02-19 16:56:16 -08:00
mpilman 3f0fd2a20c Use fwd decls in WorkerInterface
Also WorkerInterface.h -> WorkerInterface.actor.h
2019-02-19 15:16:59 -08:00
mpilman 27a3153719 Use ACTOR forward declarations in MoveKeys
Also MoveKeys.h -> MoveKeys.actor.h
2019-02-19 15:16:59 -08:00
mpilman 3a0f9839b9 Fix minor IDE build errors 2019-02-19 15:16:59 -08:00
mpilman 0bb60e5a3b Use proper fwd decl in NativeAPI
Also NativeAPI.h -> NativeAPI.actor.h
2019-02-19 15:16:59 -08:00
Evan Tschannen 065a45e05f Merge branch 'master' into feature-fix-force-recovery
# Conflicts:
#	fdbclient/ManagementAPI.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/workloads/KillRegion.actor.cpp
2019-02-18 17:09:06 -08:00
Evan Tschannen 3247d59498 partially restored an optimization on remote storage servers where a behind storage server will keep less data in memory. This optimization was fully maintained on the primary storage servers, but remote storage servers can only use a version which is known to be durable on all remote transaction logs 2019-02-18 16:47:38 -08:00
Evan Tschannen 6cf4b416f7 move the KillRegionCycle workload into the fast directory 2019-02-18 15:43:01 -08:00
Evan Tschannen 4bdada1494 fixed a broken_promise when please_reboot() is thrown and someone is waiting on durableInProgress 2019-02-18 15:26:13 -08:00
Evan Tschannen 4c35ebdcc6 fix: because of forced recoveries, storage servers in remote regions cannot update their durable version to (lastLogVersion - 5e6), because the lastLogVersion might have jumped due to an epoch end and the recovery version after the forced recovery could be before the epoch end, causing the storage server to want to rollback to a version it does not have on disk 2019-02-18 14:40:30 -08:00
Evan Tschannen 05ca0a10d8 fix: kill all storage servers which are not in the safe locality after a forced recovery 2019-02-18 14:30:51 -08:00
A.J. Beamon b435d51061 Merge branch 'master' into track-server-request-latencies 2019-02-14 08:07:32 -08:00
A.J. Beamon d4349293b9 Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing. 2019-02-07 13:39:22 -08:00
Vishesh Yadav 5985566a8e Don't issue a second read in storageserver if possible for CompareAndClear
If the previous eager read request is equal to the CompareAndClear op
key, do not issue a read again.
2019-02-04 16:10:59 -08:00
Vishesh Yadav c532d5c277 Implements CompareAndClear AtomicOp
Adds CompareAndClear mutation. If the given parameter is equal to the
current value of the key, the key is cleared. At client, the mutation
is added to the operation stack. Hence if the mutation evaluates to
clear, we only get to know so when `read()` evaluates the stack in
`RYWIterator::kv()`, which is unlike what we currently do for typical
ClearRange.
2019-02-04 14:59:56 -08:00
Trevor Clinkenbeard 93d4ed6339 Fixed typo that was messing up storage server diskUsage calculation 2019-02-02 21:04:29 -08:00
Trevor Clinkenbeard d7930af2cb Storage server periodically calculates cpuUsage and diskUsage metrics. These metrics (as well as all other metrics necessary for health metrics calculation) are sent in the StorageQueuingMetricsReply message. 2019-01-31 12:23:04 -08:00
Meng Xu 550f2e2682 Merge with master to use the latest backup container
In fdb 6.0.15, backup container is changed on how to organize the backup data.
The backup made by fdb >6.0.15 has to be restored with fdb > 6.0.15.
Merge with master so that the fast restore uses fdb > 6.0.15
2019-01-30 12:05:15 -08:00
A.J. Beamon 2198d24ce1 Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
# Conflicts:
#	fdbclient/MasterProxyInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/ServerDBInfo.h
#	fdbserver/Status.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
A.J. Beamon 8e05e95045 Added the ability to configure the latency band settings by setting a special key in \xff keyspace. 2019-01-18 16:18:34 -08:00
A.J. Beamon 7498c2308c Merge branch 'release-6.0' into track-server-request-latencies 2019-01-16 13:39:01 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00
A.J. Beamon d265517156 Fix: fast allocator would not cleanup memory for a thread if that thread never called getMagazine. This could happen if the first thing the thread did was to release memory.
Added a new metric for the number of threads that hold memory for each size and improve some existing metrics.
Fix: a failed ASSERT would crash if done early in the program lifetime.
2019-01-08 14:36:01 -08:00
Evan Tschannen 57293a2db0 byte sample recovery did not use limits for its range reads, leading to slow tasks 2019-01-04 10:32:31 -08:00
Stephen Atherton 534e50de71 Added counters for page writes, sets, and commits. 2018-12-05 22:41:04 -08:00
Meng Xu bc18110693 Wip: why correctness fail but local runs 2018-12-05 10:07:43 -08:00
A.J. Beamon eb2f27b8e5 Work in progress implementation of server-side latency tracking. The intent of this is to be able to measure the number of requests that achieve certain latency targets across the system relative to the total number of requests. 2018-11-30 10:46:04 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen 19ae063b66 fix: storage servers need to be rebooted when increasing replication so that clients become aware that new options are available 2018-11-08 15:44:03 -08:00
Stephen Atherton ade75ac692 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/worker.actor.cpp
2018-11-07 11:43:54 -08:00
Evan Tschannen bf6545a9cf clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Evan Tschannen 3b97f5a899 fix: the storage server still has to pop old tags, even if it does not need any data from them 2018-11-02 13:10:14 -07:00
Stephen Atherton 93501947f8 Process_behind should also be sent to clients. 2018-10-24 20:38:15 -07:00
Stephen Atherton f17cc1e20f StorageServer will no longer send io_error or other inappropriate errors to a client (this would never happen on SQLite). Many bug fixes around error handling, initialization, and shutdown in Redwood. StorageServer now calls init() on its underlying storage engine. 2018-10-24 15:57:06 -07:00
Evan Tschannen 6bb2ba5d92 Merge commit '2a34115e65639b7aad368a148de3c4189bc34bfc'
# Conflicts:
#	fdbserver/storageserver.actor.cpp
#	fdbserver/worker.actor.cpp
2018-10-23 17:05:42 -07:00
Clement Pang 3a30621071
Merge branch 'release-6.0' into memory-fast-rollback 2018-10-21 12:48:46 +09:00
Clement Pang 3ceec01392 Address review comments. 2018-10-19 18:55:35 -07:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen db71b60d72
Merge pull request #819 from satherton/feature-redwood
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen ed7036139a Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
2018-10-18 17:00:52 -07:00
Evan Tschannen 952b26d746 removed upgrade code from 2.0.3 2018-10-18 16:53:40 -07:00
Evan Tschannen 0613a34845 The storage server would block the main thread when processing a single version with a large amount of data 2018-10-18 13:37:31 -07:00