A.J. Beamon
3b952efb4e
Remove events from cluster controller that get logged for roughly every worker upon recovery, master registration, etc.
2017-11-14 10:15:45 -08:00
A.J. Beamon
0fea5e9c2f
Convert client_invalid_operation errors to ASSERTs.
2017-11-13 11:38:34 -08:00
A.J. Beamon
cd085764f1
Do not automatically change a cluster file that does not match what you expect.
2017-11-10 14:12:45 -08:00
Alex Miller
311d1ca87d
A variety of fixes that collectively fix using flow profiling in circus.
...
To run, use --co=flow_profiling=-1, because reasons.
2017-11-07 13:55:16 -08:00
Evan Tschannen
706bf1e018
fix: we cannot trigger better master exists before a master is fully recovered because exclusions changed by the provisional master will not be committed until the master is fully recovered
2017-11-04 12:48:04 -07:00
Evan Tschannen
57aba0b3bc
fix: excluded servers were the same fitness as storage servers for the master role
...
fix: better master exists did not considers exclusion for master fitness
2017-11-03 17:09:14 -07:00
Yichi Chiang
42fad5efe5
Introduce cluster controller process class in circus
2017-11-03 14:22:55 -07:00
Yichi Chiang
dcc9aafab7
Merge branch 'master' of github.com:apple/foundationdb
2017-11-02 10:47:59 -07:00
Yichi Chiang
c033d8efd8
Fix typo message and remove extra TraceEvent which overwrites the expected one
2017-11-02 10:47:51 -07:00
Balachandar Namasivayam
3efaaec479
onMasterProxiesChanged was being triggered when any member of ClientDBInfo changed. Change the behavior to be triggered only when proxies field in ClientDBInfo is changed.
2017-11-01 18:29:56 -07:00
A.J. Beamon
7cf17df821
Merge branch 'master' into log-group-for-unsupported-clients
...
# Conflicts:
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-11-01 11:31:02 -07:00
A.J. Beamon
31caac67dc
Rename supported_versions[x].clients to supported_versions[x].connected_clients
2017-11-01 10:41:30 -07:00
Balachandar Namasivayam
988bc0207f
Reset Client Transaction profiling parameters when the config keys are cleared.
2017-10-31 15:40:57 -07:00
Alec Grieser
5a4a5985fd
Merge branch 'release-5.0'
2017-10-30 08:31:23 -07:00
Alec Grieser
87321f5017
Merge branch 'release-4.6' into release-5.0
2017-10-30 08:31:01 -07:00
Evan Tschannen
54d82c0d92
Merge pull request #194 from cie/alexmiller/valgrind
...
Fix valgrind errors
2017-10-27 17:25:12 -07:00
Alex Miller
e0d33ef8d7
Preemptively fix profiler-related valgrind errors/straight out bugs.
...
I forgot to initialize some fields in requests.
2017-10-27 17:20:19 -07:00
Evan Tschannen
aa0c2ae317
only increase the max shard size if the shard begins in the keyServer keyspace, do not increase the minimum shard size
2017-10-27 14:22:26 -07:00
Evan Tschannen
3a4078bdda
the keyservers shards are always a fixed large size
2017-10-27 11:52:11 -07:00
Balachandar Namasivayam
cfefab18fb
Merge branch 'master' into add-new-atomic-ops
2017-10-25 18:03:34 -07:00
Balachandar Namasivayam
3d5658940a
Addressed Review Comments
2017-10-25 16:42:05 -07:00
Balachandar Namasivayam
9dd588dcce
Addressed review comments.
...
Changed naming for NewMin and NewAnd to MinV2 and AndV2
2017-10-25 14:48:05 -07:00
Evan Tschannen
d852a53ae4
Merge pull request #181 from cie/throttle-spammy-logs
...
Throttle spammy logs
2017-10-25 13:45:55 -07:00
Balachandar Namasivayam
2f6d55a52f
Add correctness tests for all atomic ops
2017-10-25 13:36:49 -07:00
Yichi Chiang
4d54a73f5b
Merge pull request #191 from cie/count-cluster-controller-role
...
Take cluster controller role into consideration when recruiting workers
2017-10-25 12:09:15 -07:00
Yichi Chiang
f39cce9b8d
Use processId instead of address for comparison
2017-10-25 11:35:29 -07:00
Yichi Chiang
5fcef911f0
Take cluster controller role into consideration when recruiting workers
2017-10-25 10:35:46 -07:00
Evan Tschannen
48901a9223
added a list of tlog IDs that are missing to status
2017-10-24 16:28:50 -07:00
Yichi Chiang
c2a117fe07
Merge pull request #189 from cie/enable-check-desired-class
...
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 15:18:21 -07:00
Yichi Chiang
defdc6550d
Exclude excluded processses when getting testers
2017-10-24 15:16:34 -07:00
Evan Tschannen
df74e2a373
re-added support for non-copying tlog recovery
2017-10-24 15:09:31 -07:00
Yichi Chiang
3865c5ae0e
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 12:58:54 -07:00
Balachandar Namasivayam
8c3bdc5b3b
Make atomic ops differentiate between unset and empty values.
2017-10-23 16:48:13 -07:00
Evan Tschannen
7a36fd2134
disabled a variety of simulation tests to get correctness clean
2017-10-19 15:49:54 -07:00
Evan Tschannen
e2c1e87df6
made a large number of fixes to make fearless DR correctness clean.
2017-10-19 15:36:32 -07:00
Bhaskar Muppana
360b777b78
Fail with correct error code in case of abort or discontinue of
...
non-existing backups.
2017-10-18 23:17:48 -07:00
Alec Grieser
dd6d8f3b0e
Merge branch 'master' into add-new-atomic-ops
2017-10-18 16:36:44 -07:00
Bhaskar Muppana
2007f3799f
Don't ignore TimeKeeper failures.
2017-10-18 14:31:31 -07:00
Bhaskar Muppana
314511f4d7
Fixing spaces in BackupCorrectness TraceEvents.
2017-10-18 14:27:52 -07:00
Alex Miller
7b9bc1d715
Merge pull request #170 from cie/alexmiller/flowprofile
...
Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.
2017-10-16 16:51:53 -07:00
Alex Miller
f997cb9038
Add a string knob to hold the Log directory, and write profiles to it.
...
This is the combination of two small changes.
1. Add support for a string knob type.
2. Change profiles to be written to the log directory instead of the working
directory.
We have three options of where to write files: the working directory, the data
directory, and the log directory.
The working directory may be set to a non-writable location, and likely
contains the fdb binaries. Allowing these files to be overwritten would likely
not be a wise idea.
The data directory hosts our sqlite b-trees. It would also be very unfortunate
if these were ever overwritten by an unfortunate profile name.
The log directory contains logs. Out of the three, these matter the least if
they disappear or become corrupted.
Thus, we write to the log directory.
2017-10-16 16:05:02 -07:00
Alex Miller
c5fbe33df6
Disallow arbitrary paths for storing profiles.
...
Previously, one could request profiles to be stored at
"../../../../../../etc/passwd". Now we expand the paths, including symlinks,
and ensure that the target is a child of the targetted subdirectory. This was
the least convoluted way I could figure out to handle paths.
2017-10-16 16:05:02 -07:00
Alex Miller
91a26a170c
Add toggleable profiling support to fdbserver+fdbcli.
...
This adds the fdbcli commands:
* profile list -- Lists all workers in a way that doesn't fill `kill`'s list.
* profile flow run -- Allows starting flow profiling on a set of hosts for a specified interval.
And threads through all the support for enabling and disabling profiling as an RPC.
2017-10-16 16:05:02 -07:00
Balachandar Namasivayam
312f614133
Add the new ops and AND to NON_ASSOCIATIVE_MASK.
...
In the storage server, read the entire value if the op is ByteMin or ByteMax.
2017-10-16 11:06:31 -07:00
Alec Grieser
e0be1ef1e0
Merge branch 'release-5.0'
2017-10-16 10:08:11 -07:00
Alec Grieser
432726ba2d
Merge branch 'release-4.6' into release-5.0
2017-10-16 09:54:21 -07:00
Stephen Atherton
68eccb681e
Merge pull request #173 from bmuppana/master
...
Backup log messages.
2017-10-13 18:31:53 -07:00
Evan Tschannen
215bcb8d3e
Merge pull request #157 from cie/choose-leader-on-stateless-processes
...
Catch and update processClass change from DBSource
2017-10-13 14:03:29 -07:00
Yichi Chiang
5bcdd37c0d
Move UID generation and add initialClass
2017-10-13 13:46:37 -07:00
Yichi Chiang
12edd27281
Introduce prevChangeID to CandidacyRequest and LeaderHeartbeatRequest
2017-10-12 17:11:58 -07:00
Bhaskar Muppana
d1e9d28239
Backup log messages.
2017-10-12 16:12:42 -07:00
Stephen Atherton
11517f7bfc
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-10-12 11:03:23 -07:00
Alex Miller
c24b941485
Fix erroneous std::move in indexed set, and clean up addMetric users.
...
This is a follow-on to c4eb73d0. Thanks to Bala for pointing out the unchanged
std::move usage, and there appeared to not be many existing users of addMetric
anyway.
2017-10-11 17:36:51 -07:00
Balachandar Namasivayam
8e0bea2795
Update API_VERSION from 500 to 510
2017-10-11 13:49:38 -07:00
Stephen Atherton
c3d8412abb
Merge pull request #166 from cie/alexmiller/deathservice
...
Fix potential division by zero issues via RPC.
2017-10-10 16:47:38 -07:00
Evan Tschannen
ff1b49be2e
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DatabaseConfiguration.cpp
2017-10-10 16:07:59 -07:00
Evan Tschannen
8feb3b8fbc
fixed conflict range workload by just disabling timeKeeper instead of the check, because it should be a more robust fix
2017-10-10 16:01:02 -07:00
Balachandar Namasivayam
eeebf10030
Modified existing behavior of MIN and AND atomic ops. The new behavior results in a 'SET' if the atomic op is performed on a non -existing key.
...
Added new atomic ops ByteMin and ByteMax that does lexicographic comparison of byte strings.
2017-10-10 13:02:22 -07:00
Evan Tschannen
c8525dc3e7
timekeeper is constantly changing keys in the system keyspace, so do not report errors on key mismatches on keys in the system keyspace
2017-10-10 12:04:56 -07:00
Evan Tschannen
3d2103075d
data distribution tracks teams for each data center separately
2017-10-10 10:36:33 -07:00
Stephen Atherton
4a58ed3b11
Debug output improvements, removed unused function.
2017-10-09 13:31:54 -07:00
Stephen Atherton
b15d29dd6b
VersionedBTrees now have names for logging purposes (trees do not deal with files directly but rather just an IPager). Bug fix in pager vacuumer which was not updating checksums. KeyValueStoreRedwood now relies on VersionedBTree to handle commit ordering, as there’s no reason to do this in both classes.
2017-10-09 13:24:16 -07:00
Evan Tschannen
5e6eba365b
fix: always set confChange, because popVersion is not deterministic across proxies, and confChange needs to be set deterministically
2017-10-06 18:37:08 -07:00
Evan Tschannen
93b3d0e4e7
fix: toMap didn’t report logs proxies and resolvers
2017-10-06 15:55:50 -07:00
Evan Tschannen
15962cf079
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbrpc/Locality.cpp
# fdbrpc/Locality.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/ClusterRecruitmentInterface.h
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/masterserver.actor.cpp
# fdbserver/worker.actor.cpp
# flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Alex Miller
a21c8a820b
Move cpuProfilerRequest from WorkerInterface to ClientWorkerInterface.
...
A way to access this stream is required if we wish to be able to toggle
profiling from fdbcli. There's two ways to do this:
1. Use `monitorLeader()` to get a `ClusterControllerFullInterface`, and use
`getWorkers` from there to get a list of `WorkerInterface`s, from which we can
access cpuProfilerRequest.
2. Move cpuProfilerRequest to ClientWorkerInterface and use the existing code
in the client that can fetch a list of all `ClientWorkerInterface`s.
The split between WorkerInterface and ClientWorkerInterface appears to be
what a client might have a need to call versus what is fdbserver-internal (and
thus no client should even want to call). Thus, it seems to make more sense to
acknowledge that profiling is useful to be able to toggle from a client, and go
with option (2).
2017-10-05 14:08:28 -07:00
Yichi Chiang
3edc2824a9
Add initialClass to RegisterWorkerRequest 2
2017-10-05 11:03:25 -07:00
Yichi Chiang
05f7626e39
Add initialClass to RegisterWorkerRequest
2017-10-04 17:11:12 -07:00
Yichi Chiang
3c70df57b5
Fix cluster controller review comments
2017-10-04 15:48:55 -07:00
Alex Miller
e55cc447d2
Address code review comments.
...
* Fixed memory corruption with SystemData key constants
* Removed duplication in ClusterController
* Reworked fdbcli actions to better represent explicit vs default assignments
2017-10-04 13:36:18 -07:00
A.J. Beamon
5063793f36
Revert line ending change
2017-10-04 11:19:19 -07:00
Alex Miller
706427ee62
Fix potential division by zero issues via RPC.
...
A carefully crafted SplitMetricRequest could have caused division by zero.
It's not really great to offer Division By Zero As A Service, so let's just
return an error instead.
2017-10-03 22:11:08 -07:00
Evan Tschannen
3a2ddcc84a
Add destinations that are read-write to the source list, so that cancelled data movement can contribute to copying the data for the next movement.
2017-10-03 17:39:08 -07:00
Stephen Atherton
c0d1097bb4
Lots of debug output improvements on pager, fixed bug in file naming scheme for page file and log files.
2017-10-03 11:55:14 -07:00
Balachandar Namasivayam
0e153cdd35
Throttle Spammy logs. Three knobs are added.
...
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.
2017-10-02 18:43:11 -07:00
Stephen Atherton
0635a31604
Better error and dispose/close handling, closer to correct behavior for an IKeyValueStore.
2017-10-02 03:32:22 -07:00
Evan Tschannen
6ea9903c82
Merge branch 'release-5.0'
...
# Conflicts:
# fdbbackup/backup.actor.cpp
# fdbserver/ClusterController.actor.cpp
# versions.target
2017-10-01 18:46:44 -07:00
Evan Tschannen
0949c4be65
Revert "Fixed problem with master being recruited on excluded servers"
...
This reverts commit 1f7b624734a8ad6e896dd3f01f9cdf334ca62486.
2017-10-01 16:30:19 -07:00
Evan Tschannen
696d432462
Revert "fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)"
...
This reverts commit 83b2ce68c8e1a29fc1559598cc38d3ef7eb46101.
2017-10-01 16:29:32 -07:00
Evan Tschannen
0dde15f1d2
fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)
...
fix: better master exists did not use exclusions because the configuration was reset
2017-10-01 16:26:58 -07:00
Yichi Chiang
636ce4a131
Replace leader when find a better one
2017-09-29 16:34:55 -07:00
Alex Miller
11668bb359
Fixing code review comments.
2017-09-29 15:58:36 -07:00
Alex Miller
b7ce9d996c
Comment out verbose TraceEvents in preparation for pushing.
2017-09-29 15:58:36 -07:00
Alex Miller
c40c1bb5fe
Add a new workload: BackupToDBAbort, which does an ACI switchover.
...
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Alex Miller
9e9a96ae76
Make VersionStamp workload able to run with DR-style workloads.
...
* It is now tolerant of locked database errors, and handles them correctly.
* There is an option to specify which database to verify against.
2017-09-29 15:58:36 -07:00
Alex Miller
34630b6130
Make VersionStamp workload can handle commit_unknown_result.
...
Previously, if a transaction failed with commit_unknown_result, and was
actually committed, it would look like data that magically appeared in the
database and verification would fail.
Now, we explicitly re-read and check to see if the commit happened, so that we
may maintain an accurate understanding of what the database state should be.
2017-09-29 15:58:36 -07:00
Alex Miller
23945b9fea
VersionStamp can co-exist with other workloads that write data to the database.
...
VersionStamp previously would range-read the entire database during validation.
This has the unfortunate effect of making it fail during validation if run with
any other workload that writes keys to the database.
Now, all keys written and read are done with a configurable prefix, so that it
may co-exist with a variety of other workloads.
2017-09-29 15:58:36 -07:00
Alex Miller
370a6afb80
Make VersionStamp have an option to be tolerant of data being lost.
2017-09-29 15:58:36 -07:00
Alex Miller
8f4c45418b
Make atomicSwitchover preserve an ever-increasing commit version.
2017-09-29 15:58:36 -07:00
Alex Miller
69523ce151
Hackish version of a test, but it does fail.
2017-09-29 15:58:36 -07:00
Alex Miller
65713b226f
Fix whitespace and line endings.
2017-09-29 15:58:36 -07:00
Evan Tschannen
e2b65e86ed
added configurable memory limits for backup and dr executables
...
added a default memory limit of 8GB for fdbcli
2017-09-29 10:35:40 -07:00
Bhaskar Muppana
91975244fe
Fixing OSX build.
2017-09-28 19:35:44 -07:00
Bhaskar Muppana
942c04e992
Merge pull request #162 from bmuppana/master
...
Fixing TimeKeeperCorrectness to deal with network delays.
2017-09-28 17:04:39 -07:00
Bhaskar Muppana
3d2bafc3a6
Fixing TimeKeeperCorrectness to deal with network delays.
2017-09-28 16:52:28 -07:00
Evan Tschannen
ef41b07bb3
renamed past_version to transaction_too_old
...
implemented read_lock_aware option
2017-09-28 16:35:08 -07:00
Yichi Chiang
d4f75630de
Support log group field in status json
2017-09-28 16:31:29 -07:00
Evan Tschannen
7b60e26660
Merge pull request #160 from cie/use-error-descriptions
...
Add the ability to access name and description in Error. Update error…
2017-09-28 16:00:39 -07:00
Evan Tschannen
5f4b997400
emergency teams are bad for performance, because we will route client read requests to servers that do not have the data, therefore getting many wrong shard server errors. emergency teams only protect us from data loss in very rare scenarios, we may want to add them in again in the future, but make sure load balance knows which storage servers used to be destinations so they can only route to them as a last resort.
2017-09-28 13:20:01 -07:00
Evan Tschannen
73fca75239
added the ability to disable timeKeeper; disabled timeKeeper before consistency check in simulation
2017-09-28 13:13:24 -07:00
A.J. Beamon
d30c730f75
Add the ability to access name and description in Error. Update error descriptions.
2017-09-28 12:35:03 -07:00
Bhaskar Muppana
0f8ff26029
Merge pull request #158 from bmuppana/master
...
<rdar://problem/34557380> Need a way to map real time to version
2017-09-27 17:56:42 -07:00
Bhaskar Muppana
6a0b1d6808
Fixing PR comments
...
<rdar://problem/34557380> Need a way to map real time to version
2017-09-27 17:56:01 -07:00
Evan Tschannen
4b21da1cd6
fix: lastVersionWithData was not updated when fetchKeys injects mutations
2017-09-27 10:44:34 -07:00
Stephen Atherton
c3e4364f08
Added checksums to pager.
2017-09-26 23:13:22 -07:00
Evan Tschannen
acb7e66d01
fix: failed logs do not count even if they have returned a result
2017-09-25 18:14:40 -07:00
Evan Tschannen
2bf042a559
fix: file_corrupt was not checking for fault injection
...
latency threshold was too long
2017-09-25 17:22:41 -07:00
Bhaskar Muppana
0bf5bdb23a
<rdar://problem/34557380> Need a way to map real time to version
2017-09-25 12:51:37 -07:00
Yichi Chiang
6758c649fc
Catch and update processClass change from DBSource
2017-09-25 10:36:03 -07:00
Stephen Atherton
ed226e2e1c
VersionedBtree now allows writes while a commit is in progress. Also, lots of bug fixes, like the one where writes during commits weren’t prevented but also didn’t work.
2017-09-22 17:18:28 -07:00
Evan Tschannen
cce4eeb52d
fix: the master was sending the cluster controller uninitialized configurations
2017-09-22 16:59:24 -07:00
Evan Tschannen
180438d41e
fix: use the number of present logServers rather than the total size of the vector
2017-09-22 16:19:16 -07:00
Evan Tschannen
738ae21c3a
fix: an optimization in buggified locking can cause recovery to break because it would not restart if a locked process was killed when the remaining logs cannot obtain a quorum
2017-09-22 15:07:57 -07:00
Stephen Atherton
248dab79b6
Created “redwood” storage engine option and many changes to support that including IKeyValueStore::init() and custom DiskQueue file extensions.
2017-09-21 23:51:55 -07:00
Alex Miller
585c9bf68f
Quick fix to reduce CPU usage of ensureEpochLive.
...
It is suspected that policy recomputations are driving proxy CPU usage up, and
thus latency and throughput down. To quickly confirm this theory, we're
forcing ensureEpochLive to wait until it has RF responses, which means we'll
probably only validate the policy once per call.
2017-09-21 18:22:24 -07:00
Stephen Atherton
d880569d52
Checkpointing progress on KeyValueStoreMVBTree. All methods are implemented to a usable point, and everything compiles, but Worker does not yet try to use it.
2017-09-21 04:43:49 -07:00
Stephen Atherton
e16ce63db0
Bug fix, find was being done with a key ref that wasn’t living as long as the actor.
2017-09-21 00:58:56 -07:00
Stephen Atherton
bcc8a2cc82
Bug fix, page reference was not living as long as the file write call so the memory could be freed before the write happens if there is any delay.
2017-09-20 17:51:26 -07:00
Stephen Atherton
eee8116fa4
Cleaned up how the root page is written and page write debug output.
2017-09-20 17:50:02 -07:00
Evan Tschannen
fbd67ea547
fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)
...
fix: better master exists did not use exclusions because the configuration was reset
2017-09-20 11:48:26 -07:00
Evan Tschannen
cb43563b2d
fix: toMap properly lists the redundancy mode of the cluster
2017-09-19 16:35:42 -07:00
Stephen Atherton
9b4cbe94aa
Bug fix, values exactly the size of the maximum part size were still split into a part set but of size 1, which is invalid. Also some debug output changes.
2017-09-19 13:03:30 -07:00
Evan Tschannen
f75dfc3153
do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response
2017-09-18 17:39:12 -07:00
Alex Miller
567d663afd
Fix SimulationConfig never generating a custom config.
...
A 0 was changed to a 1 when rewriting code, and `case 0:` was never being hit. :(
Thankfully, it looks like nothing was broken by this in the meantime.
2017-09-18 17:29:36 -07:00
Evan Tschannen
e8b895c878
added the ability to disable connection failures for a period of time after one happens
2017-09-18 12:46:29 -07:00
Evan Tschannen
489332533c
all timeouts longer than two minutes have been can be lowered to 60.0 with buggification
...
added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures
2017-09-18 11:04:51 -07:00
Stephen Atherton
25f958ac3b
Bug fixes. Finished backward iteration on Cursor. Correctness range verification now also reads the range in reverse and verifies the result.
2017-09-17 04:38:01 -07:00
Stephen Atherton
1aae32dc16
Some cleanup. InternalCursor does not need to be reference counted.
2017-09-16 02:09:09 -07:00
Stephen Atherton
c89b45158a
Removed InternalCursor::seekEqualOrGreaterThan because it isn’t needed, it’s always better to do a <= search so if the target key is found the initial cursor position will be at the last version prior to the target version.
2017-09-16 01:59:16 -07:00
Stephen Atherton
cea7903b46
Bug fixes involving edges cases with find >= and range read endpoints.
2017-09-16 01:45:39 -07:00
Evan Tschannen
34f987f56d
added a test in simulation which ensures that a recovery after a single failure takes less than 15 seconds
2017-09-15 17:55:01 -07:00
Stephen Atherton
15622f026e
Forward range reads are done.
2017-09-15 17:27:13 -07:00
Evan Tschannen
d9b64899c5
fix: we need to wait for log server failures if we have not locked all of the logs
2017-09-15 13:11:21 -07:00
Evan Tschannen
36c98f18e9
do not register a worker with the cluster controller until it has finished recovering all files from disk
2017-09-15 10:57:58 -07:00
Stephen Atherton
125d8168b4
Checkpointing progress on range reads (ordered iteration of user visible kv pairs via cursor). Added InternalCursor class which is used for all seeks and reads and sees the multi version, fragmented kv pairs and clears. Fixed a bug in commit which was discovered by the range read test where kv pairs (full or partial) could be missing from the tree but only for versions where they did not change. The write verification test did not find this because it only verifies exactly the changed versions of each key. The range test is not finished yet.
2017-09-15 05:19:39 -07:00
Evan Tschannen
f3b7aa615d
fix: seed storage servers are recruited based on the storage policy
2017-09-14 17:06:00 -07:00
Alvin Moore
9404d226d0
Merge branch 'release-5.0'
2017-09-13 16:49:00 -07:00
Alvin Moore
cb92194772
Fixed problem with master being recruited on excluded servers
2017-09-13 16:48:27 -07:00
Alex Miller
5e14f19875
Merge pull request #147 from cie/alexmiller/grvtlogs
...
Only verify a quorum of TLogs are unlocked for a GRV request
2017-09-13 16:07:25 -07:00
Alex Miller
d6b3be98fe
Fix whitespace.
2017-09-13 15:49:39 -07:00
Alex Miller
06a9c7a772
Remove unnecessary policy recomputations in confirmEpochLive.
...
Watching for interface changes on readied servers was done as a workaround for
a case where all futures could be ready, but the policy verification would
never succeed. This turns out to be because stopping a tlog causes an error to
be returned. However, if a TLog is stopped, then we know that we can't do any
more commits, so we can just immediately stop trying and never mark our future
as ready.
2017-09-13 15:45:09 -07:00
Evan Tschannen
8cb53fd608
Merge pull request #149 from cie/choose-leader-on-stateless-processes
...
choose leader on the perferred process class
2017-09-13 13:58:49 -07:00
Evan Tschannen
aea7a78cff
cluster controller changes were not maintained during merge
2017-09-11 17:40:46 -07:00
Evan Tschannen
d343d37274
fixed merge problems
2017-09-11 16:37:10 -07:00
Evan Tschannen
76e7988663
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/DataDistribution.actor.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/WorkerInterface.h
# flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Stephen Atherton
14e7756fe3
Fixed memory leak.
2017-09-09 02:18:37 -07:00
Stephen Atherton
919a3d1740
Iterating over and coalescing the “internal” records (not user visible, tuple of (key, version, valueIndex, value)) now works forward and backward, and findEqual() now works using this to read split values fully. This is also most of the work needed for range reads.
2017-09-09 01:29:25 -07:00
A.J. Beamon
4fa2415553
Merge branch 'release-5.0'
2017-09-08 17:28:12 -07:00
A.J. Beamon
bb8a245bdb
circus: throughput test scales latency error by the target latency
2017-09-08 17:27:54 -07:00
Evan Tschannen
ea26bc1c43
passed first tests which kill entire datacenters
...
added configuration options for the remote data center and satellite data centers
updated cluster controller recruitment logic
refactors how master writes core state
updated log recovery, and log system peeking
2017-09-07 15:32:08 -07:00
Yichi Chiang
bd1c7e7295
Use addTeamsBestOf() instead of addAllTeams() when team size is greater than 3
2017-09-07 12:31:01 -07:00
Bhaskar Muppana
c7df951f7c
Using BackupConfig from backup.actor.cpp to reduce intermediate
...
functions.
2017-09-07 08:36:36 -07:00
Bhaskar Muppana
fe208d6adf
Merge branch 'master' of github.com:apple/foundationdb into backup
2017-09-06 10:01:55 -07:00
Stephen Atherton
b5f79d27f7
Large values are now split and stored, but not yet read correctly.
2017-09-05 16:59:31 -07:00
Bhaskar Muppana
83810edabc
Backup/Restore tag can be std::string instad of Key.
2017-09-05 11:38:40 -07:00
Stephen Atherton
fe2d104000
Debug macro needs to be 1 instead of just being defined when it is in use.
2017-09-01 21:24:26 -07:00
Evan Tschannen
dc1f7ca6b7
testers now use client locality load balancing
2017-09-01 12:53:01 -07:00
Stephen Atherton
9745334704
Changed random parameter range so that at least two internal keys can always fit in a single page.
2017-09-01 00:17:07 -07:00
A.J. Beamon
cc24072a5d
Add the multi version API to the list of APIs to choose in the APICorrectness tester. Support for the multi-version client already existed.
2017-08-31 16:23:55 -07:00
Stephen Atherton
88e48e9a3e
Bug fix in detection of lack of mutations in a subtree.
2017-08-31 01:23:12 -07:00
Stephen Atherton
1e11353603
Merge branch 'master' into feature-redwood
2017-08-31 00:24:58 -07:00
Evan Tschannen
d61be4c760
Merge branch 'release-5.0'
2017-08-30 12:59:24 -07:00
Evan Tschannen
963e1c3f31
fix: we need to reboot the process even if it will result in too many files, because the check will not succeed without it
2017-08-30 12:58:46 -07:00
Stephen Atherton
7f46dfd829
Merge conflict fix.
2017-08-30 10:29:06 -07:00
Alex Miller
8d97a15c3f
BUGGIFY recovery to lock only the minimum number of TLogs required to prevent a quorum.
...
This is to test the quorum logic introduced in the previous patch, and should
flush out any other bugs that rely on TLog locking during recovery.
2017-08-29 14:43:40 -07:00
Alex Miller
f8486d1368
Only ensure a quorum of TLogs are unlocked to confirm the epoch hasn't ended.
...
Currently, GRV will wait to hear back from (almost) all TLogs to confirm that
they're unlocked and that the current epoch hasn't ended. This confirms that
there isn't a new set of proxies and using the commit version from the old set
of proxies would violate causal consistency.
However, during recovery, we ensure that no quorum of TLogs exists before
starting a new epoch and allowing new commits on the new TLogs. Thus, we only
need to wait until we have a quorum of TLogs that are unlocked.
This should be a significant improvement in latency particularly for the cases
when we start running >10 TLogs.
2017-08-29 14:43:40 -07:00
Alex Miller
4c1d61cd08
Assorted minor changes.
...
In which we:
* Clarify some math in a comment
* Remove misleading debugging information
* Add a useful trace event
2017-08-29 14:43:40 -07:00
Alex Miller
dbfa94f735
LF -> CRLF
...
It appears a previous patch left parts of this file ending with LF, and the
majority of the file ends in CRLF. I see no reason to keep this inconsistency,
but these line ending wars are going to drive me insane.
2017-08-29 14:43:40 -07:00
Alvin Moore
6020d70863
Added trace event to track reboots initiated by ConsistencyCheck workload in simulation
2017-08-29 11:41:27 -07:00
Alvin Moore
c95a1be5ec
Add trace event for rebooting process during simulation for consistency check
2017-08-29 11:00:44 -07:00
Stephen Atherton
b716ad90f8
Missed in last commit.
2017-08-28 17:46:18 -07:00
Stephen Atherton
5d49f2c710
Merge branch 'master' into feature-redwood
...
# Conflicts:
# fdbserver/fdbserver.vcxproj
2017-08-28 17:45:50 -07:00
Stephen Atherton
5049421a0e
More comments, debug output improvements.
2017-08-28 17:26:53 -07:00
A.J. Beamon
86774f6e42
Merge branch 'release-5.0'
2017-08-28 17:17:00 -07:00
A.J. Beamon
03478561b9
fix: Set lock aware at the transaction level for latency probe to avoid having to fill the shard cache every time.
2017-08-28 17:16:46 -07:00
A.J. Beamon
9a0a3b6329
Merge commit '66528becb82d826e81fa644bb378212584ab580e'
2017-08-28 16:47:59 -07:00
Yichi Chiang
9fe927127f
choose leader on the perferred process class
2017-08-28 14:41:04 -07:00
Alvin Moore
44e0df78c5
Added support for tracking roles for simulation workers
...
Fixed the exclusion and inclusion address simulation API and integration within workloads
Added more information within trace events for simulation
2017-08-28 11:25:37 -07:00
Alvin Moore
581bd6c8ed
Added option to delay the displaying of the simulation workers
2017-08-28 10:53:56 -07:00
Stephen Atherton
14ef5bed42
Major bug fix. In certain situations, some mutations were being applied to two adjacent pages, which had a lot of crazy side effects.
2017-08-28 06:28:49 -07:00
Stephen Atherton
d9136743f7
Bug fix, reformatted and cleaned up a lot of debug output.
2017-08-28 03:53:29 -07:00
Stephen Atherton
0fd57e9183
Many bug fixes.
2017-08-28 01:57:01 -07:00
Alec Grieser
300b5a17ed
Merge branch 'release-5.0'
2017-08-25 18:55:33 -07:00
Stephen Atherton
1070a5ed66
Several bug fixes.
2017-08-25 15:48:32 -07:00
Evan Tschannen
272b4b984c
fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine
2017-08-25 10:12:58 -07:00
Stephen Atherton
888093463b
Checkpointing progress on large rewrite of how mutations are stored and applied. Not working yet.
2017-08-24 17:25:53 -07:00
Evan Tschannen
26a5b5e422
rollback workload now clogs the communication between one of the proxies and the tlogs, since that is what will cause a rollback
2017-08-23 16:08:13 -07:00
A.J. Beamon
4c706d33e9
Merge branch 'release-5.0'
2017-08-23 14:59:43 -07:00
Evan Tschannen
be941b4bd1
sending void to committed could cause self to be deleted, so call cleanup before sending
2017-08-23 13:56:18 -07:00
Alvin Moore
7729f663e9
Ensured that the circus id is always lowercase
2017-08-23 13:45:00 -07:00
Evan Tschannen
f9308b8fa6
Merge pull request #145 from cie/alexmiller/simrefactor
...
Refactor simulation to pull all configuration parameters into one struct.
2017-08-23 12:54:21 -07:00
Evan Tschannen
4b40f817f1
fix: is recovery is cancelled before the copy is complete, remove the tlog
2017-08-23 12:26:03 -07:00
Alvin Moore
8056b78414
Merge branch 'release-5.0'
2017-08-22 13:51:19 -07:00
Alvin Moore
814e471689
Added support for displaying initial workers via printf within simulation using a workload
2017-08-22 13:38:24 -07:00
Stephen Atherton
0b817f95f2
Bug fixes, but there’s still an issue with crossing page boundaries at the same key.
2017-08-22 11:30:44 -07:00
Alex Miller
7b78035365
Have SimulationConfig wrap DatabaseConfiguration to reduce code duplication.
...
This effectively turns initializing SimulationConfig into the equivalent of
building a config string and calling buildConfiguration on it.
2017-08-22 10:13:57 -07:00
Alex Miller
9b25c72971
Pull database config and cluster config into one struct.
...
This will allow us to specify custom situations to be chosen more frequently,
and in particular control machines and processes.
2017-08-21 22:35:44 -07:00
Stephen Atherton
aff31b7f36
Checkpointing. Clears almost work, but two edge cases aren’t handled correctly yet involving range clears that must cross leaf page boundaries.
2017-08-21 22:29:57 -07:00
Alec Grieser
5ee07b1a9e
Merge branch 'release-5.0'
2017-08-14 16:56:58 -07:00
Evan Tschannen
de1b590a8a
The TLog did not delete data from removed logs
...
The TLog continued to make data from removed logs persistent
2017-08-11 18:08:09 -07:00
Stephen Atherton
50fb44be92
Merge branch 'release-5.0'
...
# Conflicts:
# versions.target
2017-08-09 23:36:12 -07:00
Evan Tschannen
2335fc73f2
fix: peek cursors were being timed out every 10 minutes, instead of 10 minutes after the last use
...
fix: if an interface is changed while we are not waiting in getMore, we will not reset the sequence to 0.
2017-08-09 15:58:06 -07:00
Evan Tschannen
47a37f3f1e
Merge pull request #135 from cie/switch-for-data-distribution
...
Add a switch to turn off data distribution in CLI
2017-08-07 12:54:08 -07:00
Stephen Atherton
de03491475
Switched to new commit buffer type which can support more than set, but so far only set is implemented.
2017-08-04 00:01:25 -07:00
Evan Tschannen
c22708b6d6
added tag localities
...
fix: remote logs need to stop the master when they are stopped
2017-08-03 16:16:36 -07:00
Stephen Atherton
dbcd3526d7
Memory lifetime bug fix and previously missed changes for enforcing only reading committed versions.
2017-08-03 15:07:29 -07:00
Alec Grieser
ca7437ecf6
Merge branch 'release-5.0'
2017-08-02 22:07:01 -07:00
John King
d0fbc41338
set LOCK_AWARE on several transactions used for getting cluster info for the consistency check
2017-07-28 18:50:32 -07:00
Yichi Chiang
6a8a5c41b0
Add a switch to turn off data distribution in CLI
2017-07-28 18:14:55 -07:00
A.J. Beamon
4243486f54
fix: TLogMetrics was being track latested with the wrong ID
2017-07-28 14:37:23 -07:00
Yichi Chiang
37e5e2acbb
Fix parentheses issue in StorageMetrics.actor.h
2017-07-27 12:03:36 -07:00
Yichi Chiang
cdc62e265c
Merge pull request #133 from cie/shard-system-keyspace
...
Shard system keyspace
2017-07-26 17:09:13 -07:00
A.J. Beamon
41c90bcdea
Merge commit '89ac94853c70d08289e7fb58055bc5d0cd4e494d'
2017-07-26 15:35:36 -07:00
A.J. Beamon
d8e308c18f
Enable use of incremental delete when deleting disk queue and sqlite KVS sqlite files.
2017-07-26 14:11:11 -07:00
Yichi Chiang
53e1ae9f60
shard system keyspace
2017-07-26 13:47:31 -07:00
Stephen Atherton
31314d06d4
Added test stats. Added enforcement of only reading committed versions, which is a currently an implementation limitation.
2017-07-25 16:10:19 -07:00
Stephen Atherton
4aaee86c2a
Moved MetricLogger actor to fdbclient so applications other than fdbserver can use it.
2017-07-24 13:13:06 -07:00
Evan Tschannen
2ae445782e
fix: cannot rely on the bestServer’s version because other logs may have higher versions
2017-07-21 19:21:49 -07:00
Evan Tschannen
f6826f1e15
fix: log routers were popped at too high of a version
...
fix: make sure tlogs make everything durable
fix: make cluster controller’s temporary remote log recruitment not interfere with better master exists
2017-07-20 16:26:05 -07:00
Evan Tschannen
7fec378830
do not continue copying data from prior generations after being locked
2017-07-19 15:11:18 -07:00
Evan Tschannen
5852a6301b
fixed even more bugs
2017-07-15 15:15:03 -07:00
Alec Grieser
c860f09d8a
Merge branch 'release-5.0'
2017-07-14 16:01:15 -07:00
Alec Grieser
660729839c
moved Notified.h from flow -> fdbclient ; flow bindings package does better job when excluding testers
2017-07-14 15:49:30 -07:00
Stephen Atherton
d61f74e52b
Debug output changes.
2017-07-14 13:39:58 -07:00
Stephen Atherton
8b3569e27e
Bug fix.
2017-07-14 11:37:08 -07:00
Stephen Atherton
fc0557252b
Bug fixes. Btree now uses page 0 as root and will write initial page only if necessary. Added debug printf macro.
2017-07-14 11:36:49 -07:00
Stephen Atherton
d8b82ecbe4
Added use of IndirectShadowPager. Moved MemoryPager code into .cpp because of type conflicts and its implementation doesn’t need to be externally visible anyway. Added start of a performance test. Renamed tests. Correctness/set now has random reopen of disk based pager before verification. Added asserts for when keys or values that are too large to fit in a single page.
2017-07-13 22:11:48 -07:00
Evan Tschannen
5ac4de8775
fix: the same tag could be in the server tags list twice
2017-07-13 16:31:55 -07:00
Stephen Atherton
b56f5643d5
Bug fixes.
2017-07-13 14:51:39 -07:00
Evan Tschannen
57ba9d36af
fixed a large number of bugs
2017-07-13 12:29:21 -07:00
Stephen Atherton
bea061fd75
Bug fix, building tree root (1 or more top levels) must happen inside commitSubtree() otherwise the root page logical ID (1) can be written out of version order.
2017-07-13 11:32:14 -07:00
Alec Grieser
f75b6f333b
Merge branch 'release-5.0'
2017-07-13 11:21:18 -07:00
Evan Tschannen
415458deef
made LogSet reference counted,
...
fixed a few bugs
2017-07-11 15:48:10 -07:00
Evan Tschannen
81ae263ad9
implemented setPeekCursor
...
removed oldTLogServer
first compiling version
2017-07-10 17:41:32 -07:00
Evan Tschannen
979ebcef6c
changed to using a vector of logSets instead of a duplicate set of logs for remote servers
...
finished porting changes to the tlog
everything but peeking is finished in the TagPartitionedLogSystem
2017-07-09 14:46:16 -07:00
Stephen Atherton
c508e8bdf9
Adjacent internal btree pages can now contain the same keys at different versions.
2017-07-04 23:49:18 -07:00
Stephen Atherton
1a71df1871
Lots of bug fixes and debug output added. Unitttest for set works…pretty often.
2017-07-04 23:41:48 -07:00
Evan Tschannen
aa1c903b52
fix: do not log that data distribution is initialized until readyToStart is ready
2017-06-30 16:21:59 -07:00
Evan Tschannen
0906250e78
merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem
...
re-included tags in messages to the tlog
previously never committed the LogRouter
2017-06-29 15:50:19 -07:00
A.J. Beamon
a2d229ff00
Merge branch 'release-5.0'
2017-06-29 08:16:22 -07:00
Evan Tschannen
69c862ed6e
updated release notes for 5.0.1
2017-06-28 16:52:45 -07:00
Evan Tschannen
663535e00a
add the same number for shared metrics and local metrics
2017-06-28 15:21:54 -07:00
Evan Tschannen
fe37d0b056
added additional logging about bytesInput and bytesDurable
2017-06-28 13:29:40 -07:00
Alvin Moore
31d562ff7b
Merge branch 'release-5.0'
...
# Conflicts:
# fdbclient/ManagementAPI.actor.cpp
# fdbserver/DatabaseConfiguration.cpp
# versions.target
2017-06-27 11:16:08 -07:00
Evan Tschannen
9fd5955e92
Merge branch 'master' into removing-old-dc-code
2017-06-26 16:27:10 -07:00
Evan Tschannen
15cb498aa7
removed fast_recovery_double and fast_recovery_triple from the fdbcli
2017-06-23 16:18:23 -07:00
Evan Tschannen
533dca95d8
fix: bytesDurable was not correctly increased when a log was removed
...
re-added many TLogMetrics
added a new role for the shared tlog
2017-06-22 17:21:42 -07:00
Alvin Moore
6d19580789
Merge branch 'release-5.0' of github.com:apple/foundationdb into release-5.0
...
# Conflicts:
# fdbrpc/simulator.h
2017-06-19 17:39:37 -07:00
Alvin Moore
96f40d8eb0
Added support for optionally re-including excluded servers
...
Removed unneeded code to protect coordinators
Ensured that simulation exclusion list is updated for excluded processes
2017-06-19 16:51:07 -07:00
Alvin Moore
9553458b78
Updated simulation to support managing exclusion and inclusion address
...
Added method for identifying acceptable availability process classes
Extended cluster availability function to ensure coordinators can be auto configured
Fixed availability function to allow protected processes to be considered as dead if not available
Added debug trace events for providing machine state when considering availability
Added trace event for protected coordinators
2017-06-19 16:48:15 -07:00
Stephen Atherton
f405c8d88e
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/optimisttest.actor.cpp
# versions.target
2017-06-15 17:40:19 -07:00
Stephen Atherton
5de6c703cf
Fixed line endings which were changed to UNIX style during a recent merge.
2017-06-15 16:57:35 -07:00
Evan Tschannen
4bdcd8fc12
Merge branch 'release-4.6' into release-5.0
...
# Conflicts:
# bindings/bindingtester/run_binding_tester.sh
# fdbrpc/AsyncFileKAIO.actor.h
2017-06-14 16:43:53 -07:00
Stephen Atherton
b65ad3563c
Merge branch 'master' into feature-redwood
...
# Conflicts:
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
2017-06-09 14:56:41 -07:00
Evan Tschannen
cefaa2391d
fix: tlog actor needs to be cancelled when worker shutdown
2017-06-02 17:56:47 -07:00
Evan Tschannen
766dc23e26
fix: do not use TLS in protectedAddresses
2017-06-02 13:52:21 -07:00
Evan Tschannen
b108c1ea0d
fix: do not delay before setting logData->version
2017-06-02 11:27:37 -07:00
Evan Tschannen
2d0dbd57e8
randomized the delays in atomic switchover workload
2017-06-01 12:08:21 -07:00
Evan Tschannen
bfcbb5623f
fixed build from merge error
2017-06-01 12:07:30 -07:00
Evan Tschannen
276073d91b
Merge branch 'release-4.6' into release-5.0
2017-06-01 11:54:54 -07:00
Stephen Atherton
fa4fdb1f1d
Merge branch 'fix-io-timeout-handling' into release-5.0
...
# Conflicts:
# fdbserver/optimisttest.actor.cpp
2017-05-31 17:03:15 -07:00
Evan Tschannen
1626e16377
Merge branch 'release-4.6' into release-5.0
2017-05-31 16:23:37 -07:00
Alvin Moore
333f2e4865
Added connection fault disabler within setup of backup submission. It should be reviewed to determine the amount of time to wait before disabling
2017-05-31 14:21:50 -07:00
Stephen Atherton
98604d33a0
Merge branch 'fix-io-timeout-handling'
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/KeyValueStoreSQLite.actor.cpp
# fdbserver/optimisttest.actor.cpp
# fdbserver/worker.actor.cpp
# fdbserver/workloads/MachineAttrition.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
Alvin Moore
ba606c2135
Removed NOT_IN_CLEAN macro from file
2017-05-26 14:52:06 -07:00
Alvin Moore
b28ed397a2
Fixed printf field width specifier to reduce compilation warnings within OS X
2017-05-26 14:51:34 -07:00
Alvin Moore
0b9ed67e12
Fixed support for RemoveServers Workload
...
Added availability functions to simulation
2017-05-26 14:20:11 -07:00
Alvin Moore
16cc0821b1
Removed dead machine option from simulation
2017-05-25 16:29:02 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00