Commit Graph

244 Commits

Author SHA1 Message Date
Yichi Chiang dcc9aafab7 Merge branch 'master' of github.com:apple/foundationdb 2017-11-02 10:47:59 -07:00
Yichi Chiang c033d8efd8 Fix typo message and remove extra TraceEvent which overwrites the expected one 2017-11-02 10:47:51 -07:00
Balachandar Namasivayam 3efaaec479 onMasterProxiesChanged was being triggered when any member of ClientDBInfo changed. Change the behavior to be triggered only when proxies field in ClientDBInfo is changed. 2017-11-01 18:29:56 -07:00
A.J. Beamon 7cf17df821 Merge branch 'master' into log-group-for-unsupported-clients
# Conflicts:
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-11-01 11:31:02 -07:00
A.J. Beamon 31caac67dc Rename supported_versions[x].clients to supported_versions[x].connected_clients 2017-11-01 10:41:30 -07:00
Balachandar Namasivayam 988bc0207f Reset Client Transaction profiling parameters when the config keys are cleared. 2017-10-31 15:40:57 -07:00
Alec Grieser 5a4a5985fd Merge branch 'release-5.0' 2017-10-30 08:31:23 -07:00
Alec Grieser 87321f5017 Merge branch 'release-4.6' into release-5.0 2017-10-30 08:31:01 -07:00
Evan Tschannen 54d82c0d92 Merge pull request #194 from cie/alexmiller/valgrind
Fix valgrind errors
2017-10-27 17:25:12 -07:00
Alex Miller e0d33ef8d7 Preemptively fix profiler-related valgrind errors/straight out bugs.
I forgot to initialize some fields in requests.
2017-10-27 17:20:19 -07:00
Evan Tschannen aa0c2ae317 only increase the max shard size if the shard begins in the keyServer keyspace, do not increase the minimum shard size 2017-10-27 14:22:26 -07:00
Evan Tschannen 3a4078bdda the keyservers shards are always a fixed large size 2017-10-27 11:52:11 -07:00
Balachandar Namasivayam cfefab18fb Merge branch 'master' into add-new-atomic-ops 2017-10-25 18:03:34 -07:00
Balachandar Namasivayam 3d5658940a Addressed Review Comments 2017-10-25 16:42:05 -07:00
Balachandar Namasivayam 9dd588dcce Addressed review comments.
Changed naming for NewMin and NewAnd to MinV2 and AndV2
2017-10-25 14:48:05 -07:00
Evan Tschannen d852a53ae4 Merge pull request #181 from cie/throttle-spammy-logs
Throttle spammy logs
2017-10-25 13:45:55 -07:00
Balachandar Namasivayam 2f6d55a52f Add correctness tests for all atomic ops 2017-10-25 13:36:49 -07:00
Yichi Chiang 4d54a73f5b Merge pull request #191 from cie/count-cluster-controller-role
Take cluster controller role into consideration when recruiting workers
2017-10-25 12:09:15 -07:00
Yichi Chiang f39cce9b8d Use processId instead of address for comparison 2017-10-25 11:35:29 -07:00
Yichi Chiang 5fcef911f0 Take cluster controller role into consideration when recruiting workers 2017-10-25 10:35:46 -07:00
Evan Tschannen 48901a9223 added a list of tlog IDs that are missing to status 2017-10-24 16:28:50 -07:00
Yichi Chiang c2a117fe07 Merge pull request #189 from cie/enable-check-desired-class
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 15:18:21 -07:00
Yichi Chiang defdc6550d Exclude excluded processses when getting testers 2017-10-24 15:16:34 -07:00
Yichi Chiang 3865c5ae0e Enable checkUsingDesiredClasses() in consistency check 2017-10-24 12:58:54 -07:00
Balachandar Namasivayam 8c3bdc5b3b Make atomic ops differentiate between unset and empty values. 2017-10-23 16:48:13 -07:00
Bhaskar Muppana 360b777b78 Fail with correct error code in case of abort or discontinue of
non-existing backups.
2017-10-18 23:17:48 -07:00
Alec Grieser dd6d8f3b0e Merge branch 'master' into add-new-atomic-ops 2017-10-18 16:36:44 -07:00
Bhaskar Muppana 2007f3799f Don't ignore TimeKeeper failures. 2017-10-18 14:31:31 -07:00
Bhaskar Muppana 314511f4d7 Fixing spaces in BackupCorrectness TraceEvents. 2017-10-18 14:27:52 -07:00
Alex Miller 7b9bc1d715 Merge pull request #170 from cie/alexmiller/flowprofile
Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.
2017-10-16 16:51:53 -07:00
Alex Miller f997cb9038 Add a string knob to hold the Log directory, and write profiles to it.
This is the combination of two small changes.

1. Add support for a string knob type.
2. Change profiles to be written to the log directory instead of the working
   directory.

We have three options of where to write files: the working directory, the data
directory, and the log directory.

The working directory may be set to a non-writable location, and likely
contains the fdb binaries.  Allowing these files to be overwritten would likely
not be a wise idea.

The data directory hosts our sqlite b-trees.  It would also be very unfortunate
if these were ever overwritten by an unfortunate profile name.

The log directory contains logs.  Out of the three, these matter the least if
they disappear or become corrupted.

Thus, we write to the log directory.
2017-10-16 16:05:02 -07:00
Alex Miller c5fbe33df6 Disallow arbitrary paths for storing profiles.
Previously, one could request profiles to be stored at
"../../../../../../etc/passwd".  Now we expand the paths, including symlinks,
and ensure that the target is a child of the targetted subdirectory.  This was
the least convoluted way I could figure out to handle paths.
2017-10-16 16:05:02 -07:00
Alex Miller 91a26a170c Add toggleable profiling support to fdbserver+fdbcli.
This adds the fdbcli commands:
* profile list -- Lists all workers in a way that doesn't fill `kill`'s list.
* profile flow run -- Allows starting flow profiling on a set of hosts for a specified interval.

And threads through all the support for enabling and disabling profiling as an RPC.
2017-10-16 16:05:02 -07:00
Balachandar Namasivayam 312f614133 Add the new ops and AND to NON_ASSOCIATIVE_MASK.
In the storage server, read the entire value if the op is ByteMin or ByteMax.
2017-10-16 11:06:31 -07:00
Alec Grieser e0be1ef1e0 Merge branch 'release-5.0' 2017-10-16 10:08:11 -07:00
Alec Grieser 432726ba2d Merge branch 'release-4.6' into release-5.0 2017-10-16 09:54:21 -07:00
Stephen Atherton 68eccb681e Merge pull request #173 from bmuppana/master
Backup log messages.
2017-10-13 18:31:53 -07:00
Evan Tschannen 215bcb8d3e Merge pull request #157 from cie/choose-leader-on-stateless-processes
Catch and update processClass change from DBSource
2017-10-13 14:03:29 -07:00
Yichi Chiang 5bcdd37c0d Move UID generation and add initialClass 2017-10-13 13:46:37 -07:00
Yichi Chiang 12edd27281 Introduce prevChangeID to CandidacyRequest and LeaderHeartbeatRequest 2017-10-12 17:11:58 -07:00
Bhaskar Muppana d1e9d28239 Backup log messages. 2017-10-12 16:12:42 -07:00
Stephen Atherton 11517f7bfc Merge branch 'master' into continuous-backup
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2017-10-12 11:03:23 -07:00
Alex Miller c24b941485 Fix erroneous std::move in indexed set, and clean up addMetric users.
This is a follow-on to c4eb73d0.  Thanks to Bala for pointing out the unchanged
std::move usage, and there appeared to not be many existing users of addMetric
anyway.
2017-10-11 17:36:51 -07:00
Balachandar Namasivayam 8e0bea2795 Update API_VERSION from 500 to 510 2017-10-11 13:49:38 -07:00
Stephen Atherton c3d8412abb Merge pull request #166 from cie/alexmiller/deathservice
Fix potential division by zero issues via RPC.
2017-10-10 16:47:38 -07:00
Evan Tschannen 8feb3b8fbc fixed conflict range workload by just disabling timeKeeper instead of the check, because it should be a more robust fix 2017-10-10 16:01:02 -07:00
Balachandar Namasivayam eeebf10030 Modified existing behavior of MIN and AND atomic ops. The new behavior results in a 'SET' if the atomic op is performed on a non -existing key.
Added new atomic ops ByteMin and ByteMax that does lexicographic comparison of byte strings.
2017-10-10 13:02:22 -07:00
Evan Tschannen c8525dc3e7 timekeeper is constantly changing keys in the system keyspace, so do not report errors on key mismatches on keys in the system keyspace 2017-10-10 12:04:56 -07:00
Evan Tschannen 5e6eba365b fix: always set confChange, because popVersion is not deterministic across proxies, and confChange needs to be set deterministically 2017-10-06 18:37:08 -07:00
Evan Tschannen 93b3d0e4e7 fix: toMap didn’t report logs proxies and resolvers 2017-10-06 15:55:50 -07:00
Alex Miller a21c8a820b Move cpuProfilerRequest from WorkerInterface to ClientWorkerInterface.
A way to access this stream is required if we wish to be able to toggle
profiling from fdbcli.  There's two ways to do this:

1. Use `monitorLeader()` to get a `ClusterControllerFullInterface`, and use
`getWorkers` from there to get a list of `WorkerInterface`s, from which we can
access cpuProfilerRequest.
2. Move cpuProfilerRequest to ClientWorkerInterface and use the existing code
in the client that can fetch a list of all `ClientWorkerInterface`s.

The split between WorkerInterface and ClientWorkerInterface appears to be
what a client might have a need to call versus what is fdbserver-internal (and
thus no client should even want to call). Thus, it seems to make more sense to
acknowledge that profiling is useful to be able to toggle from a client, and go
with option (2).
2017-10-05 14:08:28 -07:00
Yichi Chiang 3edc2824a9 Add initialClass to RegisterWorkerRequest 2 2017-10-05 11:03:25 -07:00
Yichi Chiang 05f7626e39 Add initialClass to RegisterWorkerRequest 2017-10-04 17:11:12 -07:00
Yichi Chiang 3c70df57b5 Fix cluster controller review comments 2017-10-04 15:48:55 -07:00
Alex Miller e55cc447d2 Address code review comments.
* Fixed memory corruption with SystemData key constants
* Removed duplication in ClusterController
* Reworked fdbcli actions to better represent explicit vs default assignments
2017-10-04 13:36:18 -07:00
A.J. Beamon 5063793f36 Revert line ending change 2017-10-04 11:19:19 -07:00
Alex Miller 706427ee62 Fix potential division by zero issues via RPC.
A carefully crafted SplitMetricRequest could have caused division by zero.
It's not really great to offer Division By Zero As A Service, so let's just
return an error instead.
2017-10-03 22:11:08 -07:00
Evan Tschannen 3a2ddcc84a Add destinations that are read-write to the source list, so that cancelled data movement can contribute to copying the data for the next movement. 2017-10-03 17:39:08 -07:00
Balachandar Namasivayam 0e153cdd35 Throttle Spammy logs. Three knobs are added.
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.
2017-10-02 18:43:11 -07:00
Evan Tschannen 6ea9903c82 Merge branch 'release-5.0'
# Conflicts:
#	fdbbackup/backup.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	versions.target
2017-10-01 18:46:44 -07:00
Evan Tschannen 0949c4be65 Revert "Fixed problem with master being recruited on excluded servers"
This reverts commit 1f7b624734a8ad6e896dd3f01f9cdf334ca62486.
2017-10-01 16:30:19 -07:00
Evan Tschannen 696d432462 Revert "fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)"
This reverts commit 83b2ce68c8e1a29fc1559598cc38d3ef7eb46101.
2017-10-01 16:29:32 -07:00
Evan Tschannen 0dde15f1d2 fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)
fix: better master exists did not use exclusions because the configuration was reset
2017-10-01 16:26:58 -07:00
Yichi Chiang 636ce4a131 Replace leader when find a better one 2017-09-29 16:34:55 -07:00
Alex Miller 11668bb359 Fixing code review comments. 2017-09-29 15:58:36 -07:00
Alex Miller b7ce9d996c Comment out verbose TraceEvents in preparation for pushing. 2017-09-29 15:58:36 -07:00
Alex Miller c40c1bb5fe Add a new workload: BackupToDBAbort, which does an ACI switchover.
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Alex Miller 9e9a96ae76 Make VersionStamp workload able to run with DR-style workloads.
* It is now tolerant of locked database errors, and handles them correctly.
* There is an option to specify which database to verify against.
2017-09-29 15:58:36 -07:00
Alex Miller 34630b6130 Make VersionStamp workload can handle commit_unknown_result.
Previously, if a transaction failed with commit_unknown_result, and was
actually committed, it would look like data that magically appeared in the
database and verification would fail.

Now, we explicitly re-read and check to see if the commit happened, so that we
may maintain an accurate understanding of what the database state should be.
2017-09-29 15:58:36 -07:00
Alex Miller 23945b9fea VersionStamp can co-exist with other workloads that write data to the database.
VersionStamp previously would range-read the entire database during validation.
This has the unfortunate effect of making it fail during validation if run with
any other workload that writes keys to the database.

Now, all keys written and read are done with a configurable prefix, so that it
may co-exist with a variety of other workloads.
2017-09-29 15:58:36 -07:00
Alex Miller 370a6afb80 Make VersionStamp have an option to be tolerant of data being lost. 2017-09-29 15:58:36 -07:00
Alex Miller 8f4c45418b Make atomicSwitchover preserve an ever-increasing commit version. 2017-09-29 15:58:36 -07:00
Alex Miller 69523ce151 Hackish version of a test, but it does fail. 2017-09-29 15:58:36 -07:00
Alex Miller 65713b226f Fix whitespace and line endings. 2017-09-29 15:58:36 -07:00
Evan Tschannen e2b65e86ed added configurable memory limits for backup and dr executables
added a default memory limit of 8GB for fdbcli
2017-09-29 10:35:40 -07:00
Bhaskar Muppana 91975244fe Fixing OSX build. 2017-09-28 19:35:44 -07:00
Bhaskar Muppana 942c04e992 Merge pull request #162 from bmuppana/master
Fixing TimeKeeperCorrectness to deal with network delays.
2017-09-28 17:04:39 -07:00
Bhaskar Muppana 3d2bafc3a6 Fixing TimeKeeperCorrectness to deal with network delays. 2017-09-28 16:52:28 -07:00
Evan Tschannen ef41b07bb3 renamed past_version to transaction_too_old
implemented read_lock_aware option
2017-09-28 16:35:08 -07:00
Yichi Chiang d4f75630de Support log group field in status json 2017-09-28 16:31:29 -07:00
Evan Tschannen 7b60e26660 Merge pull request #160 from cie/use-error-descriptions
Add the ability to access name and description in Error. Update error…
2017-09-28 16:00:39 -07:00
Evan Tschannen 5f4b997400 emergency teams are bad for performance, because we will route client read requests to servers that do not have the data, therefore getting many wrong shard server errors. emergency teams only protect us from data loss in very rare scenarios, we may want to add them in again in the future, but make sure load balance knows which storage servers used to be destinations so they can only route to them as a last resort. 2017-09-28 13:20:01 -07:00
Evan Tschannen 73fca75239 added the ability to disable timeKeeper; disabled timeKeeper before consistency check in simulation 2017-09-28 13:13:24 -07:00
A.J. Beamon d30c730f75 Add the ability to access name and description in Error. Update error descriptions. 2017-09-28 12:35:03 -07:00
Bhaskar Muppana 0f8ff26029 Merge pull request #158 from bmuppana/master
<rdar://problem/34557380> Need a way to map real time to version
2017-09-27 17:56:42 -07:00
Bhaskar Muppana 6a0b1d6808 Fixing PR comments
<rdar://problem/34557380> Need a way to map real time to version
2017-09-27 17:56:01 -07:00
Evan Tschannen 4b21da1cd6 fix: lastVersionWithData was not updated when fetchKeys injects mutations 2017-09-27 10:44:34 -07:00
Evan Tschannen acb7e66d01 fix: failed logs do not count even if they have returned a result 2017-09-25 18:14:40 -07:00
Evan Tschannen 2bf042a559 fix: file_corrupt was not checking for fault injection
latency threshold was too long
2017-09-25 17:22:41 -07:00
Bhaskar Muppana 0bf5bdb23a <rdar://problem/34557380> Need a way to map real time to version 2017-09-25 12:51:37 -07:00
Yichi Chiang 6758c649fc Catch and update processClass change from DBSource 2017-09-25 10:36:03 -07:00
Evan Tschannen cce4eeb52d fix: the master was sending the cluster controller uninitialized configurations 2017-09-22 16:59:24 -07:00
Evan Tschannen 180438d41e fix: use the number of present logServers rather than the total size of the vector 2017-09-22 16:19:16 -07:00
Evan Tschannen 738ae21c3a fix: an optimization in buggified locking can cause recovery to break because it would not restart if a locked process was killed when the remaining logs cannot obtain a quorum 2017-09-22 15:07:57 -07:00
Alex Miller 585c9bf68f Quick fix to reduce CPU usage of ensureEpochLive.
It is suspected that policy recomputations are driving proxy CPU usage up, and
thus latency and throughput down.  To quickly confirm this theory, we're
forcing ensureEpochLive to wait until it has RF responses, which means we'll
probably only validate the policy once per call.
2017-09-21 18:22:24 -07:00
Evan Tschannen fbd67ea547 fix: excluded servers are worst fit for master rather than never assign (so that we can recover if every process has been excluded)
fix: better master exists did not use exclusions because the configuration was reset
2017-09-20 11:48:26 -07:00
Evan Tschannen cb43563b2d fix: toMap properly lists the redundancy mode of the cluster 2017-09-19 16:35:42 -07:00
Evan Tschannen f75dfc3153 do not register with the master until recovery of the queue is complete, to avoid having the master wait a long time for a peek response 2017-09-18 17:39:12 -07:00
Alex Miller 567d663afd Fix SimulationConfig never generating a custom config.
A 0 was changed to a 1 when rewriting code, and `case 0:` was never being hit. :(
Thankfully, it looks like nothing was broken by this in the meantime.
2017-09-18 17:29:36 -07:00
Evan Tschannen e8b895c878 added the ability to disable connection failures for a period of time after one happens 2017-09-18 12:46:29 -07:00