Commit Graph

1504 Commits

Author SHA1 Message Date
Jingyu Zhou 7a205b1732 Move remoteRecovered to dataDistributionTeamCollection()
Let the remote DC to wait until fully recovered before team collection starts.
2019-02-14 16:37:16 -08:00
Jingyu Zhou 3f7bbc68aa Remove getDistributorInterface from cluster controller 2019-02-14 16:37:16 -08:00
Jingyu Zhou ef868f599c Add DataDistributorInterface to ServerDBInfo
Also change the Proxy and QuietDatabase to use the DataDistributorInterface.
2019-02-14 16:37:16 -08:00
Jingyu Zhou 0490160714 Fix according to Evan's comments
Use getRateInfo's endpoint as the ID for the DataDistributorInterface.
For now, added a "rejoined" flag for ClusterControllerData and Proxy.

TODO: move DataDistributorInterface into ServerDBInfo.
2019-02-14 16:30:13 -08:00
Jingyu Zhou c35d1bf2ef Fix according Alex's comment 2019-02-14 16:30:13 -08:00
Evan Tschannen 1818aab205 Apply suggestions from code review
Co-Authored-By: jzhou77 <jingyuzhou@gmail.com>
2019-02-14 16:30:13 -08:00
Jingyu Zhou 886e7ab2ba Add a new DataDistributor role.
Let cluster controller to start a new data distributor role by sending a
message to a chosen worker.
Change MasterInterface usage in DataDistribution to masterId

Add DataDistributor rejoin handling.

This allows the data distributor to tell the new cluster controller of its
existence so that the controller doesn't spawn a new one. I.e., there should
be only ONE data distributor in the cluster.

If DataDistributor (DD) doesn't join in a while, then ClusterController (CC) tries
to recruit one as DD. CC also monitors DD and restarts one if it failed.

The Proxy is also monitoring the DD. If DD failed, the Proxy will ask CC for
the new DD.

Add GetRecoveryInfo RPC to master server, which is called by data distributor
to obtain the recovery Transaction version from the master server.
2019-02-14 16:30:13 -08:00
Meng Xu 8ee8b98122 TeamCollection: Cosmetic change 2019-02-14 15:59:20 -08:00
Vishesh Yadav 907446d0ce Merge remote-tracking branch 'apple/master' into task/tls-upgrade 2019-02-14 11:37:38 -08:00
A.J. Beamon 8a17905621 Add a couple new files to CMakeLists 2019-02-14 08:08:44 -08:00
A.J. Beamon b435d51061 Merge branch 'master' into track-server-request-latencies 2019-02-14 08:07:32 -08:00
Meng Xu 628f7ac8c0 TeamCollection: Remove an unused knob 2019-02-13 16:22:55 -08:00
Meng Xu 5481851e82 TeamCollection: Add knobs for team remover
Added three knobs to control team remover

bool TR_FLAG_DISABLE_TEAM_REMOVER:
	Disable the teamRemover actor
double TR_REMOVE_MACHINE_TEAM_DELAY:
	Wait for the specified time before try to remove next machine team
double TR_WAIT_FOR_ALL_MACHINES_HEALTHY_DELAY:
	Wait before checking if all machines are healthy
2019-02-13 15:11:56 -08:00
Alex Miller 12123f41d6 Plumb a read function up the stack to IDiskQueue 2019-02-12 23:44:13 -08:00
Alex Miller 6c7229ec07 read fix while recovery 2019-02-12 23:44:13 -08:00
Alex Miller 8b21d1ac8f Add a standalone recovery initialization function. 2019-02-12 23:44:13 -08:00
Alex Miller 2f49acc8a0 Add a read function. 2019-02-12 23:44:13 -08:00
Alex Miller 63eb62cd36 Fix a bug when a read was delayed until after the entire disk queue has been rewritten. 2019-02-12 23:44:13 -08:00
Alex Miller 9886386a83 temporarily verify commited data as a test for read 2019-02-12 23:44:13 -08:00
Alex Miller efa8aa7e2e Adjust findPhysicalLocation to not spam.
Context is now optional, so that our high-volume calls don't get logged,
but low-volume calls still get logged the same way that they did before.
2019-02-12 23:44:13 -08:00
Alex Miller f1c31e2305 Add a read function to disk queue 2019-02-12 23:44:13 -08:00
Alex Miller 2d2b03a9ff prepare DiskQueue for actors 2019-02-12 23:44:13 -08:00
Alex Miller 40fe29c29b Abstract TrackMe into a reusable CRTP class. 2019-02-12 23:44:13 -08:00
Alex Miller 018d12fe90 use firstpages instead of recoveryfirstpages 2019-02-12 23:43:10 -08:00
Alex Miller dbf7cefcd8 Add firstPages to DiskQueue 2019-02-12 23:43:10 -08:00
Alex Miller 2570b37e6e Add function to read pages from RawDiskQueue_TwoFiles 2019-02-12 23:43:10 -08:00
Meng Xu 01e55e43bd TeamCollection: Minor improve code efficiency and style
Rewording the feature item in the release document as well.
2019-02-12 19:10:53 -08:00
Andrew Noyes 65136a2ecd Forward declare actors with ACTOR keyword. #1148
There are several more occurrences of this, but they're in .h files that
now need to be .actor.h files. This gets the easy ones out of the way.
2019-02-12 17:56:20 -08:00
Andrew Noyes 067a445e06 Replace unused _ variables with wait(success(...)) 2019-02-12 17:30:30 -08:00
Meng Xu c8db205fd9 TeamCollection: Fix bug in remove a server
When we remove a server due to server failure, we need to
remove the related server teams AND remove the server team from
the machine team.

In the previous commit, we forgot to remove the server team from
the machine team.
2019-02-12 16:18:19 -08:00
Meng Xu fe4f43203d TeamCollection: getTeam may add a new team
getTeam function may add a new team for the GetTeamRequest.
We need to check if the number of teams is larger than the desired team number.
2019-02-12 14:57:35 -08:00
Meng Xu 3ae8767ee8 TeamCollection: Apply clang-format 2019-02-12 13:41:18 -08:00
Meng Xu 214a72fba3 TeamCollection: Resolve review comments
1) Reduce the frequency of checking if we need to call teamRemover
2) Improve code efficiency in finding the machine team to remove
3) Remove unused code
4) Add sanity check
2019-02-12 10:59:57 -08:00
mpilman 6da5971e79 Guard all versions.h to not break old WIN32 build 2019-02-08 16:06:00 -08:00
Meng Xu 3b8ae0fe95 TeamCollection: Add into 6.1 release note 2019-02-08 13:50:27 -08:00
Balachandar Namasivayam f44f26c232 Dynamically rate limit consistency check. 2019-02-07 16:08:39 -08:00
mpilman 7e26b4ef0d Address comments from PR 2019-02-07 15:37:04 -08:00
mpilman 5737349676 Fix weird bug with boost interprocess
Strangely, boost interprocess didn't compile with VS 2017.
However, it does compile if it is included as the first thing.
I don't quite know what is happening here, but for now this fix
makes it that I am not blocked
2019-02-07 15:37:04 -08:00
mpilman 8a94d80deb fdbservice and fdbrpc now compiling 2019-02-07 15:37:04 -08:00
A.J. Beamon eb7c678e59 Return Void() in an actor return statement 2019-02-07 14:03:36 -08:00
Meng Xu 7cfe6de27e TeamCollection: Server team number must match machine team number
DESIRED_TEAMS_PER_MACHINE must equal to DESIRED_TEAMS_PER_SERVER.
Otherwise, we may have to few machine teams to create enough server teams.

Note that BUGGIFY macro value is based on a random number generator.
When you have two BUGGIFY, one may be true and the other is false.

Also fix a bug in get the number of healthy machine teams.
2019-02-07 13:53:55 -08:00
A.J. Beamon d4349293b9 Reworked the way latency counters are tracked. Report the latency bands in separate events from StorageMetrics and ProxyMetrics. Fix a problem when the latency band configuration was changed. Add correctness testing. 2019-02-07 13:39:22 -08:00
Meng Xu 76d022f71c TeamCollection: Remove redundant teams
When the total number of teams is larger than the desired number,
we should gracefully remove the redundant teams so that
the number of teams is kept to a low number and the possibility of
losing data is guaranteed to be extremely low even when multiple
racks fail at the same time.
2019-02-07 11:24:51 -08:00
Meng Xu 455024b3fe SimulationTest: Test the number of teams
Magnify the possibility that the number of created machine teams is
larger than the number of desired machine teams if we do NOT try to remove the surplus machine teams.
This help test the upgrade to machine team in FDB 6.1
2019-02-06 11:04:41 -08:00
Evan Tschannen 7e0e0a7673
Merge pull request #1105 from vishesh/task/issue-218-compare-and-clear
Implements CompareAndClear AtomicOp
2019-02-05 18:11:28 -08:00
Evan Tschannen 486e0e13c3
Merge pull request #1116 from alexmiller-apple/tstlog
Random cleanups that prepare for Spill-By-Reference TLog
2019-02-05 18:09:06 -08:00
Meng Xu 2b73c89e98 TeamCollection: Test the number of teams
Call the traceTeamCollectionInfo function to record the team numbers
when we add a team directly from the shard information, instead of
using addTeamsBestOf logic.
2019-02-05 15:58:16 -08:00
A.J. Beamon 882f8d70b7
Merge pull request #1066 from etschannen/master
fix: coordinators auto could put two coordinators in the same zone
2019-02-05 11:52:04 -08:00
Meng Xu f5171d1b57 TeamCollection: Test the number of teams
The current simulator does not validate if the number of teams in
the system is larger than the maximum desired number of teams.
This validation should be added because we do NOT want too many teams
in the system, which may impede the systems availability when
multiple fault zones (e.g., machines) crashes at the same time.

This commit adds the test at the consistency check in simulation.
Since the current code does not handle the upgrading situation
when we enforce the machine teams, the test is expected to fail.

The later commit will handle the upgrading situation which gracefully
remove the surplus teams.
2019-02-04 18:14:36 -08:00
Alex Miller 22a08b2b4e Change mutable ref to pointer so outparams are obvious. 2019-02-04 18:04:22 -08:00
Alex Miller 0efcccc06f Fix a long standing minor bug in disk queue that could lead to unnecessary commits.
If the disk queue is called with the following series of operations:

Push(a) -> 1
Commit()
Pop(1)
Push(b)
Commit()
Commit()

Then the last Commit() should be a no-op, and not actually run accordingly.

However, anyPopped was only set to `false` if no pages were pushed, and thus
we'd falsely think that an extra empty page commit needed to happen to log to
record the new popped position, but there actually was no new popped page
position to record.

Aside from the extra commit, it maybe makes getCommitOverhead slightly
inaccurate, but that's only used for some accounting inside of the memory
storage engine and at a quick glance doesn't look like it should have caused
any bad effects.

I dug through history, and this code has been this way since the initial commit
by Dave, and then no one has touched the anyPopped logic since.
2019-02-04 18:04:22 -08:00
Vishesh Yadav 5985566a8e Don't issue a second read in storageserver if possible for CompareAndClear
If the previous eager read request is equal to the CompareAndClear op
key, do not issue a read again.
2019-02-04 16:10:59 -08:00
Vishesh Yadav c532d5c277 Implements CompareAndClear AtomicOp
Adds CompareAndClear mutation. If the given parameter is equal to the
current value of the key, the key is cleared. At client, the mutation
is added to the operation stack. Hence if the mutation evaluates to
clear, we only get to know so when `read()` evaluates the stack in
`RYWIterator::kv()`, which is unlike what we currently do for typical
ClearRange.
2019-02-04 14:59:56 -08:00
Trevor Clinkenbeard a09afe5906 Added Throttling workload to test native health metrics API 2019-02-04 13:04:25 -08:00
Evan Tschannen 8bfde8c571 fix: increased the rate of ssl tests too much 2019-02-04 11:39:49 -08:00
Evan Tschannen 5b471699df fix: restarting simulation tests looked for directories in the wrong location 2019-02-04 11:39:06 -08:00
Trevor Clinkenbeard 93d4ed6339 Fixed typo that was messing up storage server diskUsage calculation 2019-02-02 21:04:29 -08:00
Trevor Clinkenbeard 80cf5e057f Compute worstStorageNDV for Ratekeeper health metrics 2019-02-02 21:03:02 -08:00
Trevor Clinkenbeard b7eaaaf1e5 Proxy must update health metrics after receiving GetRateInfoReply 2019-02-02 17:08:54 -08:00
Trevor Clinkenbeard 4daf49ff4d Proxy runs healthMetricsRequestServer to handle incoming health metrics requests 2019-02-01 10:58:42 -08:00
Evan Tschannen e9ddd94e27 The failure monitor is given a list of all IP addresses associated with a process
The connect packet includes the correct remote address
Did a lot of code cleanup
Simulation test mixed TLS and non-TLS listeners on the same process
2019-01-31 18:20:14 -08:00
Trevor Clinkenbeard 03e5e3ccbc Proxies periodically request health metrics from Ratekeeper in the getRate function. Occassionally (determined by DETAILED_METRIC_UPDATE_RATE), requests are for detailed per-process metrics. 2019-01-31 13:25:57 -08:00
Trevor Clinkenbeard 5822bd65bf Track health metrics in Ratekeeper and send these metrics to proxies in GetRateInfoReply messages 2019-01-31 12:56:58 -08:00
Trevor Clinkenbeard d7930af2cb Storage server periodically calculates cpuUsage and diskUsage metrics. These metrics (as well as all other metrics necessary for health metrics calculation) are sent in the StorageQueuingMetricsReply message. 2019-01-31 12:23:04 -08:00
Evan Tschannen a678f778fa
Increase severity to SevWarnAlways for TooManyStatusRequests trace
Co-Authored-By: tclinken <trevorclinkenbeard@gmail.com>
2019-01-28 17:50:50 -08:00
Trevor Clinkenbeard 5b89db811a Throttle status requests with MAX_STATUS_REQUESTS_PER_SECOND knob, whenever status batching is used. 2019-01-28 15:37:30 -08:00
Evan Tschannen 1d7fec3074 Merge commit '048bfc5c368063d9e009513078dab88be0cbd5b0' into task/tls-upgrade-2
# Conflicts:
#	.gitignore
2019-01-24 17:43:06 -08:00
Alex Miller ec32d3308b
Merge pull request #1086 from mpilman/features/c++-compiler-errors
Fixed several minor code issues
2019-01-24 15:24:33 -08:00
mpilman 79637f07ac Fixed several minor code issues
These will become a problem as soon as we
switch to C++17
2019-01-24 14:43:12 -08:00
A.J. Beamon 2198d24ce1 Merge commit '3b2700d25334c53d13496ca16682642aac951beb' into track-server-request-latencies
# Conflicts:
#	fdbclient/MasterProxyInterface.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/ServerDBInfo.h
#	fdbserver/Status.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/storageserver.actor.cpp
2019-01-24 11:43:26 -08:00
mpilman 58964af7e1 ctest improvements - #1058
- A set of CMake variables controls whether to keep
  the simfdb directory and the traces and whether we
  want to aggregate the traces into a single file
- Test labels now contain the directory they are in
  so that one can now run `ctest -R fast/`
- A different binary can be used for restart tests. CMake
  will automatically look for an installed fdb and use that
  by default. If none is found, it will use the built one
  but it will also print a warning
- CMake will throw an error if there are any text files in
  the tests directory that are not associated with a test.
- Moved testing from fdbserver/CMakeLists.txt to
  tests/CMakeLists.txt
- Moved fdb testing functions to its own cmake module
2019-01-22 14:34:51 -08:00
A.J. Beamon 8e05e95045 Added the ability to configure the latency band settings by setting a special key in \xff keyspace. 2019-01-18 16:18:34 -08:00
Evan Tschannen 699f8dd617 fix: coordinators auto could put two coordinators in the same zone
simulation now tests two machines in the same zone
2019-01-18 15:42:48 -08:00
A.J. Beamon 7498c2308c Merge branch 'release-6.0' into track-server-request-latencies 2019-01-16 13:39:01 -08:00
Alex Miller b4a446756a Remove more top-level tests that were out of order. 2019-01-14 20:28:40 -08:00
Alex Miller 0d579b4730 Top level tests aren't run by default, and some fail. 2019-01-14 19:14:25 -08:00
Alex Miller b3e977d7c1 Apply test directory as a label.
This isn't ideal, as it makes `restarting/from_5.2.0/potato.txt` have
the label "from_5.2.0" instead of "restarting", but it does make the
fast label work right.
2019-01-14 19:14:25 -08:00
mpilman 414fb0b6c8 made TestRunner work with XML traces 2019-01-14 19:14:25 -08:00
Markus Pilman 14f0a6958b added all tests to ctest 2019-01-14 19:14:25 -08:00
Markus Pilman b096b8e3f8 First tests working with ctest 2019-01-14 19:14:25 -08:00
Evan Tschannen 7dbf06162e
Update fdbserver/ClusterController.actor.cpp
Co-Authored-By: bnamasivayam <36455962+bnamasivayam@users.noreply.github.com>
2019-01-14 16:57:00 -08:00
Balachandar Namasivayam ff661bca22 Fix a minor bug in the RoleFitness Class. 2019-01-14 14:54:54 -08:00
Evan Tschannen 9912d17c35
Merge pull request #1030 from bnamasivayam/master
Add some sanity checks to tester.actor.cpp
2019-01-11 17:15:40 -08:00
Evan Tschannen 4eb11d74af
Merge pull request #1029 from bnamasivayam/reenable-check_desired_classes
Re-enable CheckDesiredClasses after making necessary changes for mult…
2019-01-11 17:15:05 -08:00
A.J. Beamon d4d5740282 * Add Optional.map and ErrorOr.map.
* Rename Optional/ErrorOr cast_to to castTo.
* Make printable(Optional<T>) templated rather than restricted to StringRef types.
* Fixes bug in (unused) ErrorOr.castTo where an ErrorOr that was not set would lose its error.
2019-01-11 09:03:38 -08:00
Balachandar Namasivayam baeaa490e4 Add some sanity checks to tester.actor.cpp 2019-01-10 11:05:50 -08:00
Balachandar Namasivayam a8e2e75cd5 Re-enable CheckDesiredClasses after making necessary changes for multi-region setup.
Fixed a couple of bugs
1) A rare race condition where a worker is being roles even after it died.
2) Fix how RoleFitness is calculated for TLog and LogRouter. Only worst fitness is compared to see if a better fit is available.
2019-01-10 10:28:32 -08:00
Evan Tschannen 684a22a52b Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbbackup/backup.actor.cpp
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	fdbserver/storageserver.actor.cpp
#	fdbserver/workloads/BackupCorrectness.actor.cpp
#	versions.target
2019-01-09 16:14:46 -08:00
A.J. Beamon 7c5b2ab330
Merge pull request #976 from alexmiller-apple/jsonlogs
Allow trace logs to be output as JSON instead of XML
2019-01-09 17:04:50 -05:00
Evan Tschannen 5d2b11cba9
Merge pull request #1019 from satherton/http-request-id
Backup usability and sanity check improvements
2019-01-09 13:29:37 -08:00
Vishesh Yadav 51b89ae083 WIP 2019-01-09 07:41:02 -08:00
Stephen Atherton 604ad062d5 Updated backup correctness test to new behavior. WaitBackup() can now return the UID and BackupContainer atomically with the status code for a backup tag. 2019-01-08 18:12:15 -08:00
A.J. Beamon d265517156 Fix: fast allocator would not cleanup memory for a thread if that thread never called getMagazine. This could happen if the first thing the thread did was to release memory.
Added a new metric for the number of threads that hold memory for each size and improve some existing metrics.
Fix: a failed ASSERT would crash if done early in the program lifetime.
2019-01-08 14:36:01 -08:00
Evan Tschannen 57293a2db0 byte sample recovery did not use limits for its range reads, leading to slow tasks 2019-01-04 10:32:31 -08:00
Markus Pilman dbe9baff1f Several small compilation fixes for new versions of gcc
There are several missing includes for cmath in the code, I added those.

Next, Coro returns a reference to a stack variable and this causes a
warning. As this is probably ok for Coro, I disabled the warning in
that file for GCC. I want to have this warning in the build system as
it is generally a very useful warning to have.

Another change is that major and minor are deprecated for a while now.
I replaced those with gnu_dev_major and gnu_dev_minor.

ErrorOr currently implements operators ==, !=, and <. These do not
compile because Error does not implement ==. This compiles on older
versions of gcc and clang because ErrorOr<T>::operator== is not used
anywhere. It is still wrong though and newer gcc versions complain.
I simply removed these methods.

The most interesting fix is that TraceEvent::~TraceEvent is currently
throwing exceptions. This is illegal behavior in C++11 and a idea in
older versions of C++. For now I simply removed the throw, but this
might need some more thought.
2019-01-03 12:44:19 -08:00
Evan Tschannen 4901e37b8f
Merge pull request #983 from alexmiller-apple/compilationfixes
Various minor fixes
2019-01-03 10:01:05 -08:00
Andrew Noyes 7eb6765698 Mention that xml is the default 2019-01-03 08:48:31 -08:00
Bhaskar Muppana aa2a76ef4c
Merge pull request #981 from alexmiller-apple/cmake
Add a CMake build system
2019-01-02 18:50:15 -08:00
Andrew Noyes bce5b03340 Fix whitespace 2019-01-02 15:24:11 -08:00
Alex Miller 73f53f9861
Merge pull request #991 from atn34/replace-ampersand-operator
Replace & operator with variadic function
2018-12-30 22:51:59 -06:00
Simon Zhou 7edf221986 Avoid null check 2018-12-28 13:09:04 -08:00
anoyes 6a4d87802b Replace & operator with variadic function 2018-12-28 11:33:42 -08:00
anoyes 1bca665b29 Document --trace_format flag 2018-12-20 16:22:41 -08:00
anoyes b8df5acc15 Add --trace_format flag to fdbserver 2018-12-20 15:02:01 -08:00
Alex Miller bfab7c150a Require PageHeader to be 36 bytes, and don't use magic numbers. 2018-12-17 13:37:44 -08:00
Alex Miller b4b7f382a7 Fix issues that a newer compiler warned about. 2018-12-14 14:43:50 -08:00
Meng Xu 486a7b04fa TeamCollection: Fix build in osX
In osX, we cannot adding unsigned long to a string to append to the string.
2018-12-14 13:44:11 -08:00
Alex Miller 550daa05f8 Default configuration now builds. 2018-12-13 15:52:27 -08:00
Markus Pilman df0f491c29 Some more improvements to the build and preparations for packaging 2018-12-13 15:04:13 -08:00
Alex Miller e70e59a895 Change some file locations. 2018-12-13 14:53:19 -08:00
Markus Pilman dce290909d fdbserver now compiling 2018-12-13 14:13:47 -08:00
Vishesh Yadav e04abf25f7 simulator: Support multiple listeners on single process
Sim2Listener can now take the network address to listen on. This is
used to listen to multiple ports in simulator and test the patch
which added multiple network addresses to single endpoint.
2018-12-13 13:36:52 -08:00
Vishesh Yadav 3eb9b23024 Listen to multiple addresses and start using vector<NetworkAdddress> in Endpoint
- This patch will make FDB listen to multiple addresses given via
  command line. Although, we'll still use first address in most places,
  this patch starts using vector<NetworkAddress> in Endpoint at some basic
  places.
- When sending packets to an endpoint, pick a random network address in
  endpoints
- Renames Endpoint::address to Endpoint::addresses since it
  now holds a vector of addresses.
2018-12-13 13:36:52 -08:00
Vishesh Yadav 43e5a46f9b Change Endpoint::address(NetworkAddress) to vector<NetworkAddress>
Extend `Endpoint` class to take multiple NetworkAddresses instead of
just one. Hence, to talk to an endpoint instead of one IP:PORT, we'll
have multiple IP:PORT pairs.

This patch simply adds the field and makes changes to compile the
codebase. The first element of of `address` field is used everywhere.
Hence the way we talk to remains same with this patch.

NOTE:

Directly accessing the first memeber of Endpoint::address is unsafe
as Endpoint() doesn't enforces non-empty address list. However, since
the correctness test pass for now and are anyway replacing all those
unsafe accesses with ones considering the whole vector, this patch
ignores to access them in safe way.
2018-12-13 13:36:52 -08:00
Vishesh Yadav 42dffd4dff Take a vector of network addresses from CLI to start FDB server
Extends the CLI interface to take multiple public and listen addresses.
We however do not do anything with those extra addresses and just
consider the first one for now.
2018-12-13 13:36:52 -08:00
Vishesh Yadav e8e01b2406 Remove unused localAddress parameter from newNet2 and Net2 classes 2018-12-13 13:36:52 -08:00
Evan Tschannen d9626895b1
Merge pull request #964 from xumengpanda/mengxu/teamcollection-release
TeamCollection: Use machine teams to create server teams to increase availability at scale when a machine has multiple servers
2018-12-13 13:18:54 -08:00
Meng Xu 79d94f78f1 TeamCollection: Improve code efficiency
Further improve code efficiency by

1) Avoid rebuild machine locality map when machine locality is changed.
This may leave the global machine locality map stale.
This is ok as long as we do not use the global map to validate
the machine team follows the locality policy.

2) Use ASSERT_WE_THINK instead of ASSERT to avoid runtime overhead.
ASSERT_WE_THINK will only validate the condition in simulation mode.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-12 22:38:38 -08:00
Meng Xu e197926c80 TeamCollection: Remove a duplicate function
Remove a duplicate function that has different signature.
No functionality change.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-12 15:21:37 -08:00
Meng Xu ad7040efcd TeamCollection: Bug fix in handle server locality change
Make sure the link between server and machine is updated
in both server and machine.
Rename function name to better reflect its functionality.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-12 14:03:29 -08:00
Meng Xu e069b5c31c TeamCollection: Use clang format
No functional change.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-06 11:39:35 -08:00
Meng Xu 5d47b9c884 TeamCollection: Handle server locality change
A server locality may change from one machine to another.
This affects the old machine and machine team the server is on, and
the new machine the server moves to.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-05 22:23:14 -08:00
Meng Xu c5047bc8c3 TeamCollection: All machine teams are correct size
We only create correct size machine teams.
When configuration (e.g., team size) is changed,
the DDTeamCollection will be destroyed and rebuilt
so that the invariant will not be violated.

Based on the invariant, we can count the number of
machine teams more quickly.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-05 15:09:38 -08:00
Meng Xu 57eab1f283 DataDistribution: Remove addAllTeams function
The addAllTeams function can be replaced with the new addTeamsBestOf
function by passing a large enough number of teams to build.
Remove addAllTeams function and update the related unit tests.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-05 15:03:16 -08:00
Meng Xu 38c5c2562b DataDistribution: Update NotEnoughServers unit test
The buggify option may set 1 to the knob parameters
(DESIRED_TEAMS_PER_SERVER and MAX_TEAMS_PER_SERVER).
When this happens, the number of machine teams to build will be
less than what we want, which prevents us from building enough
server teams.

To avoid this problem, we build machine teams before
we call addTeamsBestOf to build server teams.

We also add the ASSERT to ensure we build enough machine teams and
server teams in the test case.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-05 14:36:48 -08:00
Meng Xu f32c04c834 DataDistribution: Update NotEnoughServers unit test
Change the test condition for the NotEnoughServers unit test.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-03 23:14:01 -08:00
Meng Xu 54a4d6b308 TeamCollection: Improve code efficiency
Improve code efficiency with the following changes:
1) Change always-true if-statement to ASSERT;
2) Return when we are confident we will not find more machine teams.

No functionality change.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-01 17:10:50 -08:00
Meng Xu 8d6c6e000b DataDistribution: Mute the NotEnoughServers test
Due to the randomness in choosing a server, we cannot gurantee to
find all teams. The NotEnoughServers test case may create false positive
bug report in the correctness test.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-01 13:29:45 -08:00
Meng Xu 68dcec2240 DataDistribution: Change a unit test
Try multiple times of addTeamsBestOf() when we cannot find an available team
due to the pure randomness in choosing the server teams.

The changes for the unit test reduces the false positive in the simulation test results.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-01 13:12:55 -08:00
Meng Xu a43f579f66 TeamCollection: Change 1 unit test
Relax the assert condition on the random unit test.
Due to the randomness in choosing the machine team and
the server team from the machine team, it is possible that
we may not find the remaining several (e.g., 1 or 2) available teams.
For example, there are at most 10 teams available, and we have found
9 teams, the chance of finding the last one is low
when we do pure random selection.

It is ok to not find every available team because
1) In reality, we only create a small fraction of available teams, and
2) In practical system, this situation only happens when most of servers
   are *temporarily* unhealthy. When this situation happens, we will
   abandon all existing teams and restart the build team from scratch.

In simulation test, the situation happens 100 times out of 128613 test cases
when we run RandomUnitTests.txt only.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-12-01 13:11:19 -08:00
Meng Xu f311455c45 TeamCollection: Cleanup code and add checks
Remove unnecessary sanity checks and remove the dead code.
Add some necessary sanity checks.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-30 17:40:21 -08:00
A.J. Beamon eb2f27b8e5 Work in progress implementation of server-side latency tracking. The intent of this is to be able to measure the number of requests that achieve certain latency targets across the system relative to the total number of requests. 2018-11-30 10:46:04 -08:00
Meng Xu ea3bd1502d TeamCollection: Calculate machine team number
Calculate the number of machine teams in the same way
as we calculate the number of server teams.

Only count the machine teams that has the correct size and is healthy.

Simplify code by removing unnecessary check.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-29 15:38:23 -08:00
Meng Xu 2b41ad5e57 TeamCollection: Pick server team randomly
Pick server team purely randomly instead of picking the least used one.
This is to avoid creating correlation in the server teams we pick when
new machines are added.

The logic is:
First pick the one random least used server as chosen server;
Then pick a machine team that has the server;
Then pick a server on each machine in the machine team.
We make sure the chosen server is picked.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-28 15:57:53 -08:00
Meng Xu e4c9d4cbae TeamCollection: Build all machine teams first
Before we build server teams, we build the desired number of machine teams.
Then we pick the least used server, from which we pick the least used machine team.
Then we pick the least used server on each machine in the least used machine team to get the server team.

Note: The logic of building machine teams should be independent from server teams.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-27 18:06:36 -08:00
A.J. Beamon 975711c389 Merge branch 'release-6.0' of github.com:apple/foundationdb
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
2018-11-27 09:50:39 -08:00
Meng Xu 4c2c65c1b3 TeamCollection: Replace TraceEvent with ASSERT
Replace one TraceEvent that never happens in correctness test with an ASSERT.
Change format in one comment.

Signed-off-by: Meng xu <meng_xu@apple.com>
2018-11-27 09:48:24 -08:00
Evan Tschannen 530b5e3763 fix: do not track txsPopVersions unless there are remote logs to pop from 2018-11-26 15:17:17 -08:00
Evan Tschannen 512c00d304 added dump token trace events for storage server interfaces after rollbacks 2018-11-26 11:01:10 -08:00
Meng Xu 5cbff740ca TeamCollection: Add ASSERT
Remove sanity check code for performance benefit.
Replace TraceEvent(SevError) with ASSERT.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-21 13:16:52 -08:00
Meng Xu 8de031f9a6 TeamCollection: clang-format
Format the changes with git clang-format.
No functional changes.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-21 11:18:26 -08:00
Meng Xu 12c3bec968 TeamCollection: Misc changes to resolve review comments
No functional change.
Report error in TraceEvent when invariant is violated.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-19 20:44:52 -08:00
Meng Xu 52c6a66601 TeamCollection: Fix a bug introduced in code review
When we GetTeam, the data distribution actor may have zero teams in
rare situation in the ConfigureTest.txt test.
We should return an empty team in this situation instead of triggering error.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 16:34:38 -08:00
Meng Xu f7a7e069f0 TeamCollection: Remove unnecessary comments
Pass 41806 tests with no failure

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:56:35 -08:00
Meng Xu 73c58852f0 TeamCollection: Resolve code review comments
Resolve code review comments:
1) Improve the code efficiency by avoiding unnecessary map search
   and avoiding unnecessary checking
2) Remove or comment out trace events when they can be spammy
3) Improve coding style

Tested for 1 hour and no error was found.
KillRegionCycle.txt test was excluded from the test because
existing code cannot pass that test either

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:55:33 -08:00
Meng Xu 5051b35c61 TeamCollection: Use machine team to create server team
Current server team collection logic does not consider
the fact that multipe storage servers can run on the same machine.
When multiple machines fail, all servers on the machines will fail, and
the possibility of having one process team fail and lose data is very high.

To reduce the possibility of losing data when multiple machine fails,
we first create machine teams which span across different fault zones;
we then create server teams based on machine teams by
first picking 1 machine team, and then
picking 1 server from each machine in the machine team.

Signed-off-by: Meng Xu <meng_xu@apple.com>
2018-11-16 15:53:22 -08:00
Evan Tschannen e45952bc53 Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/BackupContainer.actor.cpp
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/HTTP.actor.cpp
#	tests/BlobStore.txt
#	versions.target
2018-11-13 16:06:39 -08:00
Evan Tschannen 1bd615f954 fix: remoteDcIds will not actually have transaction logs unless usable regions is > 1 2018-11-13 12:36:04 -08:00
Evan Tschannen 4e54690005 Merge branch 'release-6.0'
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/MoveKeys.actor.cpp
2018-11-12 20:26:58 -08:00
Evan Tschannen 3f3a562f75 updated resolution balancing knobs to be a little more aggressive 2018-11-12 19:11:28 -08:00
Evan Tschannen 239bf882d8 Merge branch 'release-6.0' into feature-resolution-balancing-fix 2018-11-12 18:43:20 -08:00
Evan Tschannen 3f461f3706 updated comments 2018-11-12 18:42:29 -08:00
Evan Tschannen 6353a6724b strengthened the protections related to changing regions 2018-11-12 17:40:40 -08:00
Evan Tschannen 26c49f21be fix: we do not know a region is fully replicated until all the initial storage servers have either been heard from or have been removed 2018-11-12 17:39:40 -08:00
Evan Tschannen 3f39024640 buggify resolution balancing so that it still happens in simulation 2018-11-12 00:03:07 -08:00
Evan Tschannen 536ee826da tuned resolver balancing to keep the resolvers within 5MB per second of each other 2018-11-11 23:42:45 -08:00
Evan Tschannen 50f481b149 fix: peek local should not call peek all, because it is possible to still peek from remote log sets after a special tag 2018-11-11 19:16:25 -08:00
Evan Tschannen 7892da032f fix: Do not remove the locality entry for the current transaction logs when removing storage servers
fix: dcId_locality map could be incorrect after restarting recruitEverything
2018-11-11 12:37:53 -08:00
Evan Tschannen cd188a351e fix: if a destination team became unhealthy and then healthy again, it would lower the priority of a move even though the source servers we are moving from are still unhealthy
fix: badTeams were not accounted for when checking priorities
2018-11-11 12:33:31 -08:00
Evan Tschannen 4b5d0b4e2c Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbclient/AsyncFileBlobStore.actor.cpp
#	fdbclient/AsyncFileBlobStore.actor.h
#	fdbclient/BlobStore.actor.cpp
#	fdbclient/BlobStore.h
#	fdbclient/HTTP.actor.cpp
#	fdbclient/ManagementAPI.actor.cpp
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/LoadBalance.actor.h
#	fdbrpc/batcher.actor.h
#	fdbrpc/fdbrpc.vcxproj
#	fdbrpc/sim2.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/DataDistributionTracker.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/masterserver.actor.cpp
2018-11-10 13:04:24 -08:00
Evan Tschannen a654183f63
Merge pull request #791 from ajbeamon/remove-cluster-from-iclientapi
Remove cluster from IClientApi (phase 2 of removing DB names)
2018-11-10 10:16:18 -08:00
Evan Tschannen 6a406bae72
Merge pull request #896 from ajbeamon/downgrade-incorrect-cluster-file-event
Downgrade the severity of IncorrectClusterFileContents the first time…
2018-11-10 10:06:36 -08:00
Evan Tschannen 6f4ad84777
Merge pull request #903 from ajbeamon/move-batcher-into-proxy
Move the sort of generic batcher from fdbrpc and make it specific to …
2018-11-10 09:56:03 -08:00
Evan Tschannen 7c23b68501 fix: we need to build teams if a server becomes healthy and it is not already on any teams 2018-11-09 18:06:00 -08:00
A.J. Beamon c3a06aa6f1 Fix indentation 2018-11-09 14:25:40 -08:00
A.J. Beamon 67a152ae9f Move the sort of generic batcher from fdbrpc and make it specific to batching commits in master proxy. Also a couple minor formatting changes. 2018-11-09 14:19:18 -08:00
Evan Tschannen 3e2484baf7 fix: a team tracker could downgrade the priority of a relocation issued by the team tracker for the other region 2018-11-09 10:07:55 -08:00
Evan Tschannen 6874e379fc fix: set the simulator’s view of usable regions to one during configure tests which can disable usable regions 2018-11-09 10:06:03 -08:00
Evan Tschannen 19ae063b66 fix: storage servers need to be rebooted when increasing replication so that clients become aware that new options are available 2018-11-08 15:44:03 -08:00
Evan Tschannen 1cf5689d62 fix: workers could only create a shared transaction log for one store type. This resulted in the old store type being used for new transaction logs after configuration changes which changed the store type 2018-11-07 21:09:51 -08:00
Evan Tschannen 599cc6260e fix: data distribution who not always add all subsets of emergency teams
fix: data distribution would not stop tracking bad teams after all their data was moved to other teams
fix: data distribution did not probably handle a server changing locality such that the teams it used to be on no longer satisfy the policy
2018-11-07 21:05:31 -08:00
Stephen Atherton ade75ac692 Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
# Conflicts:
#	fdbserver/worker.actor.cpp
2018-11-07 11:43:54 -08:00
Stephen Atherton 9d73166b3b Many bug fixes related to concurrent page operations and pager shutdown. 2018-11-06 19:31:16 -08:00
Evan Tschannen 6bb283aebc fix: dcId to Locality changes could be lost if an emergency transaction happened that did not change the configuration
fix: master proxy was starting dcId’s at 1 number too large
2018-11-05 11:12:43 -08:00
Evan Tschannen 04fa2a7202 fix: we could recover in a region with priority < 0 2018-11-05 10:14:26 -08:00
A.J. Beamon 187e507e53 Downgrade the severity of IncorrectClusterFileContents the first time it is logged to avoid transient issues that appear like the cluster file hasn't been updated (e.g. the cluster file is shared between multiple processes). 2018-11-05 09:28:08 -08:00
Evan Tschannen 87295cc263 suppressed spammy trace events, and avoid reporting a long master recovery duration when the cluster is first created 2018-11-04 23:07:56 -08:00
Evan Tschannen 87d0b4c294 fix: the remote region does not have a full replica is usable_regions==1 2018-11-04 22:05:37 -08:00
Evan Tschannen c1bd279a4e addressed review comments 2018-11-04 20:26:23 -08:00
Evan Tschannen bd60027544 test region priority changes 2018-11-04 20:11:23 -08:00
Evan Tschannen c02690471d added protection against configuration changes which cannot be immediately reverted
the configure database workload tests region configurations
2018-11-04 19:53:55 -08:00
Evan Tschannen 3304c83229 added additional checks in peek which determine when a tag will never get additional versions 2018-11-04 19:28:15 -08:00
Evan Tschannen accba4fa1d keep track of the last time a process became available to set a better starting value for remoteStartTime 2018-11-04 14:33:03 -08:00
Evan Tschannen 45c8f2dfcb restarting tests will sometimes configure to a fearless configuration on startup if possible 2018-11-02 14:16:47 -07:00
Evan Tschannen 2a8c628d82 fix: even if a peek cursor cannot find a local set for the most recent data, it still may be able to find data from older log sets 2018-11-02 14:13:57 -07:00
Evan Tschannen f045c041eb fix: if a storage server already exists in a remote region after converting to fearless, it did not receive mutations between the known committed version and the recovery version 2018-11-02 14:11:39 -07:00
Evan Tschannen bf6545a9cf clients cache storage server interfaces individually, instead of as a team. This is needed because in fearless every shard has storage servers from two separate teams, leading to a lot of possible combinations
allAlternatives failed logic was simplified, because we are already doing a global rate limiting, so a per shard limit is unnecessary
reduced unnecessary state variables in waitMetrics requests
2018-11-02 13:15:09 -07:00
Evan Tschannen 3b97f5a899 fix: the storage server still has to pop old tags, even if it does not need any data from them 2018-11-02 13:10:14 -07:00
Evan Tschannen 979597a2ca fix: upgraded tags must be popped from all log sets 2018-11-02 13:09:18 -07:00
Evan Tschannen 1b5d28386a fix: the Tlog would not update the durable version properly when version_sizes was empty 2018-11-02 13:05:54 -07:00
Evan Tschannen 2d9a670774 fix: nested multCursors would improperly hang on getMore, because an inner pop of cursors would not be detected by the outer instance 2018-11-02 13:04:09 -07:00
Evan Tschannen e68c07ae35 fix: trackShardBytes was called with the incorrect range, resulting in incorrect shard sizes
reduced the size of shard tracker actors by removing unnecessary state variable. Because we have a large number of these actors these extra state variables add up to a lot of memory
2018-11-02 13:03:01 -07:00
Evan Tschannen ad98acf795 fix: if the team started unhealthy and initialFailureReactionDelay was ready, we would not send relocations to the queue
print wrong shard size team messages in simulation
2018-11-02 13:00:15 -07:00
Evan Tschannen 1d591acd0a removed the countHealthyTeams check, because it was incorrect if it triggered during the wait(yield()) at the top of team tracker 2018-11-02 12:58:16 -07:00
Evan Tschannen 30fbc29af1 Renamed TimeKeeperStarted to TimeKeeperCommit 2018-11-02 12:57:03 -07:00
Evan Tschannen 278dbd5096 call debug transaction on timekeeper 2018-11-02 12:56:29 -07:00
Stephen Atherton df3bdde50b Many bug fixes. AsyncFileCached write() on a page with a zero-copy read in progress would orphan the old page before the read was finished. Pager file operations were not converting page id to int64 for byte offset calculation. Pager was not calling releaseZeroCopy() after readZeroCopy() if there was an error or cancellation. Pager reads were using some variables that could go out of scope. BusyPage's mechanism for notifying when a physical page is no longer in use is itself no longer in use and therefore removed. Pager shutdown now cancels all outstanding reads. Improved some debug output. 2018-10-31 02:14:55 -07:00
Stephen Atherton b08497b7ea Bug fix, at least some users of IKeyValueStore expect the read actors to make their own copies of key arguments. 2018-10-25 19:48:31 -07:00
Stephen Atherton 0277dab747 Removed accidental config change. 2018-10-25 04:01:42 -07:00
Stephen Atherton 342466817a Added pagefile name to debug output. Shutdown will no longer throw if actor collection or delete futures have errors. 2018-10-25 04:00:02 -07:00
Stephen Atherton 0e84c1f438 Pager and btree debug output macro now prints local network address and time. 2018-10-25 03:57:09 -07:00
Stephen Atherton 32b43cc02b Added 'simple' flag in simulation config generation. Defaults to false, must be changed in code. 2018-10-25 01:25:41 -07:00
Stephen Atherton 93501947f8 Process_behind should also be sent to clients. 2018-10-24 20:38:15 -07:00
Stephen Atherton f17cc1e20f StorageServer will no longer send io_error or other inappropriate errors to a client (this would never happen on SQLite). Many bug fixes around error handling, initialization, and shutdown in Redwood. StorageServer now calls init() on its underlying storage engine. 2018-10-24 15:57:06 -07:00
Alex Miller a074dc2a60 Revert one line from #856 that accidentally changed the include path for Windows.h 2018-10-23 18:58:00 -07:00
Evan Tschannen 6bb2ba5d92 Merge commit '2a34115e65639b7aad368a148de3c4189bc34bfc'
# Conflicts:
#	fdbserver/storageserver.actor.cpp
#	fdbserver/worker.actor.cpp
2018-10-23 17:05:42 -07:00
Evan Tschannen a123c9c533 forgot to commit as part of the merge 2018-10-23 16:53:41 -07:00
Evan Tschannen 7e215b7e0c Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0
# Conflicts:
#	fdbserver/worker.actor.cpp
2018-10-23 16:53:07 -07:00
Alex Miller 6bb1f4093d
Merge pull request #856 from dropbox/pr/include-fix
Adjust all includes to be relative to the root.
2018-10-22 09:51:55 -07:00
Clement Pang 3a30621071
Merge branch 'release-6.0' into memory-fast-rollback 2018-10-21 12:48:46 +09:00
Alex Miller e2fc1c9b95 Remove specifying non-root directory as a path to search for includes. 2018-10-19 18:56:45 -07:00
Clement Pang 3ceec01392 Address review comments. 2018-10-19 18:55:35 -07:00
Evan Tschannen 22b68d6cf3 fix: we need to get the result of the future before getting the error from the ErrorOr 2018-10-19 17:34:28 -07:00
Evan Tschannen 1ef29cbf0d more windows build fixes 2018-10-19 17:00:24 -07:00
Robert Escriva 268093a96d Adjust all includes to be relative to the root.
Remove the use of relative paths.  A header at foo/bar.h could be included by
files under foo/ with "bar.h", but would be included everywhere else as
"foo/bar.h".  Adjust so that every include references such a header with the
latter form.

Signed-off-by: Robert Escriva <rescriva@dropbox.com>
2018-10-19 17:35:33 +00:00
Evan Tschannen 8dd900a337 fixed the windows build 2018-10-18 20:26:45 -07:00
Evan Tschannen 5c52711f01
Merge pull request #854 from satherton/feature-redwood
Fixed line endings.
2018-10-18 19:56:05 -07:00
Stephen Atherton 3b641643cb Fixed line endings. 2018-10-18 19:46:58 -07:00
Evan Tschannen db71b60d72
Merge pull request #819 from satherton/feature-redwood
Redwood storage engine, initial/experimental version
2018-10-18 18:38:11 -07:00
Evan Tschannen ed7036139a Merge branch 'release-6.0'
# Conflicts:
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/storageserver.actor.cpp
2018-10-18 17:00:52 -07:00
Evan Tschannen 952b26d746 removed upgrade code from 2.0.3 2018-10-18 16:53:40 -07:00
Evan Tschannen 9b6c7f253c changed a knob 2018-10-18 15:26:19 -07:00
Evan Tschannen 0b304495ad added a yield to the proxy when committing a large batch of mutations 2018-10-18 15:26:00 -07:00
Evan Tschannen 0613a34845 The storage server would block the main thread when processing a single version with a large amount of data 2018-10-18 13:37:31 -07:00
Evan Tschannen e36b7cd417 Only log teamTracker trace events if sizes are not wrong, to avoid spammy messages when dropping a fearless configuration
wrongSize previous was unneeded
2018-10-17 11:45:47 -07:00
Evan Tschannen 2db17af815 separate code coverage into 3 different files 2018-10-16 16:12:25 -07:00
Evan Tschannen cae1efee4e divide workloads into their own item group 2018-10-16 15:29:44 -07:00
Evan Tschannen c89b56355a move unreadable in vcxproj to test how it changes the windows build 2018-10-16 14:57:15 -07:00
Evan Tschannen ce4826217a RestoreInterface.h was tabled with ClCompile instead of ClInclude 2018-10-16 11:47:36 -07:00
Evan Tschannen 0217aed74c Merge branch 'release-6.0'
# Conflicts:
#	bindings/go/README.md
#	documentation/sphinx/source/release-notes.rst
#	fdbserver/MasterProxyServer.actor.cpp
#	versions.target
2018-10-15 18:38:51 -07:00
Evan Tschannen 0acfae1e76 fixed the windows linker error 2018-10-15 18:19:51 -07:00
Evan Tschannen d8dc8e83b9
Do not rollback while uncommitted sets exist 2018-10-15 15:09:12 -07:00
Stephen Atherton 5bc45958d8 Finished VersionedBTree's IClosable implementation. Added deletion of existing unit test pager state. 2018-10-15 03:43:43 -07:00
Evan Tschannen a8feecbfad added a comment to explain code ordering 2018-10-12 16:27:13 -07:00
Evan Tschannen 8ed4ce183c Merge branch 'release-6.0' of github.com:apple/foundationdb into release-6.0 2018-10-12 14:56:19 -07:00
Evan Tschannen 17a1e3ce35 fix: the master proxy would log an OpCommit for empty commits to the txnStateStore 2018-10-12 12:58:17 -07:00
Clement Pang 88e8422511 Per etschannen, wait on durable for reboots 2018-10-10 17:42:40 -07:00
A.J. Beamon 419231d798 Fix: status was trying to read a metric under the wrong name, leading to an error that caused the cluster to report itself unhealthy and some metrics to be missing. 2018-10-10 13:33:28 -07:00
Evan Tschannen 4c95a5ee0f added the basic structure for parallel restore 2018-10-09 18:47:28 -07:00
Clement Pang 4f1cb97222 add missing semiCommit() on reset. 2018-10-08 17:30:39 -07:00
Clement Pang 403b4c5d94 fix tabs in worker.actor.cpp 2018-10-08 17:28:58 -07:00
Clement Pang eb72427923 Revert "Fix formatting with clang-format"
This reverts commit 448751c
2018-10-08 17:26:10 -07:00
Clement Pang 40ad06b0ac Revert "clang-format still looks weird, trying something else."
This reverts commit 24c64bd
2018-10-08 17:26:04 -07:00
Clement Pang 24c64bd4bc clang-format still looks weird, trying something else. 2018-10-08 17:24:47 -07:00
Clement Pang 448751ce83 Fix formatting with clang-format 2018-10-08 17:21:57 -07:00
Evan Tschannen ecddeab2ae fixed review comments; demote killRegionCycle test for now 2018-10-08 10:39:39 -07:00
Clement Pang 2fc60299d4 Fix comment. 2018-10-07 03:09:15 -07:00
Clement Pang 5e258677bd Fix build. 2018-10-07 03:07:58 -07:00
Clement Pang cf8f0686f8 Fix build. 2018-10-07 03:06:18 -07:00
Clement Pang ebc42ba609 Instead of ignoring onClosed on all IOErrors, only ignore reboots (and only for memory and if the flag is on). 2018-10-07 02:53:38 -07:00