Commit Graph

2292 Commits

Author SHA1 Message Date
Evan Tschannen d7d38c3544
Merge pull request #430 from ajbeamon/rename-logGroup-attribute
Rename trace file logGroup attribute to LogGroup
2018-06-08 10:30:45 -07:00
Stephen Atherton 69b713918b VersionedBTree now uses PrefixTree based pages (with bugs). This required significant changes to both classes because the interface and semantics for building, seeking in, and iterating through pages is very different from the previous trivial approach which was based on serialized vectors. PrefixTree node format rewritten to support optional values without increasing overhead for common node scenarios. PrefixTree::Cursor rewritten to reuse path prefix memory instead of allocating new memory on each movement which is then 'leaked' until destruction. PrefixTree::Cursor movement modified to work better with VersionedBTree::InternalCursor, which was also heavily modified. Added knobs related to key arrangement in PrefixTree nodes. Added StringRef::toHexString() as an alternative to printable() to make reading raw PrefixTree data easier. PrefixTree performance is temporarily worse with this update and VersionedBtree fails its unit test. 2018-06-08 03:32:34 -07:00
Evan Tschannen e4d5817679 fix: we must server getTeam requests before readyToStart is set because we cannot complete relocateShard requests without getTeam responses from both team collections 2018-06-07 16:14:40 -07:00
Balachandar Namasivayam 11b79c6c94 Save fitness info of a process to become a cluster controller. This info is currently lost after a reboot. Save this info and reload it to avoid unnecessary re-recruitments. 2018-06-07 13:07:19 -07:00
Balachandar Namasivayam 529d0497f1 Proxy going OOM when applying high volumes of writes to a proxy, particular in a sudden fashion before ratekeeper can control the workload.
Address this issue by proactively monitoring the memory used by commit batches and dropping requests if a certain memory limit is exceeded.
2018-06-01 15:21:40 -07:00
A.J. Beamon 3ea5fc72f0 Rename trace file logGroup attribute to LogGroup 2018-05-31 15:34:17 -07:00
A.J. Beamon 5848055620 More line ending fixes 2018-05-31 13:44:38 -07:00
A.J. Beamon 090117a674 Fix line endings. 2018-05-31 13:41:32 -07:00
A.J. Beamon f1dea3bed9 Rework the formatter a bit and extract out the writing logic so that it could be pluggable. 2018-05-31 13:27:35 -07:00
A.J. Beamon f7735d000f Fix linking issue in flow bindings by using a different variant of printable(). 2018-05-31 11:16:01 -07:00
A.J. Beamon 78839b20fd Merge branch 'master' into trace-log-refactor
# Conflicts:
#	flow/Trace.cpp
2018-05-31 10:46:20 -07:00
A.J. Beamon d9c702a9e3 Merge release-5.1 into release-5.2 2018-05-30 09:09:55 -07:00
A.J. Beamon e2e69b5632 Rework how rolled events are logged. Not very efficient, but it doesn't modify the cached version returned in queries. 2018-05-11 11:24:20 -07:00
A.J. Beamon 02df30149f Merge branch 'release-5.2' into trace-log-refactor 2018-05-11 11:22:34 -07:00
Alvin Moore 3fba932f80 Updated Protocol Version
Updated version within download documentation
Added old release notes
Updated Wix GUID Product ID
2018-05-10 17:03:46 -07:00
Evan Tschannen 67fa95264c Merge branch 'master' into feature-remote-logs 2018-05-10 17:01:25 -07:00
Evan Tschannen c45599f2a9 fix: versionstamped_key uses 4 bytes for size 2018-05-10 17:00:35 -07:00
Evan Tschannen 91338fc984 Merge branch 'master' into feature-remote-logs 2018-05-10 15:33:45 -07:00
Evan Tschannen 8f984cb2c9 Merge branch 'release-5.2'
# Conflicts:
#	fdbrpc/TLSConnection.h
2018-05-10 09:13:22 -07:00
Evan Tschannen d3450ce5b0
Merge pull request #343 from bnamasivayam/tls-plugin
Tls plugin
2018-05-09 16:35:53 -07:00
A.J. Beamon fe1c9d5be4 Make the trace log formatter reference counted. 2018-05-09 14:35:15 -07:00
Alec Grieser 8de914a81f
use contents() instead of address of in withPrefix and withSuffix ; whitespace fixes 2018-05-09 09:01:22 -07:00
Balachandar Namasivayam d3b5cfb93c Support latest TLS plugin.
Add support for https in backup.
2018-05-08 16:28:13 -07:00
Alec Grieser 464e2cdbf0
change SetVersionstampedKey and SetVersionstampedValue behavior based on API version to make them consistent 2018-05-08 08:57:09 -07:00
A.J. Beamon 047563aa62 Support trace event overflow. 2018-05-04 15:19:56 -07:00
A.J. Beamon 2a46fdb31d Change formatv to vsformat, which now returns the length of the string (or <0 for error) and takes the output string as an argument. 2018-05-04 13:35:25 -07:00
A.J. Beamon ce0c991e78 Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs. 2018-05-02 10:44:38 -07:00
Evan Tschannen 10d25927cd Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DataDistribution.actor.cpp
2018-04-30 22:15:39 -07:00
A.J. Beamon 738f3036cf Merge branch 'release-5.2' into enable-trace-logging-before-open 2018-04-30 15:43:17 -07:00
Alec Grieser 69e831d522
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2 2018-04-28 17:44:52 -07:00
Alec Grieser a1faaafca3
Merge remote-tracking branch 'upstream/release-5.1' into merge-release-5.1 2018-04-27 16:38:18 -07:00
Alex Miller 11bd7d7daf Add a disgusting and terrible hack to avoid "undefined std::istream::ignore(long)" linking errors.
This makes building our bindings in the provided docker image possible again.
2018-04-26 16:07:50 -07:00
Stephen Atherton af61d3596d Merge branch 'public-master' into feature-redwood
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
2018-04-24 17:22:21 -07:00
Dennis Schafroth bd26d6916c Support boost 1.52 and newer 2018-04-23 14:51:15 +02:00
Dennis Schafroth d660c96158 Compatible with newer boost versions 2018-04-23 14:07:17 +02:00
Dennis Schafroth 2dd0b25259 Use ASSERT_ABORT in destructor 2018-04-23 14:05:26 +02:00
Bruce Mitchener 2f8a0240f1
Fix some typos. 2018-04-19 11:44:01 -07:00
Bruce Mitchener 9cdf25eda3 Fix some typos. 2018-04-20 00:49:22 +07:00
Alex Miller 20082e3228 Clang fixes. 2018-04-12 11:10:53 -07:00
Evan Tschannen 19762b847d Merge branch 'release-5.2'
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
#	fdbserver/SimulatedCluster.actor.cpp
2018-04-10 17:02:43 -07:00
Evan Tschannen c1ba16b3c8 Merge branch 'release-5.1' into release-5.2
# Conflicts:
#	bindings/java/src/test/com/apple/foundationdb/test/AbstractTester.java
#	bindings/java/src/test/com/apple/foundationdb/test/VersionstampSmokeTest.java
#	bindings/nodejs/lib/fdb.js
#	bindings/nodejs/src/Version.h
#	bindings/nodejs/tests/tuple_test.js
2018-04-10 16:50:47 -07:00
Evan Tschannen b46c32535c surpassed spammy trace events 2018-04-10 15:52:32 -07:00
Evan Tschannen 7af892f50b first working version of non-copying recovery working with fearless configurations 2018-04-08 21:24:05 -07:00
Stephen Atherton 2752a28611 Merge branch 'release-5.2' of github.com:apple/foundationdb into feature-redwood 2018-04-06 16:29:37 -07:00
Evan Tschannen b36e08f08f first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet 2018-03-29 15:12:38 -07:00
A.J. Beamon 62726efa82 Add support for logging trace events before the network is created and the trace log opened. 2018-03-28 08:21:22 -07:00
Evan Tschannen d3fb17d30a
Merge pull request #74 from bnamasivayam/client-profiling-tests
Client profiling tests - Part 1
2018-03-23 16:52:49 -07:00
Evan Tschannen 5db52ab081
Merge pull request #87 from etschannen/feature-remote-logs
Feature remote logs
2018-03-23 12:55:17 -07:00
A.J. Beamon ddc0c613ed
Merge pull request #109 from apple/release-5.2
Merge Release 5.2 into master
2018-03-21 09:37:56 -07:00
Evan Tschannen d4007c27ea
Merge pull request #37 from bnamasivayam/release-5.2
Split ProcessMetrics into ProcessMetrics, NetworkMetrics and MemoryMe…
2018-03-20 17:31:46 -07:00
Alec Grieser 551ea9c7f8
Merge remote-tracking branch 'upstream/release-5.2' into master-release-5.2-merge 2018-03-19 12:34:50 -07:00
yichic 0b4564e484
Merge pull request #73 from ajbeamon/release-5.2
Reduce severity of GetDiskStatisticsDeviceNotFound to SevWarn.
2018-03-16 15:44:54 -07:00
Evan Tschannen 820382ea68 optimized the log router commit path to avoid re-serializing the data 2018-03-16 11:40:21 -07:00
Evan Tschannen 59723f51f8 fix: continue to attempt to lock logs until remote logs are recovered, this is so that remote logs get locked and readers know they will not have any more data
do not throttle trace events in simulation
2018-03-14 12:39:55 -07:00
Balachandar Namasivayam 856d2a0a9d Add correctness tests for Client transaction profiling data format. It also includes format check across upgrades. 2018-03-14 12:39:50 -07:00
A.J. Beamon 06dd052770 Reduce severity of GetDiskStatisticsDeviceNotFound to SevWarn. 2018-03-14 12:17:54 -07:00
Alec Grieser 70a05c1a9b
fix some compiler whinges 2018-03-13 15:00:16 -07:00
Evan Tschannen 5390af8be4 suppress spammy logs 2018-03-09 09:40:36 -08:00
Balachandar Namasivayam 336a54faef Split ProcessMetrics into ProcessMetrics, NetworkMetrics and MemoryMetrics 2018-03-06 15:52:03 -08:00
A.J. Beamon f2c804e14f Reverting changes from merge of master into release-5.2 (b25810711c). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit. 2018-03-06 10:15:04 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Alvin Moore de1551c20d Merge branch 'release-5.1' 2018-02-23 08:24:06 -08:00
Alvin Moore a1382895a6 Fixed headers and some whitespace 2018-02-23 04:50:23 -08:00
Alec Grieser e1162e9238 Merge remote-tracking branch 'upstream/release-5.1' 2018-02-22 11:16:12 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Evan Tschannen dc93759e15 suppressed trace events that are spammy 2018-02-16 16:01:19 -08:00
Balachandar Namasivayam 2eeb714c4b Trace memory usage of individual FastAllocator of different sizes as part of ProcessMetrics. 2018-02-16 13:30:00 -08:00
A.J. Beamon 03501fc26c TraceEvent::type is set in init so that it works for TraceIntervals too. ASSERT that the type is non-empty. 2018-02-13 13:29:08 -08:00
Stephen Atherton 0a35f167e4 Merge branch 'master' into feature-redwood
# Conflicts:
#	fdbserver/DiskQueue.actor.cpp
#	fdbserver/IDiskQueue.h
#	fdbserver/Knobs.cpp
#	fdbserver/Knobs.h
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen 42405c78a5 Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Alex Miller 32762648fa Merge branch 'release-5.1' 2018-02-08 18:38:48 -08:00
Stephen Atherton 1f78b98ac9 Bug fix, result as a state variable causes 'this' to be captured instead of copying result. 2018-02-08 09:54:47 -08:00
Stephen Atherton 69425a303b Improved error handling for cases where blob account credentials are either not found in the provided credentials sources and/or some of the credentials sources provided are not readable or parseable. 2018-02-07 21:50:43 -08:00
A.J. Beamon 9c7a39d5b2 Rename SevWarnAlways trace event DeviceNotFound to GetDiskStatisticsDeviceNotFound 2018-02-07 12:54:34 -08:00
Evan Tschannen ebd94bb654 removed a separately configurable storage team size for the remote data center, because it did not make sense
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
A.J. Beamon 080a454051 fix: getVersionstamp would return broken promise if a transaction was disposed before being set. getAddressesForKey would not return when resetPromise was set. 2018-01-31 13:47:36 -08:00
Evan Tschannen 29c5d4ad3d upgrades from 5.X mostly supported, still some remaining correctness problems 2018-01-28 11:52:54 -08:00
Evan Tschannen 79d94214a4 Merge commit 'f4ffc9752b5ec66ac47f5f684a5d8be06a7eae6e' into feature-remote-logs 2018-01-25 10:12:06 -08:00
Alec Grieser 7808099318 bump protocol version for release ; new product guid 2018-01-25 10:02:48 -08:00
Stephen Atherton 66de9d392b New error code, http_auth_failed, which is used when blob authentication fails instead of the previous generic http_request_failed. 2018-01-22 14:58:56 -08:00
Evan Tschannen 698ef4117e Merge branch 'master' into feature-remote-logs 2018-01-20 10:34:30 -08:00
A.J. Beamon 4bfbdbf454 Extract getLocalTime to platform.cpp 2018-01-17 11:35:34 -08:00
Stephen Atherton 93b34a945f Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup. 2018-01-17 04:09:43 -08:00
Evan Tschannen 21482a45e1 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/DBCoreState.h
#	fdbserver/LogSystem.h
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/TLogServer.actor.cpp
2018-01-14 13:40:24 -08:00
Evan Tschannen 660cee0254 increased the priority of getKeyServersLocations, because once a client gets a read version, answering their reads should be higher priority than starting new transactions 2018-01-12 13:46:20 -08:00
Evan Tschannen de119f192d fixed a priority inversion where the tlog would prefer to copy data from the previous generation rather than make data durable (leading to being ratekeeper controlled) 2018-01-11 16:09:49 -08:00
A.J. Beamon 2f5073d00f Some visual studio project cleanup. 2018-01-10 10:07:18 -08:00
Evan Tschannen 3ec45d38a0 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Evan Tschannen 5ac4f73978 Merge branch 'release-5.1' into feature-remote-logs
# Conflicts:
#	fdbclient/NativeAPI.actor.cpp
#	fdbrpc/Locality.h
#	fdbrpc/simulator.h
#	fdbserver/ApplyMetadataMutation.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/LogSystemPeekCursor.actor.cpp
#	fdbserver/MasterProxyServer.actor.cpp
#	fdbserver/SimulatedCluster.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/masterserver.actor.cpp
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Stephen Atherton 0f20068e82 Renamed all TaskBucket backup tasks to more appropriate names. Created the ability to make task aliases and used this to direct old task names to a task definition which will abort backups created before version 5.1. 2018-01-04 22:53:31 -08:00
A.J. Beamon 653a46f12f Update error string fro cluster_version_changed error 2018-01-04 15:06:09 -08:00
Stephen Atherton cec9f4d7a4 Bug fix in DNS resolution. When the result is an error the result promise was being set twice. 2018-01-03 13:05:38 -08:00
Stephen Atherton ec28c77353 Merge branch 'master' of github.com:apple/foundationdb 2017-12-21 01:58:47 -08:00
Stephen Atherton e3aee45a74 Backup tools and agent now accept blob account credentials via files containing JSON which are specified using command line arguments and/or an environment variable. Improved fdbbackup help, clarifying which options are for which operations. Fdbbackup operations which do not need to use a database no longer require a cluster file parameter. Added eat() commands to StringRef for incrementally tokenizing strings using separator strings. 2017-12-21 01:58:15 -08:00
Evan Tschannen 982f0dcb1e Merge pull request #222 from cie/alexmiller/drtimefix2
Fix yet another VersionStamp DR issue.
2017-12-20 15:09:23 -08:00
Alex Miller b5a6bc0ab7 Fix VersionStamp problems by instead adding a COMMIT_ON_FIRST_PROXY transaction option.
Simulation identified the fact that we can violate the
VersionStamps-are-always-increasing promise via the following series of events:

1. On proxy 0, dumpData adds commit requests to proxy 0's commit promise stream
2. To any proxy, a client submits the first transaction of abortBackup, which stops further dumpData calls on proxy 0.
3. To any proxy that is not proxy 0, submit a transaction that checks if it needs to upgrade the destination version.
4. The transaction from (3) is committed
5. Transactions from (1) are committed

This is possible because the dumpData transactions have no read conflict
ranges, and thus it's impossible to make them abort due to "conflicting"
transactions.  There's also no promise that if client C sends a commit to proxy
A, and later a client D sends a commit to proxy B, that B must log its commit
after A.  (We only promise that if C is told it was committed before D is told
it was committed, then A committed before B.)

There was a failed attempt to fix this problem.  We tried to add read conflict
ranges to dumpData transactions so that they could be aborted by "conflicting"
transactions.  However, this failed because this now means that dumpData
transactions require conflict resolution, and the stale read version that they
use can cause them to be aborted with a transaction_too_old error.
(Transactions that don't have read conflict ranges will never return
transaction_too_old, because with no reads, the read snapshot version is
effectively meaningless.)  This was never previously possible, so the existing
code doesn't retry commits, and to make things more complicated, the dumpData
commits must be applied in order.  This would require either adding
dependencies to transactions (if A is going to commit then B must also be/have
committed), which would be complicated, or submitting transactions with a fixed
read version, and replaying the failed commits with a higher read version once
we get a transaction_too_old error, which would unacceptably slow down the
maximum throughput of dumpData.

Thus, we've instead elected to add a special transaction option that bypasses
proxy load balancing for commits, and always commits against proxy 0.  We can
know for certain that after the transaction from (2) is committed, all of the
dumpData transactions that will be committed have been added to the commit
promise stream on proxy 0.  Thus, if we enqueue another transaction against
proxy 0, we can know that it will be placed into the promise stream after all
of the dumpData transactions, thus providing the semantics that we require:  no
dumpData transaction can commit after the destination version upgrade
transaction.
2017-12-20 15:04:04 -08:00
Stephen Atherton 7caa012fbf Added snapshot interval option to "fdbbackup start" which defaults to a new knob's value. Added snapshot info to backup status text. Improvements to fdbbackup help. 2017-12-20 00:49:08 -08:00
Stephen Atherton e0d9cea008 Merge branch 'master' into continuous-backup
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Alex Miller 1488c12c18 Simulation will return and error and print if any non-suppressed SevError events were logged.
This means that loops like `seed=1; while ./fdbserver -r simulation -s $seed;
do seed=$(($seed+1)); done` to find an example of an often failing test.  This
also means joshua will report ExitCode errors on anything that has a SevError
in the log.

As a part of this, we also implicitly downgrade any injected errors to SevWarnAlways.
2017-12-19 17:17:50 -08:00
Stephen Atherton abb2dd1ebc Merge pull request #214 from cie/alexmiller/fallocate
Use fallocate to zero ranges instead of writing zeroes
2017-12-06 13:47:40 -08:00
Stephen Atherton f8e89a40ac Bug fixes, take(1) is incorrect usage of FlowLock. 2017-12-04 10:25:47 -08:00
Evan Tschannen 482ac38ca6 added knobs so that the client failure monitoring update rate and the server failure monitoring update rate are separate knobs 2017-12-01 13:04:32 -08:00
Alex Miller 196258080b Refactor zeroing a chunk of a file from DiskQueue into IAsyncFile.
If we're going to do the work to provide more optimized ways to zero files,
then I'd feel better with this being in a more common place, so that any other
zero-ers are likely to reuse it.  It also makes testing easier/more obvious.

Also, because it's needed for correctness, fix the aligned_alloc for OSX, which
wasn't aligned, and use an actually aligned allocation function.
2017-11-30 17:57:55 -08:00
Stephen Atherton aeebe711ce TaskBucket’s saveAndExtend() is now accomplished through extendTimeout() with an option to save parameters. SaveAndExtendIncrementally() has been removed as it is no longer needed because TaskBucket’s normal execution loop calls extendTimeout() periodically as long as the TaskFunc’s execute() actor has not finished or thrown. If a TaskFunc wants to save changes to task parameters to checkpoint progress for task restarts to benefit from it can call extendTimeout() explicitly with the updateParams flag set to true. 2017-11-30 17:18:57 -08:00
Stephen Atherton a77162b53d Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbclient/BackupAgent.h
#	fdbclient/FileBackupAgent.actor.cpp
#	fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton 3dfaf13b67 IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup.  The addition of IBackupFile and its finish() method simplified the log and range writer tasks.  Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes.  Added KeyBackedSet<T> type.  Moved JSONDoc to its own header.  Added platform::findFilesRecursively().

Still to do:  update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
A.J. Beamon 313e823629 Delete TDMetric data (tmpEventMetric) when a trace event is throttled. 2017-11-13 15:06:21 -08:00
A.J. Beamon bf07fa3023 Untested changes to MemAvailable computation on kernels without MemAvailable 2017-11-06 09:35:05 -08:00
A.J. Beamon 7cf17df821 Merge branch 'master' into log-group-for-unsupported-clients
# Conflicts:
#	flow/Net2.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-11-01 11:31:02 -07:00
Evan Tschannen b1e3864c0e fix: stop after does not print errors if actor cancelled 2017-11-01 10:58:39 -07:00
Balachandar Namasivayam e377d1985c Merge pull request #195 from cie/throttle-trace-events
Increase message limit for throttling to 100,000 for circus runs.
2017-10-31 11:10:11 -07:00
Balachandar Namasivayam dd6c24ce09 Addressed Review Comments. 2017-10-31 11:08:54 -07:00
Balachandar Namasivayam 80e5fecfe2 Increase message limit for throttling to 100,000 for circus runs.
Added an optimization to use a separate set for throttled events. Since this set is expected to be small, comparison of every event against this set is going to be cheaper.
2017-10-31 10:35:26 -07:00
Evan Tschannen 54d82c0d92 Merge pull request #194 from cie/alexmiller/valgrind
Fix valgrind errors
2017-10-27 17:25:12 -07:00
Alex Miller 3b61b76876 Fix a massive amount of valgrind errors and make them easier to debug in the future.
std::is_pod<> being less restrictive than is_binary_serializable<> meant that
structs that both were POD and had a serialize method defined would be binary
serialized instead of using the defined serialize().  This means that it would
also serialize any padding that the struct contained, which would cause mass
waves of valgrind failures from uninitialized memory.

Included in this change is additional uses of valgrind client requests so that
attempts to send uninitialized memory are reported at the sending site, versus
as part of checksum calculation in sending the packet.
2017-10-27 16:54:44 -07:00
Balachandar Namasivayam 1d3c88c147 Increase the spammy trace event threshold to 20000 in 20 minutes. 2017-10-27 12:22:56 -07:00
Balachandar Namasivayam cfefab18fb Merge branch 'master' into add-new-atomic-ops 2017-10-25 18:03:34 -07:00
Balachandar Namasivayam 9dd588dcce Addressed review comments.
Changed naming for NewMin and NewAnd to MinV2 and AndV2
2017-10-25 14:48:05 -07:00
Evan Tschannen d852a53ae4 Merge pull request #181 from cie/throttle-spammy-logs
Throttle spammy logs
2017-10-25 13:45:55 -07:00
Stephen Atherton 3afc85881e Merge branch 'master' into backup-container-refactor
# Conflicts:
#	fdbrpc/BlobStore.actor.cpp
2017-10-20 21:38:28 -07:00
Stephen Atherton 42955012e9 Merge branch 'release-5.0'
# Conflicts:
#	fdbrpc/BlobStore.actor.cpp
#	flow/error_definitions.h
2017-10-20 21:16:55 -07:00
Stephen Atherton efe857fef6 Fixed inconsistent styles of recently changed error messages. 2017-10-20 12:56:00 -07:00
Evan Tschannen e2c1e87df6 made a large number of fixes to make fearless DR correctness clean. 2017-10-19 15:36:32 -07:00
Alec Grieser dd6d8f3b0e Merge branch 'master' into add-new-atomic-ops 2017-10-18 16:36:44 -07:00
Alex Miller d3df8469dd Upgrade protocol version as ClientWorkerInterface was changed. 2017-10-18 14:56:31 -07:00
A.J. Beamon a5c2373dbb Spaces->tabs 2017-10-18 09:04:35 -07:00
A.J. Beamon 050c1bcba6 Update transaction_too_old error text. 2017-10-18 08:49:17 -07:00
Stephen Atherton ebd0234514 Rewrote most error handling in BlobStoreEndpoint to fix several shortcomings in error handling and logging. The request loop now logs but rate limits all errors, and the exceptions thrown are more appropriate. HTTP 503 is now treated as retryable. Callers of BlobStoreEndpoint::doRequest() now specify which codes they consider to be successful so that more error handling can take place in the main request loop. 2017-10-18 02:52:09 -07:00
Alec Grieser fc2ff719d0 fixed some capitalization issues that slid in through the case-insenstive filesystems 2017-10-17 09:07:05 -07:00
Alex Miller 7b9bc1d715 Merge pull request #170 from cie/alexmiller/flowprofile
Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.
2017-10-16 16:51:53 -07:00
Alex Miller cf646d4a99 Address review comments.
* Fixed fdbcli to be more idiomatic.
* Removed is_binary_serializable in favor of std::is_pod<>
* Removed custom enable_if<> in favor of std::enable_if<>
* Removed HEY REVIEWER comments
* Removed print from prof.py
* Added FLOW_PROFILER_ENABLED=yes to circus components that wished to enable the flow profiler.
2017-10-16 16:46:52 -07:00
Alex Miller 16e5b50685 Replace backtrace with absl::GetStackTrace on non-MacOS platforms.
backtrace() gives a list of return addresses, which means that addr2line will
print out the line after the caller. GetStackTrace returns the list of caller
addresses, so the addr2line results should be accurate.  The flow profiler was
also changed to use the new backtracing code, so flow profiles will now be
accurate as well.  Unfortunately, the abseil code doesn't work on MacOS, so we
still fall back to backtrace() in this case.

For the stack unwinder to work, we must disable -fomit-frame-pointer.  This can
result in a small performance penalty, as it effectively reduces the number of
general purpose registers available by one.  (I'm also curious if this has
anything to do with the overly frequent "<value optimized out>" messages from
gdb.)  If this shows up as a problem, we can make release builds still have
-fomit-frame-pointer, and fall back to backtrace when it's enabled then as
well.
2017-10-16 16:05:02 -07:00
Alex Miller 1a91aab1d7 Import //base/debugging:stacktrace from abseil.
This code is all Apache 2 licensed, and all headers were maintained when
concatinated, so we should be completely fine from a legal standpoint.

I've scriptified the steps that I took so that if we need to update this code
in the future, it hopefully shouldn't be too much of a hassle.
2017-10-16 16:05:02 -07:00
Alex Miller f997cb9038 Add a string knob to hold the Log directory, and write profiles to it.
This is the combination of two small changes.

1. Add support for a string knob type.
2. Change profiles to be written to the log directory instead of the working
   directory.

We have three options of where to write files: the working directory, the data
directory, and the log directory.

The working directory may be set to a non-writable location, and likely
contains the fdb binaries.  Allowing these files to be overwritten would likely
not be a wise idea.

The data directory hosts our sqlite b-trees.  It would also be very unfortunate
if these were ever overwritten by an unfortunate profile name.

The log directory contains logs.  Out of the three, these matter the least if
they disappear or become corrupted.

Thus, we write to the log directory.
2017-10-16 16:05:02 -07:00
Alex Miller 91a26a170c Add toggleable profiling support to fdbserver+fdbcli.
This adds the fdbcli commands:
* profile list -- Lists all workers in a way that doesn't fill `kill`'s list.
* profile flow run -- Allows starting flow profiling on a set of hosts for a specified interval.

And threads through all the support for enabling and disabling profiling as an RPC.
2017-10-16 16:05:02 -07:00
Stephen Atherton e934604f67 Added DNS resolution. Interface is INetworkConnections::resolveTCPEndpoint() to resolve, or for convenience INetworkConnections::connect(host, service) will resolve host and service (port number or service name like http) and connect to one of the addresses at random.
BlobStoreEndpoint now only accepts hostnames and an optional service, so this update is not compatible with the previous URL formats having many IP addresses.
2017-10-15 21:51:11 -07:00
Balachandar Namasivayam 3aaa11977e Addressed Review Comments 2017-10-12 14:56:00 -07:00
Stephen Atherton ad0ed79d36 Merge pull request #172 from bmuppana/backup-refactor
Backup refactoring
2017-10-12 11:38:49 -07:00
Stephen Atherton 11517f7bfc Merge branch 'master' into continuous-backup
# Conflicts:
#	fdbclient/FileBackupAgent.actor.cpp
2017-10-12 11:03:23 -07:00
A.J. Beamon b20ae356b1 Alloc instrumentation backtraces use format_backtrace; Magnesium detects backtraces from binaries besides fdbserver. 2017-10-12 08:39:13 -07:00
Alex Miller 9648f96200 Also fix unforwarded Metric in IndexedSet.
This is simply an exceedingly minor performance fix rather than a correctness issue.
2017-10-11 17:40:48 -07:00
Alex Miller c24b941485 Fix erroneous std::move in indexed set, and clean up addMetric users.
This is a follow-on to c4eb73d0.  Thanks to Bala for pointing out the unchanged
std::move usage, and there appeared to not be many existing users of addMetric
anyway.
2017-10-11 17:36:51 -07:00
Balachandar Namasivayam eeebf10030 Modified existing behavior of MIN and AND atomic ops. The new behavior results in a 'SET' if the atomic op is performed on a non -existing key.
Added new atomic ops ByteMin and ByteMax that does lexicographic comparison of byte strings.
2017-10-10 13:02:22 -07:00
Evan Tschannen 15962cf079 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbrpc/Locality.cpp
#	fdbrpc/Locality.h
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/ClusterRecruitmentInterface.h
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/TagPartitionedLogSystem.actor.cpp
#	fdbserver/WorkerInterface.h
#	fdbserver/fdbserver.vcxproj.filters
#	fdbserver/masterserver.actor.cpp
#	fdbserver/worker.actor.cpp
#	flow/error_definitions.h
2017-10-05 17:09:44 -07:00
Alex Miller 0ac868ad5d "Simplify" IndexedSet's insert and addMetric API.
The existing code tried to work around the complexities of optionally using
rvalue references' move capabilities if they exist.  As seen in the previous
MapPair, there's a combinatorial explosion of prototypes to declare as the
parameter length increases.  Because of this, addMetric ended up with a strange
API, and there was a wrapper to make a copy for insert.

Instead, we can apply the idiom of using universal/forwarding references and
std::forward to allow the compiler to instantiate the combinations that are
needed.  There's a TagData struct with no copy constructor that validates that
move constructors can be properly called still.

I measured a 12-byte difference between before and after this change, so no
template bloat was introduced.
2017-10-03 20:15:12 -07:00
Balachandar Namasivayam 0e153cdd35 Throttle Spammy logs. Three knobs are added.
Trace Events are sampled and cached with an expiration set. Every TraceEvent above SevDebug is checked against this cache to see if it exceeded a set threshold. If yes, then throttle the TraceEvent.
If a TraceEvent is throttled, a warning msg is logged.
2017-10-02 18:43:11 -07:00
Evan Tschannen 6ea9903c82 Merge branch 'release-5.0'
# Conflicts:
#	fdbbackup/backup.actor.cpp
#	fdbserver/ClusterController.actor.cpp
#	versions.target
2017-10-01 18:46:44 -07:00
Evan Tschannen e2b65e86ed added configurable memory limits for backup and dr executables
added a default memory limit of 8GB for fdbcli
2017-09-29 10:35:40 -07:00
Evan Tschannen ef41b07bb3 renamed past_version to transaction_too_old
implemented read_lock_aware option
2017-09-28 16:35:08 -07:00
Yichi Chiang d4f75630de Support log group field in status json 2017-09-28 16:31:29 -07:00
Evan Tschannen 7b60e26660 Merge pull request #160 from cie/use-error-descriptions
Add the ability to access name and description in Error. Update error…
2017-09-28 16:00:39 -07:00
A.J. Beamon 4f97bd44a5 If we fail to get the interface name due to a platform error, don't kill the process. Instead, just leave the network counters alone. Change the GetInterfaceAddrs trace event to SevWarnAlways. 2017-09-28 13:32:39 -07:00
A.J. Beamon 67d0eb5d66 Change a few more error descriptions; update sphinx error code documentation 2017-09-28 13:03:17 -07:00
A.J. Beamon d30c730f75 Add the ability to access name and description in Error. Update error descriptions. 2017-09-28 12:35:03 -07:00
Evan Tschannen 7081136f74 added a fix 2017-09-22 15:08:14 -07:00
Evan Tschannen 4809bd8f62 fix: We cannot inject faults after renaming the file, because we could end up with two asyncFileNonDurable open for the same file 2017-09-21 18:11:18 -07:00
Evan Tschannen 489332533c all timeouts longer than two minutes have been can be lowered to 60.0 with buggification
added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures
2017-09-18 11:04:51 -07:00
Evan Tschannen 76e7988663 Merge branch 'master' into feature-remote-logs
# Conflicts:
#	fdbserver/ClusterController.actor.cpp
#	fdbserver/DataDistribution.actor.cpp
#	fdbserver/OldTLogServer.actor.cpp
#	fdbserver/TLogServer.actor.cpp
#	fdbserver/WorkerInterface.h
#	flow/Net2.actor.cpp
2017-09-11 15:15:56 -07:00
Bhaskar Muppana fe208d6adf Merge branch 'master' of github.com:apple/foundationdb into backup 2017-09-06 10:01:55 -07:00
Alec Grieser fe9abbfac9 revert 'Remove unused code' for function referenced in fdbrpc 2017-09-01 09:54:03 -07:00
Ben Collins 3aaa131b4f Merge branch 'master' of github.com:apple/foundationdb 2017-09-01 09:03:45 -07:00
Bhaskar Muppana 439193d17b Moving keyBackupContainer to BackupConfig.backupContainer() 2017-08-30 12:48:28 -07:00
Stephen Atherton 5d49f2c710 Merge branch 'master' into feature-redwood
# Conflicts:
#	fdbserver/fdbserver.vcxproj
2017-08-28 17:45:50 -07:00
A.J. Beamon 9a0a3b6329 Merge commit '66528becb82d826e81fa644bb378212584ab580e' 2017-08-28 16:47:59 -07:00
Stephen Atherton 86d025f943 Bug fix: Metric base enabled state was not being initialized. Metrics are configured to be disabled upon construction, however if during construction it appears that a metric was initially enabled then a crash would result if the MetricsCollection global was not created. 2017-08-27 22:22:32 -07:00
A.J. Beamon 9ce8d3ae4f Merge branch 'release-5.0' 2017-08-09 10:37:43 -07:00
A.J. Beamon 5a8cd34224 Disable profiling if we aren't using the cached dl_iterate_phdr values. 2017-08-08 09:03:04 -07:00
Evan Tschannen c22708b6d6 added tag localities
fix: remote logs need to stop the master when they are stopped
2017-08-03 16:16:36 -07:00
A.J. Beamon 03915ce4ea Merge branch 'release-5.0' 2017-08-03 15:49:54 -07:00
A.J. Beamon f66e22c89d fix: machine metrics could sometimes default to 0, which cause underflows when compared with prior results. 2017-08-03 15:49:30 -07:00
Evan Tschannen 9fb709cf73 Merge branch 'release-5.0'
# Conflicts:
#	versions.target
2017-07-27 16:51:57 -07:00
Evan Tschannen 19fa31ffff fix: uncancellable was not forwarding errors to the result promise 2017-07-27 13:29:08 -07:00
Ben Collins 6f0062330b Merge branch 'master' of github.com:apple/foundationdb 2017-07-26 11:02:13 -07:00
Evan Tschannen 64e9560599 Merge pull request #128 from cie/maintain-incompatible-connections
Maintain incompatible connections
2017-07-17 16:28:22 -07:00
A.J. Beamon 2113d47db6 Update protocol version for incompatible connection change 2017-07-17 16:16:05 -07:00
Alec Grieser 3700624fd7 Merge branch 'release-5.0' 2017-07-17 08:54:10 -07:00
Alec Grieser eee492a05b fix build issue from Notified.h not being shuffled in vcxproj files 2017-07-14 16:46:08 -07:00
Alec Grieser c860f09d8a Merge branch 'release-5.0' 2017-07-14 16:01:15 -07:00
Alec Grieser 660729839c moved Notified.h from flow -> fdbclient ; flow bindings package does better job when excluding testers 2017-07-14 15:49:30 -07:00
Alec Grieser b133862db6 added FLOW and FDB_FLOW targets to make packages of flow headers and libs 2017-07-13 10:21:36 -07:00
Ben Collins 72de765083 Remove unused code 2017-07-13 08:18:00 -07:00
Evan Tschannen 979ebcef6c changed to using a vector of logSets instead of a duplicate set of logs for remote servers
finished porting changes to the tlog
everything but peeking is finished in the TagPartitionedLogSystem
2017-07-09 14:46:16 -07:00
Evan Tschannen 0906250e78 merged everything from feature-remote-logs besides the tlog and tagpartitionedlogsystem
re-included tags in messages to the tlog
previously never committed the LogRouter
2017-06-29 15:50:19 -07:00
Alec Grieser 343d115b37 update Platform to handle newer version of clang that does have __rdtsc 2017-06-28 14:32:01 -07:00
Alvin Moore 8f7c76ddd3 Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	fdbserver/Knobs.h
Updated Windows GUID
Updated and corrected format of Protocol Version
2017-06-26 15:54:57 -07:00
Stephen Atherton 430bb6224e Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/Net2FileSystem.cpp
#	fdbrpc/sim2.actor.cpp
2017-06-16 02:14:19 -07:00
Evan Tschannen 4bdcd8fc12 Merge branch 'release-4.6' into release-5.0
# Conflicts:
#	bindings/bindingtester/run_binding_tester.sh
#	fdbrpc/AsyncFileKAIO.actor.h
2017-06-14 16:43:53 -07:00
Stephen Atherton b65ad3563c Merge branch 'master' into feature-redwood
# Conflicts:
#	fdbserver/fdbserver.vcxproj
#	fdbserver/fdbserver.vcxproj.filters
2017-06-09 14:56:41 -07:00
Stephen Atherton fa4fdb1f1d Merge branch 'fix-io-timeout-handling' into release-5.0
# Conflicts:
#	fdbserver/optimisttest.actor.cpp
2017-05-31 17:03:15 -07:00
Stephen Atherton 98604d33a0 Merge branch 'fix-io-timeout-handling'
# Conflicts:
#	fdbrpc/AsyncFileKAIO.actor.h
#	fdbrpc/sim2.actor.cpp
#	fdbserver/KeyValueStoreSQLite.actor.cpp
#	fdbserver/optimisttest.actor.cpp
#	fdbserver/worker.actor.cpp
#	fdbserver/workloads/MachineAttrition.actor.cpp
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
A.J. Beamon fc468f682b Merge branch 'release-5.0' into bindings-tuple-improvements
# Conflicts:
#	bindings/java/src-completable/main/com/apple/apple/foundationdbdb/tuple/Tuple.java
2017-05-26 12:33:33 -07:00
FDB Dev Team a674cb4ef4 Initial repository commit 2017-05-25 13:48:44 -07:00