Commit Graph

9979 Commits

Author SHA1 Message Date
Jingyu Zhou a2b867c6f9 Fix a unit test failure 2020-04-20 22:26:42 -07:00
Jingyu Zhou 3063611355 Write range files' begin & end keys to manifest file
This information can be very useful in knowing the content in these files,
especially for restores.
2020-04-20 22:26:42 -07:00
Alex Miller a51746b307 Match 6.2.15's behavior in how invalid/unreadable/non-existent certs are handled.
Which is to proceed past Net2 creation, and allow certificate refresh to
try and eventually load valid certs.  Additionally, fix certificate
refeshing dieing if the certificate is not readable when first called.

In testing, I also found and fixed an issue where if a cert went from
unreadable to readable, we wouldn't reload the TLS context, due to not
considering it as a file change.
2020-04-20 21:38:04 -07:00
Alex Miller aec51a1d9c
Merge pull request #2968 from atn34/atn34/cmake-rpm
Prevent server and clients rpm's from conflicting
2020-04-20 21:22:05 -07:00
Alex Miller 8f08405df3
Merge pull request #2969 from atn34/atn34/write-to-clangd
Actually redirect to $HOME/bin/clangd
2020-04-20 21:21:25 -07:00
Alex Miller 20fe068863 Merge branch 'tls-background-eio-thread' into tls-permission-errors 2020-04-20 20:51:05 -07:00
Jingyu Zhou 0ae0a81edf Ensure mutation logs save complete version's data
I.e., do not allow the same version's mutations saved in different files.
Otherwise, we may have a file only contain a version's partial data, causing
continuity analysis of mutation logs to fail. This could also cause restore
failures, if the target version's mutations are stored in two files.

In the above description, all mutation logs refer to the same tag's logs.
2020-04-20 20:41:30 -07:00
Jingyu Zhou 61f0f44ab3 Fix comments on startVersion in BackupWorker 2020-04-20 17:07:50 -07:00
Xin Dong 49c6bb90ef
Merge pull request #2982 from alexmiller-apple/tls-log-settings
Log Net2TLSConfig with paths and settings when using TLS.
2020-04-20 15:46:26 -07:00
Meng Xu fb6aa09128
Merge pull request #2964 from jzhou77/backup-fix
Fix backup progress calculation
2020-04-20 13:41:03 -07:00
Alex Miller 75a4f3b7c9 Remove comment about ignoring runOnMainThread errors.
If we got an exception, it wouldn't be of type `Error` anyway, so
it seems like things would crash regardless.
2020-04-20 13:19:42 -07:00
Alex Miller e51d0365cf Cleanup: Use the shutdown callback for destroying TLS state. 2020-04-20 13:16:16 -07:00
Alex Miller da8e47ea25 Merge remote-tracking branch 'upstream/release-6.2' into tls-background-eio-thread 2020-04-20 13:15:05 -07:00
Alex Miller 5c399bf725 Move the callbacks into ::run() right before it exits.
stopped=true doesn't cause the run loop to immediately exit.
2020-04-20 13:14:19 -07:00
Jingyu Zhou 7507f2da81
Merge pull request #2984 from satherton/future-move-t-constructor
Added Future<T>(T &&value) constructor to avoid a copy...
2020-04-20 11:47:32 -07:00
Jingyu Zhou 5f43e18906 Backup worker pops max of savedVersion or NOOP's popVersion 2020-04-20 11:43:09 -07:00
Jingyu Zhou 0823091423 Fix backup worker removal races with setting
The master waits for all backup worker recruitment done and then set them in a
batch. However, a backup worker could remove itself before the master sets it.
As a result, the worker is not removed and oldest backup epoch can't advance,
and TLog can't be popped.
2020-04-20 11:06:46 -07:00
Jingyu Zhou 70221a25d7 True-up a backup's begin version
For the first mutation log of a backup, we need to true-up its begin version to
the exact version of the first mutation. This is needed to ensure the strict
less than relationship between two mutation logs, if one's version range is
within the other.

A problematic scenario is as follows:
Epoch 1: a mutation log A [200, 900] is saved, but its progress is NOT saved.
Epoch 2: master recruits a worker for [1, 1000], 1000 is epoch 1's end version.
         New worker saves a mutation log B [100, 1000]
A's range is strict within B's range, but A's size is larger than B.

This happens because B's start version is true-up to the backup's begin version,
which is not the actual version of the first mutation. After B's begin version
is true-up to 300, we won't have this issue.
2020-04-20 11:06:46 -07:00
Jingyu Zhou 8245f12091 Backup worker doesn't save progress in NOOP mode
This fixes the consistency check failure, where saving progress commits new
transactions. Pop is performed by the NOOP loop in monitorBackupKeyOrPullData.
2020-04-20 11:06:46 -07:00
Jingyu Zhou cdc911a6ae Fix inadvertent savedVersion update 2020-04-20 11:06:46 -07:00
Jingyu Zhou 76d90ac6d7 Limit the version range for old epochs
When the Master recruits a backup worker for previous epochs, the Master may
set the begin version to a very low number, because the backup progress for
that epoch is not saved. This can cause problem for the log file, since these
low versions have been popped.

The fix here is to advance savedVersion to the minimum of backup's starting
version if it is higher than the begin version set by the Master. This is safe
because these versions are not popped. If they are popped, their progress should
already be recorded and Master would use a higher version than the backup's
starting version.
2020-04-20 11:06:46 -07:00
Jingyu Zhou 5528857934 Remove epoch's begin version check
Turns out the begin version can be a valid previous epoch's begin version, not
specificly 1.
2020-04-20 11:06:46 -07:00
Jingyu Zhou 4c66c8c377 Fix backup progress calculation
The oldest epoch the master gets can assume its begin version is 1, which can
be wrong. In this case, we use the saved backup progress to "true-up" the real
begin version.
2020-04-20 11:06:46 -07:00
Steve Atherton ba1b0a1d96 Use std::move() instead of forward. 2020-04-20 11:01:01 -07:00
Jingyu Zhou 5d6758646f
Merge pull request #2986 from satherton/actor-statevar-rvo
Actor compiler will std::move() return expressions that exactly match a state variable
2020-04-20 10:42:51 -07:00
A.J. Beamon c28a843251
Merge pull request #2977 from alexmiller-apple/tls-no-atexit
Fix clients crashing in TLS code on exit.
2020-04-20 08:40:16 -07:00
Steve Atherton 022b77e288 Actor compiler will std::move() return expressions that exactly match a state variable. 2020-04-20 04:19:33 -07:00
Alex Miller 2ce539ef6d Respect flow<->fdbrpc module boundaries.
Which fixes a compilation error due to a circular dependency between
flow.a and fdbrpc.a.  However, this is now done at the cost of newNet2
users have to remember to add Net2FileSystem::stop() as a callback.
2020-04-20 02:53:07 -07:00
Steve Atherton 7b23c6f640 Future constructor to avoid a copy when Future<T> is initialized from an rvalue reference to T. 2020-04-20 01:52:28 -07:00
Steve Atherton 21277fdb4f Merge branch 'master' of github.com:apple/foundationdb into feature-redwood 2020-04-20 01:47:21 -07:00
Xin Dong c09f659fd5
Merge pull request #2983 from alexmiller-apple/tls-whichmeans-on-tls-error
Only log OpenSSL error strings for OpenSSL errors.
2020-04-19 12:57:11 -07:00
Jingyu Zhou 3f3a728bb3
Merge pull request #2970 from xumengpanda/mengxu/fr-range-versions-PR
Fix:New backup and restore: Mutations before min range version may not be complete
2020-04-19 09:49:07 -07:00
Meng Xu 719eda9421 FastRestore:Add an assertion in handleRestoreSysInfoRequest
as suggested in code review.
2020-04-18 22:42:46 -07:00
Alex Miller cbb6ffb431 Only log OpenSSL error strings for OpenSSL errors.
Normal "connection refused" messages would show up with a long verbose
string that doesn't really provide any useful information otherwise.
2020-04-18 20:39:02 -07:00
Alex Miller 11eebc4a48 Log Net2TLSConfig with paths and settings when using TLS.
There were similar TraceEvents in the FDBLibTLS/LibreSSL TLS
implementaiton that were accidentally dropped in the TLS rewrite.

This makes it so that one does not have to use magic to figure out if a
process was configued with TLS correctly when some of the settings come
from environment variables.
2020-04-18 20:21:10 -07:00
Alex Miller 1398e9a82e Stop background eio threads on Net2::stop().
This will stop eio threads for both the client (`fdb_stop_network()`)
and the server.  This change is being done more for the former, but I
don't see any harm in doing the latter as well.
2020-04-18 19:40:55 -07:00
Meng Xu 3dd43aa7dd
Merge pull request #2662 from zjuLcg/private-keyspace-framework
Design for the special key space framework
2020-04-18 17:00:14 -07:00
Alex Miller 94b4f78ea9 Fix clients crashing in TLS code on exit.
If client code initiates an FDB operation to a TLS cluster, and then
immediately exits the main thread, then OpenSSL's atexit handler would
potentially run while the network thread is attempting to do TLS
operations, and thus crash.

This commit removes the OpenSSL atexit hander, and instead relies on a
client intentionally ending the network thread to do TLS cleanup.  If
the client code exits without stopping the network thread, then we'll
never free OpenSSL data structures, which is the safer thing to do.
2020-04-18 15:48:02 -07:00
Meng Xu 10a6461d13 FastRestore:Change __inline__ to inline
__inline__ is compiler specific while inline is the standard keyword
2020-04-17 22:31:44 -07:00
Meng Xu 82ae82c98f Move MAX_VERSION to FDBTypes.h 2020-04-17 18:46:04 -07:00
Meng Xu 916d361587 BackupAndParallelRestoreCorrectness:Remove unnecessary checking optional variable 2020-04-17 18:32:14 -07:00
Evan Tschannen b04478704e fixed improper use of std::set erase 2020-04-17 16:45:22 -07:00
Andrew Noyes cb6389d42d Prevent main thread from destroying flatbuffers globals
We recently witnessed (using tsan) the main thread exiting without first
joining the network thread, and this caused data races and
heap-use-after-free's

Now the lifetime of these globals will be tied to the network thread
itself (and I guess every thread, but the one that actually uses memory
will be owned by the network thread.)
2020-04-17 23:34:28 +00:00
Meng Xu c8d049d0bb FastRestore:Loader:Add counter oldLogMutations 2020-04-17 15:21:59 -07:00
Evan Tschannen ba3e2af473 Merge commit '5288033bcfe40c3ade97c8bf2d04cf31b3f16cb1' into feature-tree-broadcast 2020-04-17 15:17:37 -07:00
Evan Tschannen 33efb9ec97 code cleanup based on review comments 2020-04-17 15:05:01 -07:00
Meng Xu 7e890cb6be FastRestore:Minor simplify code 2020-04-17 15:00:07 -07:00
Evan Tschannen 4c51e0a05b
Update fdbserver/worker.actor.cpp
Co-Authored-By: A.J. Beamon <ajbeamon@users.noreply.github.com>
2020-04-17 14:44:58 -07:00
Meng Xu 14406ad940 BackupAndParallelRestoreCorrectness:Assert on backup validity 2020-04-17 13:51:39 -07:00
Evan Tschannen b667d5442f fix: not all removed endpoints were actually removed 2020-04-17 13:47:54 -07:00