Alex Miller
f19cb3bbbd
Merge pull request #208 from cie/alexmiller/grvtfix
...
Fix the GRV performance regression
2017-11-17 15:00:44 -08:00
A.J. Beamon
3ded271153
Dispose of Cluster objects in fdb.open()
2017-11-17 12:21:14 -08:00
Alec Grieser
f657be8136
add a space to match the bracing style used in this file
2017-11-17 09:55:11 -08:00
A.J. Beamon
0981e0dcdd
Dispose of newly created transactions if transfer() fails.
2017-11-17 09:47:17 -08:00
Bhaskar Muppana
1bf84cd51a
Merge pull request #210 from bmuppana/backup-logs
...
Adding TraceEvents for BackupRangeTask.
2017-11-16 19:12:04 -08:00
Bhaskar Muppana
5e596ea670
Adding TraceEvents for BackupRangeTask.
2017-11-16 19:11:31 -08:00
Yichi Chiang
d9a98aa968
Remove commented code
2017-11-16 17:25:37 -08:00
Evan Tschannen
7211a397b0
Merge pull request #209 from cie/fix-double-recoveries
...
Fix double recoveries
2017-11-16 17:16:39 -08:00
Yichi Chiang
0d5dc15ac8
Fix double recoveries
2017-11-16 16:58:55 -08:00
Stephen Atherton
07c19098fe
Improved backup container unit test, added file reading / verification, more data, and a series of expirations and validating the expected result. Then fixed the bugs that this new testing discovered.
2017-11-16 16:19:56 -08:00
Alex Miller
e9412bbb11
Fix the GRV performance regression introduced by adding the policy engine to GRV calculations.
...
Construction of LocalityGroup from LocalityData is expensive, and the previous
code greatly ran afoul of that. The policy engine does a large amount of
interning of strings and building compressed maps to make the expected many
future selectReplica calls cheap. Unfortunately we don't call selectReplicas,
so much of this work is undesireable for us, and a large amount of CPU time is
spent doing this initialization work.
The new changes aggressively do the minimal LocalityGroup::add() calls
necessary, and make them as cheap as possibly by removing all elements from
LocalityData that don't need to be considered by the policy.
This optimization was also applied to the PeekCursor used during recovery,
which should speed recoveries up by a small amount.
2017-11-16 16:15:52 -08:00
A.J. Beamon
5b5850e097
The dispose in Database.createTransaction was supposed to happen on error, not in the finally block
2017-11-16 10:50:13 -08:00
Stephen Atherton
f105204aca
Shifted version distribution over folders.
2017-11-15 23:13:04 -08:00
Stephen Atherton
cc47d0e161
Bug fix in restore dispatch, begin file was not being incremented. Removed try/catch because the inherited handleError() is better.
2017-11-15 22:38:31 -08:00
Alex Miller
e900333dbf
Fix a subtle valgrind error.
...
If there was an error in waiting for the read version, we would attempt to
serialize and eventually commit a CommitTransactionRef that had an
uninitialized read_snapshot.
2017-11-15 19:21:20 -08:00
Evan Tschannen
ad456a939a
Merge pull request #206 from cie/change-excluded-cluster-controller
...
Change excluded cluster controller
2017-11-15 17:28:33 -08:00
Yichi Chiang
f96faf72d9
Add fullyRecoveredConfig for checking exclusions
2017-11-15 17:15:24 -08:00
A.J. Beamon
db017317ac
Update the Java bindings to call add missing dispose calls.
2017-11-15 15:56:50 -08:00
Stephen Atherton
ab0017f023
TaskBucket’s TaskFunc interface now has an optional handleError() which is called on any task that throws an error from execute() or finish(). Restore and Backup tasks use this to ensure that any errors that occur are placed in the backup or restore config’s lastError property. Bug fixes in log and range file encodings.
2017-11-15 13:33:09 -08:00
A.J. Beamon
02a9978612
Merge pull request #204 from cie/fdbcli-warn-incompatible-peers
...
Added a counter to keep track of active outgoing incompatible connect…
2017-11-15 12:52:01 -08:00
Evan Tschannen
30464e943c
Merge pull request #205 from cie/cleanup-spammy-traceevents
...
Cleanup spammy traceevents
2017-11-15 12:41:37 -08:00
Evan Tschannen
e113dba0e3
added a new trace event tracking master recovery durations
2017-11-15 12:38:26 -08:00
Stephen Atherton
a77162b53d
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/BackupAgent.h
# fdbclient/FileBackupAgent.actor.cpp
# fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton
e07dcb9ada
Fixed header paths.
2017-11-15 00:05:20 -08:00
Stephen Atherton
3dfaf13b67
IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
...
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively().
Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
Balachandar Namasivayam
987379d790
Changed naming of num_incompatible_connections to numIncompatibleConnections
2017-11-14 18:37:29 -08:00
Yichi Chiang
df922bc973
Change excluded cluster controller
2017-11-14 13:57:37 -08:00
Balachandar Namasivayam
986b73f458
Fixed an issue where an ACTOR outlives an object passed to it and then crashes while accessing it.
2017-11-14 13:51:23 -08:00
A.J. Beamon
bb1297c686
Remove RkServerQueueInfo and RkTLogQueueInfo trace events, since this information is more or less already logged on the storage servers and tlogs. Update the quiet database check and magnesium to use the information from the logs and storage servers.
2017-11-14 12:59:42 -08:00
A.J. Beamon
3b952efb4e
Remove events from cluster controller that get logged for roughly every worker upon recovery, master registration, etc.
2017-11-14 10:15:45 -08:00
Balachandar Namasivayam
27b67cffbe
The earlier implementation of tracking number of incompatible connection had a bug where the counter will be incorrectly decremented for incoming connections on certain conditions.
...
Now the counter increment and decrement happens in the same ACTOR (ConnecitonReader) and makes it easy to verify its correctness.
2017-11-13 15:07:39 -08:00
A.J. Beamon
313e823629
Delete TDMetric data (tmpEventMetric) when a trace event is throttled.
2017-11-13 15:06:21 -08:00
A.J. Beamon
0fea5e9c2f
Convert client_invalid_operation errors to ASSERTs.
2017-11-13 11:38:34 -08:00
A.J. Beamon
cd085764f1
Do not automatically change a cluster file that does not match what you expect.
2017-11-10 14:12:45 -08:00
Balachandar Namasivayam
486d089e98
Better message is displayed to the user.
2017-11-09 13:55:19 -08:00
Balachandar Namasivayam
9809e84806
Added a counter to keep track of active outgoing incompatible connections.
...
This counter is used to print a warning in fdbcli if there are incompatible peers.
Example Output:
./fdbcli
Using cluster file `fdb.cluster'.
WARNING: Incompatible peers exist.
The database is unavailable; type `status' for more information.
Welcome to the fdbcli. For help, type `help'.
fdb> status
WARNING: Incompatible peers exist.
Using cluster file `fdb.cluster'.
Could not communicate with a quorum of coordination servers:
127.0.0.1:4000 (unreachable)
2017-11-09 11:20:35 -08:00
Alex Miller
311d1ca87d
A variety of fixes that collectively fix using flow profiling in circus.
...
To run, use --co=flow_profiling=-1, because reasons.
2017-11-07 13:55:16 -08:00
Alec Grieser
f53afd615e
Merge pull request #203 from cie/update-old-kernel-memavailable-computation
...
Changes to MemAvailable computation on kernels without MemAvailable
2017-11-06 18:30:31 -08:00
A.J. Beamon
d174e05bac
Merge pull request #180 from cie/bindings-versionstamps-in-tuples
...
<rdar://problem/25560444> [Feature] Versionstamped keys and tuple/directory incompatibility
2017-11-06 16:39:17 -08:00
A.J. Beamon
fee6734e71
Add braces around multiline if block
2017-11-06 16:38:32 -08:00
Alec Grieser
396434794d
some python versionstamp api tweaks
2017-11-06 14:56:41 -08:00
Stephen Atherton
7a0aa7b0aa
Merge pull request #202 from cie/fdbcli-status-log-backup-dr-info
...
Fdbcli status log backup dr info
2017-11-06 14:55:20 -08:00
A.J. Beamon
fefbf22b6f
Remove redundant tagCount
2017-11-06 14:00:08 -08:00
A.J. Beamon
b1fe3d1b55
Delete spurious spaces
2017-11-06 12:59:00 -08:00
A.J. Beamon
bf07fa3023
Untested changes to MemAvailable computation on kernels without MemAvailable
2017-11-06 09:35:05 -08:00
A.J. Beamon
7d63c2cfb7
Formatting fixes
2017-11-06 09:20:31 -08:00
A.J. Beamon
839ddd4d2e
Merge pull request #201 from cie/bindings-java-get-range-first-chunk
...
<rdar://problem/35325444> Java Bindings: getRange shouldn't fetch first chunk twice when using asList
2017-11-06 09:09:52 -08:00
Evan Tschannen
150aca4734
Merge pull request #199 from cie/getrange-perf-improvements
...
Getrange perf improvements
2017-11-04 14:25:55 -07:00
Evan Tschannen
33eef4c76b
Merge branch 'master' into getrange-perf-improvements
2017-11-04 14:25:45 -07:00
Evan Tschannen
706bf1e018
fix: we cannot trigger better master exists before a master is fully recovered because exclusions changed by the provisional master will not be committed until the master is fully recovered
2017-11-04 12:48:04 -07:00