Stephen Atherton
2925b9b984
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-07-03 23:03:56 -07:00
Stephen Atherton
09e68a4335
Lots of bug fixes around page reads and concurrency.
2018-07-03 15:39:32 -07:00
Evan Tschannen
604b3bca17
increased the api correctness timeout
2018-07-02 12:51:50 -04:00
Evan Tschannen
89a4b2cd68
fix: consistency check could loop too long
2018-07-02 12:08:02 -04:00
Evan Tschannen
73e61312c6
fix: shareLogRange was never initialized
2018-07-01 22:49:24 -04:00
Stephen Atherton
b95a2bd6c1
Merge commit 'b17c8359ec22892ed4daeaa569f2f5e105477251' into feature-redwood
...
# Conflicts:
# flow/Trace.cpp
2018-06-30 23:18:29 -07:00
Evan Tschannen
4a3247da69
fixed a few problems with the consistency check
2018-06-30 10:39:28 -07:00
Evan Tschannen
02f616eb68
fix: consistency check was broken when the key server key space is sharded
2018-06-28 23:16:32 -07:00
Balachandar Namasivayam
8caa6eaecf
Merge pull request #541 from etschannen/feature-remote-logs
...
More multiple DC improvements
2018-06-28 11:22:08 -07:00
Evan Tschannen
45cf0067e4
fix: consistency check was not checking for data inconsistencies
2018-06-28 11:08:16 -07:00
A.J. Beamon
9f545ce002
Merge commit '892727e358c0b3f075564c60c2b7cedb64306f83' into trace-log-refactor
2018-06-26 11:37:23 -07:00
Stephen Atherton
e5c48d453a
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-18 22:45:27 -07:00
Evan Tschannen
1ccfb3a0f4
fix: log_anti_quorum was always 0 in simulation
...
removed durableStorageQuorum, because it is no longer a useful configuration parameter
2018-06-18 10:24:57 -07:00
Evan Tschannen
0913368651
added usable_regions to specify if we will replicate into a remote region
...
remote replication defaults to the primary replication
removed remote_logs, because they should be specified as an override in the regions object
2018-06-17 19:31:15 -07:00
Stephen Atherton
90c8288c68
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-17 14:55:05 -07:00
Evan Tschannen
99e21c869c
fixed a number of status calculations, and re-enabled the status workload
2018-06-14 17:58:57 -07:00
Stephen Atherton
1eae9d621b
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
2018-06-13 15:58:21 -07:00
Stephen Atherton
2878f30f29
Merge branch 'master' of github.com:apple/foundationdb into feature-redwood
...
# Conflicts:
# fdbserver/IKeyValueStore.h
# fdbserver/KeyValueStoreMemory.actor.cpp
# fdbserver/storageserver.actor.cpp
2018-06-13 15:56:06 -07:00
Richard Low
39894ea798
Merge remote-tracking branch 'apple/release-5.2'
2018-06-12 18:31:20 -07:00
Evan Tschannen
372ed67497
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
2018-06-11 11:34:10 -07:00
A.J. Beamon
f965954122
Merge commit '82be52205b95464e355c449fdf3e7d483fa06677' into trace-log-refactor
...
# Conflicts:
# fdbserver/Status.actor.cpp
# fdbserver/workloads/DDMetrics.actor.cpp
# flow/Trace.cpp
2018-06-08 16:22:22 -07:00
Evan Tschannen
b9826dc1cb
fix: do not automatically reduce redundancy we move keys if the database does not have remote replicas. This is to prevent problems when dropping remote replicas from a configuration.
2018-06-08 16:17:27 -07:00
A.J. Beamon
99c9958db7
Some more trace event normalization
2018-06-08 13:57:00 -07:00
A.J. Beamon
0ca51989bb
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# fdbserver/QuietDatabase.actor.cpp
# fdbserver/Status.actor.cpp
# flow/Trace.cpp
2018-06-08 13:24:30 -07:00
Balachandar Namasivayam
20febf5ef9
Address review comments.
2018-06-08 11:24:51 -07:00
A.J. Beamon
e5488419cc
Attempt to normalize trace events:
...
* Detail names now all start with an uppercase character and contain no underscores. Ideally these should be head-first camel case, though that was harder to check.
* Type names have the same rules, except they allow one underscore (to support a usage pattern Context_Type). The first character after the underscore is also uppercase.
* Use seconds instead of milliseconds in details.
Added a check when events are logged in simulation that logs a message to stderr if the first two rules above aren't followed.
This probably doesn't address every instance of the above problems, but all of the events I was able to hit in simulation pass the check.
2018-06-08 11:11:08 -07:00
Balachandar Namasivayam
514b0e3c20
Having fixed limits for getRange results in continuously getting transaction_too_old error in some scenarios.
...
Cutting the limits by half in such cases allows to test to progress.
2018-06-07 15:27:05 -07:00
Evan Tschannen
b423d73b42
fix: do not finish a shard relocation until all of the storage servers have made the current recovery version durable. This is to prevent dropping a needed storage server as a source for a shard after dropping a remote configuration
2018-06-07 12:29:25 -07:00
A.J. Beamon
78839b20fd
Merge branch 'master' into trace-log-refactor
...
# Conflicts:
# flow/Trace.cpp
2018-05-31 10:46:20 -07:00
A.J. Beamon
026458baf3
Merge release-5.2 into master
2018-05-23 15:32:56 -07:00
Alec Grieser
40babc40e1
remove one unnecessary line ; fix else formatting
2018-05-15 17:20:44 -07:00
Alec Grieser
6d132717f2
add versionstamp compatibility test to VersionStampWorkload
...
surfaces error found in #387
2018-05-15 17:09:24 -07:00
A.J. Beamon
02df30149f
Merge branch 'release-5.2' into trace-log-refactor
2018-05-11 11:22:34 -07:00
Evan Tschannen
91338fc984
Merge branch 'master' into feature-remote-logs
2018-05-10 15:33:45 -07:00
Evan Tschannen
8f984cb2c9
Merge branch 'release-5.2'
...
# Conflicts:
# fdbrpc/TLSConnection.h
2018-05-10 09:13:22 -07:00
Evan Tschannen
f6e55d0b74
Merge pull request #348 from etschannen/release-5.2
...
DR upgrade tests now test the durability of the data.
2018-05-09 15:40:03 -07:00
Evan Tschannen
8930c2e3db
DR upgrade tests now test the durability of the data.
2018-05-09 15:11:05 -07:00
Alec Grieser
464e2cdbf0
change SetVersionstampedKey and SetVersionstampedValue behavior based on API version to make them consistent
2018-05-08 08:57:09 -07:00
Alec Grieser
14cca75429
server components of version of alternative versionstamp op that writes to an arbitrary place in the value
2018-05-08 08:57:08 -07:00
A.J. Beamon
ce0c991e78
Refactor trace events to store a vector of fields that aren't encoded until write time. Better support for pre-network trace events. Rework how trace events are queried. Some initial work towards pluggable formatting of logs.
2018-05-02 10:44:38 -07:00
Evan Tschannen
656a817e74
fix: only reconfigure during the quiet database check, because excluding at the same time as reconfiguring causes the master to indefinitely restart recovery
2018-05-01 15:31:49 -07:00
Evan Tschannen
10d25927cd
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# fdbserver/DataDistribution.actor.cpp
2018-04-30 22:15:39 -07:00
Evan Tschannen
9cdabfed0e
added useful trace events
2018-04-29 18:54:47 -07:00
Alec Grieser
69e831d522
Merge remote-tracking branch 'upstream/release-5.2' into merge-release-5.2
2018-04-28 17:44:52 -07:00
Yichi Chiang
c721ab6854
Fix review comments
2018-04-27 13:54:34 -07:00
Yichi Chiang
6bddf8aefa
Upgrade DR from 5.1 to 5.2
2018-04-26 17:24:40 -07:00
Stephen Atherton
af61d3596d
Merge branch 'public-master' into feature-redwood
...
# Conflicts:
# fdbserver/DatabaseConfiguration.cpp
# fdbserver/OldTLogServer.actor.cpp
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
2018-04-24 17:22:21 -07:00
Evan Tschannen
c1ccc8522c
Merge branch 'release-5.2'
2018-04-17 18:38:12 -07:00
Evan Tschannen
db98c1b9b6
Merge branch 'release-5.1' into release-5.2
...
# Conflicts:
# versions.target
2018-04-17 18:36:19 -07:00
Yichi Chiang
a4e8b6492c
Fix DR Upgrade workload backup range
2018-04-13 09:59:32 -07:00
Evan Tschannen
19762b847d
Merge branch 'release-5.2'
...
# Conflicts:
# fdbserver/DatabaseConfiguration.cpp
# fdbserver/SimulatedCluster.actor.cpp
2018-04-10 17:02:43 -07:00
Evan Tschannen
c1ba16b3c8
Merge branch 'release-5.1' into release-5.2
...
# Conflicts:
# bindings/java/src/test/com/apple/foundationdb/test/AbstractTester.java
# bindings/java/src/test/com/apple/foundationdb/test/VersionstampSmokeTest.java
# bindings/nodejs/lib/fdb.js
# bindings/nodejs/src/Version.h
# bindings/nodejs/tests/tuple_test.js
2018-04-10 16:50:47 -07:00
Evan Tschannen
b0a88001cc
Merge pull request #132 from yichic/support-dr-upgrade-test
...
Support DR upgrade test
2018-04-10 16:30:19 -07:00
Yichi Chiang
d0230d4d13
Support DR upgrade test in 5.1
2018-04-10 15:19:53 -07:00
Evan Tschannen
7af892f50b
first working version of non-copying recovery working with fearless configurations
2018-04-08 21:24:05 -07:00
Stephen Atherton
2752a28611
Merge branch 'release-5.2' of github.com:apple/foundationdb into feature-redwood
2018-04-06 16:29:37 -07:00
Evan Tschannen
b95e68eb5a
fix: getDatabaseSize is really inefficient and causes slow tasks in the real world. Outside of simulation just assume the database is really large, because we only need the InvalidShardSize check in simulation
2018-03-26 17:35:11 -07:00
Evan Tschannen
d3fb17d30a
Merge pull request #74 from bnamasivayam/client-profiling-tests
...
Client profiling tests - Part 1
2018-03-23 16:52:49 -07:00
Balachandar Namasivayam
1e719d79e9
Remove incorrect ASSERT's
...
Account for corner cases in missing chunks.
2018-03-23 15:51:56 -07:00
Evan Tschannen
5db52ab081
Merge pull request #87 from etschannen/feature-remote-logs
...
Feature remote logs
2018-03-23 12:55:17 -07:00
Alec Grieser
551ea9c7f8
Merge remote-tracking branch 'upstream/release-5.2' into master-release-5.2-merge
2018-03-19 12:34:50 -07:00
yichic
ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
...
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang
1f2602d2b3
Fix all review comments
2018-03-19 11:33:33 -07:00
Yichi Chiang
d6559b144f
Share log mutations between backups and DRs which have the same backup range
2018-03-19 11:32:50 -07:00
Balachandar Namasivayam
9e3e3c8561
Add some sanity checks to deserialized data.
2018-03-16 18:45:25 -07:00
Yichi Chiang
f12c1d811c
Fix all review comments
2018-03-16 18:09:23 -07:00
Yichi Chiang
26b93ff920
Share log mutations between backups and DRs which have the same backup range
2018-03-16 18:09:23 -07:00
Balachandar Namasivayam
89d7cc1093
Minor Bug fixes...
2018-03-15 11:00:47 -07:00
Evan Tschannen
65b532658f
added support for single region configurations
2018-03-15 10:59:30 -07:00
Alec Grieser
0853fcb052
switch to using zu for some size_t variables in printf
2018-03-14 18:07:05 -07:00
Balachandar Namasivayam
856d2a0a9d
Add correctness tests for Client transaction profiling data format. It also includes format check across upgrades.
2018-03-14 12:39:50 -07:00
Alec Grieser
70a05c1a9b
fix some compiler whinges
2018-03-13 15:00:16 -07:00
Evan Tschannen
3abf4d7fdf
Merge branch 'master' into feature-remote-logs
2018-03-09 14:50:04 -08:00
Evan Tschannen
91bb8faa45
Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
Evan Tschannen
28ea983487
Merge branch 'release-5.1' into release-5.2
...
# Conflicts:
# flow/Trace.cpp
# versions.target
2018-03-09 14:40:31 -08:00
Evan Tschannen
cf6dd1437b
suppress spammy trace events
2018-03-09 10:16:34 -08:00
Balachandar Namasivayam
e7309a3535
Add trace events to print the ranges in ConsistencyCheck.
2018-03-08 13:53:59 -08:00
Evan Tschannen
cf9d02cdbd
Merge pull request #48 from apple/release-5.2
...
Merge release-5.2 into master
2018-03-08 13:21:26 -08:00
Balachandar Namasivayam
4f58bca66a
Simple refactor of code...
2018-03-08 11:34:25 -08:00
Balachandar Namasivayam
1c1a497ea2
Refactor getKeyServers to be more readable.
...
Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers.
Simplify getMasterProxies on DatabaseContext class.
2018-03-08 11:34:18 -08:00
Balachandar Namasivayam
03a40354e3
Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100.
...
Fix the bug where some of the key server shards may not be fetched.
2018-03-08 11:34:11 -08:00
Evan Tschannen
1194e3a361
added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed.
2018-03-05 19:27:46 -08:00
Balachandar Namasivayam
aea1f7ba21
Add tests for Client Transaction Profiling correctness
2018-03-05 18:55:23 -08:00
Alec Grieser
218b7a41e2
add APPEND_IF_FITS to workload and remove guard ; add command to vexillographer
2018-03-02 17:43:39 -08:00
Evan Tschannen
470f5c01f3
changed remoteDcId to a vector of ids, to support future configurations where there are multiple remote databases
2018-02-26 17:09:09 -08:00
Evan Tschannen
37a6a81634
Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
...
# Conflicts:
# fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen
719bb5bd0c
Merge pull request #4 from bnamasivayam/getKeyServers-refactor
...
Having 1000 as the limit for Limit for GetKeyServerLocationsRequest s…
2018-02-22 12:39:48 -08:00
Balachandar Namasivayam
2fe2b522d5
Simple refactor of code...
2018-02-22 12:38:14 -08:00
Alec Grieser
e1162e9238
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-22 11:16:12 -08:00
Balachandar Namasivayam
e2030db5a8
Refactor getKeyServers to be more readable.
...
Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers.
Simplify getMasterProxies on DatabaseContext class.
2018-02-21 17:11:50 -08:00
Alec Grieser
0bae9880f1
remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py
2018-02-21 10:25:11 -08:00
Balachandar Namasivayam
6218934c7b
Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100.
...
Fix the bug where some of the key server shards may not be fetched.
2018-02-20 17:41:34 -08:00
Alec Grieser
aadc06de99
Merge remote-tracking branch 'upstream/release-5.1'
2018-02-20 14:28:29 -08:00
Evan Tschannen
9ea963ddd6
fix: the master did not detect core state changes if it changed while writing
...
fix: do not attempt to use three_data_hall when in a fearless deployment
fix: log router tags are ephemeral and can be cleared after every recovery
2018-02-19 16:49:57 -08:00
Evan Tschannen
1b5628d2c5
testing a single configured fearless setup in simulated cluster
...
consolidated simulation connection disablers into one call in the tester
automatically reconfigure from a fearless setup in simulation
2018-02-18 12:59:43 -08:00
Stephen Atherton
54fc81b260
Improved backup error reporting in backup status. The most recent error for each error type is reported along with how long ago the error occurred, and errors are divided into two categories based on whether or not they occurred since the most recent backup progress.
2018-02-16 19:38:31 -08:00
Evan Tschannen
cb25564d38
simulated cluster supports fearless configurations
...
removed unused simulation variables
run the simulation with only 1 coordinator most of the time, since we protect the coordinator from being killed, and protecting too many things is bad for simulation
2018-02-15 18:32:39 -08:00
Evan Tschannen
5303962af6
re-enabled configure database and remove servers safely, even though they do not work with fearless
2018-02-14 16:07:23 -08:00
Evan Tschannen
1fedcba890
fix: do not use log router tags when configured without remote logs
...
fix: data distribution tracks undesired storage servers
re-enabled consistency check
2018-02-13 17:01:34 -08:00
Stephen Atherton
0a35f167e4
Merge branch 'master' into feature-redwood
...
# Conflicts:
# fdbserver/DiskQueue.actor.cpp
# fdbserver/IDiskQueue.h
# fdbserver/Knobs.cpp
# fdbserver/Knobs.h
# fdbserver/fdbserver.vcxproj
# fdbserver/fdbserver.vcxproj.filters
# fdbserver/worker.actor.cpp
2018-02-12 01:30:02 -08:00
Evan Tschannen
42405c78a5
Merge commit '4038bd2fd968d88861f2cebd442ce511724816cb' into feature-remote-logs
...
# Conflicts:
# fdbserver/ClusterController.actor.cpp
# fdbserver/Knobs.cpp
2018-02-10 12:08:52 -08:00
Evan Tschannen
ebd94bb654
removed a separately configurable storage team size for the remote data center, because it did not make sense
...
fix: the master did not monitor for the failure of remote logs
stop merge attempts when a data center is failed
fixed a variety of other problems with data distribution when a data center is failed
2018-02-02 11:46:04 -08:00
A.J. Beamon
0c601d6f85
Purge past version references
2018-01-31 12:05:41 -08:00
Evan Tschannen
698ef4117e
Merge branch 'master' into feature-remote-logs
2018-01-20 10:34:30 -08:00
Evan Tschannen
b78e0a362a
fix: do not pause when running multiple backup tests simultaneously
2018-01-18 12:24:33 -08:00
Stephen Atherton
93b34a945f
Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup.
2018-01-17 04:09:43 -08:00
Evan Tschannen
3ec45d38a0
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Stephen Atherton
b86f68ceb8
Added new test that combines atomic backup/restore. Added randomization to delays in AtomicRestore workload.
2018-01-05 14:43:21 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen
f8f1c48d83
sometimes test pausing backups
2018-01-04 11:40:08 -08:00
Evan Tschannen
86958cb08d
Merge pull request #226 from cie/fix-taskBucket-unblockFuture
...
Modify TaskBucketCorrectness to support chain and multiple tasks
2017-12-20 18:00:54 -08:00
Yichi Chiang
91e5abeaa6
Modify TaskBucketCorrectness to support chain and multiple tasks
2017-12-20 17:02:49 -08:00
Evan Tschannen
982f0dcb1e
Merge pull request #222 from cie/alexmiller/drtimefix2
...
Fix yet another VersionStamp DR issue.
2017-12-20 15:09:23 -08:00
Stephen Atherton
e0d9cea008
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Alex Miller
c7dbd31a1e
Refactoring: Create a common prefixRange and do UID->Key once in backup.
2017-12-19 17:17:50 -08:00
Stephen Atherton
e28641886d
TraceEvent improvements. Minor bug fix, restore log writing tasks didn't have the log file endVersion but it's only for logging purposes.
2017-12-19 15:27:04 -08:00
Evan Tschannen
1dc9eceb6d
optimize GetKeyLocationRequests on the proxy so they only require a single map lookup, instead of doing 3 + (3* [number of ranges]) lookups
2017-12-15 20:13:44 -08:00
Stephen Atherton
33f9f1a95c
Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration.
2017-12-14 01:44:38 -08:00
Evan Tschannen
7ce93426ed
fix: connection disabler in removeServerSafely needs to run for the whole test to avoid getting stuck on include all
2017-12-12 18:38:57 -08:00
Alec Grieser
4495a19299
Merge pull request #220 from cie/alexmiller/flowprofcircus
...
Add class restrictions to CpuProfiler, and fix metric crash.
2017-12-11 14:13:22 -08:00
Evan Tschannen
73a0a07eac
clients ask for key location information directly from the proxy, instead of reading it from the database
2017-12-09 16:10:22 -08:00
Alex Miller
48660e9ce5
Add class restrictions to CpuProfiler, and fix metric crash.
...
This change largely refactors away the old meaning of the value given to
flow_profiler, which was the number of machines that we'd be profiling, and
instead replaces it with the classes of processes to profile for the duration
of the test. Most importantly, this means that one can profile in circus with
a configuration that has "ssd" in it, and the circus run will still complete
(as long as the argument isn't "storage").
And also finally add some other fixes I had to the same file to conditionally
change the name of the metric we're looking for to comply with what's actually
written.
2017-12-07 19:28:29 -08:00
Stephen Atherton
f8e89a40ac
Bug fixes, take(1) is incorrect usage of FlowLock.
2017-12-04 10:25:47 -08:00
Yichi Chiang
8ba0eaebff
Check cluster controller using desired process class in consistency check
2017-11-29 15:09:23 -08:00
Stephen Atherton
6695c9e6a2
Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files.
2017-11-25 00:46:16 -08:00
Stephen Atherton
a77162b53d
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/BackupAgent.h
# fdbclient/FileBackupAgent.actor.cpp
# fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton
3dfaf13b67
IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
...
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively().
Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
Alex Miller
311d1ca87d
A variety of fixes that collectively fix using flow profiling in circus.
...
To run, use --co=flow_profiling=-1, because reasons.
2017-11-07 13:55:16 -08:00
Evan Tschannen
57aba0b3bc
fix: excluded servers were the same fitness as storage servers for the master role
...
fix: better master exists did not considers exclusion for master fitness
2017-11-03 17:09:14 -07:00
Evan Tschannen
54d82c0d92
Merge pull request #194 from cie/alexmiller/valgrind
...
Fix valgrind errors
2017-10-27 17:25:12 -07:00
Alex Miller
e0d33ef8d7
Preemptively fix profiler-related valgrind errors/straight out bugs.
...
I forgot to initialize some fields in requests.
2017-10-27 17:20:19 -07:00
Balachandar Namasivayam
cfefab18fb
Merge branch 'master' into add-new-atomic-ops
2017-10-25 18:03:34 -07:00
Balachandar Namasivayam
3d5658940a
Addressed Review Comments
2017-10-25 16:42:05 -07:00
Balachandar Namasivayam
9dd588dcce
Addressed review comments.
...
Changed naming for NewMin and NewAnd to MinV2 and AndV2
2017-10-25 14:48:05 -07:00
Balachandar Namasivayam
2f6d55a52f
Add correctness tests for all atomic ops
2017-10-25 13:36:49 -07:00
Yichi Chiang
c2a117fe07
Merge pull request #189 from cie/enable-check-desired-class
...
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 15:18:21 -07:00
Yichi Chiang
defdc6550d
Exclude excluded processses when getting testers
2017-10-24 15:16:34 -07:00
Yichi Chiang
3865c5ae0e
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 12:58:54 -07:00
Balachandar Namasivayam
8c3bdc5b3b
Make atomic ops differentiate between unset and empty values.
2017-10-23 16:48:13 -07:00
Evan Tschannen
7a36fd2134
disabled a variety of simulation tests to get correctness clean
2017-10-19 15:49:54 -07:00
Evan Tschannen
e2c1e87df6
made a large number of fixes to make fearless DR correctness clean.
2017-10-19 15:36:32 -07:00
Bhaskar Muppana
360b777b78
Fail with correct error code in case of abort or discontinue of
...
non-existing backups.
2017-10-18 23:17:48 -07:00
Alec Grieser
dd6d8f3b0e
Merge branch 'master' into add-new-atomic-ops
2017-10-18 16:36:44 -07:00
Bhaskar Muppana
314511f4d7
Fixing spaces in BackupCorrectness TraceEvents.
2017-10-18 14:27:52 -07:00
Alex Miller
7b9bc1d715
Merge pull request #170 from cie/alexmiller/flowprofile
...
Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.
2017-10-16 16:51:53 -07:00
Alex Miller
91a26a170c
Add toggleable profiling support to fdbserver+fdbcli.
...
This adds the fdbcli commands:
* profile list -- Lists all workers in a way that doesn't fill `kill`'s list.
* profile flow run -- Allows starting flow profiling on a set of hosts for a specified interval.
And threads through all the support for enabling and disabling profiling as an RPC.
2017-10-16 16:05:02 -07:00
Bhaskar Muppana
d1e9d28239
Backup log messages.
2017-10-12 16:12:42 -07:00
Stephen Atherton
11517f7bfc
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-10-12 11:03:23 -07:00
Balachandar Namasivayam
8e0bea2795
Update API_VERSION from 500 to 510
2017-10-11 13:49:38 -07:00
Evan Tschannen
8feb3b8fbc
fixed conflict range workload by just disabling timeKeeper instead of the check, because it should be a more robust fix
2017-10-10 16:01:02 -07:00
Balachandar Namasivayam
eeebf10030
Modified existing behavior of MIN and AND atomic ops. The new behavior results in a 'SET' if the atomic op is performed on a non -existing key.
...
Added new atomic ops ByteMin and ByteMax that does lexicographic comparison of byte strings.
2017-10-10 13:02:22 -07:00
Evan Tschannen
c8525dc3e7
timekeeper is constantly changing keys in the system keyspace, so do not report errors on key mismatches on keys in the system keyspace
2017-10-10 12:04:56 -07:00
Alex Miller
a21c8a820b
Move cpuProfilerRequest from WorkerInterface to ClientWorkerInterface.
...
A way to access this stream is required if we wish to be able to toggle
profiling from fdbcli. There's two ways to do this:
1. Use `monitorLeader()` to get a `ClusterControllerFullInterface`, and use
`getWorkers` from there to get a list of `WorkerInterface`s, from which we can
access cpuProfilerRequest.
2. Move cpuProfilerRequest to ClientWorkerInterface and use the existing code
in the client that can fetch a list of all `ClientWorkerInterface`s.
The split between WorkerInterface and ClientWorkerInterface appears to be
what a client might have a need to call versus what is fdbserver-internal (and
thus no client should even want to call). Thus, it seems to make more sense to
acknowledge that profiling is useful to be able to toggle from a client, and go
with option (2).
2017-10-05 14:08:28 -07:00
Alex Miller
11668bb359
Fixing code review comments.
2017-09-29 15:58:36 -07:00
Alex Miller
b7ce9d996c
Comment out verbose TraceEvents in preparation for pushing.
2017-09-29 15:58:36 -07:00
Alex Miller
c40c1bb5fe
Add a new workload: BackupToDBAbort, which does an ACI switchover.
...
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Alex Miller
9e9a96ae76
Make VersionStamp workload able to run with DR-style workloads.
...
* It is now tolerant of locked database errors, and handles them correctly.
* There is an option to specify which database to verify against.
2017-09-29 15:58:36 -07:00
Alex Miller
34630b6130
Make VersionStamp workload can handle commit_unknown_result.
...
Previously, if a transaction failed with commit_unknown_result, and was
actually committed, it would look like data that magically appeared in the
database and verification would fail.
Now, we explicitly re-read and check to see if the commit happened, so that we
may maintain an accurate understanding of what the database state should be.
2017-09-29 15:58:36 -07:00
Alex Miller
23945b9fea
VersionStamp can co-exist with other workloads that write data to the database.
...
VersionStamp previously would range-read the entire database during validation.
This has the unfortunate effect of making it fail during validation if run with
any other workload that writes keys to the database.
Now, all keys written and read are done with a configurable prefix, so that it
may co-exist with a variety of other workloads.
2017-09-29 15:58:36 -07:00
Alex Miller
370a6afb80
Make VersionStamp have an option to be tolerant of data being lost.
2017-09-29 15:58:36 -07:00
Alex Miller
69523ce151
Hackish version of a test, but it does fail.
2017-09-29 15:58:36 -07:00
Alex Miller
65713b226f
Fix whitespace and line endings.
2017-09-29 15:58:36 -07:00
Bhaskar Muppana
91975244fe
Fixing OSX build.
2017-09-28 19:35:44 -07:00
Bhaskar Muppana
942c04e992
Merge pull request #162 from bmuppana/master
...
Fixing TimeKeeperCorrectness to deal with network delays.
2017-09-28 17:04:39 -07:00
Bhaskar Muppana
3d2bafc3a6
Fixing TimeKeeperCorrectness to deal with network delays.
2017-09-28 16:52:28 -07:00
Evan Tschannen
ef41b07bb3
renamed past_version to transaction_too_old
...
implemented read_lock_aware option
2017-09-28 16:35:08 -07:00
Evan Tschannen
7b60e26660
Merge pull request #160 from cie/use-error-descriptions
...
Add the ability to access name and description in Error. Update error…
2017-09-28 16:00:39 -07:00
Evan Tschannen
73fca75239
added the ability to disable timeKeeper; disabled timeKeeper before consistency check in simulation
2017-09-28 13:13:24 -07:00
A.J. Beamon
d30c730f75
Add the ability to access name and description in Error. Update error descriptions.
2017-09-28 12:35:03 -07:00
Bhaskar Muppana
0f8ff26029
Merge pull request #158 from bmuppana/master
...
<rdar://problem/34557380> Need a way to map real time to version
2017-09-27 17:56:42 -07:00
Bhaskar Muppana
6a0b1d6808
Fixing PR comments
...
<rdar://problem/34557380> Need a way to map real time to version
2017-09-27 17:56:01 -07:00
Evan Tschannen
2bf042a559
fix: file_corrupt was not checking for fault injection
...
latency threshold was too long
2017-09-25 17:22:41 -07:00
Bhaskar Muppana
0bf5bdb23a
<rdar://problem/34557380> Need a way to map real time to version
2017-09-25 12:51:37 -07:00
Stephen Atherton
248dab79b6
Created “redwood” storage engine option and many changes to support that including IKeyValueStore::init() and custom DiskQueue file extensions.
2017-09-21 23:51:55 -07:00
Evan Tschannen
e8b895c878
added the ability to disable connection failures for a period of time after one happens
2017-09-18 12:46:29 -07:00
Evan Tschannen
489332533c
all timeouts longer than two minutes have been can be lowered to 60.0 with buggification
...
added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures
2017-09-18 11:04:51 -07:00
Evan Tschannen
34f987f56d
added a test in simulation which ensures that a recovery after a single failure takes less than 15 seconds
2017-09-15 17:55:01 -07:00
A.J. Beamon
4fa2415553
Merge branch 'release-5.0'
2017-09-08 17:28:12 -07:00
A.J. Beamon
bb8a245bdb
circus: throughput test scales latency error by the target latency
2017-09-08 17:27:54 -07:00
Bhaskar Muppana
c7df951f7c
Using BackupConfig from backup.actor.cpp to reduce intermediate
...
functions.
2017-09-07 08:36:36 -07:00
Bhaskar Muppana
fe208d6adf
Merge branch 'master' of github.com:apple/foundationdb into backup
2017-09-06 10:01:55 -07:00
Bhaskar Muppana
83810edabc
Backup/Restore tag can be std::string instad of Key.
2017-09-05 11:38:40 -07:00
A.J. Beamon
cc24072a5d
Add the multi version API to the list of APIs to choose in the APICorrectness tester. Support for the multi-version client already existed.
2017-08-31 16:23:55 -07:00
Evan Tschannen
d61be4c760
Merge branch 'release-5.0'
2017-08-30 12:59:24 -07:00
Evan Tschannen
963e1c3f31
fix: we need to reboot the process even if it will result in too many files, because the check will not succeed without it
2017-08-30 12:58:46 -07:00
Alvin Moore
6020d70863
Added trace event to track reboots initiated by ConsistencyCheck workload in simulation
2017-08-29 11:41:27 -07:00
Alvin Moore
c95a1be5ec
Add trace event for rebooting process during simulation for consistency check
2017-08-29 11:00:44 -07:00
A.J. Beamon
9a0a3b6329
Merge commit '66528becb82d826e81fa644bb378212584ab580e'
2017-08-28 16:47:59 -07:00
Alvin Moore
44e0df78c5
Added support for tracking roles for simulation workers
...
Fixed the exclusion and inclusion address simulation API and integration within workloads
Added more information within trace events for simulation
2017-08-28 11:25:37 -07:00
Alvin Moore
581bd6c8ed
Added option to delay the displaying of the simulation workers
2017-08-28 10:53:56 -07:00
Alec Grieser
300b5a17ed
Merge branch 'release-5.0'
2017-08-25 18:55:33 -07:00
Evan Tschannen
272b4b984c
fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine
2017-08-25 10:12:58 -07:00
Evan Tschannen
26a5b5e422
rollback workload now clogs the communication between one of the proxies and the tlogs, since that is what will cause a rollback
2017-08-23 16:08:13 -07:00
Alvin Moore
8056b78414
Merge branch 'release-5.0'
2017-08-22 13:51:19 -07:00
Alvin Moore
814e471689
Added support for displaying initial workers via printf within simulation using a workload
2017-08-22 13:38:24 -07:00
Evan Tschannen
47a37f3f1e
Merge pull request #135 from cie/switch-for-data-distribution
...
Add a switch to turn off data distribution in CLI
2017-08-07 12:54:08 -07:00
Alec Grieser
ca7437ecf6
Merge branch 'release-5.0'
2017-08-02 22:07:01 -07:00
John King
d0fbc41338
set LOCK_AWARE on several transactions used for getting cluster info for the consistency check
2017-07-28 18:50:32 -07:00
Yichi Chiang
6a8a5c41b0
Add a switch to turn off data distribution in CLI
2017-07-28 18:14:55 -07:00
Yichi Chiang
53e1ae9f60
shard system keyspace
2017-07-26 13:47:31 -07:00
Alvin Moore
96f40d8eb0
Added support for optionally re-including excluded servers
...
Removed unneeded code to protect coordinators
Ensured that simulation exclusion list is updated for excluded processes
2017-06-19 16:51:07 -07:00
Evan Tschannen
766dc23e26
fix: do not use TLS in protectedAddresses
2017-06-02 13:52:21 -07:00
Evan Tschannen
2d0dbd57e8
randomized the delays in atomic switchover workload
2017-06-01 12:08:21 -07:00
Evan Tschannen
1626e16377
Merge branch 'release-4.6' into release-5.0
2017-05-31 16:23:37 -07:00
Alvin Moore
333f2e4865
Added connection fault disabler within setup of backup submission. It should be reviewed to determine the amount of time to wait before disabling
2017-05-31 14:21:50 -07:00
Stephen Atherton
98604d33a0
Merge branch 'fix-io-timeout-handling'
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/KeyValueStoreSQLite.actor.cpp
# fdbserver/optimisttest.actor.cpp
# fdbserver/worker.actor.cpp
# fdbserver/workloads/MachineAttrition.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
Alvin Moore
ba606c2135
Removed NOT_IN_CLEAN macro from file
2017-05-26 14:52:06 -07:00
Alvin Moore
b28ed397a2
Fixed printf field width specifier to reduce compilation warnings within OS X
2017-05-26 14:51:34 -07:00
Alvin Moore
0b9ed67e12
Fixed support for RemoveServers Workload
...
Added availability functions to simulation
2017-05-26 14:20:11 -07:00
Alvin Moore
16cc0821b1
Removed dead machine option from simulation
2017-05-25 16:29:02 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00