A.J. Beamon
0c601d6f85
Purge past version references
2018-01-31 12:05:41 -08:00
Evan Tschannen
698ef4117e
Merge branch 'master' into feature-remote-logs
2018-01-20 10:34:30 -08:00
Evan Tschannen
b78e0a362a
fix: do not pause when running multiple backup tests simultaneously
2018-01-18 12:24:33 -08:00
Stephen Atherton
93b34a945f
Major usability and performance improvements to backup management. Backup descriptions now calculate and display timestamps using TimeKeeper data (if given a cluster) and restorability of snapshots. Expire now requires a --force option to leave a backup unrestorable or unrestorable after a given point in time, specified by version or timestamp. BackupContainerFilesystem now maintains metadata on key version boundaries in order to avoid large list operations for describe and expire operations. Blob parallel recursive list operations can now take a path (aka prefix) filter function. New describe and expire options are available in fdbbackup.
2018-01-17 04:09:43 -08:00
Evan Tschannen
3ec45d38a0
Merge branch 'master' into feature-remote-logs
...
# Conflicts:
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-06 13:54:45 -08:00
Stephen Atherton
b86f68ceb8
Added new test that combines atomic backup/restore. Added randomization to delays in AtomicRestore workload.
2018-01-05 14:43:21 -08:00
Evan Tschannen
5ac4f73978
Merge branch 'release-5.1' into feature-remote-logs
...
# Conflicts:
# fdbclient/NativeAPI.actor.cpp
# fdbrpc/Locality.h
# fdbrpc/simulator.h
# fdbserver/ApplyMetadataMutation.h
# fdbserver/ClusterController.actor.cpp
# fdbserver/LogSystemPeekCursor.actor.cpp
# fdbserver/MasterProxyServer.actor.cpp
# fdbserver/SimulatedCluster.actor.cpp
# fdbserver/TLogServer.actor.cpp
# fdbserver/TagPartitionedLogSystem.actor.cpp
# fdbserver/WorkerInterface.h
# fdbserver/masterserver.actor.cpp
# flow/Net2.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2018-01-05 11:33:42 -08:00
Evan Tschannen
f8f1c48d83
sometimes test pausing backups
2018-01-04 11:40:08 -08:00
Evan Tschannen
86958cb08d
Merge pull request #226 from cie/fix-taskBucket-unblockFuture
...
Modify TaskBucketCorrectness to support chain and multiple tasks
2017-12-20 18:00:54 -08:00
Yichi Chiang
91e5abeaa6
Modify TaskBucketCorrectness to support chain and multiple tasks
2017-12-20 17:02:49 -08:00
Evan Tschannen
982f0dcb1e
Merge pull request #222 from cie/alexmiller/drtimefix2
...
Fix yet another VersionStamp DR issue.
2017-12-20 15:09:23 -08:00
Stephen Atherton
e0d9cea008
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
# fdbrpc/BlobStore.actor.cpp
2017-12-19 23:02:14 -08:00
Alex Miller
c7dbd31a1e
Refactoring: Create a common prefixRange and do UID->Key once in backup.
2017-12-19 17:17:50 -08:00
Stephen Atherton
e28641886d
TraceEvent improvements. Minor bug fix, restore log writing tasks didn't have the log file endVersion but it's only for logging purposes.
2017-12-19 15:27:04 -08:00
Evan Tschannen
1dc9eceb6d
optimize GetKeyLocationRequests on the proxy so they only require a single map lookup, instead of doing 3 + (3* [number of ranges]) lookups
2017-12-15 20:13:44 -08:00
Stephen Atherton
33f9f1a95c
Added SnapshotDispatch task for writing snapshots in random order over a specified period of time and adapting speed to a growing or shrinking database. TaskBucket now supports scheduling tasks. TaskFuture now correctly recognizes multiple tasks in its callback space. TaskBucket extendTimeout() now supports specifying the new timeout version. Submitting a backup now requires a snapshot duration.
2017-12-14 01:44:38 -08:00
Evan Tschannen
7ce93426ed
fix: connection disabler in removeServerSafely needs to run for the whole test to avoid getting stuck on include all
2017-12-12 18:38:57 -08:00
Alec Grieser
4495a19299
Merge pull request #220 from cie/alexmiller/flowprofcircus
...
Add class restrictions to CpuProfiler, and fix metric crash.
2017-12-11 14:13:22 -08:00
Evan Tschannen
73a0a07eac
clients ask for key location information directly from the proxy, instead of reading it from the database
2017-12-09 16:10:22 -08:00
Alex Miller
48660e9ce5
Add class restrictions to CpuProfiler, and fix metric crash.
...
This change largely refactors away the old meaning of the value given to
flow_profiler, which was the number of machines that we'd be profiling, and
instead replaces it with the classes of processes to profile for the duration
of the test. Most importantly, this means that one can profile in circus with
a configuration that has "ssd" in it, and the circus run will still complete
(as long as the argument isn't "storage").
And also finally add some other fixes I had to the same file to conditionally
change the name of the metric we're looking for to comply with what's actually
written.
2017-12-07 19:28:29 -08:00
Stephen Atherton
f8e89a40ac
Bug fixes, take(1) is incorrect usage of FlowLock.
2017-12-04 10:25:47 -08:00
Yichi Chiang
8ba0eaebff
Check cluster controller using desired process class in consistency check
2017-11-29 15:09:23 -08:00
Stephen Atherton
6695c9e6a2
Bug fixes and improvements to error handling and trace events. The most serious bug was that restore would start at the wrong version, possibly skipping early log and range files.
2017-11-25 00:46:16 -08:00
Stephen Atherton
a77162b53d
Merge branch 'master' into backup-container-refactor
...
# Conflicts:
# fdbclient/BackupAgent.h
# fdbclient/FileBackupAgent.actor.cpp
# fdbclient/KeyBackedTypes.h
2017-11-15 08:14:47 -08:00
Stephen Atherton
3dfaf13b67
IBackupContainer has been rewritten to be a logical interface for storing, reading, deleting, expiring, and querying backup data. The details of how the data is organized or stored is now hidden from users of the interface. Both the local and blobstore containers have been rewritten, the key changes being a multi level directory structure and no more use of temporary files or pseudo-symlinks in the blob store implementation. This refactor has a large impact radius as the previous backup container was just a thin wrapper that presented a single level list of files and offered no methods for managing or interpreting the file structure so all of that logic was spread around other places in the code base. This made moving to the new blob store schema very messy, and without this refactor further changes in the future would only be worse.
...
Several backup tasks have been cleaned up / simplified because they no longer need to manage the ‘raw’ structure of the backup. The addition of IBackupFile and its finish() method simplified the log and range writer tasks. Updated BlobStoreEndpoint to support now-required bucket creation and bucket listing prefix/delimiter options for finding common prefixes. Added KeyBackedSet<T> type. Moved JSONDoc to its own header. Added platform::findFilesRecursively().
Still to do: update command line tool to use new IBackupContainer interface, fix bugs in Restore startup.
2017-11-14 23:33:17 -08:00
Alex Miller
311d1ca87d
A variety of fixes that collectively fix using flow profiling in circus.
...
To run, use --co=flow_profiling=-1, because reasons.
2017-11-07 13:55:16 -08:00
Evan Tschannen
57aba0b3bc
fix: excluded servers were the same fitness as storage servers for the master role
...
fix: better master exists did not considers exclusion for master fitness
2017-11-03 17:09:14 -07:00
Evan Tschannen
54d82c0d92
Merge pull request #194 from cie/alexmiller/valgrind
...
Fix valgrind errors
2017-10-27 17:25:12 -07:00
Alex Miller
e0d33ef8d7
Preemptively fix profiler-related valgrind errors/straight out bugs.
...
I forgot to initialize some fields in requests.
2017-10-27 17:20:19 -07:00
Balachandar Namasivayam
cfefab18fb
Merge branch 'master' into add-new-atomic-ops
2017-10-25 18:03:34 -07:00
Balachandar Namasivayam
3d5658940a
Addressed Review Comments
2017-10-25 16:42:05 -07:00
Balachandar Namasivayam
9dd588dcce
Addressed review comments.
...
Changed naming for NewMin and NewAnd to MinV2 and AndV2
2017-10-25 14:48:05 -07:00
Balachandar Namasivayam
2f6d55a52f
Add correctness tests for all atomic ops
2017-10-25 13:36:49 -07:00
Yichi Chiang
c2a117fe07
Merge pull request #189 from cie/enable-check-desired-class
...
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 15:18:21 -07:00
Yichi Chiang
defdc6550d
Exclude excluded processses when getting testers
2017-10-24 15:16:34 -07:00
Yichi Chiang
3865c5ae0e
Enable checkUsingDesiredClasses() in consistency check
2017-10-24 12:58:54 -07:00
Balachandar Namasivayam
8c3bdc5b3b
Make atomic ops differentiate between unset and empty values.
2017-10-23 16:48:13 -07:00
Evan Tschannen
7a36fd2134
disabled a variety of simulation tests to get correctness clean
2017-10-19 15:49:54 -07:00
Evan Tschannen
e2c1e87df6
made a large number of fixes to make fearless DR correctness clean.
2017-10-19 15:36:32 -07:00
Bhaskar Muppana
360b777b78
Fail with correct error code in case of abort or discontinue of
...
non-existing backups.
2017-10-18 23:17:48 -07:00
Alec Grieser
dd6d8f3b0e
Merge branch 'master' into add-new-atomic-ops
2017-10-18 16:36:44 -07:00
Bhaskar Muppana
314511f4d7
Fixing spaces in BackupCorrectness TraceEvents.
2017-10-18 14:27:52 -07:00
Alex Miller
7b9bc1d715
Merge pull request #170 from cie/alexmiller/flowprofile
...
Add support for profiling a running fdb cluster to fdbcli, fix security issues, and add an improved backtrace.
2017-10-16 16:51:53 -07:00
Alex Miller
91a26a170c
Add toggleable profiling support to fdbserver+fdbcli.
...
This adds the fdbcli commands:
* profile list -- Lists all workers in a way that doesn't fill `kill`'s list.
* profile flow run -- Allows starting flow profiling on a set of hosts for a specified interval.
And threads through all the support for enabling and disabling profiling as an RPC.
2017-10-16 16:05:02 -07:00
Bhaskar Muppana
d1e9d28239
Backup log messages.
2017-10-12 16:12:42 -07:00
Stephen Atherton
11517f7bfc
Merge branch 'master' into continuous-backup
...
# Conflicts:
# fdbclient/FileBackupAgent.actor.cpp
2017-10-12 11:03:23 -07:00
Balachandar Namasivayam
8e0bea2795
Update API_VERSION from 500 to 510
2017-10-11 13:49:38 -07:00
Evan Tschannen
8feb3b8fbc
fixed conflict range workload by just disabling timeKeeper instead of the check, because it should be a more robust fix
2017-10-10 16:01:02 -07:00
Balachandar Namasivayam
eeebf10030
Modified existing behavior of MIN and AND atomic ops. The new behavior results in a 'SET' if the atomic op is performed on a non -existing key.
...
Added new atomic ops ByteMin and ByteMax that does lexicographic comparison of byte strings.
2017-10-10 13:02:22 -07:00
Evan Tschannen
c8525dc3e7
timekeeper is constantly changing keys in the system keyspace, so do not report errors on key mismatches on keys in the system keyspace
2017-10-10 12:04:56 -07:00
Alex Miller
a21c8a820b
Move cpuProfilerRequest from WorkerInterface to ClientWorkerInterface.
...
A way to access this stream is required if we wish to be able to toggle
profiling from fdbcli. There's two ways to do this:
1. Use `monitorLeader()` to get a `ClusterControllerFullInterface`, and use
`getWorkers` from there to get a list of `WorkerInterface`s, from which we can
access cpuProfilerRequest.
2. Move cpuProfilerRequest to ClientWorkerInterface and use the existing code
in the client that can fetch a list of all `ClientWorkerInterface`s.
The split between WorkerInterface and ClientWorkerInterface appears to be
what a client might have a need to call versus what is fdbserver-internal (and
thus no client should even want to call). Thus, it seems to make more sense to
acknowledge that profiling is useful to be able to toggle from a client, and go
with option (2).
2017-10-05 14:08:28 -07:00
Alex Miller
11668bb359
Fixing code review comments.
2017-09-29 15:58:36 -07:00
Alex Miller
b7ce9d996c
Comment out verbose TraceEvents in preparation for pushing.
2017-09-29 15:58:36 -07:00
Alex Miller
c40c1bb5fe
Add a new workload: BackupToDBAbort, which does an ACI switchover.
...
This is to allower easier testing of non-durable switchovers without having to
wiggle into BackupToDBCorrectness's view of the world.
2017-09-29 15:58:36 -07:00
Alex Miller
9e9a96ae76
Make VersionStamp workload able to run with DR-style workloads.
...
* It is now tolerant of locked database errors, and handles them correctly.
* There is an option to specify which database to verify against.
2017-09-29 15:58:36 -07:00
Alex Miller
34630b6130
Make VersionStamp workload can handle commit_unknown_result.
...
Previously, if a transaction failed with commit_unknown_result, and was
actually committed, it would look like data that magically appeared in the
database and verification would fail.
Now, we explicitly re-read and check to see if the commit happened, so that we
may maintain an accurate understanding of what the database state should be.
2017-09-29 15:58:36 -07:00
Alex Miller
23945b9fea
VersionStamp can co-exist with other workloads that write data to the database.
...
VersionStamp previously would range-read the entire database during validation.
This has the unfortunate effect of making it fail during validation if run with
any other workload that writes keys to the database.
Now, all keys written and read are done with a configurable prefix, so that it
may co-exist with a variety of other workloads.
2017-09-29 15:58:36 -07:00
Alex Miller
370a6afb80
Make VersionStamp have an option to be tolerant of data being lost.
2017-09-29 15:58:36 -07:00
Alex Miller
69523ce151
Hackish version of a test, but it does fail.
2017-09-29 15:58:36 -07:00
Alex Miller
65713b226f
Fix whitespace and line endings.
2017-09-29 15:58:36 -07:00
Bhaskar Muppana
91975244fe
Fixing OSX build.
2017-09-28 19:35:44 -07:00
Bhaskar Muppana
942c04e992
Merge pull request #162 from bmuppana/master
...
Fixing TimeKeeperCorrectness to deal with network delays.
2017-09-28 17:04:39 -07:00
Bhaskar Muppana
3d2bafc3a6
Fixing TimeKeeperCorrectness to deal with network delays.
2017-09-28 16:52:28 -07:00
Evan Tschannen
ef41b07bb3
renamed past_version to transaction_too_old
...
implemented read_lock_aware option
2017-09-28 16:35:08 -07:00
Evan Tschannen
7b60e26660
Merge pull request #160 from cie/use-error-descriptions
...
Add the ability to access name and description in Error. Update error…
2017-09-28 16:00:39 -07:00
Evan Tschannen
73fca75239
added the ability to disable timeKeeper; disabled timeKeeper before consistency check in simulation
2017-09-28 13:13:24 -07:00
A.J. Beamon
d30c730f75
Add the ability to access name and description in Error. Update error descriptions.
2017-09-28 12:35:03 -07:00
Bhaskar Muppana
0f8ff26029
Merge pull request #158 from bmuppana/master
...
<rdar://problem/34557380> Need a way to map real time to version
2017-09-27 17:56:42 -07:00
Bhaskar Muppana
6a0b1d6808
Fixing PR comments
...
<rdar://problem/34557380> Need a way to map real time to version
2017-09-27 17:56:01 -07:00
Evan Tschannen
2bf042a559
fix: file_corrupt was not checking for fault injection
...
latency threshold was too long
2017-09-25 17:22:41 -07:00
Bhaskar Muppana
0bf5bdb23a
<rdar://problem/34557380> Need a way to map real time to version
2017-09-25 12:51:37 -07:00
Stephen Atherton
248dab79b6
Created “redwood” storage engine option and many changes to support that including IKeyValueStore::init() and custom DiskQueue file extensions.
2017-09-21 23:51:55 -07:00
Evan Tschannen
e8b895c878
added the ability to disable connection failures for a period of time after one happens
2017-09-18 12:46:29 -07:00
Evan Tschannen
489332533c
all timeouts longer than two minutes have been can be lowered to 60.0 with buggification
...
added a workload that tries for a 50 second maximum latency in the presence of one failure with both buggification and connection failures
2017-09-18 11:04:51 -07:00
Evan Tschannen
34f987f56d
added a test in simulation which ensures that a recovery after a single failure takes less than 15 seconds
2017-09-15 17:55:01 -07:00
A.J. Beamon
4fa2415553
Merge branch 'release-5.0'
2017-09-08 17:28:12 -07:00
A.J. Beamon
bb8a245bdb
circus: throughput test scales latency error by the target latency
2017-09-08 17:27:54 -07:00
Bhaskar Muppana
c7df951f7c
Using BackupConfig from backup.actor.cpp to reduce intermediate
...
functions.
2017-09-07 08:36:36 -07:00
Bhaskar Muppana
fe208d6adf
Merge branch 'master' of github.com:apple/foundationdb into backup
2017-09-06 10:01:55 -07:00
Bhaskar Muppana
83810edabc
Backup/Restore tag can be std::string instad of Key.
2017-09-05 11:38:40 -07:00
A.J. Beamon
cc24072a5d
Add the multi version API to the list of APIs to choose in the APICorrectness tester. Support for the multi-version client already existed.
2017-08-31 16:23:55 -07:00
Evan Tschannen
d61be4c760
Merge branch 'release-5.0'
2017-08-30 12:59:24 -07:00
Evan Tschannen
963e1c3f31
fix: we need to reboot the process even if it will result in too many files, because the check will not succeed without it
2017-08-30 12:58:46 -07:00
Alvin Moore
6020d70863
Added trace event to track reboots initiated by ConsistencyCheck workload in simulation
2017-08-29 11:41:27 -07:00
Alvin Moore
c95a1be5ec
Add trace event for rebooting process during simulation for consistency check
2017-08-29 11:00:44 -07:00
A.J. Beamon
9a0a3b6329
Merge commit '66528becb82d826e81fa644bb378212584ab580e'
2017-08-28 16:47:59 -07:00
Alvin Moore
44e0df78c5
Added support for tracking roles for simulation workers
...
Fixed the exclusion and inclusion address simulation API and integration within workloads
Added more information within trace events for simulation
2017-08-28 11:25:37 -07:00
Alvin Moore
581bd6c8ed
Added option to delay the displaying of the simulation workers
2017-08-28 10:53:56 -07:00
Alec Grieser
300b5a17ed
Merge branch 'release-5.0'
2017-08-25 18:55:33 -07:00
Evan Tschannen
272b4b984c
fix: fixed a rare bug where we do not wait for a file in the process of being deleted to shutdown before rebooting a machine
2017-08-25 10:12:58 -07:00
Evan Tschannen
26a5b5e422
rollback workload now clogs the communication between one of the proxies and the tlogs, since that is what will cause a rollback
2017-08-23 16:08:13 -07:00
Alvin Moore
8056b78414
Merge branch 'release-5.0'
2017-08-22 13:51:19 -07:00
Alvin Moore
814e471689
Added support for displaying initial workers via printf within simulation using a workload
2017-08-22 13:38:24 -07:00
Evan Tschannen
47a37f3f1e
Merge pull request #135 from cie/switch-for-data-distribution
...
Add a switch to turn off data distribution in CLI
2017-08-07 12:54:08 -07:00
Alec Grieser
ca7437ecf6
Merge branch 'release-5.0'
2017-08-02 22:07:01 -07:00
John King
d0fbc41338
set LOCK_AWARE on several transactions used for getting cluster info for the consistency check
2017-07-28 18:50:32 -07:00
Yichi Chiang
6a8a5c41b0
Add a switch to turn off data distribution in CLI
2017-07-28 18:14:55 -07:00
Yichi Chiang
53e1ae9f60
shard system keyspace
2017-07-26 13:47:31 -07:00
Alvin Moore
96f40d8eb0
Added support for optionally re-including excluded servers
...
Removed unneeded code to protect coordinators
Ensured that simulation exclusion list is updated for excluded processes
2017-06-19 16:51:07 -07:00
Evan Tschannen
766dc23e26
fix: do not use TLS in protectedAddresses
2017-06-02 13:52:21 -07:00
Evan Tschannen
2d0dbd57e8
randomized the delays in atomic switchover workload
2017-06-01 12:08:21 -07:00
Evan Tschannen
1626e16377
Merge branch 'release-4.6' into release-5.0
2017-05-31 16:23:37 -07:00
Alvin Moore
333f2e4865
Added connection fault disabler within setup of backup submission. It should be reviewed to determine the amount of time to wait before disabling
2017-05-31 14:21:50 -07:00
Stephen Atherton
98604d33a0
Merge branch 'fix-io-timeout-handling'
...
# Conflicts:
# fdbrpc/AsyncFileKAIO.actor.h
# fdbrpc/sim2.actor.cpp
# fdbserver/KeyValueStoreSQLite.actor.cpp
# fdbserver/optimisttest.actor.cpp
# fdbserver/worker.actor.cpp
# fdbserver/workloads/MachineAttrition.actor.cpp
# tests/fast/SidebandWithStatus.txt
# tests/rare/LargeApiCorrectnessStatus.txt
# tests/slow/DDBalanceAndRemoveStatus.txt
2017-05-26 18:43:08 -07:00
Alvin Moore
ba606c2135
Removed NOT_IN_CLEAN macro from file
2017-05-26 14:52:06 -07:00
Alvin Moore
b28ed397a2
Fixed printf field width specifier to reduce compilation warnings within OS X
2017-05-26 14:51:34 -07:00
Alvin Moore
0b9ed67e12
Fixed support for RemoveServers Workload
...
Added availability functions to simulation
2017-05-26 14:20:11 -07:00
Alvin Moore
16cc0821b1
Removed dead machine option from simulation
2017-05-25 16:29:02 -07:00
FDB Dev Team
a674cb4ef4
Initial repository commit
2017-05-25 13:48:44 -07:00