Commit Graph

587 Commits

Author SHA1 Message Date
Evan Tschannen 28a1fa9dc2 fix: we need to notify the old log system that its recruitmentID has changed 2018-04-21 12:57:00 -07:00
Evan Tschannen 1d1e2cd367 fix: initialize the known committed version on the tlog 2018-04-21 00:41:15 -07:00
Evan Tschannen 8d350ceb5f fix: persist the known committed version on the tlogs 2018-04-20 17:55:46 -07:00
Evan Tschannen a6d9e889f0 a cleaner solution to preventing tlogs from peeking log routers 2018-04-20 13:25:22 -07:00
Evan Tschannen f5c3417905 fix: prevent tlogs from peeking the wrong log routers 2018-04-20 00:30:37 -07:00
Evan Tschannen 5da452db8e fix: pop the log routers again after the log system updates 2018-04-19 14:33:31 -07:00
Bruce Mitchener 2f8a0240f1
Fix some typos. 2018-04-19 11:44:01 -07:00
Bruce Mitchener 9cdf25eda3 Fix some typos. 2018-04-20 00:49:22 +07:00
Evan Tschannen d46d5487bd Merge branch 'release-5.2' 2018-04-18 20:46:03 -07:00
Evan Tschannen 57d650062a merge 5.1 into 5.2 2018-04-18 20:44:31 -07:00
Evan Tschannen 224621be04 fix: extraDB==0 must leave g_simulator.extraDB as null, so that non-DR tests do not attempt to use a DR database 2018-04-18 19:34:35 -07:00
Evan Tschannen 22526ef996 fix: do not tell storage servers about large sections of empty versions, because it can lead them to make mutations durable which have not been committed 2018-04-18 16:06:44 -07:00
Evan Tschannen 447c7bd15b fix: log routers use durable known committed version at the time of the pop to determine what is safe to pop from their logs
fix: storage server does not advance its version across large version increase until it has data associated with the version
2018-04-18 12:07:29 -07:00
Evan Tschannen e43fb6d8bc fix: the log routers were popping too many versions because the known committed version is less than minPopped version 2018-04-17 19:41:36 -07:00
Evan Tschannen c1ccc8522c Merge branch 'release-5.2' 2018-04-17 18:38:12 -07:00
Evan Tschannen db98c1b9b6 Merge branch 'release-5.1' into release-5.2
# Conflicts:
#	versions.target
2018-04-17 18:36:19 -07:00
Evan Tschannen 8569a85771 fix: only let a log router pop if they tlog it is serving is fully recovered 2018-04-17 15:03:22 -07:00
Evan Tschannen 760bc8bc99 fix: log router version needs to be fetched before it is available
fix: tlog did not fetch known committed version if start version was exactly equal to it
2018-04-17 11:16:48 -07:00
Evan Tschannen 093908b83f fix: log routers were starting one version too late 2018-04-17 00:29:16 -07:00
Evan Tschannen 3e40505f4a Revert "fix: remote logs should reply until they have recovered through recoverAt"
This reverts commit 3c0c03c004.
2018-04-16 23:17:16 -07:00
Evan Tschannen 3c0c03c004 fix: remote logs should reply until they have recovered through recoverAt 2018-04-16 17:25:49 -07:00
Evan Tschannen cef6c9b418 fix: the startVersion cannot be larger than the known committed version 2018-04-16 16:21:27 -07:00
Evan Tschannen dcfa1847ff fix: log router’s starting popped version must be less than its starting version 2018-04-16 11:43:03 -07:00
Evan Tschannen 3018a7b1b3 fix: the known committed version of a newly initialized log is 1, since by definition the first commit must have succeeded 2018-04-16 10:42:48 -07:00
Evan Tschannen a8662f8737 fix: remote recovered is does not need to wait for old logs to be removed 2018-04-16 10:14:39 -07:00
Evan Tschannen e53f17a83a fix: the newest log router needs to start where the last old one ends 2018-04-15 14:54:22 -07:00
Evan Tschannen 5533016f1e fix: tlogs are now initialized immediately, instead of when starting the core, this must be done to pop the log routers during recovery
fix: log router start version must be the same as remote log start version
2018-04-15 14:33:07 -07:00
Evan Tschannen 0496bee1ef fix: suppress expected errors in data distribution 2018-04-15 11:30:22 -07:00
Evan Tschannen 041f5787fb fix: peekLocal does not stop when a locality does not exist
fix: lock logs only stops on special or upgraded locality
fix: recruiting old log routers respects the passed in startVersion
2018-04-14 19:06:24 -07:00
Evan Tschannen f5141acae9 fix: log routers need all logs present in their log system since they call addRemoteTags 2018-04-13 17:33:36 -07:00
Evan Tschannen 65e69620a7 fix: unrecoveredBefore on a new log is at minimum 1 2018-04-13 10:41:30 -07:00
Yichi Chiang a4e8b6492c Fix DR Upgrade workload backup range 2018-04-13 09:59:32 -07:00
Evan Tschannen c589630e53 fix: log router start version is based on the start version of the local logs 2018-04-12 18:14:23 -07:00
Evan Tschannen 3b7e4410cf fix: protect from peeking too early of a version from a log router 2018-04-12 16:15:17 -07:00
Evan Tschannen 1af5ac0d9d fix: a number of different problems prevented tlogs from using log routers during recovery 2018-04-12 15:20:54 -07:00
Evan Tschannen c6229e443c fix: do not use resolution class when using regions 2018-04-11 21:22:53 -07:00
Evan Tschannen 4248fbec61 fix: must set startVersion when upgrading 2018-04-11 17:33:17 -07:00
Evan Tschannen 19762b847d Merge branch 'release-5.2'
# Conflicts:
#	fdbserver/DatabaseConfiguration.cpp
#	fdbserver/SimulatedCluster.actor.cpp
2018-04-10 17:02:43 -07:00
Evan Tschannen c1ba16b3c8 Merge branch 'release-5.1' into release-5.2
# Conflicts:
#	bindings/java/src/test/com/apple/foundationdb/test/AbstractTester.java
#	bindings/java/src/test/com/apple/foundationdb/test/VersionstampSmokeTest.java
#	bindings/nodejs/lib/fdb.js
#	bindings/nodejs/src/Version.h
#	bindings/nodejs/tests/tuple_test.js
2018-04-10 16:50:47 -07:00
Evan Tschannen b0a88001cc
Merge pull request #132 from yichic/support-dr-upgrade-test
Support DR upgrade test
2018-04-10 16:30:19 -07:00
Evan Tschannen b46c32535c surpassed spammy trace events 2018-04-10 15:52:32 -07:00
Yichi Chiang d0230d4d13 Support DR upgrade test in 5.1 2018-04-10 15:19:53 -07:00
Alex Miller b289312a37
Merge pull request #120 from alecgrieser/storage-class-help-text
Add router to help text for storage class of fdbserver
2018-04-10 15:01:27 -07:00
Evan Tschannen 3453a51d0f remoteRecovery was still swallowing errors 2018-04-10 13:31:24 -07:00
Evan Tschannen 5fcedd2e98 fix: coordinated state errors were being eaten 2018-04-10 11:14:57 -07:00
Evan Tschannen 2ab2c788b3 fix: the start version is allowed to be larger than the recovery version 2018-04-09 21:58:14 -07:00
Evan Tschannen a738c4bec1 fix: if the known committed version is equal to the recovery version we do not need to copy any data 2018-04-09 20:48:55 -07:00
Evan Tschannen 419951f601 fix: need to initialize tlog versions to less than the startVersion 2018-04-09 17:17:11 -07:00
Evan Tschannen 27e14790b1 fix: do not start at a version larger that the recovery version 2018-04-09 15:08:01 -07:00
Evan Tschannen 7566a0d109 fix: endEpoch gets its logs from the core state, so by definition they are written 2018-04-09 11:44:54 -07:00
Evan Tschannen 4c89f721cd fix: do not include logRouter tags in lock results 2018-04-09 10:48:57 -07:00
Evan Tschannen 7af892f50b first working version of non-copying recovery working with fearless configurations 2018-04-08 21:24:05 -07:00
Alex Miller 0136a01c18 Fix "Not enough physical servers available" error due to incorrect server calculation. 2018-04-05 15:13:21 -07:00
Evan Tschannen bc938d9273 fix: storage recruitment could get stuck in a spin loop 2018-04-03 18:06:31 -07:00
Evan Tschannen 331e707684 fix: pop all tags that did not have data at the recovery version because fully popped tags may come back when pullAsyncData re-indexes the mutations 2018-03-31 16:47:56 -07:00
Evan Tschannen 96fffe2cea fix: do not update version if the log has been stopped 2018-03-30 22:11:42 -07:00
Evan Tschannen 4fb2b99341 fix: using only one region still means we need 3 machines per datacenter, the other machines in the other datacenters just won’t be used 2018-03-30 19:26:22 -07:00
Evan Tschannen 579ba58930 pop old tags only looks are recovered tags, and checks if they are still being used 2018-03-30 19:08:01 -07:00
Evan Tschannen 8352b93f48 fix: do not reuse tags that are still in historyTags, pop historyTags past epochEnd to allow tlogs to finish recovery
fix: peekLocal did not properly respect end
fix: the storage server added to the end of the history vector instead of the beginning
2018-03-30 17:39:45 -07:00
Evan Tschannen 43cb63df25 fix: the collectTags bool was set incorrectly 2018-03-29 18:19:29 -07:00
Evan Tschannen 1a4ded1c99 support upgrades by merging tags associated with the different peek requests 2018-03-29 17:54:08 -07:00
Evan Tschannen b36e08f08f first version of non-copying recovery. Upgrades are broken, and it has not been tested using fearless configurations yet 2018-03-29 15:12:38 -07:00
Evan Tschannen da737e1ea3 suppress the BestTeamStuck trace event 2018-03-26 18:32:32 -07:00
Evan Tschannen 82ed956c65 renamed the multi_dc configuration to three_datacenter. The old three_datacenter configuration was not a useful configuration. 2018-03-26 18:31:26 -07:00
Evan Tschannen b95e68eb5a fix: getDatabaseSize is really inefficient and causes slow tasks in the real world. Outside of simulation just assume the database is really large, because we only need the InvalidShardSize check in simulation 2018-03-26 17:35:11 -07:00
Alec Grieser bb5f3ebb6d
add router to help text for storage class of fdbserver 2018-03-26 13:26:56 -07:00
Evan Tschannen d3fb17d30a
Merge pull request #74 from bnamasivayam/client-profiling-tests
Client profiling tests - Part 1
2018-03-23 16:52:49 -07:00
Balachandar Namasivayam 1e719d79e9 Remove incorrect ASSERT's
Account for corner cases in missing chunks.
2018-03-23 15:51:56 -07:00
Evan Tschannen 5db52ab081
Merge pull request #87 from etschannen/feature-remote-logs
Feature remote logs
2018-03-23 12:55:17 -07:00
Evan Tschannen 7c48e1d31c
Update SimulatedCluster.actor.cpp 2018-03-23 12:54:44 -07:00
A.J. Beamon ddc0c613ed
Merge pull request #109 from apple/release-5.2
Merge Release 5.2 into master
2018-03-21 09:37:56 -07:00
Clement Pang 64deb0e0a1 Address review comments. 2018-03-20 14:38:04 -07:00
Clement Pang b46ffb4cbc Available space should take into account both memory and disk 2018-03-20 14:38:04 -07:00
Evan Tschannen 0746fe4d56 optimized tag lookups on the tlog by removing one level of vectors 2018-03-20 10:41:42 -07:00
Evan Tschannen d8e064d8bb fix: when a new log is recruited on a shared log, all outstanding commits need to be notified that they are stopped, because there is no longer a guarantee that their queueCommittedVersion will advance 2018-03-19 17:48:28 -07:00
Alec Grieser 551ea9c7f8
Merge remote-tracking branch 'upstream/release-5.2' into master-release-5.2-merge 2018-03-19 12:34:50 -07:00
yichic ede5cab192
Merge pull request #89 from yichic/share-log-mutations-5.2
Share log mutations 5.2
2018-03-19 12:01:26 -07:00
Yichi Chiang 1f2602d2b3 Fix all review comments 2018-03-19 11:33:33 -07:00
Yichi Chiang d6559b144f Share log mutations between backups and DRs which have the same backup range 2018-03-19 11:32:50 -07:00
Evan Tschannen 54be14000d do not deserialize tags 2018-03-17 11:24:18 -07:00
Evan Tschannen 4dcef08260 optimized the log router to use a vector instead of a map for tag data 2018-03-17 11:08:37 -07:00
Evan Tschannen 9c8cb445d6 optimized the tlog to use a vector for tags instead of a map 2018-03-17 10:36:19 -07:00
Evan Tschannen fecfea0f7d fix: messages vector was not cleared 2018-03-17 10:24:44 -07:00
Balachandar Namasivayam 9e3e3c8561 Add some sanity checks to deserialized data. 2018-03-16 18:45:25 -07:00
Yichi Chiang f12c1d811c Fix all review comments 2018-03-16 18:09:23 -07:00
Yichi Chiang 26b93ff920 Share log mutations between backups and DRs which have the same backup range 2018-03-16 18:09:23 -07:00
Evan Tschannen ccd70fd005 The tlog uses the tags embedded in the message instead of a separate vector of locations
optimized remote tlog committing to avoid re-serializing the message
2018-03-16 16:47:05 -07:00
Evan Tschannen 820382ea68 optimized the log router commit path to avoid re-serializing the data 2018-03-16 11:40:21 -07:00
Evan Tschannen a42205eb8e test running with only one region 2018-03-15 15:40:58 -07:00
Balachandar Namasivayam 89d7cc1093 Minor Bug fixes... 2018-03-15 11:00:47 -07:00
Evan Tschannen 82fb6424ec fix: storage recruitment could get stuck in a spin loop 2018-03-15 11:00:44 -07:00
Evan Tschannen 65b532658f added support for single region configurations 2018-03-15 10:59:30 -07:00
Alec Grieser 0853fcb052
switch to using zu for some size_t variables in printf 2018-03-14 18:07:05 -07:00
Evan Tschannen 59723f51f8 fix: continue to attempt to lock logs until remote logs are recovered, this is so that remote logs get locked and readers know they will not have any more data
do not throttle trace events in simulation
2018-03-14 12:39:55 -07:00
Balachandar Namasivayam 856d2a0a9d Add correctness tests for Client transaction profiling data format. It also includes format check across upgrades. 2018-03-14 12:39:50 -07:00
Alec Grieser 70a05c1a9b
fix some compiler whinges 2018-03-13 15:00:16 -07:00
Evan Tschannen 2e741057d4 use references instead of copying regionInfo 2018-03-13 12:59:07 -07:00
Evan Tschannen f6a22c1035 fix: the recovery actor was holding a copy of the tlogInterface after the tlog was removed 2018-03-12 16:56:34 -07:00
Evan Tschannen 72d56a700c fix: do not serialize an a tlog interface without a unique id 2018-03-10 09:52:09 -08:00
Evan Tschannen c74211bd92 fix: merge problem 2018-03-09 16:52:37 -08:00
Evan Tschannen 3abf4d7fdf Merge branch 'master' into feature-remote-logs 2018-03-09 14:50:04 -08:00
Evan Tschannen 91bb8faa45 Merge commit 'f773b9460d31d31b7d421860fc647936f31aa1fa'
# Conflicts:
#	tests/fast/SidebandWithStatus.txt
#	tests/rare/LargeApiCorrectnessStatus.txt
#	tests/slow/DDBalanceAndRemoveStatus.txt
2018-03-09 14:47:03 -08:00
Evan Tschannen 28ea983487 Merge branch 'release-5.1' into release-5.2
# Conflicts:
#	flow/Trace.cpp
#	versions.target
2018-03-09 14:40:31 -08:00
A.J. Beamon bb9f51bb5c Don't try to extract attributes from the program start trace events if they couldn't be collected. 2018-03-09 11:55:57 -08:00
Evan Tschannen cf6dd1437b suppress spammy trace events 2018-03-09 10:16:34 -08:00
Evan Tschannen ae7d8e90b2 Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-03-09 09:56:09 -08:00
Evan Tschannen 5390af8be4 suppress spammy logs 2018-03-09 09:40:36 -08:00
A.J. Beamon 1bf9f0ec6b
Merge pull request #54 from etschannen/release-5.1
fix: new cluster controllers should not consider anything failed unti…
2018-03-09 09:28:21 -08:00
Evan Tschannen f9625f5b2f fix: new cluster controllers should not consider anything failed until they have time to get failure monitoring updates
fix: storage and log class machines wait 100MS before attempting to become the cluster controller
2018-03-08 18:08:41 -08:00
Balachandar Namasivayam e7309a3535 Add trace events to print the ranges in ConsistencyCheck. 2018-03-08 13:53:59 -08:00
Evan Tschannen cf9d02cdbd
Merge pull request #48 from apple/release-5.2
Merge release-5.2 into master
2018-03-08 13:21:26 -08:00
A.J. Beamon 2c92ef8ff8
Merge pull request #47 from apple/release-5.1
Merge Release 5.1 into Release 5.2
2018-03-08 13:18:45 -08:00
A.J. Beamon 73cec8abad Merge branch 'release-5.1' of github.com:apple/foundationdb into release-5.1 2018-03-08 11:47:44 -08:00
Balachandar Namasivayam 4f58bca66a Simple refactor of code... 2018-03-08 11:34:25 -08:00
Balachandar Namasivayam 1c1a497ea2 Refactor getKeyServers to be more readable.
Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers.
Simplify getMasterProxies on DatabaseContext class.
2018-03-08 11:34:18 -08:00
Balachandar Namasivayam 03a40354e3 Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100.
Fix the bug where some of the key server shards may not be fetched.
2018-03-08 11:34:11 -08:00
A.J. Beamon fdcaf473ae Don't pass a copy of the StorageServerInterface to storageServerRollbackRebooter. This prevents a situation where the storage server has terminated but the request streams are left open until the underlying KV-store gets closed. 2018-03-08 11:14:24 -08:00
Evan Tschannen fa7eaea7cf fix: shards affected by team failure did not properly handle separate teams for the remote and primary data centers 2018-03-08 10:50:05 -08:00
bnamasivayam f838bc077e
Merge pull request #36 from ajbeamon/release-5.2
Set the address in consistency check processes…
2018-03-07 15:00:14 -08:00
Evan Tschannen 9d4cdc828b fix: inactive cursors are still useful if their version is larger than the current version 2018-03-07 12:54:53 -08:00
Evan Tschannen 68606c7984 fix: sim2 logic for when a kill is safe was incorrect 2018-03-06 18:38:05 -08:00
Alec Grieser 2a2ac56529
Merge pull request #22 from alecgrieser/37844532-expose-append-if-fits
Expose APPEND_IF_FITS to clients
2018-03-06 16:31:36 -08:00
Evan Tschannen 8c88041608 fix: we must commit to the number of log routers we are going to use when recruiting the primary, because it determines the number of log router tags that will be attached to mutations 2018-03-06 16:31:21 -08:00
A.J. Beamon 232bd496bf Set the address in consistency check processes in the same way we set it for clients so that it shows up in trace logs. Disallow specifying a public address for consistency check processes. 2018-03-06 15:40:04 -08:00
A.J. Beamon 7f8f655b9c Revert "Fix build errors"
This reverts commit 51804f0504.
2018-03-06 10:28:39 -08:00
A.J. Beamon f2c804e14f Reverting changes from merge of master into release-5.2 (b25810711c). Note that we never intend to release master into release-5.2, but if we did we would need to revert this commit. 2018-03-06 10:15:04 -08:00
Evan Tschannen 1194e3a361 added region-based configuration to support a large variety of fearless setups. Currently only 1 primary 1 remote setups are allowed. 2018-03-05 19:27:46 -08:00
Balachandar Namasivayam aea1f7ba21 Add tests for Client Transaction Profiling correctness 2018-03-05 18:55:23 -08:00
Balachandar Namasivayam 51804f0504 Fix build errors 2018-03-05 15:18:14 -08:00
A.J. Beamon b25810711c
Merge branch 'master' into release-5.2 2018-03-05 10:32:57 -08:00
Balachandar Namasivayam 8ae640c062 Addressed review comments. 2018-03-02 17:56:49 -08:00
Alec Grieser 218b7a41e2 add APPEND_IF_FITS to workload and remove guard ; add command to vexillographer 2018-03-02 17:43:39 -08:00
Balachandar Namasivayam 11df1aeabf Add new api to get shared tlogs id and address 2018-03-02 16:50:30 -08:00
Evan Tschannen 470f5c01f3 changed remoteDcId to a vector of ids, to support future configurations where there are multiple remote databases 2018-02-26 17:09:09 -08:00
Evan Tschannen a67296b373 do not test fearless configurations to merge with master 2018-02-26 13:31:06 -08:00
Evan Tschannen 8e966fdf9c simulated cluster tests all configurations. Still needs to randomize the remote and satellite replication, along with them number of remote tlogs, log routers, and satellite tlogs 2018-02-26 13:15:44 -08:00
Evan Tschannen e3c6b66240 fix: do not commit more data after being stopped
fix: prioritize dc locality above exclusion to prevent being stuck after excluding all machines in a data center
2018-02-26 13:13:37 -08:00
Evan Tschannen 37a6a81634 Merge commit '7f6fc3e039c911cd84b8540f7f799fc38a1c1822' into feature-remote-logs
# Conflicts:
#	fdbserver/workloads/RestartRecovery.actor.cpp
2018-02-23 12:33:28 -08:00
Evan Tschannen cfcf98cffc fix: log router tags were not stored at a best location 2018-02-23 12:26:19 -08:00
Evan Tschannen a49e43000e fix: did not peek from log routers correctly 2018-02-22 16:13:56 -08:00
Evan Tschannen 719bb5bd0c
Merge pull request #4 from bnamasivayam/getKeyServers-refactor
Having 1000 as the limit for Limit for GetKeyServerLocationsRequest s…
2018-02-22 12:39:48 -08:00
Balachandar Namasivayam 2fe2b522d5 Simple refactor of code... 2018-02-22 12:38:14 -08:00
Alec Grieser e1162e9238 Merge remote-tracking branch 'upstream/release-5.1' 2018-02-22 11:16:12 -08:00
Balachandar Namasivayam e2030db5a8 Refactor getKeyServers to be more readable.
Fix possible memory corruption by returning KeyRange instead of KeyRangeRef in getKeyServers.
Simplify getMasterProxies on DatabaseContext class.
2018-02-21 17:11:50 -08:00
Evan Tschannen 2aa273df96 addStorageServer was advancing tags too much because of read errors 2018-02-21 17:05:39 -08:00
Evan Tschannen 310f56d98a fix: tlogs was resized incorrectly 2018-02-21 15:28:02 -08:00
Evan Tschannen ddb484143c fix: do not peek from remote logs if they are not fully recovered 2018-02-21 14:06:44 -08:00
Alec Grieser 0bae9880f1 remove trailing whitespace from our copyright headers ; fixed formatting of python setup.py 2018-02-21 10:25:11 -08:00
Balachandar Namasivayam 6218934c7b Having 1000 as the limit for Limit for GetKeyServerLocationsRequest sometimes generate large packet warnings. Reduce it to 100.
Fix the bug where some of the key server shards may not be fetched.
2018-02-20 17:41:34 -08:00
Evan Tschannen 1dc6a8d4bd fix: the tlog can peek from log systems that have been recovered even if it does not match its recoverFrom set 2018-02-20 14:50:13 -08:00