Commit Graph

2513 Commits

Author SHA1 Message Date
Evan Tschannen 9015b8038f io_error should cause the process to die and restart, to prevent repeated recruitment of a bad disk 2018-07-06 14:42:36 -07:00
Evan Tschannen 7d54ca4dc2 fix: errors from disk should trump errors from workers 2018-07-06 14:41:36 -07:00
Evan Tschannen 6d7172ef7e fix: canKillProcesses did not take into account the remoteTLogPolicy when checking notEnoughLeft 2018-07-05 21:36:09 -07:00
Evan Tschannen 6f4ca2eba2 fix: get all processes did not include rebooting processes 2018-07-05 21:13:56 -07:00
Alex Miller bb2eb2fe53
Merge pull request #565 from etschannen/feature-remote-logs
Simulation did not permanently kill machines in most tests
2018-07-05 15:07:28 -07:00
Alex Miller 5b12414b74
Merge pull request #564 from alexmiller-apple/tlsplugin
Fix dependencies for TLS library stuff
2018-07-05 14:32:45 -07:00
Evan Tschannen cd4fb9285a waitForExlusion requires both regions to be healthy, which is only possible if we do not kill all logs in a region 2018-07-05 14:04:42 -07:00
Alex Miller 6c98aa8aac Fix not depending on FDBLibTLS.a and default TLS_LIBDIR to a sensible place. 2018-07-05 13:23:20 -07:00
Evan Tschannen 6cf5354425 checkSatelliteTagLocations is not an error if the same zoneId is used multiple times 2018-07-05 13:00:13 -07:00
Evan Tschannen 21347df254 fix: getting metrics did not handle broken_promise errors 2018-07-05 12:30:11 -07:00
Evan Tschannen da5a232d7e fix: If we have not recruited the remote logs yet and detect a configuration change, we must fail the master to update the remote recruitment request 2018-07-05 12:17:41 -07:00
Evan Tschannen 7315e5da55 fix: isExcluded and isCleared were exactly wrong
fix: isCleared should mean the process is dead
2018-07-05 02:22:22 -07:00
Evan Tschannen 507b3bacb0 fix: kill all tlogs in one region prevents the remote logs from recovering in that region, do not allow that to prevent us from configuring usable_regions=1.
added more recovery states.
2018-07-05 00:08:51 -07:00
Evan Tschannen 99e2b06c2d
Merge pull request #562 from etschannen/feature-remote-logs
Added repopulate region anti-quorum to the configuration
2018-07-04 16:44:35 -04:00
Evan Tschannen e17dfea3b6 fix: desiredTLogCount was used instead of getDesiredLogs(), which caused problems with recruitment when desiredTLogCount was -1.
canKillProcess logic was wrong.
We still need to configure usable_regions because if datacenterVersionDifference is too large we cannot complete data movement.
2018-07-04 16:22:32 -04:00
Alex Miller 01659e34cc Move TLS libs into STATIC_LIBS to avoid having a make dependency on them.
And fix STATIC_LIBS to be cross platform.
2018-07-04 00:29:53 -07:00
Evan Tschannen ea3365dc38 fix: quiet database only needs to use repopulate_anti_quorum instead of reducing usable_regions 2018-07-04 02:52:00 -04:00
Evan Tschannen 66a6fbb219 Merge branch 'master' into feature-remote-logs 2018-07-04 01:59:30 -04:00
Evan Tschannen 866ccfe344 added the ability to allow the master to finish recovery before all storage servers in both regions have their mutations. This allows you to recover from scenarios where you lose all your tlogs in one dc. 2018-07-04 01:59:04 -04:00
Alex Miller 77aecd3900
Merge pull request #559 from bnamasivayam/force-recovery-hidden-command
Add force_recovery_with_data_loss to hidden command list.
2018-07-03 15:51:13 -07:00
Alex Miller 78c23e4e1e
Merge pull request #558 from AlvinMooreSr/tls-init
Fixed problem with stack initialization of TLS Options class
2018-07-03 15:50:33 -07:00
Balachandar Namasivayam cbdf598fa2 Add force_recovery_with_data_loss to hidden command list. 2018-07-03 15:04:11 -07:00
Alvin Moore 9ea0f0a5ae Fixed problem with stack initialization of TLS Options class 2018-07-03 15:02:53 -07:00
Alex Miller 37f0e4be09
Merge pull request #557 from AlvinMooreSr/tls-build
Added support for specifying location of LibreSSL libraries via defin…
2018-07-03 13:08:02 -07:00
Balachandar Namasivayam af07a3782f
Merge pull request #556 from etschannen/feature-remote-logs
Attempted to fix force recovery
2018-07-03 10:51:45 -07:00
Evan Tschannen c69d6166e3 another attempt at forced recovery 2018-07-03 13:42:58 -04:00
Alvin Moore ab255b444f Added support for specifying location of LibreSSL libraries via define TLS_LIBDIR 2018-07-03 09:01:01 -07:00
Evan Tschannen 57a8c6862e fix: force recovery did not work if the latest log set did not recover th 2018-07-02 23:48:22 -04:00
Evan Tschannen 88ddc1c228 Merge branch 'master' into feature-remote-logs 2018-07-02 22:36:23 -04:00
Evan Tschannen 9eb8dc3a59 fix: previous attempt at force recovery did not work because we need to treat the remote logs as local for peeking 2018-07-02 22:35:18 -04:00
Alex Miller 29f560bafe Fix a warning-turned-error about not returning from an unreachable point. 2018-07-02 14:31:06 -07:00
Evan Tschannen b635d40a8f
Merge pull request #553 from etschannen/feature-remote-logs
Minor bug fixes and improvements
2018-07-02 10:10:52 -07:00
Evan Tschannen f2ec80f10d added trace events for cluster controller changing datacenters 2018-07-02 13:06:54 -04:00
Evan Tschannen 604b3bca17 increased the api correctness timeout 2018-07-02 12:51:50 -04:00
Evan Tschannen 334a433238 spend less time before using satellite fallback, because the database will be unavailable during this waiting time 2018-07-02 12:50:52 -04:00
Evan Tschannen 89a4b2cd68 fix: consistency check could loop too long 2018-07-02 12:08:02 -04:00
Steve Atherton abb5100388
Merge pull request #550 from etschannen/feature-remote-logs
Fixed a variety of problems found by Valgrind, and added the untested ability to do an ACI recovery
2018-07-02 00:00:46 -07:00
Evan Tschannen e67f951c06 Merge branch 'master' into feature-remote-logs 2018-07-02 02:18:20 -04:00
Evan Tschannen d3e1067d31
Merge pull request #508 from AlvinMooreSr/tls-static
Added support for compiling TLS into binaries
2018-07-01 23:17:03 -07:00
Alvin Moore c3f88dbfe1 Merge branch 'master' of github.com:apple/foundationdb into tls-static 2018-07-01 23:13:57 -07:00
Alvin Moore 132e2d9267 Defined TLS build flags for projects
Updated TLS documentation
2018-07-01 22:49:39 -07:00
Alec Grieser be873001cc
Merge pull request #532 from drew-richardson/master
Avoid calls that can panic when handling errors
2018-07-01 21:54:18 -07:00
Evan Tschannen b24e272394 Merge branch 'master' into feature-remote-logs 2018-07-02 00:07:25 -04:00
Evan Tschannen 3c9f3da980 fix: usable regions cannot be changed during an emergency transaction, because it could lead to all storage servers dying if the previous primary is dead 2018-07-01 23:59:06 -04:00
Evan Tschannen 73e61312c6 fix: shareLogRange was never initialized 2018-07-01 22:49:24 -04:00
Evan Tschannen 21d03cd1eb fix: we must store the result of range reads before iterating through the results 2018-07-01 21:07:25 -04:00
Evan Tschannen 5054c194e2 Some trace events are logged before FLOW_KNOBS are initialized 2018-07-01 14:30:37 -04:00
Evan Tschannen 7a12d3e130 added the (untested) ability to force a recovery to the remote datacenter, even if that results in data loss. If the DR lag is more than 1 week there could be potential data corruption if any primary storage servers are still alive. 2018-07-01 09:39:04 -04:00
Steve Atherton 7f6bced835
Merge pull request #538 from alexmiller-apple/tlsplugin_san
TLS certificate handling enhancements
2018-07-01 01:50:58 -07:00
Steve Atherton b17c8359ec
Merge pull request #549 from apple/release-5.2
Merge release-5.2 into master
2018-06-30 22:50:07 -07:00