351 lines
27 KiB
HTML
351 lines
27 KiB
HTML
<!DOCTYPE html>
|
||
|
||
<html lang="en">
|
||
<head>
|
||
<meta charset="utf-8" />
|
||
<meta name="viewport" content="width=device-width, initial-scale=1.0" /><meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
|
||
|
||
<title>CAP Theorem — FoundationDB ON documentation</title>
|
||
<link rel="stylesheet" type="text/css" href="_static/pygments.css" />
|
||
<link rel="stylesheet" type="text/css" href="_static/bootstrap-sphinx.css" />
|
||
<script data-url_root="./" id="documentation_options" src="_static/documentation_options.js"></script>
|
||
<script src="_static/jquery.js"></script>
|
||
<script src="_static/underscore.js"></script>
|
||
<script src="_static/_sphinx_javascript_frameworks_compat.js"></script>
|
||
<script src="_static/doctools.js"></script>
|
||
<link rel="search" title="Search" href="search.html" />
|
||
<link rel="next" title="Consistency" href="consistency.html" />
|
||
<link rel="prev" title="Transaction Manifesto" href="transaction-manifesto.html" />
|
||
<meta charset='utf-8'>
|
||
<meta http-equiv='X-UA-Compatible' content='IE=edge,chrome=1'>
|
||
<meta name='viewport' content='width=device-width, initial-scale=1.0, maximum-scale=1'>
|
||
<meta name="apple-mobile-web-app-capable" content="yes">
|
||
<script type="text/javascript" src="_static/js/jquery-1.12.4.min.js"></script>
|
||
<script type="text/javascript" src="_static/js/jquery-fix.js"></script>
|
||
<script type="text/javascript" src="_static/bootstrap-3.4.1/js/bootstrap.min.js"></script>
|
||
<script type="text/javascript" src="_static/bootstrap-sphinx.js"></script>
|
||
|
||
</head><body>
|
||
|
||
<div id="navbar" class="navbar navbar-default navbar-fixed-top">
|
||
<div class="container">
|
||
<div class="navbar-header">
|
||
<!-- .btn-navbar is used as the toggle for collapsed navbar content -->
|
||
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".nav-collapse">
|
||
<span class="icon-bar"></span>
|
||
<span class="icon-bar"></span>
|
||
<span class="icon-bar"></span>
|
||
</button>
|
||
<a class="navbar-brand" href="index.html">
|
||
FoundationDB</a>
|
||
<span class="navbar-text navbar-version pull-left"><b>7.3.57</b></span>
|
||
</div>
|
||
|
||
<div class="collapse navbar-collapse nav-collapse">
|
||
<ul class="nav navbar-nav">
|
||
|
||
<li><a href="contents.html">Site Map</a></li>
|
||
|
||
|
||
<li class="dropdown globaltoc-container">
|
||
<a role="button"
|
||
id="dLabelGlobalToc"
|
||
data-toggle="dropdown"
|
||
data-target="#"
|
||
href="index.html">Site <b class="caret"></b></a>
|
||
<ul class="dropdown-menu globaltoc"
|
||
role="menu"
|
||
aria-labelledby="dLabelGlobalToc"><ul class="current">
|
||
<li class="toctree-l1"><a class="reference internal" href="local-dev.html">Local Development</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="internal-dev-tools.html">Internal Dev Tools</a></li>
|
||
<li class="toctree-l1 current"><a class="reference internal" href="why-foundationdb.html">Why FoundationDB</a><ul class="current">
|
||
<li class="toctree-l2"><a class="reference internal" href="transaction-manifesto.html">Transaction Manifesto</a></li>
|
||
<li class="toctree-l2 current"><a class="current reference internal" href="#">CAP Theorem</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="consistency.html">Consistency</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="scalability.html">Scalability</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="technical-overview.html">Technical Overview</a><ul>
|
||
<li class="toctree-l2"><a class="reference internal" href="architecture.html">Architecture</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="performance.html">Performance</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="benchmarking.html">Benchmarking</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="engineering.html">Engineering</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="features.html">Features</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="layer-concept.html">Layer Concept</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="anti-features.html">Anti-Features</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="experimental-features.html">Experimental-Features</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="transaction-processing.html">Transaction Processing</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="fault-tolerance.html">Fault Tolerance</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="flow.html">Flow</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="testing.html">Simulation and Testing</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="kv-architecture.html">FoundationDB Architecture</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="read-write-path.html">FDB Read and Write Path</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="ha-write-path.html">FDB HA Write Path: How a mutation travels in FDB HA</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="consistency-check-urgent.html">Consistency Checker Urgent</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="client-design.html">Client Design</a><ul>
|
||
<li class="toctree-l2"><a class="reference internal" href="getting-started-mac.html">Getting Started on macOS</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="getting-started-linux.html">Getting Started on Linux</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="downloads.html">Downloads</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="developer-guide.html">Developer Guide</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="data-modeling.html">Data Modeling</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="client-testing.html">Client Testing</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="client-testing.html#testing-error-handling-with-buggify">Testing Error Handling with Buggify</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="client-testing.html#simulation-and-cluster-workloads">Simulation and Cluster Workloads</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="client-testing.html#api-tester">API Tester</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="api-general.html">Using FoundationDB Clients</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="transaction-tagging.html">Transaction Tagging</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="known-limitations.html">Known Limitations</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="transaction-profiler-analyzer.html">Transaction profiling and analyzing</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="api-version-upgrade-guide.html">API Version Upgrade Guide</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="tenants.html">Tenants</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="automatic-idempotency.html">Automatic Idempotency</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="design-recipes.html">Design Recipes</a><ul>
|
||
<li class="toctree-l2"><a class="reference internal" href="blob.html">Blob</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="blob-java.html">Blob</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="hierarchical-documents.html">Hierarchical Documents</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="hierarchical-documents-java.html">Hierarchical Documents</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="multimaps.html">Multimaps</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="multimaps-java.html">Multimaps</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="priority-queues.html">Priority Queues</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="priority-queues-java.html">Priority Queues</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="queues.html">Queues</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="queues-java.html">Queues</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="segmented-range-reads.html">Segmented Range Reads</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="segmented-range-reads-java.html">Segmented Range Reads</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="simple-indexes.html">Simple Indexes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="simple-indexes-java.html">Simple Indexes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="spatial-indexing.html">Spatial Indexing</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="spatial-indexing-java.html">Spatial Indexing</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="subspace-indirection.html">Subspace Indirection</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="subspace-indirection-java.html">Subspace Indirection</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="tables.html">Tables</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="tables-java.html">Tables</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="vector.html">Vector</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="vector-java.html">Vector</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="api-reference.html">API Reference</a><ul>
|
||
<li class="toctree-l2"><a class="reference internal" href="api-python.html">Python API</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="api-ruby.html">Ruby API</a></li>
|
||
<li class="toctree-l2"><a class="reference external" href="relative://javadoc/index.html">Java API</a></li>
|
||
<li class="toctree-l2"><a class="reference external" href="https://godoc.org/github.com/apple/foundationdb/bindings/go/src/fdb">Go API</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="api-c.html">C API</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="api-error-codes.html">Error Codes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="special-keys.html">Special Keys</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="global-configuration.html">Global Configuration</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="tutorials.html">Tutorials</a><ul>
|
||
<li class="toctree-l2"><a class="reference internal" href="class-scheduling.html">Class Scheduling</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="largeval.html">Managing Large Values and Blobs</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="time-series.html">Time-Series Data</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="administration.html">Administration</a><ul>
|
||
<li class="toctree-l2"><a class="reference internal" href="configuration.html">Configuration</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="moving-a-cluster.html">Moving a Cluster to New Machines</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="tls.html">Transport Layer Security</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="authorization.html">Authorization</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="monitored-metrics.html"><strong>Monitored Metrics</strong></a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="redwood.html">Redwood Storage Engine</a></li>
|
||
<li class="toctree-l1"><a class="reference internal" href="visibility.html">Visibility Documents</a><ul>
|
||
<li class="toctree-l2"><a class="reference internal" href="request-tracing.html">Request Tracing</a></li>
|
||
</ul>
|
||
</li>
|
||
<li class="toctree-l1"><a class="reference internal" href="earlier-release-notes.html">Earlier Release Notes</a><ul>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-014.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-016.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-021.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-022.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-023.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-100.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-200.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-300.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-400.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-410.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-420.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-430.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-440.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-450.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-460.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-500.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-510.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-520.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-600.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-610.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-620.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-630.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-700.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-710.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-720.html">Release Notes</a></li>
|
||
<li class="toctree-l2"><a class="reference internal" href="release-notes/release-notes-730.html">Release Notes</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
</ul>
|
||
</li>
|
||
|
||
<li class="dropdown">
|
||
<a role="button"
|
||
id="dLabelLocalToc"
|
||
data-toggle="dropdown"
|
||
data-target="#"
|
||
href="#">Page <b class="caret"></b></a>
|
||
<ul class="dropdown-menu localtoc"
|
||
role="menu"
|
||
aria-labelledby="dLabelLocalToc"><ul>
|
||
<li><a class="reference internal" href="#">CAP Theorem</a><ul>
|
||
<li><a class="reference internal" href="#what-is-the-cap-theorem">What is the CAP Theorem?</a></li>
|
||
<li><a class="reference internal" href="#what-does-choosing-availability-mean">What does choosing Availability mean?</a></li>
|
||
<li><a class="reference internal" href="#where-s-the-confusion">Where’s the confusion?</a></li>
|
||
<li><a class="reference internal" href="#what-does-foundationdb-choose">What does FoundationDB choose?</a></li>
|
||
<li><a class="reference internal" href="#foundationdb-fault-tolerance">FoundationDB fault tolerance</a></li>
|
||
<li><a class="reference internal" href="#example-a-minimal-configuration">Example: a minimal configuration</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
</ul>
|
||
</li>
|
||
|
||
|
||
|
||
|
||
|
||
<li>
|
||
<a href="transaction-manifesto.html" title="Previous Chapter: Transaction Manifesto"><span class="glyphicon glyphicon-chevron-left visible-sm"></span><span class="hidden-sm hidden-tablet">« Transaction Manifesto</span>
|
||
</a>
|
||
</li>
|
||
<li>
|
||
<a href="consistency.html" title="Next Chapter: Consistency"><span class="glyphicon glyphicon-chevron-right visible-sm"></span><span class="hidden-sm hidden-tablet">Consistency »</span>
|
||
</a>
|
||
</li>
|
||
|
||
|
||
|
||
|
||
|
||
</ul>
|
||
|
||
|
||
|
||
<form class="navbar-form navbar-right" action="search.html" method="get">
|
||
<div class="form-group">
|
||
<input type="text" name="q" class="form-control" placeholder="Search" />
|
||
</div>
|
||
<input type="hidden" name="check_keywords" value="yes" />
|
||
<input type="hidden" name="area" value="default" />
|
||
</form>
|
||
|
||
</div>
|
||
</div>
|
||
</div>
|
||
|
||
<div class="container">
|
||
<div class="row">
|
||
<div class="col-md-3">
|
||
<div id="sidebar" class="bs-sidenav" role="complementary"><ul>
|
||
<li><a class="reference internal" href="#">CAP Theorem</a><ul>
|
||
<li><a class="reference internal" href="#what-is-the-cap-theorem">What is the CAP Theorem?</a></li>
|
||
<li><a class="reference internal" href="#what-does-choosing-availability-mean">What does choosing Availability mean?</a></li>
|
||
<li><a class="reference internal" href="#where-s-the-confusion">Where’s the confusion?</a></li>
|
||
<li><a class="reference internal" href="#what-does-foundationdb-choose">What does FoundationDB choose?</a></li>
|
||
<li><a class="reference internal" href="#foundationdb-fault-tolerance">FoundationDB fault tolerance</a></li>
|
||
<li><a class="reference internal" href="#example-a-minimal-configuration">Example: a minimal configuration</a></li>
|
||
</ul>
|
||
</li>
|
||
</ul>
|
||
|
||
</div>
|
||
</div>
|
||
<div class="body col-md-9 content" role="main">
|
||
|
||
<section id="cap-theorem">
|
||
<h1>CAP Theorem</h1>
|
||
<p>A database <em>can</em> provide strong consistency <em>and</em> system availability during network partitions. The common belief that this combination is impossible is based on a misunderstanding of the CAP theorem.</p>
|
||
<section id="what-is-the-cap-theorem">
|
||
<h2>What is the CAP Theorem?</h2>
|
||
<p>In 2000, Eric Brewer conjectured that a distributed system cannot simultaneously provide all three of the following desirable properties:</p>
|
||
<ul class="simple">
|
||
<li><p>Consistency: A read sees all previously completed writes.</p></li>
|
||
<li><p>Availability: Reads and writes always succeed.</p></li>
|
||
<li><p>Partition tolerance: Guaranteed properties are maintained even when network failures prevent some machines from communicating with others.</p></li>
|
||
</ul>
|
||
<p>In 2002, Gilbert and Lynch proved this in the asynchronous and partially synchronous network models, so it is now commonly called the <a class="reference external" href="http://en.wikipedia.org/wiki/CAP_theorem">CAP Theorem</a>.</p>
|
||
<p>Brewer originally described this impossibility result as forcing a choice of “two out of the three” <strong>CAP</strong> properties, leaving three viable design options: <strong>CP</strong>, <strong>AP</strong>, and <strong>CA</strong>. However, further consideration shows that <strong>CA</strong> is not really a coherent option because a system that is not Partition-tolerant will, by definition, be forced to give up Consistency or Availability during a partition. Therefore, a more <a class="reference external" href="http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html">modern interpretation</a> of the theorem is: <em>during a network partition, a distributed system must choose either Consistency or Availability.</em></p>
|
||
</section>
|
||
<section id="what-does-choosing-availability-mean">
|
||
<h2>What does choosing Availability mean?</h2>
|
||
<p>Let’s consider an <strong>AP</strong> database. In such a database, reads and writes would always succeed, even when network connectivity is unavailable between nodes. If possible, these would certainly seem like desirable properties!</p>
|
||
<p>However, the downside is stark. Imagine a simple distributed database consisting of two nodes and a network partition making them unable to communicate. To be Available, each of the two nodes must continue to accept writes from clients.</p>
|
||
<figure class="align-default" id="id2">
|
||
<img alt="_images/AP_Partition.png" src="_images/AP_Partition.png" />
|
||
<figcaption>
|
||
<p><span class="caption-text">Data divergence in an AP system during partition</span></p>
|
||
</figcaption>
|
||
</figure>
|
||
<p>Of course, because the partition makes communication impossible, a write on one node cannot be seen by the other. Such a system is now “a database” in name only. As long as the partition lasts, the system is fully equivalent to two independent databases whose contents need not even be related, much less consistent.</p>
|
||
</section>
|
||
<section id="where-s-the-confusion">
|
||
<h2>Where’s the confusion?</h2>
|
||
<p>Confusion about the CAP theorem usually involves the interpretation of the Availability property. Availability in the <strong>CAP</strong> sense means that <em>all</em> nodes remain able to read and write even when partitioned. A system that keeps some, but not all, of its nodes able to read and write is not Available in the <strong>CAP</strong> sense, <em>even if it remains available to clients</em> and satisfies its SLAs for <a class="reference external" href="http://en.wikipedia.org/wiki/High_availability">high availability</a>.</p>
|
||
</section>
|
||
<section id="what-does-foundationdb-choose">
|
||
<h2>What does FoundationDB choose?</h2>
|
||
<p>As any ACID database must, during a network partition FoundationDB chooses Consistency over Availability. This does <em>not</em> mean that the database becomes unavailable for clients. When multiple machines or datacenters hosting a FoundationDB database are unable to communicate, <em>some</em> of them will be unable to execute writes. In a wide variety of real-world cases, the database and the application using it will remain up. A network partition affecting some machines is no worse than a failure of those same machines, which FoundationDB handles gracefully due to its fault tolerant design.</p>
|
||
</section>
|
||
<section id="foundationdb-fault-tolerance">
|
||
<h2>FoundationDB fault tolerance</h2>
|
||
<p>FoundationDB’s design goal is to make sure that, even if some machines are down or unable to communicate reliably with the network, the database and the application connected to it remain up. This is high availability as usually understood, but it is <em>not</em> Availability in the <strong>CAP</strong> sense because the database will be unavailable <em>on the affected machines</em>.</p>
|
||
<p>FoundationDB seeks to present user applications with a single (logical) database. The challenge of handling a partition is to determine which machines should continue to accept reads and writes. To make this determination, FoundationDB is configured with set of <em>coordination servers</em>. FoundationDB selects the partition in which a majority of these servers are available as the one that will remain responsive. If failures are so pervasive that there is <em>no</em> such partition, then the database really will be unavailable.</p>
|
||
<p>The coordination servers use the <a class="reference external" href="http://en.wikipedia.org/wiki/Paxos_(computer_science)">Paxos</a> algorithm to maintain a small amount of shared state that itself is Consistent and Partition-tolerant. Like the database as a whole, the shared state is not Available but <em>is</em> available for reads and writes in the partition containing a majority of the coordination servers.</p>
|
||
<p>FoundationDB uses this shared state to maintain and update a replication topology. When a failure occurs, the coordination servers are used to change the replication topology. It’s worth noting that the coordination servers aren’t involved at all in committing transactions.</p>
|
||
</section>
|
||
<section id="example-a-minimal-configuration">
|
||
<h2>Example: a minimal configuration</h2>
|
||
<p>To illustrate how the coordination servers support fault tolerance, let’s look at a FoundationDB cluster of the minimal size that allows for data replication. Of course, the fault tolerance and availability provided by coordination are higher when the cluster is larger.</p>
|
||
<p>Imagine a small web startup that wants its application, served by FoundationDB within a datacenter, to stay available even if a machine fails. It sets up a cluster of three machines - A, B, and C - each running a database server and a coordination server. Applying the majority rule to this cluster, any pair of machines that can communicate will remain available. The startup configures FoundationDB in its <code class="docutils literal notranslate"><span class="pre">double</span></code> redundancy mode, in which the system will make two copies of each piece of data, each on a different machine.</p>
|
||
<p>Imagine that a rack-top switch fails, and A is partitioned from the network. A will be unable to commit new transactions because FoundationDB requires an acknowledgment from B or C. The database server on A can only communicate with the coordination server on A, so it will not be able to achieve a majority to set up a new replication topology. For any client communicating only with A, the database is down.</p>
|
||
<p>However, for all other clients, the database servers can reach a majority of coordination servers, B and C. The replication configuration has ensured there is a full copy of the data available even without A. For these clients, the database will remain available for reads and writes and the web servers will continue to serve traffic.</p>
|
||
<figure class="align-default" id="id3">
|
||
<img alt="_images/FDB_Partition.png" src="_images/FDB_Partition.png" />
|
||
<figcaption>
|
||
<p><span class="caption-text">Maintenance of availability during partition</span></p>
|
||
</figcaption>
|
||
</figure>
|
||
<p>When the partition ends, A will again be able to communicate with the majority of coordination servers and will rejoin the database. Depending on how long the communications failure lasted, A will rejoin by either receiving transactions that occurred in its absence or, in the worst case, transferring the contents of the database. After A has rejoined the database, all machines will again be able to handle transactions in a fault tolerant manner.</p>
|
||
<p>In contrast to the minimal cluster above, an actual production system would typically be configured in <code class="docutils literal notranslate"><span class="pre">triple</span></code> redundancy mode on five or more machines, giving it correspondingly higher availability. For further details, read our discussion of <a class="reference internal" href="fault-tolerance.html"><span class="doc">fault tolerance</span></a>.</p>
|
||
</section>
|
||
</section>
|
||
|
||
|
||
</div>
|
||
|
||
</div>
|
||
</div>
|
||
<footer class="footer">
|
||
<div class="container">
|
||
<p class="pull-right">
|
||
<a href="#">Back to top</a>
|
||
|
||
<br/>
|
||
|
||
<div id="sourcelink">
|
||
<a href="_sources/cap-theorem.rst.txt"
|
||
rel="nofollow">Source</a>
|
||
</div>
|
||
|
||
</p>
|
||
<p>
|
||
© Copyright 2013-2022 Apple, Inc and the FoundationDB project authors.<br/>
|
||
Last updated on Nov 20, 2024.<br/>
|
||
</p>
|
||
</div>
|
||
</footer>
|
||
</body>
|
||
</html> |