foundationdb/segmented-range-reads-java....

250 lines
20 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Segmented Range Reads &#8212; FoundationDB 7.1</title>
<link rel="stylesheet" href="_static/basic.css" type="text/css" />
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
<link rel="stylesheet" href="_static/bootstrap-3.3.4/css/bootstrap.min.css" type="text/css" />
<link rel="stylesheet" href="_static/bootstrap-3.3.4/css/bootstrap-theme.min.css" type="text/css" />
<link rel="stylesheet" href="_static/bootstrap-sphinx.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: './',
VERSION: '7.1.23',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true,
SOURCELINK_SUFFIX: '.txt'
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="_static/js/jquery-1.11.0.min.js"></script>
<script type="text/javascript" src="_static/js/jquery-fix.js"></script>
<script type="text/javascript" src="_static/bootstrap-3.3.4/js/bootstrap.min.js"></script>
<script type="text/javascript" src="_static/bootstrap-sphinx.js"></script>
<link rel="search" title="Search" href="search.html" />
<link rel="next" title="Simple Indexes" href="simple-indexes.html" />
<link rel="prev" title="Segmented Range Reads" href="segmented-range-reads.html" />
<meta charset='utf-8'>
<meta http-equiv='X-UA-Compatible' content='IE=edge,chrome=1'>
<meta name='viewport' content='width=device-width, initial-scale=1.0, maximum-scale=1'>
<meta name="apple-mobile-web-app-capable" content="yes">
</head>
<body role="document">
<div id="navbar" class="navbar navbar-default navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<!-- .btn-navbar is used as the toggle for collapsed navbar content -->
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".nav-collapse">
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a class="navbar-brand" href="index.html">
FoundationDB</a>
<span class="navbar-text navbar-version pull-left"><b>7.1</b></span>
</div>
<div class="collapse navbar-collapse nav-collapse">
<ul class="nav navbar-nav">
<li><a href="contents.html">Site Map</a></li>
<li class="dropdown globaltoc-container">
<a role="button"
id="dLabelGlobalToc"
data-toggle="dropdown"
data-target="#"
href="index.html">Site <b class="caret"></b></a>
<ul class="dropdown-menu globaltoc"
role="menu"
aria-labelledby="dLabelGlobalToc"></ul>
</li>
<li class="dropdown">
<a role="button"
id="dLabelLocalToc"
data-toggle="dropdown"
data-target="#"
href="#">Page <b class="caret"></b></a>
<ul class="dropdown-menu localtoc"
role="menu"
aria-labelledby="dLabelLocalToc"><ul>
<li><a class="reference internal" href="#">Segmented Range Reads</a><ul>
<li><a class="reference internal" href="#goal">Goal</a></li>
<li><a class="reference internal" href="#challenge">Challenge</a></li>
<li><a class="reference internal" href="#explanation">Explanation</a></li>
<li><a class="reference internal" href="#ordering">Ordering</a></li>
<li><a class="reference internal" href="#pattern">Pattern</a></li>
<li><a class="reference internal" href="#extensions">Extensions</a></li>
<li><a class="reference internal" href="#code">Code</a></li>
</ul>
</li>
</ul>
</ul>
</li>
<li>
<a href="segmented-range-reads.html" title="Previous Chapter: Segmented Range Reads"><span class="glyphicon glyphicon-chevron-left visible-sm"></span><span class="hidden-sm hidden-tablet">&laquo; Segmented Range Reads</span>
</a>
</li>
<li>
<a href="simple-indexes.html" title="Next Chapter: Simple Indexes"><span class="glyphicon glyphicon-chevron-right visible-sm"></span><span class="hidden-sm hidden-tablet">Simple Indexes &raquo;</span>
</a>
</li>
</ul>
<form class="navbar-form navbar-right" action="search.html" method="get">
<div class="form-group">
<input type="text" name="q" class="form-control" placeholder="Search" />
</div>
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
</div>
<div class="container">
<div class="row">
<div class="col-md-3">
<div id="sidebar" class="bs-sidenav" role="complementary"><ul>
<li><a class="reference internal" href="#">Segmented Range Reads</a><ul>
<li><a class="reference internal" href="#goal">Goal</a></li>
<li><a class="reference internal" href="#challenge">Challenge</a></li>
<li><a class="reference internal" href="#explanation">Explanation</a></li>
<li><a class="reference internal" href="#ordering">Ordering</a></li>
<li><a class="reference internal" href="#pattern">Pattern</a></li>
<li><a class="reference internal" href="#extensions">Extensions</a></li>
<li><a class="reference internal" href="#code">Code</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div class="col-md-9 content">
<div class="section" id="segmented-range-reads">
<h1>Segmented Range Reads</h1>
<p><a class="reference internal" href="simple-indexes.html"><span class="doc">Python</span></a> <strong>Java</strong></p>
<div class="section" id="goal">
<h2>Goal</h2>
<p>Perform range reads in calibrated batches.</p>
</div>
<div class="section" id="challenge">
<h2>Challenge</h2>
<p>Retrieve data in batches whose size you select based on your data model or application.</p>
</div>
<div class="section" id="explanation">
<h2>Explanation</h2>
<p>FoundationDB supports streaming modes that makes range reads efficient even for large amounts of data. You can usually get good performance by selecting the proper streaming mode. However, there are particular cases in which you may want to exercise finer grained control of data retrieval. You can exercise this control using the limit parameter.</p>
</div>
<div class="section" id="ordering">
<h2>Ordering</h2>
<p>This approach works with arbitrary ranges, which are, by definition, ordered. The goal here is to be able to walk through sub-ranges in order.</p>
</div>
<div class="section" id="pattern">
<h2>Pattern</h2>
<p>A range read returns a container that issues asynchronous reads to the database. The client usually processes the data by iterating over the values returned by the container. The API balances latency and bandwidth by fetching data in batches as determined by the <code class="docutils literal"><span class="pre">streaming_mode</span></code> parameter. Streaming modes allow you to customize this balance based on how you intend to consume the data. The default streaming mode is quite efficient. However, if you anticipate that your range read will retrieve a large amount of data, you should select a streaming mode to match your use case. For example, if you&#8217;re iterating through a large range and testing against a condition that may result in early termination, you can use the <code class="docutils literal"><span class="pre">small</span></code> streaming mode:</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="k">for</span><span class="p">(</span><span class="n">KeyValue</span><span class="w"> </span><span class="n">kv</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">tr</span><span class="p">.</span><span class="na">getRange</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="w"> </span><span class="n">ReadTransaction</span><span class="p">.</span><span class="na">ROW_LIMIT_UNLIMITED</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="n">StreamingMode</span><span class="p">.</span><span class="na">SMALL</span><span class="p">)){</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">haltingCondition</span><span class="p">(</span><span class="n">kv</span><span class="p">.</span><span class="na">getKey</span><span class="p">(),</span><span class="w"> </span><span class="n">kv</span><span class="p">.</span><span class="na">getValue</span><span class="p">())){</span><span class="w"></span>
<span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">System</span><span class="p">.</span><span class="na">out</span><span class="p">.</span><span class="na">println</span><span class="p">(</span><span class="n">Tuple</span><span class="p">.</span><span class="na">fromBytes</span><span class="p">(</span><span class="n">kv</span><span class="p">.</span><span class="na">getKey</span><span class="p">()).</span><span class="na">toString</span><span class="p">()</span><span class="w"></span>
<span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">Tuple</span><span class="p">.</span><span class="na">fromBytes</span><span class="p">(</span><span class="n">kv</span><span class="p">.</span><span class="na">getValue</span><span class="p">()).</span><span class="na">toString</span><span class="p">());</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
</div>
<p>However, in some situations, you may want to explicitly control the number of key-value pairs returned. This may be the case if your data model creates blocks of N key-value pairs, and you want to read M blocks at a time and therefore a sub-range of N x M key-value pairs. You can use the limit parameter for this purpose.</p>
</div>
<div class="section" id="extensions">
<h2>Extensions</h2>
<p><em>Parallel retrieval</em></p>
<p>For very large range reads, you can use multiple clients to perform reads in parallel. In this case, you&#8217;ll want to estimate sub-ranges of roughly equal size based on the distribution of your keys. The <a class="reference internal" href="api-python.html#api-python-locality"><span class="std std-ref">locality</span></a> functions can be used to find the partition boundaries used by the database, which will be roughly uniformly distributed in bytes of data. The partition boundaries can then be used to derive boundaries between sub-ranges for parallel reading.</p>
</div>
<div class="section" id="code">
<h2>Code</h2>
<p>Heres a basic function that successively reads sub-ranges of a size determined by the value of <code class="docutils literal"><span class="pre">LIMIT</span></code>.</p>
<div class="highlight-java"><div class="highlight"><pre><span></span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">getRangeLimited</span><span class="p">(</span><span class="n">TransactionContext</span><span class="w"> </span><span class="n">tcx</span><span class="p">,</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">KeySelector</span><span class="w"> </span><span class="n">begin</span><span class="p">,</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">KeySelector</span><span class="w"> </span><span class="n">end</span><span class="p">){</span><span class="w"></span>
<span class="w"> </span><span class="n">tcx</span><span class="p">.</span><span class="na">run</span><span class="p">(</span><span class="n">tr</span><span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"></span>
<span class="w"> </span><span class="kt">boolean</span><span class="w"> </span><span class="n">keysToCheck</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Tuple</span><span class="o">&gt;</span><span class="w"> </span><span class="n">keysFound</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Tuple</span><span class="o">&gt;</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="n">KeySelector</span><span class="w"> </span><span class="n">n_begin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">KeySelector</span><span class="p">(</span><span class="n">begin</span><span class="p">.</span><span class="na">getKey</span><span class="p">(),</span><span class="kc">true</span><span class="p">,</span><span class="n">begin</span><span class="p">.</span><span class="na">getOffset</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">while</span><span class="p">(</span><span class="n">keysToCheck</span><span class="p">){</span><span class="w"></span>
<span class="w"> </span><span class="n">keysToCheck</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">KeyValue</span><span class="w"> </span><span class="n">kv</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">tr</span><span class="p">.</span><span class="na">getRange</span><span class="p">(</span><span class="n">n_begin</span><span class="p">,</span><span class="w"> </span><span class="n">end</span><span class="p">,</span><span class="w"> </span><span class="n">LIMIT</span><span class="p">)){</span><span class="w"></span>
<span class="w"> </span><span class="n">keysToCheck</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="n">Tuple</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Tuple</span><span class="p">.</span><span class="na">fromBytes</span><span class="p">(</span><span class="n">kv</span><span class="p">.</span><span class="na">getKey</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">keysFound</span><span class="p">.</span><span class="na">size</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"></span>
<span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">!</span><span class="n">t</span><span class="p">.</span><span class="na">equals</span><span class="p">(</span><span class="n">keysFound</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="n">keysFound</span><span class="p">.</span><span class="na">size</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="p">))){</span><span class="w"></span>
<span class="w"> </span><span class="n">keysFound</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="n">t</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">if</span><span class="p">(</span><span class="n">keysToCheck</span><span class="p">){</span><span class="w"></span>
<span class="w"> </span><span class="n">n_begin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">KeySelector</span><span class="p">.</span><span class="na">firstGreaterThan</span><span class="p">(</span><span class="n">keysFound</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="n">keysFound</span><span class="p">.</span><span class="na">size</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="p">).</span><span class="na">pack</span><span class="p">());</span><span class="w"></span>
<span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="w"> </span><span class="n">readableFound</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="k">for</span><span class="p">(</span><span class="n">Tuple</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="n">keysFound</span><span class="p">){</span><span class="w"></span>
<span class="w"> </span><span class="n">readableFound</span><span class="p">.</span><span class="na">add</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="na">get</span><span class="p">(</span><span class="mi">1</span><span class="p">));</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="n">System</span><span class="p">.</span><span class="na">out</span><span class="p">.</span><span class="na">println</span><span class="p">(</span><span class="n">readableFound</span><span class="p">);</span><span class="w"></span>
<span class="w"> </span><span class="n">keysFound</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ArrayList</span><span class="o">&lt;</span><span class="n">Tuple</span><span class="o">&gt;</span><span class="p">();</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="p">}</span><span class="w"></span>
<span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span><span class="w"></span>
<span class="w"> </span><span class="p">});</span><span class="w"></span>
<span class="p">}</span><span class="w"></span>
</pre></div>
</div>
</div>
</div>
</div>
</div>
</div>
<footer class="footer">
<div class="container">
<p class="pull-right">
<a href="#">Back to top</a>
<br/>
<div id="sourcelink">
<a href="_sources/segmented-range-reads-java.rst.txt"
rel="nofollow">Source</a>
</div>
</p>
<p>
&copy; Copyright 2013-2021 Apple, Inc and the FoundationDB project authors.<br/>
Last updated on Sep 19, 2022.<br/>
</p>
</div>
</footer>
</body>
</html>