Merge branch 'cassandra-5.0' into trunk

* cassandra-5.0:
  Correct out-of-date metrics and configuration documentation for SAI
This commit is contained in:
Caleb Rackliffe 2024-09-19 14:45:22 -05:00
commit 79998d0b96
3 changed files with 6 additions and 87 deletions

View File

@ -76,6 +76,7 @@
* Add the ability to disable bulk loading of SSTables (CASSANDRA-18781)
* Clean up obsolete functions and simplify cql_version handling in cqlsh (CASSANDRA-18787)
Merged from 5.0:
* Correct out-of-date metrics and configuration documentation for SAI (CASSANDRA-19898)
* Make configuration entries in memtable section order-independent (CASSANDRA-19906)
* Add guardrail for enabling usage of VectorType (CASSANDRA-19903)
* Set executable flag for shell scripts in .build directory for source artifact (CASSANDRA-19896)

View File

@ -3,25 +3,7 @@
// LLP: *NOT DONE*
Configuring your {product} environment for Storage-Attached Indexing (SAI) requires some important customization of the `cassandra.yaml` file.
== Increase file cache above the default value
By default, the file cache's xref:cassandra:managing/configuration/cass_yaml_file.adoc#file_cache_size[file_cache_size] value is calculated as 50% of the `MaxDirectMemorySize` setting.
This default for `file_cache_size` may result in suboptimal performance because Cassandra is not able to take full advantage of available memory.
[TIP]
====
File cache is also known as chunk cache.
====
The `file_cache_size` value can be defined explicitly in `cassandra.yaml`.
The recommendation is to:
. Increase `--XX:MaxDirectMemorySize`, leaving approximately 15-20% of memory for the OS and other in-memory structures.
. In `cassandra.yaml`, explicitly set `file_cache_size` to 75% of that value.
In testing, this configuration improves indexing performance across read, write, and mixed read/write scenarios.
Configuring your {product} environment for Storage-Attached Indexing (SAI) may require some customization of the `cassandra.yaml` file.
== Compaction strategies
@ -45,7 +27,7 @@ In general, do not use `LeveledCompactionStrategy` (LCS) unless your index queri
However, if you decide to use LCS, use the following guidelines:
* The `160` MB default for the `CREATE TABLE` command's `sstable_size_in_mb` option, described in this xref:reference:cql-commands/create-table.adoc#compactSubprop__LCS[topic], may result in suboptimal performance for index queries that do not restrict on token range or partition key.
* While even higher values may be appropriate, depending on your hardware, DataStax recommends at least doubling the default value of `sstable_size_in_mb`.
* While even higher values may be appropriate, depending on your hardware, we recommend at least doubling the default value of `sstable_size_in_mb`.
Example:
@ -63,6 +45,9 @@ Each SAI index should ultimately consume less space on disk because of better lo
If query performance degrades on large (`sstable_max_size` ~2GB) SAI indexed SSTables when the workload is not dominated by reads but is experiencing increased write amplification, consider using Unified Compaction Strategy (UCS).
The `cassandra.yaml` options `sai_sstable_indexes_per_query_warn_threshold` (default: 32) and `sai_sstable_indexes_per_query_fail_threshold` (default: disabled) determine the number of SSTable indexes a SAI query touches before warning clients and failing queries respectively.
When enabled, they can provide feedback for clients and protection for the database in the face of sub-optimal read queries.
== About SAI encryption
With SAI indexes, its on-disk components are simply additional SSTable data.

View File

@ -47,11 +47,7 @@ The categorized data:
* Global indexing metrics
* Table query metrics
* Per query metrics
* Key fetch metrics
* Offset fetch metrics
* Token fetch metrics
* Column query metrics per index
* Terms metrics per index
* Range slice metrics
For example, you can use metrics to get the current count of total partition reads since the node started for `cycling.cyclist_semi_pro`.
@ -101,32 +97,6 @@ The index group metrics for the given keyspace and table:
* `IndexFileCacheBytes` -- Size in bytes of memory used by the on-disk data structure of the per-column indices.
* `OpenIndexFiles` -- Number of open index files for the given table's SAI indices.
== Key fetch metrics
----
ObjectName: org.apache.cassandra.metrics:type=StorageAttachedIndex,keyspace=<keyspace>,table=<table>,scope=KeyFetch,name=<metric>
----
The key fetch metrics for the given keyspace and table:
* `ChunkCacheHitRate` -- All-time chunk cache hit rate for keys during queries against the given table.
* `TotalChunkCacheLookups` -- All-time chunk cache lookups for keys during queries against the given table.
* `TotalChunkCacheMisses` -- All-time chunk cache misses for keys during queries against the given table.
* `ChunkCache(One|Five|Fifteen)HitRate` -- <N>-minute chunk cache hit rate for keys during queries against the given table.
== Offset fetch metrics
----
ObjectName: org.apache.cassandra.metrics:type=StorageAttachedIndex,keyspace=<keyspace>,table=<table>,scope=OffsetFetch,name=<metric>
----
The offset fetch metrics for the given keyspace and table:
* `ChunkCacheHitRate` -- All-time chunk cache hit rate for partition key SSTable offset fetches during queries against the given table.
* `TotalChunkCacheLookups` -- All-time chunk cache lookups for partition key SSTable offset fetches during queries against the given table.
* `TotalChunkCacheMisses` -- All-time chunk cache misses for partition key SSTable offset fetches during queries against the given table.
* `ChunkCache(One|Five|Fifteen)HitRate` -- <N>-minute chunk cache hit rate for partition key SSTable offset fetches during queries against the given table.
== Per query metrics
----
@ -170,30 +140,6 @@ The table state metrics for the given keyspace and table:
* `TotalIndexCount` -- Total number of SAI indices per table.
* `TotalQueryableIndexCount` -- Status of SAI indices per table currently in the `is_querable` state.
== Token fetch metrics
----
ObjectName: org.apache.cassandra.metrics:type=StorageAttachedIndex,keyspace=<keyspace>,table=<table>,scope=TokenFetch,name=<metric>
----
The token fetch metrics for the given keyspace and table:
* `ChunkCacheHitRate` -- All-time chunk cache hit rate for partition key token fetches during queries against the given table.
* `TotalChunkCacheLookups` -- All-time chunk cache lookups for partition key token fetches during queries against the given table.
* `TotalChunkCacheMisses` -- All-time chunk cache misses for partition key token fetches during queries against the given table.
* `ChunkCache(One|Five|Fifteen)HitRate` -- <N>-minute chunk cache hit rate for partition key token fetches during queries against the given table.
== Token skipping metrics
----
ObjectName: org.apache.cassandra.metrics:type=StorageAttachedIndex,keyspace=<keyspace>,table=<table>,scope=TokenSkipping,name=<metric>
----
The token skippping metrics for the given keyspace and table:
* `CacheHits` -- Number of cache hits from token skipping in a multi-index `AND` query.
* `Lookups` -- Number of lookups from token skipping a multi-index `AND` query.
== Column query metrics for each numeric index
----
@ -219,19 +165,6 @@ The column query metrics for the given keyspace, table, and index include:
* `TermsLookupLatency` -- For string indexes, such as `country_sai_idx` in the xref:cassandra:getting-started/sai-quickstart.adoc[quickstart] examples, this metric shows terms lookup latency percentiles (in microseconds) per one/five/fifteen minute query throughput.
== Terms metrics for each string index
----
ObjectName: org.apache.cassandra.metrics:type=StorageAttachedIndex,keyspace=<keyspace>,table=<table>,index=<index>,scope=Terms,name=<metric>
----
For string indexes, the terms metrics for the given keyspace, table, and index:
* `ChunkCacheHitRate` -- All-time chunk cache hit rate for terms during string index queries that used the given index.
* `TotalChunkCacheLookups` -- All-time chunk cache lookups for terms during string index queries that used the given index.
* `TotalChunkCacheMisses` -- All-time chunk cache misses for terms during string index queries that used the given index.
* `ChunkCache(One|Five|Fifteen)HitRate` -- <N>-minute chunk cache hit rate for terms during string index queries that used the given index.
== Range slice metrics
----