#### Context
Apache Cassandra is a popular open-source database created back in 2008.
This year with
[CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes)
support for vector and similarity searches have been introduced.
Cassandra is very fast in read and write and is used as a cache by many
companies, it as an opportunity to implement the ChatMemoryStore. This
feature is expected for Cassandra 5 at the end of the year but some
docker images are already available.
DataStax AstraDb is a distribution of Apache Cassandra available as Saas
providing a free tier (free forever) of 80 millions queries/month.
[Registration](https://astra.datastax.com). The vector capability is
there production ready.
#### Data Modelling
With the proper data model in Cassandra we can perform both similarity
search, keyword search, metadata search.
```sql
CREATE TABLE sample_vector_table (
row_id text PRIMARY KEY,
attributes_blob text,
body_blob text,
metadata_s map<text, text>,
vector vector<float, 1536>
);
```
#### Implementation Throughts
- The **configuration** to connect to Astra and Cassandra are not
exactly the same so 2 different classes with associated builder are
provided:
[Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java)
and [OSS
Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java).
A couple of fields are mutualized but creating a superclass to inherit
from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not
able to found out what to do.
- Instead of passing a large number of arguments like other stores I
prefer to wrap them as a bean. With this trick you can add or remove
attributes, make then optional or mandatory at will. If you need to add
a new attribute in the configuration you do not have to change the
implementation of `XXXStore` and `XXXStoreImpl`
- I create an
[AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java)
that could very well become the super class for any store. It handles
the different call of the real concrete implementation. (_delegate
pattern_). Some default implementation can be implemented
```java
/**
* Add a list of embeddings to the store.
*
* @param embeddings
* list of embeddings (hold vector)
* @return
* list of ids
*/
@Override
public List<String> addAll(List<Embedding> embeddings) {
Objects.requireNonNull(embeddings, "embeddings must not be null");
return embeddings.stream().map(this::add).collect(Collectors.toList());
}
```
The only method to implement at the Store level is:
```java
/**
* Initialize the concrete implementation.
* @return create implementation class for the store
*/
protected abstract EmbeddingStore<T> loadImplementation()
throws ClassNotFoundException, NoSuchMethodException, InstantiationException,
IllegalAccessException, InvocationTargetException;
```
-
[CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30)
proposes 2 constructors, one could override the implementation class if
they want (extension point)
#### Tests
- Test classes are provided including some long form examples based on
classed found in `langchain4j-examples` but test are disabled.
- To start a local cassandra use docker and the
[docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml)
```
docker compose up -d
```
- To run Test with Astra signin with your github account, create a token
(api Key) with role `Organization Administrator` following this
[procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure)
<img width="926" alt="Screenshot 2023-09-06 at 18 14 12"
src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1">
- Pick the full value of the `token` from the json
<img width="713" alt="Screenshot 2023-09-06 at 18 15 53"
src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997">
- Create the environment variable `ASTRA_DB_APPLICATION_TOKEN`
```console
export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token>
```