langchain4j/langchain4j-cassandra/pom.xml

104 lines
3.3 KiB
XML
Raw Normal View History

Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<artifactId>langchain4j-cassandra</artifactId>
<name>LangChain4j integration with Cassandra and AstraDb</name>
<description>Some dependencies have a "Public Domain" license</description>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<parent>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-parent</artifactId>
2024-01-30 23:18:40 +08:00
<version>0.27.0-SNAPSHOT</version>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<relativePath>../langchain4j-parent/pom.xml</relativePath>
</parent>
<properties>
<astra-sdk.version>0.6.11</astra-sdk.version>
</properties>
<dependencies>
<dependency>
<groupId>dev.langchain4j</groupId>
2023-11-18 22:33:17 +08:00
<artifactId>langchain4j-core</artifactId>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>com.datastax.astra</groupId>
<artifactId>astra-sdk-vector</artifactId>
<version>${astra-sdk.version}</version>
<exclusions>
<exclusion>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- removing cve -->
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
Bump org.json:json from 20230618 to 20231013 in /langchain4j-cassandra (#341) Bumps [org.json:json](https://github.com/douglascrockford/JSON-java) from 20230618 to 20231013. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/douglascrockford/JSON-java/releases">org.json:json's releases</a>.</em></p> <blockquote> <h2>20231013</h2> <table> <thead> <tr> <th>Pull Request</th> <th>Description</th> </tr> </thead> <tbody> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/793">#793</a></td> <td>Reverted <a href="https://redirect.github.com/douglascrockford/JSON-java/issues/761">#761</a></td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/792">#792</a></td> <td>update the docs for release 20231013</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/783">#783</a></td> <td>optLong vs getLong inconsistencies</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/782">#782</a></td> <td>Fix XMLTest.testIndentComplicatedJsonObjectWithArrayAndWithConfig() for Windows</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/779">#779</a></td> <td>add validity check for JSONObject constructors</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/778">#778</a></td> <td>Fix XMLTest.testIndentComplicatedJsonObjectWithArrayAndWithConfig() for Windows</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/776">#776</a></td> <td>Update [JUnit to version 4.13.2</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/774">#774</a></td> <td>Removing unneeded synchronization</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/773">#773</a></td> <td>Add optJSONArray method to JSONObject with a default value</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/772">#772</a></td> <td>Disallow nested objects and arrays as keys in objects</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/779">#779</a></td> <td>Unit test cleanup</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/769">#769</a></td> <td>Addressed Java 17 compile warnings</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/764">#764</a></td> <td>Update CodeQL action version</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/761">#761</a></td> <td>Add module-info</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/759">#759</a></td> <td>JSON parsing should detect embedded </td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/753">#753</a></td> <td>Updated new object methods</td> </tr> <tr> <td><a href="https://redirect.github.com/douglascrockford/JSON-java/issues/752">#752</a></td> <td>Fixes possible unit test bug when compiling/testing on Windows</td> </tr> </tbody> </table> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/stleary/JSON-java/blob/master/docs/RELEASES.md">org.json:json's changelog</a>.</em></p> <blockquote> <p>20231013 First release with minimum Java version 1.8. Recent commits, including fixes for CVE-2023-5072.</p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li>See full diff in <a href="https://github.com/douglascrockford/JSON-java/commits">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=org.json:json&package-manager=maven&previous-version=20230618&new-version=20231013)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/langchain4j/langchain4j/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: LangChain4j <langchain4j@gmail.com>
2023-12-13 02:22:31 +08:00
<version>20231013</version>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
</dependency>
<dependency>
<groupId>commons-beanutils</groupId>
<artifactId>commons-beanutils</artifactId>
<version>1.9.4</version>
</dependency>
<dependency>
<groupId>org.junit.jupiter</groupId>
<artifactId>junit-jupiter-engine</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<version>${assertj.version}</version>
<scope>test</scope>
</dependency>
2023-11-18 22:33:17 +08:00
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j</artifactId>
<version>${project.parent.version}</version>
2023-11-18 22:33:17 +08:00
<scope>test</scope>
</dependency>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-open-ai</artifactId>
<version>${project.parent.version}</version>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.honton.chas</groupId>
<artifactId>license-maven-plugin</artifactId>
<configuration>
<!-- org.json:json has a "Public Domain" license -->
<skipCompliance>true</skipCompliance>
</configuration>
</plugin>
</plugins>
</build>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
</project>