langchain4j/pom.xml

45 lines
1.6 KiB
XML
Raw Normal View History

2023-06-24 15:07:23 +08:00
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-aggregator</artifactId>
2023-09-29 20:27:51 +08:00
<version>0.23.0</version>
2023-06-24 15:07:23 +08:00
<packaging>pom</packaging>
<modules>
<module>langchain4j-parent</module>
<module>langchain4j-bom</module>
<module>langchain4j-core</module>
<module>langchain4j</module>
2023-07-03 02:46:24 +08:00
<module>langchain4j-spring-boot-starter</module>
<!-- model providers -->
<module>langchain4j-azure-open-ai</module>
2023-11-10 20:47:13 +08:00
<module>langchain4j-bedrock</module>
<module>langchain4j-dashscope</module>
<module>langchain4j-hugging-face</module>
<module>langchain4j-local-ai</module>
<module>langchain4j-open-ai</module>
<module>langchain4j-vertex-ai</module>
<module>langchain4j-ollama</module>
<!-- embedding stores -->
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<module>langchain4j-cassandra</module>
<module>langchain4j-chroma</module>
<module>langchain4j-elasticsearch</module>
<module>langchain4j-milvus</module>
<module>langchain4j-opensearch</module>
<module>langchain4j-pinecone</module>
<module>langchain4j-pgvector</module>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<module>langchain4j-redis</module>
<module>langchain4j-vespa</module>
<module>langchain4j-weaviate</module>
2023-11-10 20:47:13 +08:00
2023-06-24 15:07:23 +08:00
</modules>
</project>