langchain4j/pom.xml

175 lines
6.8 KiB
XML
Raw Permalink Normal View History

2023-06-24 15:07:23 +08:00
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-aggregator</artifactId>
2024-09-09 16:11:09 +08:00
<version>0.35.0-SNAPSHOT</version>
2023-06-24 15:07:23 +08:00
<packaging>pom</packaging>
<name>LangChain4j :: Aggregator</name>
2023-06-24 15:07:23 +08:00
<properties>
<gib.disable>true</gib.disable>
</properties>
2023-06-24 15:07:23 +08:00
<modules>
<module>langchain4j-parent</module>
<module>langchain4j-bom</module>
<module>langchain4j-core</module>
<module>langchain4j</module>
POC: Easy RAG (#686) Implementing RAG applications is hard. Especially for those who are just getting started exploring LLMs and RAG. This PR introduces an "Easy RAG" feature that should help developers to get started with RAG as easy as possible. With it, there is no need to learn about chunking/splitting/segmentation, embeddings, embedding models, vector databases, retrieval techniques and other RAG-related concepts. This is similar to how one can simply upload one or multiple files into [OpenAI Assistants API](https://platform.openai.com/docs/assistants/overview) and the LLM will automagically know about their contents when answering questions. Easy RAG is using local embedding model running in your CPU (GPU support can be added later). Your files are ingested into an in-memory embedding store. Please note that "Easy RAG" will not replace manual RAG setups and especially [advanced RAG techniques](https://github.com/langchain4j/langchain4j/pull/538), but will provide an easier way to get started with RAG. The quality of an "Easy RAG" should be sufficient for demos, proof of concepts and for getting started. To use "Easy RAG", simply import `langchain4j-easy-rag` dependency that includes everything needed to do RAG: - Apache Tika document loader (to parse all document types automatically) - Quantized [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) in-process embedding model which has an impressive (for it's size) 51.68 [score](https://huggingface.co/spaces/mteb/leaderboard) for retrieval Here is the proposed API: ```java List<Document> documents = FileSystemDocumentLoader.loadDocuments(directoryPath); // one can also load documents recursively and filter with glob/regex EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>(); // we will use an in-memory embedding store for simplicity EmbeddingStoreIngestor.ingest(documents, embeddingStore); Assistant assistant = AiServices.builder(Assistant.class) .chatLanguageModel(model) .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore)) .build(); String answer = assistant.chat("Who is Charlie?"); // Charlie is a carrot... ``` `FileSystemDocumentLoader` in the above code loads documents using `DocumentParser` available in classpath via SPI, in this case an `ApacheTikaDocumentParser` imported with the `langchain4j-easy-rag` dependency. The `EmbeddingStoreIngestor` in the above code: - splits documents into smaller text segments using a `DocumentSplitter` loaded via SPI from the `langchain4j-easy-rag` dependency. Currently it uses `DocumentSplitters.recursive(300, 30, new HuggingFaceTokenizer())` - embeds text segments using an `AllMiniLmL6V2QuantizedEmbeddingModel` loaded via SPI from the `langchain4j-easy-rag` dependency - stores text segments and their embeddings into the specified embedding store When using `InMemoryEmbeddingStore`, one can serialize/persist it into a JSON string on into a file. This way one can skip loading documents and embedding them on each application run. It is easy to customize the ingestion in the above code, just change ```java EmbeddingStoreIngestor.ingest(documents, embeddingStore); ``` into ```java EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder() //.documentTransformer(...) // you can optionally transform (clean, enrich, etc) documents before splitting //.documentSplitter(...) // you can optionally specify another splitter //.textSegmentTransformer(...) // you can optionally transform (clean, enrich, etc) segments before embedding //.embeddingModel(...) // you can optionally specify another embedding model to use for embedding .embeddingStore(embeddingStore) .build(); ingestor.ingest(documents) ``` Over time, we can add an auto-eval feature that will find the most suitable hyperparametes for a given documents (e.g. which embedding model to use, which splitting method, possibly advanced RAG techniques, etc.) so that "easy RAG" can be comparable to the "advanced RAG". Related: https://github.com/langchain4j/langchain4j-embeddings/pull/16 --------- Co-authored-by: dliubars <dliubars@redhat.com>
2024-03-22 00:37:38 +08:00
<module>langchain4j-easy-rag</module>
<!-- model providers -->
<module>langchain4j-anthropic</module>
<module>langchain4j-azure-open-ai</module>
2023-11-10 20:47:13 +08:00
<module>langchain4j-bedrock</module>
<module>langchain4j-chatglm</module>
<module>langchain4j-cohere</module>
<module>langchain4j-dashscope</module>
<module>langchain4j-hugging-face</module>
<module>langchain4j-jlama</module>
<module>langchain4j-jina</module>
<module>langchain4j-local-ai</module>
<module>langchain4j-mistral-ai</module>
<module>langchain4j-nomic</module>
<module>langchain4j-ollama</module>
<module>langchain4j-ovh-ai</module>
<module>langchain4j-open-ai</module>
<module>langchain4j-qianfan</module>
<module>langchain4j-github-models</module>
<module>langchain4j-google-ai-gemini</module>
<module>langchain4j-vertex-ai</module>
<module>langchain4j-vertex-ai-gemini</module>
<module>langchain4j-workers-ai</module>
<module>langchain4j-zhipu-ai</module>
Integration with Voyage (#1816) ## Issue Closes #1814 and #1813 ## Change 1. Integration `EmbeddingModel` and `ScoringModel` with `Voyage`. 2. Add document about `VoyageEmbeddingModel` and `VoyageScoringModel` Related PR: 1. [Voyage Example](https://github.com/langchain4j/langchain4j-examples/pull/109) 2. [Voyage Spring Boot](https://github.com/langchain4j/langchain4j-spring/pull/42) ## General checklist <!-- Please double-check the following points and mark them like this: [X] --> - [x] There are no breaking changes - [x] I have added unit and integration tests for my change - [x] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [x] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [x] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [x] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) - [x] I have added/updated [Spring Boot starter(s)](https://github.com/langchain4j/langchain4j-spring) (if applicable) ## Checklist for adding new model integration <!-- Please double-check the following points and mark them like this: [X] --> - [x] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
2024-09-24 14:59:01 +08:00
<module>langchain4j-voyage-ai</module>
<!-- embedding stores -->
<module>langchain4j-azure-ai-search</module>
<module>langchain4j-azure-cosmos-mongo-vcore</module>
<module>langchain4j-azure-cosmos-nosql</module>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<module>langchain4j-cassandra</module>
<module>langchain4j-chroma</module>
Adds couchbase vector storage (#1482) ## Issue <!-- Please specify the ID of the issue this PR is addressing. For Closes #1481 ## Change Adds couchbase vector storage. ## General checklist <!-- Please double-check the following points and mark them like this: [X] --> - [x] There are no breaking changes - [x] I have added unit and integration tests for my change - [x] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [x] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [x] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [x] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) - [ ] I have added/updated [Spring Boot starter(s)](https://github.com/langchain4j/langchain4j-spring) (if applicable) ## Checklist for adding new model integration <!-- Please double-check the following points and mark them like this: [X] --> - [x] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml) ## Checklist for adding new embedding store integration <!-- Please double-check the following points and mark them like this: [X] --> - [x] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT` - [x] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
2024-09-02 21:11:36 +08:00
<module>langchain4j-couchbase</module>
<module>langchain4j-elasticsearch</module>
<module>langchain4j-infinispan</module>
<module>langchain4j-milvus</module>
<module>langchain4j-mongodb-atlas</module>
<module>langchain4j-neo4j</module>
Oracle Database Embedding Store (#1490) ## Issue Closes #1091 ## Change Added an `EmbeddingStore` integration for Oracle Database. ## General checklist <!-- Please double-check the following points and mark them like this: [X] --> - [X] There are no breaking changes - [X] I have added unit and integration tests for my change - [X] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [X] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [ ] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [ ] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) - [ ] I have added/updated [Spring Boot starter(s)](https://github.com/langchain4j/langchain4j-spring) (if applicable) ## Checklist for adding new model integration <!-- Please double-check the following points and mark them like this: [X] --> - [ ] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml) ## Checklist for adding new embedding store integration <!-- Please double-check the following points and mark them like this: [X] --> - [X] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT` - [X] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml) --------- Signed-off-by: Michael McMahon <michael.a.mcmahon@oracle.com> Co-authored-by: psilberk <pablo.silberkasten@oracle.com> Co-authored-by: Pablo Silberkasten <47338417+psilberk@users.noreply.github.com> Co-authored-by: LangChain4j <langchain4j@gmail.com> Co-authored-by: Fernanda Meheust <fernanda.meheust@oracle.com> Co-authored-by: Eddú Meléndez Gonzales <eddu.melendez@gmail.com>
2024-08-27 16:38:29 +08:00
<module>langchain4j-oracle</module>
<module>langchain4j-opensearch</module>
<module>langchain4j-pgvector</module>
2023-11-19 19:59:24 +08:00
<module>langchain4j-pinecone</module>
<module>langchain4j-qdrant</module>
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162) #### Context Apache Cassandra is a popular open-source database created back in 2008. This year with [CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes) support for vector and similarity searches have been introduced. Cassandra is very fast in read and write and is used as a cache by many companies, it as an opportunity to implement the ChatMemoryStore. This feature is expected for Cassandra 5 at the end of the year but some docker images are already available. DataStax AstraDb is a distribution of Apache Cassandra available as Saas providing a free tier (free forever) of 80 millions queries/month. [Registration](https://astra.datastax.com). The vector capability is there production ready. #### Data Modelling With the proper data model in Cassandra we can perform both similarity search, keyword search, metadata search. ```sql CREATE TABLE sample_vector_table ( row_id text PRIMARY KEY, attributes_blob text, body_blob text, metadata_s map<text, text>, vector vector<float, 1536> ); ``` #### Implementation Throughts - The **configuration** to connect to Astra and Cassandra are not exactly the same so 2 different classes with associated builder are provided: [Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java) and [OSS Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java). A couple of fields are mutualized but creating a superclass to inherit from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not able to found out what to do. - Instead of passing a large number of arguments like other stores I prefer to wrap them as a bean. With this trick you can add or remove attributes, make then optional or mandatory at will. If you need to add a new attribute in the configuration you do not have to change the implementation of `XXXStore` and `XXXStoreImpl` - I create an [AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java) that could very well become the super class for any store. It handles the different call of the real concrete implementation. (_delegate pattern_). Some default implementation can be implemented ```java /** * Add a list of embeddings to the store. * * @param embeddings * list of embeddings (hold vector) * @return * list of ids */ @Override public List<String> addAll(List<Embedding> embeddings) { Objects.requireNonNull(embeddings, "embeddings must not be null"); return embeddings.stream().map(this::add).collect(Collectors.toList()); } ``` The only method to implement at the Store level is: ```java /** * Initialize the concrete implementation. * @return create implementation class for the store */ protected abstract EmbeddingStore<T> loadImplementation() throws ClassNotFoundException, NoSuchMethodException, InstantiationException, IllegalAccessException, InvocationTargetException; ``` - [CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30) proposes 2 constructors, one could override the implementation class if they want (extension point) #### Tests - Test classes are provided including some long form examples based on classed found in `langchain4j-examples` but test are disabled. - To start a local cassandra use docker and the [docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml) ``` docker compose up -d ``` - To run Test with Astra signin with your github account, create a token (api Key) with role `Organization Administrator` following this [procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure) <img width="926" alt="Screenshot 2023-09-06 at 18 14 12" src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1"> - Pick the full value of the `token` from the json <img width="713" alt="Screenshot 2023-09-06 at 18 15 53" src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997"> - Create the environment variable `ASTRA_DB_APPLICATION_TOKEN` ```console export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token> ```
2023-09-27 21:50:04 +08:00
<module>langchain4j-redis</module>
Add langchain4j-tablestore Integration: TablestoreEmbeddingStore/TablestoreChatMemoryStore (#1650) ## Change Add langchain4j-tablestore Integration: TablestoreEmbeddingStore / TablestoreChatMemoryStore ## General checklist - [x] There are no breaking changes - [x] I have added unit and integration tests for my change - [x] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [x] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [x] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [ ] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) - [ ] I have added/updated [Spring Boot starter(s)](https://github.com/langchain4j/langchain4j-spring) (if applicable) ## Checklist for adding new embedding store integration <!-- Please double-check the following points and mark them like this: [X] --> - [x] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT` - [x] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
2024-09-18 17:41:53 +08:00
<module>langchain4j-tablestore</module>
<module>langchain4j-vearch</module>
<module>langchain4j-vespa</module>
<module>langchain4j-weaviate</module>
2023-11-10 20:47:13 +08:00
<!-- document loaders -->
<module>document-loaders/langchain4j-document-loader-amazon-s3</module>
<module>document-loaders/langchain4j-document-loader-azure-storage-blob</module>
2024-01-30 14:20:20 +08:00
<module>document-loaders/langchain4j-document-loader-github</module>
Draft: Add Selenium document loader (#1166) ## Change <!-- Please describe the changes you made. --> This change adds the document loading process by integrating Selenium web automation. Unlike the existing UrlDocumentLoader, this method captures the complete version of a webpage, including all post-redirect content and dynamically loaded elements via JavaScript. This ensures that the full content is retrieved, providing a more accurate representation of the page as rendered in a browser. ## General checklist <!-- Please double-check the following points and mark them like this: [X] --> - [x] There are no breaking changes - [x] I have added unit and integration tests for my change - [x] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [ ] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [ ] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [ ] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) ## Checklist for adding new model integration <!-- Please double-check the following points and mark them like this: [X] --> - [x] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml) ## Checklist for adding new embedding store integration <!-- Please double-check the following points and mark them like this: [X] --> - [ ] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT` - [ ] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml) ## Checklist for changing existing embedding store integration <!-- Please double-check the following points and mark them like this: [X] --> - [ ] I have manually verified that the `{NameOfIntegration}EmbeddingStore` works correctly with the data persisted using the latest released version of LangChain4j
2024-05-27 17:23:07 +08:00
<module>document-loaders/langchain4j-document-loader-selenium</module>
<module>document-loaders/langchain4j-document-loader-tencent-cos</module>
<module>document-loaders/langchain4j-document-loader-google-cloud-storage</module>
<!-- document parsers -->
<module>document-parsers/langchain4j-document-parser-apache-pdfbox</module>
<module>document-parsers/langchain4j-document-parser-apache-poi</module>
POC: Easy RAG (#686) Implementing RAG applications is hard. Especially for those who are just getting started exploring LLMs and RAG. This PR introduces an "Easy RAG" feature that should help developers to get started with RAG as easy as possible. With it, there is no need to learn about chunking/splitting/segmentation, embeddings, embedding models, vector databases, retrieval techniques and other RAG-related concepts. This is similar to how one can simply upload one or multiple files into [OpenAI Assistants API](https://platform.openai.com/docs/assistants/overview) and the LLM will automagically know about their contents when answering questions. Easy RAG is using local embedding model running in your CPU (GPU support can be added later). Your files are ingested into an in-memory embedding store. Please note that "Easy RAG" will not replace manual RAG setups and especially [advanced RAG techniques](https://github.com/langchain4j/langchain4j/pull/538), but will provide an easier way to get started with RAG. The quality of an "Easy RAG" should be sufficient for demos, proof of concepts and for getting started. To use "Easy RAG", simply import `langchain4j-easy-rag` dependency that includes everything needed to do RAG: - Apache Tika document loader (to parse all document types automatically) - Quantized [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) in-process embedding model which has an impressive (for it's size) 51.68 [score](https://huggingface.co/spaces/mteb/leaderboard) for retrieval Here is the proposed API: ```java List<Document> documents = FileSystemDocumentLoader.loadDocuments(directoryPath); // one can also load documents recursively and filter with glob/regex EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>(); // we will use an in-memory embedding store for simplicity EmbeddingStoreIngestor.ingest(documents, embeddingStore); Assistant assistant = AiServices.builder(Assistant.class) .chatLanguageModel(model) .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore)) .build(); String answer = assistant.chat("Who is Charlie?"); // Charlie is a carrot... ``` `FileSystemDocumentLoader` in the above code loads documents using `DocumentParser` available in classpath via SPI, in this case an `ApacheTikaDocumentParser` imported with the `langchain4j-easy-rag` dependency. The `EmbeddingStoreIngestor` in the above code: - splits documents into smaller text segments using a `DocumentSplitter` loaded via SPI from the `langchain4j-easy-rag` dependency. Currently it uses `DocumentSplitters.recursive(300, 30, new HuggingFaceTokenizer())` - embeds text segments using an `AllMiniLmL6V2QuantizedEmbeddingModel` loaded via SPI from the `langchain4j-easy-rag` dependency - stores text segments and their embeddings into the specified embedding store When using `InMemoryEmbeddingStore`, one can serialize/persist it into a JSON string on into a file. This way one can skip loading documents and embedding them on each application run. It is easy to customize the ingestion in the above code, just change ```java EmbeddingStoreIngestor.ingest(documents, embeddingStore); ``` into ```java EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder() //.documentTransformer(...) // you can optionally transform (clean, enrich, etc) documents before splitting //.documentSplitter(...) // you can optionally specify another splitter //.textSegmentTransformer(...) // you can optionally transform (clean, enrich, etc) segments before embedding //.embeddingModel(...) // you can optionally specify another embedding model to use for embedding .embeddingStore(embeddingStore) .build(); ingestor.ingest(documents) ``` Over time, we can add an auto-eval feature that will find the most suitable hyperparametes for a given documents (e.g. which embedding model to use, which splitting method, possibly advanced RAG techniques, etc.) so that "easy RAG" can be comparable to the "advanced RAG". Related: https://github.com/langchain4j/langchain4j-embeddings/pull/16 --------- Co-authored-by: dliubars <dliubars@redhat.com>
2024-03-22 00:37:38 +08:00
<module>document-parsers/langchain4j-document-parser-apache-tika</module>
<!-- document transformers -->
<module>document-transformers/langchain4j-document-transformer-jsoup</module>
<!-- code execution engines -->
2023-12-22 20:11:57 +08:00
<module>code-execution-engines/langchain4j-code-execution-engine-graalvm-polyglot</module>
Extract Judge0 code execution engine as module (#1051) ## Issue https://github.com/langchain4j/langchain4j/issues/1048 ## Change I extract these classes as new module `langchain4j-code-execution-engine-judge0` : - `Judge0JavaScriptEngine` - `JavaScriptCodeFixer` - `Judge0JavaScriptExecutionTool` - `JavaScriptCodeFixerTest` and I moved the `com.squareup.okhttp3:okhttp` dependency from the main module to that new one. ## General checklist <!-- Please double-check the following points and mark them like this: [X] --> - [x] There are no breaking changes - [x] I have added unit and integration tests for my change - [x] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [x] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [x] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [x] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) ## Checklist for adding new model integration <!-- Please double-check the following points and mark them like this: [X] --> - [x] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml) ## Checklist for adding new embedding store integration <!-- Please double-check the following points and mark them like this: [X] --> - [x] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT` - [x] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml) ## Checklist for changing existing embedding store integration <!-- Please double-check the following points and mark them like this: [X] --> - [x] I have manually verified that the `{NameOfIntegration}EmbeddingStore` works correctly with the data persisted using the latest released version of LangChain4j
2024-05-21 14:47:25 +08:00
<module>code-execution-engines/langchain4j-code-execution-engine-judge0</module>
<!-- web search engines -->
<module>web-search-engines/langchain4j-web-search-engine-google-custom</module>
2024-05-21 22:10:59 +08:00
<module>web-search-engines/langchain4j-web-search-engine-tavily</module>
<module>web-search-engines/langchain4j-web-search-engine-searchapi</module>
2024-05-21 22:10:59 +08:00
<!-- embedding store filter parsers -->
<module>embedding-store-filter-parsers/langchain4j-embedding-store-filter-parser-sql</module>
Experimental: RAG: SQL database content retriever (#1056) ## Issue https://github.com/langchain4j/langchain4j/issues/232 ## Change An experimental `SqlDatabaseContentRetriever` has been added. Simplest usage example: ```java ContentRetriever contentRetriever = SqlDatabaseContentRetriever.builder() .dataSource(dataSource) .chatLanguageModel(openAiChatModel) .build(); ``` In this case SQL dialect and table structure will be determined from the `DataSource`. But it can be customized: ```java ContentRetriever contentRetriever = SqlDatabaseContentRetriever.builder() .dataSource(dataSource) .sqlDialect("PostgreSQL") .databaseStructure(...) .promptTemplate(...) .chatLanguageModel(openAiChatModel) .maxRetries(2) .build(); ``` See `SqlDatabaseContentRetrieverIT` for a full example. ## General checklist <!-- Please double-check the following points and mark them like this: [X] --> - [X] There are no breaking changes - [X] I have added unit and integration tests for my change - [X] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [X] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [ ] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [ ] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) ## Checklist for adding new model integration <!-- Please double-check the following points and mark them like this: [X] --> - [X] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
2024-05-21 22:49:02 +08:00
<!-- experimental -->
<module>experimental/langchain4j-experimental-sql</module>
<module>langchain4j-onnx-scoring</module>
Experimental: RAG: SQL database content retriever (#1056) ## Issue https://github.com/langchain4j/langchain4j/issues/232 ## Change An experimental `SqlDatabaseContentRetriever` has been added. Simplest usage example: ```java ContentRetriever contentRetriever = SqlDatabaseContentRetriever.builder() .dataSource(dataSource) .chatLanguageModel(openAiChatModel) .build(); ``` In this case SQL dialect and table structure will be determined from the `DataSource`. But it can be customized: ```java ContentRetriever contentRetriever = SqlDatabaseContentRetriever.builder() .dataSource(dataSource) .sqlDialect("PostgreSQL") .databaseStructure(...) .promptTemplate(...) .chatLanguageModel(openAiChatModel) .maxRetries(2) .build(); ``` See `SqlDatabaseContentRetrieverIT` for a full example. ## General checklist <!-- Please double-check the following points and mark them like this: [X] --> - [X] There are no breaking changes - [X] I have added unit and integration tests for my change - [X] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [X] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [ ] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [ ] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) ## Checklist for adding new model integration <!-- Please double-check the following points and mark them like this: [X] --> - [X] I have added my new module in the [BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
2024-05-21 22:49:02 +08:00
2023-06-24 15:07:23 +08:00
</modules>
<build>
<extensions>
<extension>
<groupId>com.vackosar.gitflowincrementalbuilder</groupId>
<artifactId>gitflow-incremental-builder</artifactId>
<version>3.15.0</version>
</extension>
</extensions>
<plugins>
2024-01-30 14:20:20 +08:00
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<configuration>
<!-- do not deploy langchain4j-aggregator's pom.xml (this file) -->
<skip>true</skip>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.5.0</version>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
<execution>
<id>aggregate</id>
<goals>
<goal>aggregate</goal>
</goals>
<phase>site</phase>
</execution>
</executions>
</plugin>
</plugins>
</build>
<reporting>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.5.0</version>
<reportSets>
<reportSet>
<id>aggregate</id>
<inherited>false</inherited>
<reports>
<report>aggregate</report>
</reports>
</reportSet>
<reportSet>
<id>default</id>
<reports>
<report>javadoc</report>
</reports>
</reportSet>
</reportSets>
</plugin>
</plugins>
</reporting>
</project>