## Issue
Closes#1049
## Change
Extracted `HtmlTextExtractor` into
`langchain4j-document-transformer-jsoup` module.
Renamed `HtmlToTextDocumentTransformer` into `HtmlTextExtractor`.
Please import:
```xml
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-document-transformer-jsoup</artifactId>
<version>0.35.0</version>
</dependency>
```
## General checklist
- [ ] There are no breaking changes
- [ ] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [X] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
- [X] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Issue
<!-- Please specify the ID of the issue this PR is addressing. For
Closes#1481
## Change
Adds couchbase vector storage.
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] There are no breaking changes
- [x] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [x] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [x] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [x] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for adding new embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends
from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT`
- [x] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Issue
Closes#1091
## Change
Added an `EmbeddingStore` integration for Oracle Database.
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [X] There are no breaking changes
- [X] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [X] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for adding new embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [X] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends
from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT`
- [X] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
---------
Signed-off-by: Michael McMahon <michael.a.mcmahon@oracle.com>
Co-authored-by: psilberk <pablo.silberkasten@oracle.com>
Co-authored-by: Pablo Silberkasten <47338417+psilberk@users.noreply.github.com>
Co-authored-by: LangChain4j <langchain4j@gmail.com>
Co-authored-by: Fernanda Meheust <fernanda.meheust@oracle.com>
Co-authored-by: Eddú Meléndez Gonzales <eddu.melendez@gmail.com>
## Issue
Closes#1132
## Change
Added SearchApi as a WebSearchEngine that also can be used as a tool.
Currently using Google Search as default engine. It also allows for new
engines to be implemented using the SearchApiRequestResponseHandler
interface, and adding it to the SearchApiEngine enum so the user can
choose which one to use.
## General checklist
- [X] There are no breaking changes
- [X] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [ ] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
- [X] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [X] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
* The example is in the docs, I will open a new PR to the examples repo
if it is ok
@algora-pbc /claim #1132
## Change
<!-- Please describe the changes you made. -->
This change adds the document loading process by integrating Selenium
web automation. Unlike the existing UrlDocumentLoader, this method
captures the complete version of a webpage, including all post-redirect
content and dynamically loaded elements via JavaScript. This ensures
that the full content is retrieved, providing a more accurate
representation of the page as rendered in a browser.
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] There are no breaking changes
- [x] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [ ] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for adding new embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends
from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT`
- [ ] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for changing existing embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have manually verified that the
`{NameOfIntegration}EmbeddingStore` works correctly with the data
persisted using the latest released version of LangChain4j
## Issue
This PR add supports for Azure Cosmos DB for NoSQL embedding store.
## Change
- This PR adds an embedding store for Azure Cosmos DB for NoSql. The
test cases and IT test case is also included.
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] There are no breaking changes
- [x] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [ ] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for adding new embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends
from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT`
- [x] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for changing existing embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have manually verified that the
`{NameOfIntegration}EmbeddingStore` works correctly with the data
persisted using the latest released version of LangChain4j
## Issue
https://github.com/langchain4j/langchain4j/issues/232
## Change
An experimental `SqlDatabaseContentRetriever` has been added.
Simplest usage example:
```java
ContentRetriever contentRetriever = SqlDatabaseContentRetriever.builder()
.dataSource(dataSource)
.chatLanguageModel(openAiChatModel)
.build();
```
In this case SQL dialect and table structure will be determined from the
`DataSource`.
But it can be customized:
```java
ContentRetriever contentRetriever = SqlDatabaseContentRetriever.builder()
.dataSource(dataSource)
.sqlDialect("PostgreSQL")
.databaseStructure(...)
.promptTemplate(...)
.chatLanguageModel(openAiChatModel)
.maxRetries(2)
.build();
```
See `SqlDatabaseContentRetrieverIT` for a full example.
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [X] There are no breaking changes
- [X] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [X] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [X] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
I've adding two missing artefacts in the BOM (azure-ai-search and
azure-cosmos-mongo-vcore). And to make it easier to spot, I've sorted
the `pom.xml` files by artifact id
## Issue
https://github.com/langchain4j/langchain4j/issues/1048
## Change
I extract these classes as new module
`langchain4j-code-execution-engine-judge0` :
- `Judge0JavaScriptEngine`
- `JavaScriptCodeFixer`
- `Judge0JavaScriptExecutionTool`
- `JavaScriptCodeFixerTest`
and I moved the `com.squareup.okhttp3:okhttp` dependency from the main
module to that new one.
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] There are no breaking changes
- [x] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [x] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [x] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [x] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for adding new embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends
from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT`
- [x] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for changing existing embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have manually verified that the
`{NameOfIntegration}EmbeddingStore` works correctly with the data
persisted using the latest released version of LangChain4j
Implementing RAG applications is hard. Especially for those who are just
getting started exploring LLMs and RAG.
This PR introduces an "Easy RAG" feature that should help developers to
get started with RAG as easy as possible.
With it, there is no need to learn about
chunking/splitting/segmentation, embeddings, embedding models, vector
databases, retrieval techniques and other RAG-related concepts.
This is similar to how one can simply upload one or multiple files into
[OpenAI Assistants
API](https://platform.openai.com/docs/assistants/overview) and the LLM
will automagically know about their contents when answering questions.
Easy RAG is using local embedding model running in your CPU (GPU support
can be added later).
Your files are ingested into an in-memory embedding store.
Please note that "Easy RAG" will not replace manual RAG setups and
especially [advanced RAG
techniques](https://github.com/langchain4j/langchain4j/pull/538), but
will provide an easier way to get started with RAG.
The quality of an "Easy RAG" should be sufficient for demos, proof of
concepts and for getting started.
To use "Easy RAG", simply import `langchain4j-easy-rag` dependency that
includes everything needed to do RAG:
- Apache Tika document loader (to parse all document types
automatically)
- Quantized [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) in-process embedding model which has an impressive (for it's size) 51.68 [score](https://huggingface.co/spaces/mteb/leaderboard) for retrieval
Here is the proposed API:
```java
List<Document> documents = FileSystemDocumentLoader.loadDocuments(directoryPath); // one can also load documents recursively and filter with glob/regex
EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>(); // we will use an in-memory embedding store for simplicity
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
Assistant assistant = AiServices.builder(Assistant.class)
.chatLanguageModel(model)
.contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
.build();
String answer = assistant.chat("Who is Charlie?"); // Charlie is a carrot...
```
`FileSystemDocumentLoader` in the above code loads documents using
`DocumentParser` available in classpath via SPI, in this case an
`ApacheTikaDocumentParser` imported with the `langchain4j-easy-rag`
dependency.
The `EmbeddingStoreIngestor` in the above code:
- splits documents into smaller text segments using a `DocumentSplitter`
loaded via SPI from the `langchain4j-easy-rag` dependency. Currently it
uses `DocumentSplitters.recursive(300, 30, new HuggingFaceTokenizer())`
- embeds text segments using an `AllMiniLmL6V2QuantizedEmbeddingModel`
loaded via SPI from the `langchain4j-easy-rag` dependency
- stores text segments and their embeddings into the specified embedding
store
When using `InMemoryEmbeddingStore`, one can serialize/persist it into a
JSON string on into a file.
This way one can skip loading documents and embedding them on each
application run.
It is easy to customize the ingestion in the above code, just change
```java
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
```
into
```java
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
//.documentTransformer(...) // you can optionally transform (clean, enrich, etc) documents before splitting
//.documentSplitter(...) // you can optionally specify another splitter
//.textSegmentTransformer(...) // you can optionally transform (clean, enrich, etc) segments before embedding
//.embeddingModel(...) // you can optionally specify another embedding model to use for embedding
.embeddingStore(embeddingStore)
.build();
ingestor.ingest(documents)
```
Over time, we can add an auto-eval feature that will find the most
suitable hyperparametes for a given documents (e.g. which embedding
model to use, which splitting method, possibly advanced RAG techniques,
etc.) so that "easy RAG" can be comparable to the "advanced RAG".
Related:
https://github.com/langchain4j/langchain4j-embeddings/pull/16
---------
Co-authored-by: dliubars <dliubars@redhat.com>
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced classes and interfaces to facilitate chat interactions with
the Anthropic API, enabling chat completion functionalities.
- Developed a client class for seamless interaction with the AnthropicAI
API, including authentication and request handling.
- Implemented utility methods for message conversion and managing token
usage in chat interactions.
- Defined an enum to distinguish between user and assistant roles in
chat scenarios.
- Added logging interceptors for HTTP requests and responses to enhance
debugging capabilities.
- Created a model class for generating AI responses from chat messages
using the Anthropic API.
- Added a request model class for creating messages in the Anthropic
system.
- Introduced a class for representing image content with type and source
details.
- Included integration tests for the `AnthropicChatModel` class covering
various functionalities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
ZhipuAI is a large model focusing on Chinese cognition
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
- **New Features**
- Introduced serialization and deserialization for assistant messages.
- Enhanced utility methods for data conversion in Zhipu AI processing.
- Implemented JSON serialization/deserialization support.
- Defined an interface for Zhipu AI service interactions.
- Introduced classes for handling chat completions and embedding
requests with Zhipu AI.
- Provided structure for chat messages, choices, models, requests, and
responses.
- Added classes for function calls, parameters, tool interactions, and
web searches within chats.
- Established data structures for embedding information and requests.
- Implemented builders for chat and embedding model instances.
- **Tests**
- Added integration tests for chat model, embedding model, and streaming
chat model functionalities.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
## New EmbeddingStore (metadata) `Filter` API
Many embedding stores, such as
[Pinecone](https://docs.pinecone.io/docs/metadata-filtering) and
[Milvus](https://milvus.io/docs/boolean.md) support strict filtering
(think of an SQL "WHERE" clause) during similarity search.
So, if one has an embedding store with movies, for example, one could
search not only for the most semantically similar movies to the given
user query but also apply strict filtering by metadata fields like year,
genre, rating, etc. In this case, the similarity search will be
performed only on those movies that match the filter expression.
Since LangChain4j supports (and abstracts away) many embedding stores,
there needs to be an embedding-store-agnostic way for users to define
the filter expression.
This PR introduces a `Filter` interface, which can represent both simple
(e.g., `type = "documentation"`) and composite (e.g., `type in
("documentation", "tutorial") AND year > 2020`) filter expressions in an
embedding-store-agnostic manner.
`Filter` currently supports the following operations:
- Comparison:
- `IsEqualTo`
- `IsNotEqualTo`
- `IsGreaterThan`
- `IsGreaterThanOrEqualTo`
- `IsLessThan`
- `IsLessThanOrEqualTo`
- `IsIn`
- `IsNotIn`
- Logical:
- `And`
- `Not`
- `Or`
These operations are supported by most embedding stores and serve as a
good starting point. However, the list of operations will expand over
time to include other operations (e.g., `Contains`) supported by
embedding stores.
Currently, the DSL looks like this:
```java
Filter onlyDocs = metadataKey("type").isEqualTo("documentation");
Filter docsAndTutorialsAfter2020 = metadataKey("type").isIn("documentation", "tutorial").and(metadataKey("year").isGreaterThan(2020));
// or
Filter docsAndTutorialsAfter2020 = and(
metadataKey("type").isIn("documentation", "tutorial"),
metadataKey("year").isGreaterThan(2020)
);
```
## Filter expression as a `String`
Filter expression can also be specified as a `String`. This might be
necessary, for example, if the filter expression is generated
dynamically by the application or by the LLM (as in [self
querying](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)).
This PR introduces a `FilterParser` interface with a simple `Filter
parse(String)` API, allowing for future support of multiple syntaxes (if
this will be required).
For the out-of-the-box filter syntax, ANSI SQL's `WHERE` clause is
proposed as a suitable candidate for several reasons:
- SQL is well-known among Java developers
- There is extensive tooling available for SQL (e.g., parsers)
- LLMs are pretty good at generating valid SQL, as there are tons of SQL
queries on the internet, which are included in the LLM training
datasets. There are also specialized LLMs that are trained for
text-to-SQL task, such as [SQLCoder](https://huggingface.co/defog).
The downside is that SQL's `WHERE` clause might not support all
operations and data types that could be supported in the future by
various embedding stores. In such case, we could extend it to a superset
of ANSI SQL `WHERE` syntax and/or provide an option to express filters
in the native syntax of the store.
An out-of-the-box implementation of the SQL `FilterParser` is provided
as a `SqlFilterParser` in a separate module
`langchain4j-embedding-store-filter-parser-sql`, using
[JSqlParser](https://github.com/JSQLParser/JSqlParser) under the hood.
`SqlFilterParser` can parse SQL "SELECT" (or just "WHERE" clause)
statement into a `Filter` object:
- `SELECT * FROM fake_table WHERE userId = '123-456'` ->
`metadataKey("userId").isEqualTo("123-456")`
- `userId = '123-456'` -> `metadataKey("userId").isEqualTo("123-456")`
It can also resolve `CURDATE()` and
`CURRENT_DATE`/`CURRENT_TIME`/`CURRENT_TIMESTAMP`:
`SELECT * FROM fake_table WHERE year = EXTRACT(YEAR FROM CURRENT_DATE`
-> `metadataKey("year").isEqualTo(LocalDate.now().getYear())`
## Changes in `Metadata` API
Until now, `Metadata` supported only `String` values. This PR expands
the list of supported value types to `Integer`, `Long`, `Float` and
`Double`. In the future, more types may be added (if needed).
The method `String get(String key)` will be deprecated later in favor
of:
- `String getString(String key)`
- `Integer getInteger(String key)`
- `Long getLong(String key)`
- etc
New overloaded `put(key, value)` methods are introduced to support more
value types:
- `put(String key, int value)`
- `put(String key, long value)`
- etc
## Changes in `EmbeddingStore` API
New method `search` is added that will become the main entry point for
search in the future. All `findRelevant` methods will be deprecated
later.
New `search` method accepts `EmbeddingSearchRequest` and returns
`EmbeddingSearchResult`.
`EmbeddingSearchRequest` contains all search criteria (e.g.
`maxResults`, `minScore`), including new `Filter`.
`EmbeddingSearchResult` contains a list of `EmbeddingMatch`.
```java
EmbeddingSearchResult search(EmbeddingSearchRequest request);
```
## Changes in `EmbeddingStoreContentRetriever` API
`EmbeddingStoreContentRetriever` can now be configured with a static
`filter` as well as dynamic `dynamicMaxResults`, `dynamicMinScore` and
`dynamicFilter` in the builder:
```java
ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
...
.maxResults(3)
// or
.dynamicMaxResults(query -> 3) // You can define maxResults dynamically. The value could, for example, depend on the query or the user associated with the query.
...
.minScore(0.3)
// or
.dynamicMinScore(query -> 0.3)
...
.filter(metadataKey("userId").isEqualTo("123-456")) // Assuming your TextSegments contain Metadata with key "userId"
// or
.dynamicFilter(query -> metadataKey("userId").isEqualTo(query.metadata().chatMemoryId().toString()))
...
.build();
```
So now you can define `maxResults`, `minScore` and `filter` both
statically and dynamically (they can depend on the query, user, etc.).
These values will be propagated to the underlying `EmbeddingStore`.
##
["Self-querying"](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)
This PR also introduces `LanguageModelSqlFilterBuilder` in
`langchain4j-embedding-store-filter-parser-sql` module which can be used
with `EmbeddingStoreContentRetriever`'s `dynamicFilter` to automatically
build a `Filter` object from the `Query` using language model and
`SqlFilterParser`.
For example:
```java
TextSegment groundhogDay = TextSegment.from("Groundhog Day", new Metadata().put("genre", "comedy").put("year", 1993));
TextSegment forrestGump = TextSegment.from("Forrest Gump", new Metadata().put("genre", "drama").put("year", 1994));
TextSegment dieHard = TextSegment.from("Die Hard", new Metadata().put("genre", "action").put("year", 1998));
// describe metadata keys as if they were columns in the SQL table
TableDefinition tableDefinition = TableDefinition.builder()
.name("movies")
.addColumn("genre", "VARCHAR", "one of [comedy, drama, action]")
.addColumn("year", "INT")
.build();
LanguageModelSqlFilterBuilder sqlFilterBuilder = new LanguageModelSqlFilterBuilder(model, tableDefinition);
ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder()
.embeddingStore(embeddingStore)
.embeddingModel(embeddingModel)
.dynamicFilter(sqlFilterBuilder::build)
.build();
String answer = assistant.answer("Recommend me a good drama from 90s"); // Forrest Gump
```
## Which embedding store integrations will support `Filter`?
In the long run, all (provided the embedding store itself supports it).
In the first iteration, I aim to add support to just a few:
- `InMemoryEmbeddingStore`
- Elasticsearch
- Milvus
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
## Summary by CodeRabbit
- **New Features**
- Introduced filters for checking key's value existence in a collection
for improved data handling.
- **Enhancements**
- Updated `InMemoryEmbeddingStoreTest` to extend a different class for
improved testing coverage and added a new test method.
- **Refactor**
- Made minor formatting adjustments in the assertion block for better
readability.
- **Documentation**
- Updated class hierarchy information for clarity.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->