## Issue
Closes#1049
## Change
Extracted `HtmlTextExtractor` into
`langchain4j-document-transformer-jsoup` module.
Renamed `HtmlToTextDocumentTransformer` into `HtmlTextExtractor`.
Please import:
```xml
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-document-transformer-jsoup</artifactId>
<version>0.35.0</version>
</dependency>
```
## General checklist
- [ ] There are no breaking changes
- [ ] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [X] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
- [X] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Change
In addition to `application/json`, Gemini also supports `text/x.enum` as
a structured output format, which is great for classification.
This PR covers both Google Vertex AI Gemini and Google AI Gemini models.
A minor internal change to create the `GeminiService` at construction
time rather than on each request.
+ Observability (ChatLanguageModel) for Google AI Gemini
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] There are no breaking changes
- [X] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [/] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Issue
#1506
## Change
Resolved version conflict:
```
[ERROR] Rule 0: org.apache.maven.enforcer.rules.dependency.DependencyConvergence failed with message:
[ERROR] Failed while enforcing releasability.
[ERROR]
[ERROR] Dependency convergence error for org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.9.10 paths to dependency are:
[ERROR] +-dev.langchain4j:langchain4j-ollama:jar:0.34.0-SNAPSHOT
[ERROR] +-com.squareup.okhttp3:okhttp:jar:4.12.0:compile
[ERROR] +-com.squareup.okio:okio:jar:3.6.0:compile
[ERROR] +-com.squareup.okio:okio-jvm:jar:3.6.0:compile
[ERROR] +-org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.9.10:compile
[ERROR] and
[ERROR] +-dev.langchain4j:langchain4j-ollama:jar:0.34.0-SNAPSHOT
[ERROR] +-com.squareup.okhttp3:okhttp:jar:4.12.0:compile
[ERROR] +-org.jetbrains.kotlin:kotlin-stdlib-jdk8:jar:1.8.21:compile
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <args> -rf :langchain4j-ollama
```
... caused by 'okhttp' dependency and enabled Maven enforcer plugin in
the following modules:
- LangChain4j :: Integration :: Anthropic
- LangChain4j :: Integration :: ChatGLM
- LangChain4j :: Integration :: Chroma
- LangChain4j :: Integration :: CloudFlare Workers AI
- LangChain4j :: Integration :: Cohere
- LangChain4j :: Integration :: DashScope
- LangChain4j :: Integration :: Hugging Face
- LangChain4j :: Integration :: Jina
- LangChain4j :: Integration :: Judge0
- LangChain4j :: Integration :: MistralAI
- LangChain4j :: Integration :: Nomic
- LangChain4j :: Integration :: OVHcloud AI
- LangChain4j :: Integration :: Ollama
- LangChain4j :: Integration :: Qianfan
- LangChain4j :: Integration :: Vearch
- LangChain4j :: Integration :: Vespa
- LangChain4j :: Integration :: Zhipu AI
- LangChain4j :: Web Search Engine :: SearchApi
- LangChain4j :: Web Search Engine :: Tavily
## Note
Please note that [issue ](https://github.com/square/okhttp/issues/8288)
for this was already created in `httpok` repository but it will not be
fixed in 4.x. It's reportedly already tackled in version 5.x.
With that in mind I suggest we apply temporary changes proposed in this
PR. After upgrading to `httpok` 5.x we will be able to remove these.
## Tests
`mvn clean test` passed
## Issue
Closes#1629
## Change
Upgrade to Milvus SDK 2.3.9 and replace com.alibaba.fastjson with
com.google.gson
## General checklist
- [X] There are no breaking changes
- [N/A] I have added unit and integration tests for my change
- [+-] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [+-] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
* I had to skip the MilvusEmbeddingStoreCloudIT test due to a lack of
credentials. However, other Milvus tests passed.
* While the [core] tests were successful, the [main] tests failed
because I lack credentials for several cloud providers.
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for adding new embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends
from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT`
- [ ] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for changing existing embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have manually verified that the
`{NameOfIntegration}EmbeddingStore` works correctly with the data
persisted using the latest released version of LangChain4j
The current implementation uses a slow function score query which is
known to be slow because you have to iterate over all the hits to
compute the score.
Elasticsearch in the recent versions, now also allow using knn search.
This PR now allows to change the configuration:
```java
// Use the Knn implementation
ElasticsearchEmbeddingStore store = ElasticsearchEmbeddingStore.builder()
.configuration(ElasticsearchConfigurationKnn.builder().build())
.restClient(restClient)
.build();
// Note that this configuration is optional as it's the default behavior. So this code is equivalent:
ElasticsearchEmbeddingStore store = ElasticsearchEmbeddingStore.builder()
.restClient(restClient)
.build();
// Use the Scripting implementation (slower)
ElasticsearchEmbeddingStore store = ElasticsearchEmbeddingStore.builder()
.configuration(ElasticsearchConfigurationScript.builder().build())
.restClient(restClient)
.build();
```
In more details, this is what this PR is bringing:
* Add a `configuration` new option in the builder. This helps to choose
the implementation to use. It could be
`ElasticsearchConfigurationScript` (the behavior before this change) or
`ElasticsearchConfigurationKnn` for the new Knn implementation
* The Knn implementation (`ElasticsearchConfigurationKnn`) is now the
default implementation
* Removes the `dimension` parameter as Elasticsearch can guess
automatically this value based on the first document indexed. And
anyway, it's better to let the user create an index template or the
index mapping he wants.
* Modify the IT. In which case if a local instance of Elasticsearch is
yet running on https://localhost:9200, we will use it by default and
start Testcontainers only if needed.
* Make Elasticsearch container secured (which is closer to production
use case)
* We now use the `elastic.version` `pom.xml` property to launch the
right Elasticsearch TestContainer.
* Replaced the `removeAll()` implementation from a `DeleteByQuery(match
all query)` to a Delete index API call which is always the recommended
way to remove all documents. Note that this does not recreate an index.
* `ElasticsearchEmbeddingStoreRemoveIT` now extends the common removal
test class `EmbeddingStoreWithRemovalIT`. But we need to implement in
the future a `EmbeddingStoreWithRemovalIT.wait_for_ready()` method to
allow even more common tests.
* Add more logs in implementation and tests
## Issue
Closes#1581
## Change
- OpenAI: added support for [Structured
Outputs](https://openai.com/index/introducing-structured-outputs-in-the-api/):
- for tools
- for json mode
- Introduced new (still experimental) `ChatLanguageModel` API (which
supports specifying json schema)
### OpenAI Structured Outputs for tools
To enable Structured Outputs feature for tools, set `.strictTools(true)`
when buidling the model:
```java
OpenAiChatModel.builder()
...
.strictTools(true)
.build(),
```
Please note that this will automatically make all tool parameters
mandatory (`required` in json schema)
and set `additionalProperties=false` for each `object` in json schema.
This is due to the current OpenAI limitations.
### OpenAI Structured Outputs for json mode
To enable Structured Outputs feature for json mode, set
`.responseFormat("json_schema")` and `.strictJsonSchema(true)` when
buidling the model:
```java
OpenAiChatModel.builder()
...
.responseFormat("json_schema")
.strictJsonSchema(true)
.build(),
```
In this case `AiServices` will not append "You must answer strictly in
the following JSON format: ..." string to the end of the last
`UserMessage`, but will create a Json schema from the given POJO and
pass it to the LLM.
Please note that this works only when method return type is a POJO. If
the return type is something else, (like an enum or a `List<String>`),
the old behaviour is applied (with "You must answer strictly ..."). All
return types will be supported in the near future.
Please note that this feature is available now only for `gpt-4o-mini`
and `gpt-4o-2024-08-06` models.
### Experimental `ChatLanguageModel` API
This was drafted in
https://github.com/langchain4j/langchain4j/pull/1261, but now it has to
be rushed a bit in order to enable new Structured Outputs feature for
OpenAI.
A new method `ChatResponse chat(ChatRequest request)` was added into
`ChatLanguageModel` which allows to specify messages, tools and response
format (with json schema). In the future it will also support specifying
model parameters like temperature.
## Upcoming Changes
- Adopt new `ChatLanguageModel` API for Gemini
- Adopt new `ChatLanguageModel` API for Azure OpenAI (once available)
- Support Structured Outputs with all other method return types like
`List<Pojo>`
- Adopt new `JsonSchema` type for tools (instead of `ToolParameters`)
Reated changes in openai4j:
https://github.com/ai-for-java/openai4j/pull/33
## General checklist
- [X] There are no breaking changes
- [X] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [X] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Issue
Closes#1169Fixes#1418
## Change
1. Refactor `PineconeEmbeddingStore`, update `pinecone-client` version
to latest.
2. Support storing metadata
3. Support embedding removal method (not include `removeAll(Filter filter)` because I
don't find a way to convert `Filter` to
`com.google.protobuf.StructStruct` :(, but I will try my best to work on
it, and will create a new PR if I complete it.)
## General checklist
- [ ] There are no breaking changes
- [x] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [x] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Checklist for changing existing embedding store integration
- [x] I have manually verified that the
`{NameOfIntegration}EmbeddingStore` works correctly with the data
persisted using the latest released version of LangChain4j
## Issue
<!-- Please paste the link to the issue this PR is addressing. For
example: https://github.com/langchain4j/langchain4j/issues/1012 -->
#1168
## Change
<!-- Please describe the changes you made. -->
Add the `MilvusEmbeddingStore` method to implement `remove(String id)`,
`removeAll(Collection<String> ids)`, `removeAll(Filter filter)`,
`removeAll()`.
Add unit test `MilvusEmbeddingStoreRemoveIT`
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] There are no breaking changes
- [x] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [ ] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
## Checklist for changing existing embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have manually verified that the
`{NameOfIntegration}EmbeddingStore` works correctly with the data
persisted using the latest released version of LangChain4j
## Issue
https://github.com/langchain4j/langchain4j/issues/232
## Change
An experimental `SqlDatabaseContentRetriever` has been added.
Simplest usage example:
```java
ContentRetriever contentRetriever = SqlDatabaseContentRetriever.builder()
.dataSource(dataSource)
.chatLanguageModel(openAiChatModel)
.build();
```
In this case SQL dialect and table structure will be determined from the
`DataSource`.
But it can be customized:
```java
ContentRetriever contentRetriever = SqlDatabaseContentRetriever.builder()
.dataSource(dataSource)
.sqlDialect("PostgreSQL")
.databaseStructure(...)
.promptTemplate(...)
.chatLanguageModel(openAiChatModel)
.maxRetries(2)
.build();
```
See `SqlDatabaseContentRetrieverIT` for a full example.
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [X] There are no breaking changes
- [X] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [X] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [X] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)