Commit Graph

37 Commits

Author SHA1 Message Date
LangChain4j 11855157dd updated version to 0.36.0-SNAPSHOT 2024-09-25 15:23:52 +02:00
LangChain4j 79f03dff36
Release 0.35.0 (#1829) 2024-09-25 13:16:03 +02:00
LangChain4j 8c625c3caf capitalize maven module names 2024-09-24 15:33:17 +02:00
LangChain4j 21d35e4434 changed version to 0.35.0-SNAPSHOT 2024-09-09 10:11:09 +02:00
LangChain4j b0a8e6f45b
Release 0.34.0 (#1711) 2024-09-05 16:49:39 +02:00
Felipe Zambrin 9b25b59c7b
[Feature] ApachePdfBoxDocumentParser should return metadata (#1475)
## Issue
Closes #1406 

## Change
Added metadata to Document returned by ApachePdfBoxDocumentParser.

## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [X] There are no breaking changes
- [X] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [ ] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
2024-09-03 14:42:30 +02:00
PrimosK e535f0153d
re #1506 Enabling Maven (version) enforcer plugin in modules with no version conflicts (#1507)
## Issue

#1506

## Change

Enabled Maven Enforcer Plugin on modules without existing version
conflicts to ensure they remain conflict-free. The Maven Enforcer Plugin
will now cause the build to fail if new conflicts are introduced
guarding against these.

## Tests

`mvn clean test` passed
2024-08-06 15:21:25 +02:00
LangChain4j 1cccfdfa65 changed version to 0.34.0-SNAPSHOT 2024-07-26 15:12:26 +02:00
LangChain4j 822f09cb1c
Release 0.33.0 (#1514) 2024-07-25 10:12:20 +02:00
LangChain4j fe50c88e77 changed version to 0.33.0-SNAPSHOT 2024-07-08 14:47:07 +02:00
LangChain4j c2366a226c
Release 0.32.0 (#1409) 2024-07-04 12:04:29 +02:00
Alex K 62fdc16185
Fix deprecated methods (#1213)
This is small refactoring

There are bunch of places where use deprecated methods. 

These changes fix this issue.

## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] There are no breaking changes
- [ ] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [x] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
2024-06-13 15:03:34 +02:00
LangChain4j a1b733d96d bumped version to 0.32.0-SNAPSHOT 2024-05-24 16:25:13 +02:00
LangChain4j d9cb1e9b81
Release 0.31.0 (#1151) 2024-05-23 17:40:52 +02:00
Kais Neffati f34c5432ee
[BUG] Introduce parser supplier support in FileSystemDocumentLoader (#1031)
## Issue
https://github.com/langchain4j/langchain4j/issues/1026


## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [X] There are no breaking changes
- [X] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [X] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
- [X] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
2024-05-06 08:33:12 +02:00
LangChain4j 66c338c135 changed version to 0.31.0-SNAPSHOT 2024-04-29 11:21:00 +02:00
LangChain4j 1a340893ec
Release 0.30.0 (#945) 2024-04-16 18:21:01 +02:00
LangChain4j 03528af8a1
Fix #913: FileSystemDocumentLoader: ignore empty/blank documents, improved error/warn messages (#920)
## Context
When loading with `FileSystemDocumentLoader` and it encounters an empty
file, a WARN message in the logs looks like an exception and is
confusing.
See https://github.com/langchain4j/langchain4j/issues/913

## Change
- Ignore empty/blank documents when loading multiple files with
`DocumentIsBlankException`
- Changed log message to look not like an exception

## Checklist
Before submitting this PR, please check the following points:
- [X] I have added unit and integration tests for my change
- [X] All unit and integration tests in the module I have added/changed
are green
- [X] All unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules are green
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
(only when a new module is added)

## Checklist for adding new embedding store integration
- [ ] I have added a {NameOfIntegration}EmbeddingStoreIT that extends
from either EmbeddingStoreIT or EmbeddingStoreWithFilteringIT
2024-04-16 08:58:50 +02:00
LangChain4j d1d9b45adc bumped to 0.30.0-SNAPSHOT 2024-04-08 17:36:52 +02:00
LangChain4j 45b58ac993
released 0.29.1 (#857) 2024-03-28 16:42:45 +01:00
LangChain4j d1e3cc1693
Release 0.29.0 (#830) 2024-03-26 11:54:43 +01:00
LangChain4j 2f425da9f7
POC: Easy RAG (#686)
Implementing RAG applications is hard. Especially for those who are just
getting started exploring LLMs and RAG.

This PR introduces an "Easy RAG" feature that should help developers to
get started with RAG as easy as possible.

With it, there is no need to learn about
chunking/splitting/segmentation, embeddings, embedding models, vector
databases, retrieval techniques and other RAG-related concepts.

This is similar to how one can simply upload one or multiple files into
[OpenAI Assistants
API](https://platform.openai.com/docs/assistants/overview) and the LLM
will automagically know about their contents when answering questions.

Easy RAG is using local embedding model running in your CPU (GPU support
can be added later).
Your files are ingested into an in-memory embedding store.

Please note that "Easy RAG" will not replace manual RAG setups and
especially [advanced RAG
techniques](https://github.com/langchain4j/langchain4j/pull/538), but
will provide an easier way to get started with RAG.
The quality of an "Easy RAG" should be sufficient for demos, proof of
concepts and for getting started.


To use "Easy RAG", simply import `langchain4j-easy-rag` dependency that
includes everything needed to do RAG:
- Apache Tika document loader (to parse all document types
automatically)
- Quantized [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) in-process embedding model which has an impressive (for it's size) 51.68 [score](https://huggingface.co/spaces/mteb/leaderboard) for retrieval


Here is the proposed API:

```java
List<Document> documents = FileSystemDocumentLoader.loadDocuments(directoryPath); // one can also load documents recursively and filter with glob/regex

EmbeddingStore<TextSegment> embeddingStore = new InMemoryEmbeddingStore<>(); // we will use an in-memory embedding store for simplicity

EmbeddingStoreIngestor.ingest(documents, embeddingStore);

Assistant assistant = AiServices.builder(Assistant.class)
                .chatLanguageModel(model)
                .contentRetriever(EmbeddingStoreContentRetriever.from(embeddingStore))
                .build();

String answer = assistant.chat("Who is Charlie?"); // Charlie is a carrot...
```

`FileSystemDocumentLoader` in the above code loads documents using
`DocumentParser` available in classpath via SPI, in this case an
`ApacheTikaDocumentParser` imported with the `langchain4j-easy-rag`
dependency.

The `EmbeddingStoreIngestor` in the above code:
- splits documents into smaller text segments using a `DocumentSplitter`
loaded via SPI from the `langchain4j-easy-rag` dependency. Currently it
uses `DocumentSplitters.recursive(300, 30, new HuggingFaceTokenizer())`
- embeds text segments using an `AllMiniLmL6V2QuantizedEmbeddingModel`
loaded via SPI from the `langchain4j-easy-rag` dependency
- stores text segments and their embeddings into the specified embedding
store

When using `InMemoryEmbeddingStore`, one can serialize/persist it into a
JSON string on into a file.
This way one can skip loading documents and embedding them on each
application run.

It is easy to customize the ingestion in the above code, just change
```java
EmbeddingStoreIngestor.ingest(documents, embeddingStore);
```
into
```java
EmbeddingStoreIngestor ingestor = EmbeddingStoreIngestor.builder()
                //.documentTransformer(...) // you can optionally transform (clean, enrich, etc) documents before splitting
                //.documentSplitter(...) // you can optionally specify another splitter
                //.textSegmentTransformer(...) // you can optionally transform (clean, enrich, etc) segments before embedding
                //.embeddingModel(...) // you can optionally specify another embedding model to use for embedding
                .embeddingStore(embeddingStore)
                .build();

ingestor.ingest(documents)
```

Over time, we can add an auto-eval feature that will find the most
suitable hyperparametes for a given documents (e.g. which embedding
model to use, which splitting method, possibly advanced RAG techniques,
etc.) so that "easy RAG" can be comparable to the "advanced RAG".

Related:
https://github.com/langchain4j/langchain4j-embeddings/pull/16

---------

Co-authored-by: dliubars <dliubars@redhat.com>
2024-03-21 17:37:38 +01:00
LangChain4j 91db3d354a bumped to 0.29.0-SNAPSHOT 2024-03-14 13:31:28 +01:00
LangChain4j 90fe3040b9
released 0.28.0 (#735) 2024-03-11 20:08:55 +01:00
LangChain4j 197b4af9d1 bumped version to 0.28.0-SNAPSHOT 2024-02-09 15:11:52 +01:00
LangChain4j c1462c087f
release 0.27.1 (#621) 2024-02-09 15:00:42 +01:00
LangChain4j ad2fd90f32 bumped version to 0.28.0-SNAPSHOT 2024-02-09 08:12:28 +01:00
LangChain4j a22d297104
Release 0.27.0 (#615) 2024-02-09 08:00:34 +01:00
Antonio Goncalves baac759766
Beautifying Maven output (#572)
Looking at the Maven output I thought it could benefit from a little
renaming. I just changed the `<name>` in the `pom.xml`, nothing more.
The output is like this at the moment:

![Screenshot 2024-01-30 at 16 26
53](https://github.com/langchain4j/langchain4j/assets/729277/940886d1-565e-416f-a58e-91f609fc0c00)

It could look like this if this PR is merged:

![Screenshot 2024-01-30 at 16 42
38](https://github.com/langchain4j/langchain4j/assets/729277/f8787af2-b869-4e95-90bd-72bce5622737)

Just a personal taste. Let me know if you like it or not (or want to
change it). If not, just discard it, it's fine ;o)
2024-01-30 16:54:54 +01:00
LangChain4j fca8ca48f7 bump version to 0.27.0-SNAPSHOT 2024-01-30 16:18:40 +01:00
LangChain4j 3958e01738
release 0.26.1 (#570) 2024-01-30 16:11:21 +01:00
LangChain4j 469699b944 bump version to 0.27.0-SNAPSHOT 2024-01-30 08:07:45 +01:00
LangChain4j a8ad9e48d9
Automate release (#562) 2024-01-30 07:20:20 +01:00
LangChain4j 7e5e82b7b2 updated to 0.26.0-SNAPSHOT 2023-12-22 18:08:19 +01:00
LangChain4j 2a5308b794 released 0.25.0 2023-12-22 18:02:04 +01:00
LangChain4j e1dddb33a2
bumped version to 0.25.0-SNAPSHOT (#369) 2023-12-19 13:03:48 +01:00
LangChain4j 3731f3326f
Extract document loaders and parsers into separate modules (#354)
- extract PDF, POI document parsers into separate modules
- extract and simplify S3 document loader into a separate module
2023-12-18 16:32:22 +01:00