langchain4j

Commit Graph

Author	SHA1	Message	Date
LangChain4j	21d35e4434	changed version to 0.35.0-SNAPSHOT	2024-09-09 10:11:09 +02:00
LangChain4j	b0a8e6f45b	Release 0.34.0 (#1711 )	2024-09-05 16:49:39 +02:00
LangChain4j	20a30eb253	fixed failing tests	2024-08-23 15:38:06 +02:00
LangChain4j	3e6d50ee40	EmbeddingStoreIT: use awaitility (#1610 ) ## Change Use awaitility in `EmbeddingStoreIT` ## General checklist - [X] There are no breaking changes - [X] I have added unit and integration tests for my change - [x] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [x] I have manually run all the unit and integration tests in the [core]	2024-08-22 16:17:53 +02:00
PrimosK	e535f0153d	re #1506 Enabling Maven (version) enforcer plugin in modules with no version conflicts (#1507 ) ## Issue #1506 ## Change Enabled Maven Enforcer Plugin on modules without existing version conflicts to ensure they remain conflict-free. The Maven Enforcer Plugin will now cause the build to fail if new conflicts are introduced guarding against these. ## Tests `mvn clean test` passed	2024-08-06 15:21:25 +02:00
Michael Simons	586307a036	fix: Use `NODE` type to check for the values type. (#1539 ) Fixes #1537.	2024-08-05 17:43:07 +02:00
LangChain4j	1cccfdfa65	changed version to 0.34.0-SNAPSHOT	2024-07-26 15:12:26 +02:00
LangChain4j	209f2825ea	Neo4jEmbeddingStoreTest -> Neo4jEmbeddingStoreIT	2024-07-25 15:42:03 +02:00
LangChain4j	822f09cb1c	Release 0.33.0 (#1514 )	2024-07-25 10:12:20 +02:00
LangChain4j	8537e897ba	Fix split packages (#1433 ) ## Issue Closes #1066 ## Change These are changes for each split package (each change was done in a separate commit, so they can be reviewed in isolation): - `dev.langchain4j.retriever` -> Moved `EmbeddingStoreRetriever` into `langchain4j-core` module - `dev.langchain4j.agent.tool` -> Moved `DefaultToolExecutor` and `ToolExecutor` into `dev.langchain4j.service.tool` package - `dev.langchain4j.classification` -> Moved `TextClassifier` into `langchian4j` module - `dev.langchain4j.chain` -> Moved `Chain` into `langchain4j` module - `dev.langchain4j.model.embedding` -> [All in-process embedding models should have unique package name](https://github.com/langchain4j/langchain4j-embeddings/pull/33) - `dev.langchain4j.model.output` -> Moved `OutputParser` and all it's implementations into `dev.langchain4j.service.output` package of the `langchain4j` module More details can be found [here](https://docs.google.com/spreadsheets/d/1U7f2MIfDgWA1tydPpzWpOGTHiBjBVZjsu0uZnXBT9qE/edit?usp=sharing). ## Breaking Changes - All in-process ONNX model classes moved into their own unique packages: - `AllMiniLmL6V2EmbeddingModel` moved into `dev.langchain4j.model.embedding.onnx.allminilml6v2` - `AllMiniLmL6V2QuantizedEmbeddingModel` moved into `dev.langchain4j.model.embedding.onnx.allminilml6v2q` - `OnnxEmbeddingModel` moved into `dev.langchain4j.model.embedding.onnx` package - etc - `ToolExecutor` and `DefaultToolExecutor` moved into `dev.langchain4j.service.tool` package - Moved `OutputParser` and all it's implementations into `dev.langchain4j.service.output` package of the `langchain4j` module - Moved `Chain` into `langchain4j` module - Moved `TextClassifier` into `langchian4j` module ## General checklist - [ ] There are no breaking changes - [ ] I have added unit and integration tests for my change - [X] I have manually run all the unit and integration tests in the module I have added/changed, and they are all green - [X] I have manually run all the unit and integration tests in the [core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core) and [main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j) modules, and they are all green <!-- Before adding documentation and example(s) (below), please wait until the PR is reviewed and approved. --> - [ ] I have added/updated the [documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs) - [ ] I have added an example in the [examples repo](https://github.com/langchain4j/langchain4j-examples) (only for "big" features) - [ ] I have added/updated [Spring Boot starter(s)](https://github.com/langchain4j/langchain4j-spring) (if applicable)	2024-07-19 12:59:59 +02:00
Eddú Meléndez Gonzales	d1beab5fba	Update testcontainers version to 1.20.0 (#1488 ) It contains an enhancement for Weaviate module.	2024-07-19 10:02:54 +02:00
LangChain4j	fe50c88e77	changed version to 0.33.0-SNAPSHOT	2024-07-08 14:47:07 +02:00
LangChain4j	c2366a226c	Release 0.32.0 (#1409 )	2024-07-04 12:04:29 +02:00
LangChain4j	a1b733d96d	bumped version to 0.32.0-SNAPSHOT	2024-05-24 16:25:13 +02:00
LangChain4j	d9cb1e9b81	Release 0.31.0 (#1151 )	2024-05-23 17:40:52 +02:00
LangChain4j	66c338c135	changed version to 0.31.0-SNAPSHOT	2024-04-29 11:21:00 +02:00
LangChain4j	1a340893ec	Release 0.30.0 (#945 )	2024-04-16 18:21:01 +02:00
LangChain4j	d1d9b45adc	bumped to 0.30.0-SNAPSHOT	2024-04-08 17:36:52 +02:00
LangChain4j	45b58ac993	released 0.29.1 (#857 )	2024-03-28 16:42:45 +01:00
LangChain4j	d1e3cc1693	Release 0.29.0 (#830 )	2024-03-26 11:54:43 +01:00
Siben Nayak	86a5908f8a	Add support for Neo4J Graph and ContentRetriever (#741 ) For Issue #685 <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Introduced a `Neo4jGraph` class for enhanced interaction with Neo4j databases, including read/write operations and schema management. - Added a `Neo4jContentRetriever` for generating and executing Cypher queries from user questions, improving content retrieval from Neo4j databases. - Tests - Implemented tests for Neo4j database interactions and content retrieval functionalities, ensuring reliability and performance. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2024-03-25 08:43:35 +01:00
LangChain4j	91db3d354a	bumped to 0.29.0-SNAPSHOT	2024-03-14 13:31:28 +01:00
LangChain4j	90fe3040b9	released 0.28.0 (#735 )	2024-03-11 20:08:55 +01:00
LangChain4j	1acb7a607f	EmbeddingStore (Metadata) Filter API (#610 ) ## New EmbeddingStore (metadata) `Filter` API Many embedding stores, such as [Pinecone](https://docs.pinecone.io/docs/metadata-filtering) and [Milvus](https://milvus.io/docs/boolean.md) support strict filtering (think of an SQL "WHERE" clause) during similarity search. So, if one has an embedding store with movies, for example, one could search not only for the most semantically similar movies to the given user query but also apply strict filtering by metadata fields like year, genre, rating, etc. In this case, the similarity search will be performed only on those movies that match the filter expression. Since LangChain4j supports (and abstracts away) many embedding stores, there needs to be an embedding-store-agnostic way for users to define the filter expression. This PR introduces a `Filter` interface, which can represent both simple (e.g., `type = "documentation"`) and composite (e.g., `type in ("documentation", "tutorial") AND year > 2020`) filter expressions in an embedding-store-agnostic manner. `Filter` currently supports the following operations: - Comparison: - `IsEqualTo` - `IsNotEqualTo` - `IsGreaterThan` - `IsGreaterThanOrEqualTo` - `IsLessThan` - `IsLessThanOrEqualTo` - `IsIn` - `IsNotIn` - Logical: - `And` - `Not` - `Or` These operations are supported by most embedding stores and serve as a good starting point. However, the list of operations will expand over time to include other operations (e.g., `Contains`) supported by embedding stores. Currently, the DSL looks like this: ```java Filter onlyDocs = metadataKey("type").isEqualTo("documentation"); Filter docsAndTutorialsAfter2020 = metadataKey("type").isIn("documentation", "tutorial").and(metadataKey("year").isGreaterThan(2020)); // or Filter docsAndTutorialsAfter2020 = and( metadataKey("type").isIn("documentation", "tutorial"), metadataKey("year").isGreaterThan(2020) ); ``` ## Filter expression as a `String` Filter expression can also be specified as a `String`. This might be necessary, for example, if the filter expression is generated dynamically by the application or by the LLM (as in [self querying](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/)). This PR introduces a `FilterParser` interface with a simple `Filter parse(String)` API, allowing for future support of multiple syntaxes (if this will be required). For the out-of-the-box filter syntax, ANSI SQL's `WHERE` clause is proposed as a suitable candidate for several reasons: - SQL is well-known among Java developers - There is extensive tooling available for SQL (e.g., parsers) - LLMs are pretty good at generating valid SQL, as there are tons of SQL queries on the internet, which are included in the LLM training datasets. There are also specialized LLMs that are trained for text-to-SQL task, such as [SQLCoder](https://huggingface.co/defog). The downside is that SQL's `WHERE` clause might not support all operations and data types that could be supported in the future by various embedding stores. In such case, we could extend it to a superset of ANSI SQL `WHERE` syntax and/or provide an option to express filters in the native syntax of the store. An out-of-the-box implementation of the SQL `FilterParser` is provided as a `SqlFilterParser` in a separate module `langchain4j-embedding-store-filter-parser-sql`, using [JSqlParser](https://github.com/JSQLParser/JSqlParser) under the hood. `SqlFilterParser` can parse SQL "SELECT" (or just "WHERE" clause) statement into a `Filter` object: - `SELECT * FROM fake_table WHERE userId = '123-456'` -> `metadataKey("userId").isEqualTo("123-456")` - `userId = '123-456'` -> `metadataKey("userId").isEqualTo("123-456")` It can also resolve `CURDATE()` and `CURRENT_DATE`/`CURRENT_TIME`/`CURRENT_TIMESTAMP`: `SELECT * FROM fake_table WHERE year = EXTRACT(YEAR FROM CURRENT_DATE` -> `metadataKey("year").isEqualTo(LocalDate.now().getYear())` ## Changes in `Metadata` API Until now, `Metadata` supported only `String` values. This PR expands the list of supported value types to `Integer`, `Long`, `Float` and `Double`. In the future, more types may be added (if needed). The method `String get(String key)` will be deprecated later in favor of: - `String getString(String key)` - `Integer getInteger(String key)` - `Long getLong(String key)` - etc New overloaded `put(key, value)` methods are introduced to support more value types: - `put(String key, int value)` - `put(String key, long value)` - etc ## Changes in `EmbeddingStore` API New method `search` is added that will become the main entry point for search in the future. All `findRelevant` methods will be deprecated later. New `search` method accepts `EmbeddingSearchRequest` and returns `EmbeddingSearchResult`. `EmbeddingSearchRequest` contains all search criteria (e.g. `maxResults`, `minScore`), including new `Filter`. `EmbeddingSearchResult` contains a list of `EmbeddingMatch`. ```java EmbeddingSearchResult search(EmbeddingSearchRequest request); ``` ## Changes in `EmbeddingStoreContentRetriever` API `EmbeddingStoreContentRetriever` can now be configured with a static `filter` as well as dynamic `dynamicMaxResults`, `dynamicMinScore` and `dynamicFilter` in the builder: ```java ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) ... .maxResults(3) // or .dynamicMaxResults(query -> 3) // You can define maxResults dynamically. The value could, for example, depend on the query or the user associated with the query. ... .minScore(0.3) // or .dynamicMinScore(query -> 0.3) ... .filter(metadataKey("userId").isEqualTo("123-456")) // Assuming your TextSegments contain Metadata with key "userId" // or .dynamicFilter(query -> metadataKey("userId").isEqualTo(query.metadata().chatMemoryId().toString())) ... .build(); ``` So now you can define `maxResults`, `minScore` and `filter` both statically and dynamically (they can depend on the query, user, etc.). These values will be propagated to the underlying `EmbeddingStore`. ## ["Self-querying"](https://python.langchain.com/docs/modules/data_connection/retrievers/self_query/) This PR also introduces `LanguageModelSqlFilterBuilder` in `langchain4j-embedding-store-filter-parser-sql` module which can be used with `EmbeddingStoreContentRetriever`'s `dynamicFilter` to automatically build a `Filter` object from the `Query` using language model and `SqlFilterParser`. For example: ```java TextSegment groundhogDay = TextSegment.from("Groundhog Day", new Metadata().put("genre", "comedy").put("year", 1993)); TextSegment forrestGump = TextSegment.from("Forrest Gump", new Metadata().put("genre", "drama").put("year", 1994)); TextSegment dieHard = TextSegment.from("Die Hard", new Metadata().put("genre", "action").put("year", 1998)); // describe metadata keys as if they were columns in the SQL table TableDefinition tableDefinition = TableDefinition.builder() .name("movies") .addColumn("genre", "VARCHAR", "one of [comedy, drama, action]") .addColumn("year", "INT") .build(); LanguageModelSqlFilterBuilder sqlFilterBuilder = new LanguageModelSqlFilterBuilder(model, tableDefinition); ContentRetriever contentRetriever = EmbeddingStoreContentRetriever.builder() .embeddingStore(embeddingStore) .embeddingModel(embeddingModel) .dynamicFilter(sqlFilterBuilder::build) .build(); String answer = assistant.answer("Recommend me a good drama from 90s"); // Forrest Gump ``` ## Which embedding store integrations will support `Filter`? In the long run, all (provided the embedding store itself supports it). In the first iteration, I aim to add support to just a few: - `InMemoryEmbeddingStore` - Elasticsearch - Milvus <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Summary by CodeRabbit - New Features - Introduced filters for checking key's value existence in a collection for improved data handling. - Enhancements - Updated `InMemoryEmbeddingStoreTest` to extend a different class for improved testing coverage and added a new test method. - Refactor - Made minor formatting adjustments in the assertion block for better readability. - Documentation - Updated class hierarchy information for clarity. <!-- end of auto-generated comment: release notes by coderabbit.ai -->	2024-03-08 17:06:58 +01:00
Tim te Beek	5f522e51d6	refactor: AssertJ best practices (#622 ) Hi! Noticed _almost all_ tests used AssertJ, but in some cases JUnit was still used. In addition to that some tests don't use the most expressive assertions. Figured clean that up such that you get better assertions if any tests were to fail. Compare for instance ```diff - assertThat(document.metadata().asMap().size()).isEqualTo(4); + assertThat(document.metadata().asMap()).hasSize(4); ``` The first one will print expected 5 to be equal to 4, whereas the second one shows the contents of the map involved. Being consistent with your test library also stops bad patterns from repeating accidentally through copy-and-paste. If you want to enforce these best practices through an automated pull request check that's also an option. Let me know if you'd want that as well. Hope that helps!	2024-03-05 18:33:22 +01:00
Hervé Boutemy	677d3e091e	use maven.compiler.release instead of source+target (#617 ) with such a setting, you can safely build only once the whole project with JDK 17 or even 21 without fearing any wrong API being injected in .class files	2024-03-05 16:50:16 +01:00
LangChain4j	197b4af9d1	bumped version to 0.28.0-SNAPSHOT	2024-02-09 15:11:52 +01:00
LangChain4j	c1462c087f	release 0.27.1 (#621 )	2024-02-09 15:00:42 +01:00
LangChain4j	ad2fd90f32	bumped version to 0.28.0-SNAPSHOT	2024-02-09 08:12:28 +01:00
LangChain4j	a22d297104	Release 0.27.0 (#615 )	2024-02-09 08:00:34 +01:00
Antonio Goncalves	baac759766	Beautifying Maven output (#572 ) Looking at the Maven output I thought it could benefit from a little renaming. I just changed the `<name>` in the `pom.xml`, nothing more. The output is like this at the moment: ![Screenshot 2024-01-30 at 16 26 53](https://github.com/langchain4j/langchain4j/assets/729277/940886d1-565e-416f-a58e-91f609fc0c00) It could look like this if this PR is merged: ![Screenshot 2024-01-30 at 16 42 38](https://github.com/langchain4j/langchain4j/assets/729277/f8787af2-b869-4e95-90bd-72bce5622737) Just a personal taste. Let me know if you like it or not (or want to change it). If not, just discard it, it's fine ;o)	2024-01-30 16:54:54 +01:00
LangChain4j	fca8ca48f7	bump version to 0.27.0-SNAPSHOT	2024-01-30 16:18:40 +01:00
LangChain4j	3958e01738	release 0.26.1 (#570 )	2024-01-30 16:11:21 +01:00
LangChain4j	469699b944	bump version to 0.27.0-SNAPSHOT	2024-01-30 08:07:45 +01:00
LangChain4j	a8ad9e48d9	Automate release (#562 )	2024-01-30 07:20:20 +01:00
Giuseppe Villani	df6683645e	Fix another Neo4jEmbeddingStoreTest error (#441 ) See here: https://github.com/langchain4j/langchain4j/actions/runs/7396118145/job/20120618008?pr=396 The fix made [here](https://github.com/langchain4j/langchain4j/pull/368) it is not enough, because the test property is not always populated, see [here](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-neo4j/src/main/java/dev/langchain4j/store/embedding/neo4j/Neo4jEmbeddingUtils.java#L71). Therefore, I put an if-else on the iterator where needed	2024-01-03 17:43:18 +01:00
Crutcher Dunnavant	be331724c7	Utils test coverage and docs. (#396 ) Add UtilsTest. Add JacocoIgnoreCoverageGenerated. Migrate Neo4jEmbeddingStore to non-deprecated method.	2024-01-03 11:31:58 +01:00
LangChain4j	7e5e82b7b2	updated to 0.26.0-SNAPSHOT	2023-12-22 18:08:19 +01:00
LangChain4j	2a5308b794	released 0.25.0	2023-12-22 18:02:04 +01:00
Giuseppe Villani	968bb71891	Fix Neo4jEmbeddingStoreTest error (#368 ) See https://github.com/langchain4j/langchain4j/actions/runs/7259989386/job/19778223144 Changed to `MATCH (n:%s) RETURN n ORDER BY n.text` to make tests more deterministic, since rarely the sorting of "MATCH (n:%s) RETURN n" is not based on the order of arrival of the `embeddingModel.embed(..)`.	2023-12-19 15:03:05 +01:00
LangChain4j	e1dddb33a2	bumped version to 0.25.0-SNAPSHOT (#369 )	2023-12-19 13:03:48 +01:00
Giuseppe Villani	c35a4e22ea	Fixes #241 : Added support for Neo4j Vector Index (#282 ) Fixes #241: Added support for Neo4j Vector Index This commit brings support for Neo4j graph database in general, and uses the vector index functionality, generally available since version 5.13. Mostly aligned with the existing WeaviateEmbeddingStoreImpl implementation and tests. The tests have some additional Neo4j node assertion to check that the nodes involved are correctly created. The module creates indexes, i.e. `"CALL db.index.vector.createNodeIndex(<indexName>, <label>, <embeddingProperty>, <dimension>, <distanceType>)"`, if needed, for the vector search . The required configurations are: - the Neo4j index dimension parameter - the Neo4j Java Driver connection instance - as an alternative to the Neo4j Java Driver, we can create a `Neo4jEmbeddingStore.builder().withBasicAuth(<url>, <username>, <password>)`, which will create a Driver connection instance under the hood It is possible to customize, via the builder: - the index name (with default `langchain-embedding-index`) - the Neo4j node label (with default `Document`) - the Neo4j property key which save the embeddings (with default `embeddingProp`) - the Neo4j index distanceType parameter - the metadata prefix (with default `metadata.`) - the text property key (with default `text`), which store the text field of the `TextSegment.java` Created an example PR as well, on `langchain4j-examples` repo: https://github.com/langchain4j/langchain4j-examples/pull/23	2023-12-18 18:04:05 +01:00

42 Commits