Commit Graph

29 Commits

Author SHA1 Message Date
LangChain4j f5039b6eea
excluded neo4j module from java 8 and 11 builds 2023-12-18 09:31:10 +01:00
Julien Dubois 52fdbaf6df
Update GitHub Actions versions (#357)
In the GitHub Actions workflow:

- Update actions/checkout to the latest version
- Update actions/setup-java to the latest version (Java 21 already works
but is undocumented, the next version it will be thanks to
https://github.com/actions/setup-java/pull/538 😀)
2023-12-15 15:24:22 +01:00
Julien Dubois 3c0943d38b
Support Java 21 (#336)
This PR is to fix #335
2023-12-12 19:16:41 +01:00
shalk(xiao kun) b2f358c926
enable langchain4j-graal build in workflow (#333) 2023-12-08 10:30:36 +01:00
deep-learning-dynamo 06ada5310d disabled jdk17 build temporarily 2023-11-24 12:25:05 +01:00
LangChain4j ba7fabaa50
graal: cleanup (#297) 2023-11-19 12:59:24 +01:00
LangChain4j ff998ac82d
build most modules with jdk 8 (#295)
Since we target java 8, CI build was updated to run most modules using
java 8, then modules requiring java 11 separately with java 11
2023-11-18 15:07:11 +01:00
deep-learning-dynamo 21dfc8b317 released 0.24.0 2023-11-12 18:58:31 +01:00
deep-learning-dynamo eef1796963 fixing build 2023-10-09 13:09:25 +02:00
deep-learning-dynamo 315eab8641 released 0.23.0 2023-09-29 14:27:51 +02:00
deep-learning-dynamo ef8f04015b Removed dynamic loading from AstraDB/Cassandra 2023-09-27 17:11:01 +02:00
Cedrick Lunven c632322493
Cassandra and Astra (dbaas) as VectorStore and ChatMemoryStore (#162)
#### Context

Apache Cassandra is a popular open-source database created back in 2008.
This year with
[CEP30](https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-30%3A+Approximate+Nearest+Neighbor%28ANN%29+Vector+Search+via+Storage-Attached+Indexes)
support for vector and similarity searches have been introduced.
Cassandra is very fast in read and write and is used as a cache by many
companies, it as an opportunity to implement the ChatMemoryStore. This
feature is expected for Cassandra 5 at the end of the year but some
docker images are already available.

DataStax AstraDb is a distribution of Apache Cassandra available as Saas
providing a free tier (free forever) of 80 millions queries/month.
[Registration](https://astra.datastax.com). The vector capability is
there production ready.

#### Data Modelling

With the proper data model in Cassandra we can perform both similarity
search, keyword search, metadata search.

```sql
CREATE TABLE sample_vector_table (
    row_id text PRIMARY KEY,
    attributes_blob text,
    body_blob text,
    metadata_s map<text, text>,
    vector vector<float, 1536>
);
```

#### Implementation Throughts

- The **configuration** to connect to Astra and Cassandra are not
exactly the same so 2 different classes with associated builder are
provided:
[Astra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/AstraDbEmbeddingConfiguration.java)
and [OSS
Cassandra](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingConfiguration.java).
A couple of fields are mutualized but creating a superclass to inherit
from lead to the use of Lombok `@SuperBuilder` and the Javadoc was not
able to found out what to do.

- Instead of passing a large number of arguments like other stores I
prefer to wrap them as a bean. With this trick you can add or remove
attributes, make then optional or mandatory at will. If you need to add
a new attribute in the configuration you do not have to change the
implementation of `XXXStore` and `XXXStoreImpl`

- I create an
[AstractEmbeddedStore<T>](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/AbstractEmbeddingStore.java)
that could very well become the super class for any store. It handles
the different call of the real concrete implementation. (_delegate
pattern_). Some default implementation can be implemented

```java
/**
 * Add a list of embeddings to the store.
 *
 * @param embeddings
 *      list of embeddings (hold vector)
 * @return
 *      list of ids
*/
@Override
public List<String> addAll(List<Embedding> embeddings) {
   Objects.requireNonNull(embeddings, "embeddings must not be null");
   return embeddings.stream().map(this::add).collect(Collectors.toList());
}
```

The only method to implement at the Store level is:

```java
/**
* Initialize the concrete implementation.
* @return create implementation class for the store
*/
protected abstract EmbeddingStore<T> loadImplementation()
throws ClassNotFoundException, NoSuchMethodException, InstantiationException,
       IllegalAccessException, InvocationTargetException;
```

-
[CassandraEmbeddedStore](https://github.com/clun/langchain4j/blob/main/langchain4j/src/main/java/dev/langchain4j/store/embedding/cassandra/CassandraEmbeddingStore.java#L30)
proposes 2 constructors, one could override the implementation class if
they want (extension point)

#### Tests

- Test classes are provided including some long form examples based on
classed found in `langchain4j-examples` but test are disabled.

- To start a local cassandra use docker and the
[docker-compose](https://github.com/clun/langchain4j/blob/main/langchain4j-cassandra/src/test/resources/docker-compose.yml)

```
docker compose up -d
```

- To run Test with Astra signin with your github account, create a token
(api Key) with role `Organization Administrator` following this
[procedure](https://awesome-astra.github.io/docs/pages/astra/create-token/#c-procedure)

<img width="926" alt="Screenshot 2023-09-06 at 18 14 12"
src="https://github.com/langchain4j/langchain4j/assets/726536/dfd2d9e5-09c9-4504-bfaa-31cfd87704a1">

- Pick the full value of the `token` from the json

<img width="713" alt="Screenshot 2023-09-06 at 18 15 53"
src="https://github.com/langchain4j/langchain4j/assets/726536/1be56234-dd98-4f59-af71-03df42ed6997">

- Create the environment variable `ASTRA_DB_APPLICATION_TOKEN`

```console
export ASTRA_DB_APPLICATION_TOKEN=AstraCS:....<your_token>
```
2023-09-27 15:50:04 +02:00
deep-learning-dynamo c1cc5be1c7 released 0.22.0 2023-08-29 19:21:56 +02:00
deep-learning-dynamo db1f236ed2 released 0.21.0 2023-08-19 15:57:39 +02:00
deep-learning-dynamo d7b96ca9a6 released 0.20.0 2023-08-14 00:44:07 +02:00
deep-learning-dynamo 1541f214c1 released 0.19.0 2023-08-10 14:34:21 +02:00
Julien Perrochet 5cb371d7bf
[ci] let the compliance check run on all modules (#75)
Some leftovers from an earlier (and now incorrect) CI configuration.

Modules that don't need to comply with the licenses need to deactivate
the relevant plugin on a case-by-case basis.
2023-08-06 21:22:26 +02:00
deep-learning-dynamo d4fca658c1 released 0.18.0 2023-07-26 21:19:24 +02:00
LangChain4j 540741c8e5
Temporarily disabled in-process embedding model tests (#48)
We are out of free Git LFS quota
2023-07-24 19:52:30 +02:00
LangChain4j 529ef6b647
Added in-process embedding models (#41)
- all-minilm-l6-v2
- all-minilm-l6-v2-q
- e5-small-v2
- e5-small-v2-q

The idea is to give users an option to embed documents/texts in the same
Java process without any external dependencies.
ONNX Runtime is used to run models inside JVM.
Each model resides in it's own maven module (inside the jar).
2023-07-23 19:05:13 +02:00
deep-learning-dynamo 1976560aeb released 0.16.0 2023-07-18 10:49:43 +02:00
deep-learning-dynamo e439f96466 released 0.15.0 2023-07-18 00:13:08 +02:00
deep-learning-dynamo 14185653c7 released 0.14.0 2023-07-16 12:15:31 +02:00
deep-learning-dynamo 120c6a01d8 released 0.13.0 2023-07-15 17:53:10 +02:00
LangChain4j 482dda9df6
Added feature request template 2023-07-15 11:03:06 +02:00
LangChain4j 8b64ad0049
Added bug report template 2023-07-15 11:00:58 +02:00
Julien Perrochet 9cbbf705b9
[build] run workflows on open PR instead of just push (#21)
So that external PRs get the tests run as well
2023-07-12 11:30:55 +02:00
Julien Perrochet 0534ec91e4
[CI] automated license check as part of CI (Apache 2.0/MIT/Eclipse) (#14)
The title says it all. Relying on [this maven
plugin](https://github.com/chonton/license-maven-plugin) for it.

Note that this adds a separate build step because we need a more recent
JDK to run the needed plugin.
2023-07-07 09:27:44 +02:00
Julien Perrochet d427b7ba06
add maven test github action (#11)
This PR:

    adds a github action for running unit tests
    tests that require an OpenAI/HuggingFace token and hit their API are now considered integration tests (and have been renamed to end in IT)
    integration tests are now run through a separate goal (mvn integration-test) via the maven-failsafe-plugin
    to fix the PromptTemplate tests a Clock has been added to that class. Its constructor is now private: whether this is the convention we want to follow can be discussed
2023-07-05 21:55:49 +02:00