## Issue
Closes#1049
## Change
Extracted `HtmlTextExtractor` into
`langchain4j-document-transformer-jsoup` module.
Renamed `HtmlToTextDocumentTransformer` into `HtmlTextExtractor`.
Please import:
```xml
<dependency>
<groupId>dev.langchain4j</groupId>
<artifactId>langchain4j-document-transformer-jsoup</artifactId>
<version>0.35.0</version>
</dependency>
```
## General checklist
- [ ] There are no breaking changes
- [ ] I have added unit and integration tests for my change
- [X] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [X] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
- [X] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
- [ ] I have added/updated [Spring Boot
starter(s)](https://github.com/langchain4j/langchain4j-spring) (if
applicable)
## Change
<!-- Please describe the changes you made. -->
This change adds the document loading process by integrating Selenium
web automation. Unlike the existing UrlDocumentLoader, this method
captures the complete version of a webpage, including all post-redirect
content and dynamically loaded elements via JavaScript. This ensures
that the full content is retrieved, providing a more accurate
representation of the page as rendered in a browser.
## General checklist
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] There are no breaking changes
- [x] I have added unit and integration tests for my change
- [x] I have manually run all the unit and integration tests in the
module I have added/changed, and they are all green
- [ ] I have manually run all the unit and integration tests in the
[core](https://github.com/langchain4j/langchain4j/tree/main/langchain4j-core)
and
[main](https://github.com/langchain4j/langchain4j/tree/main/langchain4j)
modules, and they are all green
<!-- Before adding documentation and example(s) (below), please wait
until the PR is reviewed and approved. -->
- [ ] I have added/updated the
[documentation](https://github.com/langchain4j/langchain4j/tree/main/docs/docs)
- [ ] I have added an example in the [examples
repo](https://github.com/langchain4j/langchain4j-examples) (only for
"big" features)
## Checklist for adding new model integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [x] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for adding new embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have added a `{NameOfIntegration}EmbeddingStoreIT` that extends
from either `EmbeddingStoreIT` or `EmbeddingStoreWithFilteringIT`
- [ ] I have added my new module in the
[BOM](https://github.com/langchain4j/langchain4j/blob/main/langchain4j-bom/pom.xml)
## Checklist for changing existing embedding store integration
<!-- Please double-check the following points and mark them like this:
[X] -->
- [ ] I have manually verified that the
`{NameOfIntegration}EmbeddingStore` works correctly with the data
persisted using the latest released version of LangChain4j