Commit Graph

682 Commits

Author SHA1 Message Date
mpostma 204c743bcc
add json payload check on document addition 2021-03-16 14:28:13 +01:00
mpostma 6a742ee62c
restore version route 2021-03-15 19:11:27 +01:00
mpostma c29b86849b
use actix cors git dependency 2021-03-15 17:40:20 +01:00
mpostma f727dcc8c6
update milli 2021-03-15 14:26:59 +01:00
mpostma 80d0f9c49d
methods to update index time metadata 2021-03-15 14:05:47 +01:00
mpostma adc71a70ce
fix displayed attributes in document retrieval 2021-03-15 10:17:41 +01:00
mpostma 3f68460d6c
fix update dedup 2021-03-11 20:58:51 +01:00
mpostma 79a4bc8129
use meta from milli 2021-03-11 19:40:18 +01:00
mpostma a56e8c1a0c
fix tests 2021-03-10 14:47:04 +01:00
mpostma 5ecf514d28
restructure project 2021-03-10 13:46:49 +01:00
mpostma 562da9dd3f
fix test compilation 2021-03-10 11:56:51 +01:00
Clément Renault b18ec00a7a
Add a logging_timer macro to te criterion next methods 2021-03-08 16:12:06 +01:00
Kerollmops 636a9df177
Temporarily fix the tinytemplate doc hidden issue 2021-03-08 15:57:45 +01:00
mpostma 7d28f8cff0
implement get single udpate 2021-03-06 10:51:52 +01:00
mpostma f090f42e7a
multi index store
create two channels for Index handler, one for writes and one for reads,
so write are processed one at a time, while reads are processed in
parallel.
2021-03-04 19:18:01 +01:00
Kerollmops 07784c8990
Tune the words prefixes threshold to compute for 1/1000 instead 2021-03-03 15:51:28 +01:00
Kerollmops 79a143b32f
Introduce the query tree data structure 2021-03-03 13:40:18 +01:00
mpostma 62532b8f79
WIP concurent index store 2021-03-02 14:05:03 +01:00
Clément Renault 9423310816
Introduce an helpers crate that export the database to stdout 2021-03-01 19:55:04 +01:00
mpostma 61ce749122
update tokio and disable all routes 2021-02-26 09:10:04 +01:00
Kerollmops 519b1cb5c9
Update dependencies 2021-02-21 10:26:04 +01:00
mpostma 91d6e90d5d
enable faceted searches 2021-02-16 19:20:39 +01:00
Clément Renault fecf3d6fc1
Move the command lines helpers into different crates 2021-02-14 18:55:15 +01:00
Clément Renault d8f3421608
Update the dependencies and remove the unused ones 2021-02-14 18:32:46 +01:00
Clément Renault e8639517da
Change the project to become a workspace with milli as a default-member 2021-02-12 16:15:09 +01:00
mpostma f8f02af23e
incorporate review changes 2021-02-04 13:21:15 +01:00
mpostma 9af0a08122
post review fixes 2021-02-02 17:34:06 +01:00
mpostma 17c463ca61
remove unused deps 2021-02-01 13:32:21 +01:00
mpostma e9c95f6623
remove useless files 2021-01-28 19:43:54 +01:00
mpostma 6c63ee6798
implement list all indexes 2021-01-28 18:32:24 +01:00
mpostma 74410d8c6b
architecture rework 2021-01-28 14:12:34 +01:00
Clément Renault 433ac8c38a
Remove the ordered-float serde feature 2021-01-27 14:11:10 +01:00
Kerollmops 61dbcfa44a
Bump the roaring to 0.6.4 2021-01-26 14:38:43 +01:00
Clément Renault 51a37de885
Introduce the FacetValue enum type 2021-01-26 14:09:09 +01:00
mpostma 87a56d2bc9
Fix settings bug
replace ids with str in settings

This allows for better maintainability of the settings code, since
updating the searchable attributes is now straightforward.

criterion use string

fix reindexing fieldid remaping

add tests for primary_key compute

fix tests

fix http-ui

fixup! add tests for primary_key compute

code improvements settings

update deps

fixup! code improvements settings

fixup! refactor settings updates and fix bug

fixup! Fix settings bug

fixup! Fix settings bug

fixup! Fix settings bug

Update src/update/index_documents/transform.rs

Co-authored-by: Clément Renault <clement@meilisearch.com>

fixup! Fix settings bug
2021-01-26 13:53:08 +01:00
mpostma 6a3f625e11
WIP: refactor IndexController
change the architecture of the index controller to allow it to own an
index store.
2021-01-16 15:09:48 +01:00
mpostma 686f987180
fix compile errors 2021-01-14 11:27:07 +01:00
mpostma d22fab5bae
implement open index 2021-01-13 18:20:14 +01:00
mpostma ddd7789713
WIP: IndexController 2021-01-13 17:50:36 +01:00
mpostma 4f7f7538f7
highlight with new tokenizer 2021-01-11 21:59:37 +01:00
mpostma 1ae761311e
integrate with meilisearch tokenizer 2021-01-07 16:14:27 +01:00
mpostma b4d447b5cb temp 2021-01-01 16:59:49 +01:00
mpostma d1e9ded76f
setting builder takes ownership 2020-12-31 00:50:30 +01:00
mpostma d9dc2036a7 support error & return document count on addition 2020-12-30 18:44:33 +01:00
mpostma 54861335a0 retrieve update status 2020-12-30 18:16:07 +01:00
mpostma 0cd9e62fc6 search first iteration 2020-12-24 12:58:34 +01:00
mpostma 1a38bfd31f data add documents 2020-12-23 13:52:28 +01:00
mpostma 55e1552957 update queue refactor, first iteration 2020-12-22 17:13:50 +01:00
mpostma 7c9eaaeadb clean code, and fix errors 2020-12-22 14:02:41 +01:00
Kerollmops 77e951e933
Use the byte-unit crate to ease library usage 2020-12-20 12:00:37 +01:00
Clément Renault e7f2ab9138
Bump grenad to fix an indexing bug 2020-12-05 16:39:15 +01:00
Clément Renault 0959e1501f
Introduce the FacetRevRange Iterator struct 2020-12-04 12:02:23 +01:00
Clément Renault 61b383f422
Introduce the criteria update setting 2020-12-04 12:02:22 +01:00
Clément Renault a0adfb5e8e
Introduce a real pest parser and support every facet filter conditions 2020-11-23 16:43:55 +01:00
Clément Renault 07a0c82790
Bump heed to 0.10.4 to use be able to lazily decode roaring bitmaps 2020-11-23 16:43:53 +01:00
Clément Renault 38c76754ef
Make the facet level search system generic on f64 and i64 2020-11-23 16:43:52 +01:00
Clément Renault b255be93fa
Bump heed to 0.10.3 2020-11-23 16:43:49 +01:00
Clément Renault a18d9a1f87
Parse and store the faceted fields 2020-11-13 16:13:51 +01:00
Clément Renault 640c7d748a
Modify the highlight function to support any JSON type 2020-11-05 13:59:32 +01:00
Clément Renault 0408c9d66a
Move the http server into its own sub-module 2020-11-05 11:16:39 +01:00
Clément Renault 4fded5bd0e
Bump heed to be able to reference a RoTxn from multiple threads 2020-11-02 12:49:23 +01:00
Clément Renault f0d028d3a4
Update the Transform struct to support JSON updates 2020-10-31 20:52:49 +01:00
Clément Renault 9d47ee52b4
Generate a uuid v4 based document id when missing 2020-10-31 15:11:06 +01:00
Clément Renault 085d3b9d94
Update heed to 0.10.0 2020-10-30 11:42:00 +01:00
Clément Renault b5d52b6b45
Prefer using a smallstr instead of a real String to reduce allocations 2020-10-29 14:32:32 +01:00
Clément Renault 98fc24cbdf
Bump heed to fix a prefix iter bug 2020-10-28 10:55:21 +01:00
Clément Renault b44b04d25b
Serialize the CSV record values as JSON strings 2020-10-24 14:43:46 +02:00
Clément Renault 802e925fd7
Switch to a JSON protocol for the front page 2020-10-21 18:26:29 +02:00
Clément Renault 2210818114
Introduce the obkv heed codec 2020-10-21 15:51:48 +02:00
Clément Renault f948a03be2
Optimise the merge functions to avoid allocations 2020-10-20 16:40:50 +02:00
Clément Renault cde8478388
Replace the panic in the merge function by actual errors 2020-10-20 16:19:07 +02:00
Clément Renault 35c9a3c558
Brodacast the updates infos to every ws clients 2020-10-20 11:19:34 +02:00
Clément Renault 871222aebd
Introduce some new routes to handle live indexing 2020-10-19 16:06:43 +02:00
Clément Renault 65e32fecb1
Move the binaries into one with subcommands 2020-10-19 13:44:17 +02:00
Clément Renault ff389f1270
Update heed-types to 0.7.1 2020-10-19 11:52:59 +02:00
Clément Renault eca49e3a03
Introduce a notification channel for the UpdateStore 2020-10-18 16:37:37 +02:00
Clément Renault 83c1db8763
Introduce the UpdateStore 2020-10-18 15:26:57 +02:00
Clément Renault 9021b2dba6
Introduce the enable-chunk-fusing flag 2020-10-14 18:44:59 +02:00
Kerollmops f980422c57
Move from oxidized-mtbl to grenad 2020-10-14 12:47:32 +02:00
Kerollmops 4e9bd1fef5
Bump oxidized-mtbl 2020-10-07 14:23:22 +02:00
Kerollmops 433d9bbc6e
Use CompressionType::from_str rather than a custom function 2020-10-06 13:50:34 +02:00
Kerollmops 4b819457c9
Enable the strucopt/clap warp help feature 2020-10-06 13:06:22 +02:00
Clément Renault 770f29fd05
Bump the oxidized-mtbl dependency 2020-10-04 17:04:33 +02:00
Kerollmops 68f4af7d2e
Improve the display of the number of processed documents 2020-09-29 16:08:58 +02:00
Clément Renault ed05999f63
Replace the arc cache by a simple linked hash map 2020-09-23 14:50:52 +02:00
Clément Renault d6fa9c0414
Index the intra documents word pair proximities 2020-09-22 14:04:33 +02:00
Kerollmops 3ded98e5fa
Bump the roaring version that fix a deserialization bug 2020-09-10 22:37:51 +02:00
Kerollmops d5e5baa20f
Bump the oxidized-mtbl dependency 2020-09-10 13:29:12 +02:00
Kerollmops 0fb086f241
Use the crates.io raoring library 2020-09-08 15:16:04 +02:00
Clément Renault bb1ab428db
Use another function to define the proximity 2020-09-06 17:55:07 +02:00
Clément Renault f928b91e9d
Specify the exact rev for the near-proximity dep 2020-09-06 17:21:38 +02:00
Clément Renault 1c504471d3
Introduce the plane-sweep algorithm 2020-09-05 18:25:27 +02:00
Clément Renault dc88a86259
Store the word positions under the documents 2020-09-05 18:03:06 +02:00
Kerollmops 580ed1119a
Make the engine to return csv string records as documents and headers 2020-08-31 19:02:00 +02:00
Clément Renault bad0663138
Come back to the old tokenizer 2020-08-31 13:34:38 +02:00
Clément Renault 3fe497e129
Improve the Mtbl heed codec to only encode MTBL databases 2020-08-29 11:20:39 +02:00
Clément Renault d19f394630
Make the indexer support gzipped CSV as input 2020-08-21 18:10:24 +02:00
Clément Renault ff479c865d
Replace pipe by ringtail to improve stdin read performances 2020-08-21 17:45:52 +02:00
Clément Renault 8806fcd545
Introduce a better query and document lexer 2020-08-16 14:36:54 +02:00
Clément Renault 1e358e3ae8
Introduce the AstarBagIter that iterates through best paths 2020-08-15 16:24:06 +02:00
Clément Renault fae694a102
Put the documents into an MTBL database 2020-08-07 12:14:40 +02:00
Clément Renault 405a71d3a4
Accept csv from stdin 2020-08-06 13:38:21 +02:00
Clément Renault 6508d497ce
Replace the regex highlighting by a simple algorithm 2020-08-05 13:52:27 +02:00
Clément Renault bd4b18541c
Introduce a new indexer which uses an MTBL sorter 2020-08-04 15:44:37 +02:00
Kerollmops 085c376655
Use the regex crate to highlight "hello" 2020-07-14 11:28:40 +02:00
Kerollmops 12358476da
Use the log crate instead of stderr 2020-07-12 10:55:09 +02:00
Kerollmops 2c62eeea3c
Rename the project milli 2020-07-12 00:16:41 +02:00
Kerollmops f6eae91c7d
Pretty print the new dashboard numbers 2020-07-11 14:17:37 +02:00
Kerollmops 11c7fef80a
Implement a memory dumper
It moves the in memory HashMaps used when indexing to a disk based MTBL file
2020-07-07 16:48:49 +02:00
Kerollmops 7178b6c2c4
First basic version using MTBL again 2020-07-07 11:32:33 +02:00
Kerollmops 2a3b03138b
Use heed 0.8.1 with the RwIter append method 2020-07-05 19:50:28 +02:00
Kerollmops 46ced5c828
Introduce the RwIter append heed API 2020-07-04 12:34:10 +02:00
Kerollmops 2ae3f40971
Make the indexer ignore certain words
This is a preparation for making the indexing fully parallel by making the
indexer only be aware of certain words for each threads to avoid postings lists
conflicts for each words
2020-07-01 17:49:46 +02:00
Kerollmops f98b615bf3
Replace the LRU by an Arc cache 2020-06-29 20:48:57 +02:00
Kerollmops 07abebfc46
Introduce a (too big) LRU cache 2020-06-29 18:15:03 +02:00
Kerollmops 5f0088594b
Index by writing directly into LMDB 2020-06-29 13:54:47 +02:00
Kerollmops d6705d5529
Introduce the criterion dependency to bench the engine 2020-06-19 18:32:25 +02:00
Kerollmops 55a8941922
Optimize things 2020-06-19 17:48:17 +02:00
Kerollmops a8cda248b4
Introduce a customized A* algorithm.
This custom algo lazily compute the intersections between words, to avoid too much set operations and database reads
2020-06-14 12:51:57 +02:00
Kerollmops 0a83a86e65
Fix multiple bugs 2020-06-11 11:55:03 +02:00
Kerollmops 13977d9338
squash-me 2020-06-09 23:06:59 +02:00
Kerollmops dfdaceb410
Introduce a first basic working positions-based engine 2020-06-05 20:13:19 +02:00
Kerollmops 3a23dc242e
More efficiently merge MTBLs, more than two at a time 2020-06-04 16:17:24 +02:00
Kerollmops dff68a339a
Use OnceCell to cache levenshtein builders 2020-05-31 19:27:11 +02:00
Kerollmops a26553c90a
Reintroduce a simple HTTP server 2020-05-31 17:48:13 +02:00
Kerollmops ba9527abc0
Support typos with a levenshtein automata 2020-05-31 17:01:11 +02:00
Kerollmops 6c726df9b9
Support multiple space seperated words 2020-05-31 16:09:34 +02:00
Kerollmops 24587148fd
Introduce MTBL parallel merging before LMDB writing 2020-05-31 14:22:57 +02:00
Kerollmops 3a998cf39c
Far better usage of rayon to fold indexed data 2020-05-31 14:22:57 +02:00
Kerollmops 1237306ca8
Introduce a thread that write to heed 2020-05-31 14:22:57 +02:00
Kerollmops a81f201fad
Inroduce the use of RocksDB instead of sled (RAM) 2020-05-31 14:22:06 +02:00
Kerollmops 91ba938953
Initial commit 2020-05-31 14:22:06 +02:00