[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
//===-- SerializationTests.cpp - Binary and YAML serialization unit tests -===//
|
|
|
|
//
|
2019-01-19 16:50:56 +08:00
|
|
|
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
|
|
|
|
// See https://llvm.org/LICENSE.txt for license information.
|
|
|
|
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
//
|
|
|
|
//===----------------------------------------------------------------------===//
|
|
|
|
|
2019-07-04 17:52:04 +08:00
|
|
|
#include "Headers.h"
|
2018-09-21 21:04:57 +08:00
|
|
|
#include "index/Index.h"
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
#include "index/Serialization.h"
|
2019-07-04 17:51:53 +08:00
|
|
|
#include "clang/Tooling/CompilationDatabase.h"
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
#include "llvm/Support/ScopedPrinter.h"
|
|
|
|
#include "gmock/gmock.h"
|
|
|
|
#include "gtest/gtest.h"
|
|
|
|
|
2019-05-06 18:08:47 +08:00
|
|
|
using ::testing::_;
|
|
|
|
using ::testing::AllOf;
|
|
|
|
using ::testing::Pair;
|
|
|
|
using ::testing::UnorderedElementsAre;
|
|
|
|
using ::testing::UnorderedElementsAreArray;
|
2018-10-20 23:30:37 +08:00
|
|
|
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
namespace clang {
|
|
|
|
namespace clangd {
|
|
|
|
namespace {
|
|
|
|
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
const char *YAML = R"(
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
---
|
2018-10-04 22:09:55 +08:00
|
|
|
!Symbol
|
2018-11-16 18:58:40 +08:00
|
|
|
ID: 057557CEBF6E6B2D
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
Name: 'Foo1'
|
|
|
|
Scope: 'clang::'
|
|
|
|
SymInfo:
|
|
|
|
Kind: Function
|
|
|
|
Lang: Cpp
|
|
|
|
CanonicalDeclaration:
|
|
|
|
FileURI: file:///path/foo.h
|
|
|
|
Start:
|
|
|
|
Line: 1
|
|
|
|
Column: 0
|
|
|
|
End:
|
|
|
|
Line: 1
|
|
|
|
Column: 1
|
2019-04-30 01:25:58 +08:00
|
|
|
Origin: 128
|
|
|
|
Flags: 129
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
Documentation: 'Foo doc'
|
|
|
|
ReturnType: 'int'
|
|
|
|
IncludeHeaders:
|
|
|
|
- Header: 'include1'
|
|
|
|
References: 7
|
|
|
|
- Header: 'include2'
|
|
|
|
References: 3
|
|
|
|
...
|
|
|
|
---
|
2018-10-04 22:09:55 +08:00
|
|
|
!Symbol
|
2018-11-16 18:58:40 +08:00
|
|
|
ID: 057557CEBF6E6B2E
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
Name: 'Foo2'
|
|
|
|
Scope: 'clang::'
|
|
|
|
SymInfo:
|
|
|
|
Kind: Function
|
|
|
|
Lang: Cpp
|
|
|
|
CanonicalDeclaration:
|
|
|
|
FileURI: file:///path/bar.h
|
|
|
|
Start:
|
|
|
|
Line: 1
|
|
|
|
Column: 0
|
|
|
|
End:
|
|
|
|
Line: 1
|
|
|
|
Column: 1
|
2018-09-07 02:52:26 +08:00
|
|
|
Flags: 2
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
Signature: '-sig'
|
|
|
|
CompletionSnippetSuffix: '-snippet'
|
|
|
|
...
|
2018-10-04 22:09:55 +08:00
|
|
|
!Refs
|
2018-11-16 18:58:40 +08:00
|
|
|
ID: 057557CEBF6E6B2D
|
2018-10-04 22:09:55 +08:00
|
|
|
References:
|
|
|
|
- Kind: 4
|
|
|
|
Location:
|
|
|
|
FileURI: file:///path/foo.cc
|
|
|
|
Start:
|
|
|
|
Line: 5
|
|
|
|
Column: 3
|
|
|
|
End:
|
|
|
|
Line: 5
|
|
|
|
Column: 8
|
2019-06-03 13:07:52 +08:00
|
|
|
...
|
|
|
|
--- !Relations
|
|
|
|
Subject:
|
|
|
|
ID: 6481EE7AF2841756
|
|
|
|
Predicate: 0
|
|
|
|
Object:
|
|
|
|
ID: 6512AEC512EA3A2D
|
|
|
|
...
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
)";
|
|
|
|
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
MATCHER_P(ID, I, "") { return arg.ID == cantFail(SymbolID::fromStr(I)); }
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
MATCHER_P(QName, Name, "") { return (arg.Scope + arg.Name).str() == Name; }
|
|
|
|
MATCHER_P2(IncludeHeaderWithRef, IncludeHeader, References, "") {
|
|
|
|
return (arg.IncludeHeader == IncludeHeader) && (arg.References == References);
|
|
|
|
}
|
|
|
|
|
2019-01-08 23:24:47 +08:00
|
|
|
TEST(SerializationTest, NoCrashOnEmptyYAML) {
|
|
|
|
EXPECT_TRUE(bool(readIndexFile("")));
|
|
|
|
}
|
|
|
|
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
TEST(SerializationTest, YAMLConversions) {
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
auto In = readIndexFile(YAML);
|
|
|
|
EXPECT_TRUE(bool(In)) << In.takeError();
|
|
|
|
|
|
|
|
auto ParsedYAML = readIndexFile(YAML);
|
|
|
|
ASSERT_TRUE(bool(ParsedYAML)) << ParsedYAML.takeError();
|
|
|
|
ASSERT_TRUE(bool(ParsedYAML->Symbols));
|
2019-01-03 21:28:05 +08:00
|
|
|
EXPECT_THAT(
|
|
|
|
*ParsedYAML->Symbols,
|
|
|
|
UnorderedElementsAre(ID("057557CEBF6E6B2D"), ID("057557CEBF6E6B2E")));
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
|
|
|
|
auto Sym1 = *ParsedYAML->Symbols->find(
|
2018-11-16 18:58:40 +08:00
|
|
|
cantFail(SymbolID::fromStr("057557CEBF6E6B2D")));
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
auto Sym2 = *ParsedYAML->Symbols->find(
|
2018-11-16 18:58:40 +08:00
|
|
|
cantFail(SymbolID::fromStr("057557CEBF6E6B2E")));
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
EXPECT_THAT(Sym1, QName("clang::Foo1"));
|
|
|
|
EXPECT_EQ(Sym1.Signature, "");
|
|
|
|
EXPECT_EQ(Sym1.Documentation, "Foo doc");
|
|
|
|
EXPECT_EQ(Sym1.ReturnType, "int");
|
2018-11-14 19:55:45 +08:00
|
|
|
EXPECT_EQ(StringRef(Sym1.CanonicalDeclaration.FileURI), "file:///path/foo.h");
|
2019-04-30 01:25:58 +08:00
|
|
|
EXPECT_EQ(Sym1.Origin, static_cast<SymbolOrigin>(1 << 7));
|
|
|
|
EXPECT_EQ(static_cast<uint8_t>(Sym1.Flags), 129);
|
2018-09-07 02:52:26 +08:00
|
|
|
EXPECT_TRUE(Sym1.Flags & Symbol::IndexedForCodeCompletion);
|
|
|
|
EXPECT_FALSE(Sym1.Flags & Symbol::Deprecated);
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
EXPECT_THAT(Sym1.IncludeHeaders,
|
|
|
|
UnorderedElementsAre(IncludeHeaderWithRef("include1", 7u),
|
|
|
|
IncludeHeaderWithRef("include2", 3u)));
|
|
|
|
|
|
|
|
EXPECT_THAT(Sym2, QName("clang::Foo2"));
|
|
|
|
EXPECT_EQ(Sym2.Signature, "-sig");
|
|
|
|
EXPECT_EQ(Sym2.ReturnType, "");
|
2018-11-14 19:55:45 +08:00
|
|
|
EXPECT_EQ(llvm::StringRef(Sym2.CanonicalDeclaration.FileURI),
|
|
|
|
"file:///path/bar.h");
|
2018-09-07 02:52:26 +08:00
|
|
|
EXPECT_FALSE(Sym2.Flags & Symbol::IndexedForCodeCompletion);
|
|
|
|
EXPECT_TRUE(Sym2.Flags & Symbol::Deprecated);
|
2018-10-04 22:09:55 +08:00
|
|
|
|
|
|
|
ASSERT_TRUE(bool(ParsedYAML->Refs));
|
2018-10-24 14:58:42 +08:00
|
|
|
EXPECT_THAT(
|
|
|
|
*ParsedYAML->Refs,
|
2019-01-03 21:28:05 +08:00
|
|
|
UnorderedElementsAre(Pair(cantFail(SymbolID::fromStr("057557CEBF6E6B2D")),
|
2019-05-06 18:08:47 +08:00
|
|
|
::testing::SizeIs(1))));
|
2018-10-04 22:09:55 +08:00
|
|
|
auto Ref1 = ParsedYAML->Refs->begin()->second.front();
|
|
|
|
EXPECT_EQ(Ref1.Kind, RefKind::Reference);
|
2018-11-14 19:55:45 +08:00
|
|
|
EXPECT_EQ(StringRef(Ref1.Location.FileURI), "file:///path/foo.cc");
|
2019-06-03 13:07:52 +08:00
|
|
|
|
|
|
|
SymbolID Base = cantFail(SymbolID::fromStr("6481EE7AF2841756"));
|
|
|
|
SymbolID Derived = cantFail(SymbolID::fromStr("6512AEC512EA3A2D"));
|
|
|
|
ASSERT_TRUE(bool(ParsedYAML->Relations));
|
|
|
|
EXPECT_THAT(*ParsedYAML->Relations,
|
|
|
|
UnorderedElementsAre(
|
|
|
|
Relation{Base, index::SymbolRole::RelationBaseOf, Derived}));
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
}
|
|
|
|
|
|
|
|
std::vector<std::string> YAMLFromSymbols(const SymbolSlab &Slab) {
|
|
|
|
std::vector<std::string> Result;
|
|
|
|
for (const auto &Sym : Slab)
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
Result.push_back(toYAML(Sym));
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
return Result;
|
|
|
|
}
|
2018-10-04 22:09:55 +08:00
|
|
|
std::vector<std::string> YAMLFromRefs(const RefSlab &Slab) {
|
|
|
|
std::vector<std::string> Result;
|
2019-06-03 13:07:52 +08:00
|
|
|
for (const auto &Refs : Slab)
|
|
|
|
Result.push_back(toYAML(Refs));
|
|
|
|
return Result;
|
|
|
|
}
|
|
|
|
|
|
|
|
std::vector<std::string> YAMLFromRelations(const RelationSlab &Slab) {
|
|
|
|
std::vector<std::string> Result;
|
|
|
|
for (const auto &Rel : Slab)
|
|
|
|
Result.push_back(toYAML(Rel));
|
2018-10-04 22:09:55 +08:00
|
|
|
return Result;
|
|
|
|
}
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
|
|
|
|
TEST(SerializationTest, BinaryConversions) {
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
auto In = readIndexFile(YAML);
|
|
|
|
EXPECT_TRUE(bool(In)) << In.takeError();
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
|
|
|
|
// Write to binary format, and parse again.
|
2018-10-04 22:09:55 +08:00
|
|
|
IndexFileOut Out(*In);
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
Out.Format = IndexFileFormat::RIFF;
|
2019-01-07 23:45:19 +08:00
|
|
|
std::string Serialized = llvm::to_string(Out);
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
auto In2 = readIndexFile(Serialized);
|
|
|
|
ASSERT_TRUE(bool(In2)) << In.takeError();
|
2018-10-04 22:09:55 +08:00
|
|
|
ASSERT_TRUE(In2->Symbols);
|
|
|
|
ASSERT_TRUE(In2->Refs);
|
2019-06-03 13:07:52 +08:00
|
|
|
ASSERT_TRUE(In2->Relations);
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
|
|
|
|
// Assert the YAML serializations match, for nice comparisons and diffs.
|
[clangd] Merge binary + YAML serialization behind a (mostly) common interface.
Summary:
Interface is in one file, implementation in two as they have little in common.
A couple of ad-hoc YAML functions left exposed:
- symbol -> YAML I expect to keep for tools like dexp
- YAML -> symbol is used for the MR-style indexer, I think we can eliminate
this (merge-on-the-fly, else use a different serialization)
Reviewers: kbobyrev
Subscribers: mgorny, ilya-biryukov, ioeric, MaskRay, jkorous, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D52453
llvm-svn: 342999
2018-09-26 02:06:43 +08:00
|
|
|
EXPECT_THAT(YAMLFromSymbols(*In2->Symbols),
|
|
|
|
UnorderedElementsAreArray(YAMLFromSymbols(*In->Symbols)));
|
2018-10-04 22:09:55 +08:00
|
|
|
EXPECT_THAT(YAMLFromRefs(*In2->Refs),
|
|
|
|
UnorderedElementsAreArray(YAMLFromRefs(*In->Refs)));
|
2019-06-03 13:07:52 +08:00
|
|
|
EXPECT_THAT(YAMLFromRelations(*In2->Relations),
|
|
|
|
UnorderedElementsAreArray(YAMLFromRelations(*In->Relations)));
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
}
|
|
|
|
|
2018-11-28 00:08:53 +08:00
|
|
|
TEST(SerializationTest, SrcsTest) {
|
2018-11-20 02:06:29 +08:00
|
|
|
auto In = readIndexFile(YAML);
|
|
|
|
EXPECT_TRUE(bool(In)) << In.takeError();
|
|
|
|
|
2018-11-28 00:08:53 +08:00
|
|
|
std::string TestContent("TestContent");
|
|
|
|
IncludeGraphNode IGN;
|
[clangd] Use xxhash instead of SHA1 for background index file digests.
Summary:
Currently SHA1 is about 10% of our CPU, this patch reduces it to ~1%.
xxhash is a well-defined (stable) non-cryptographic hash optimized for
fast checksums (like crc32).
Collisions shouldn't be a problem, despite the reduced length:
- for actual file content (used to invalidate bg index shards), there
are only two versions that can collide (new shard and old shard).
- for file paths in bg index shard filenames, we would need 2^32 files
with the same filename to expect a collision. Imperfect hashing may
reduce this a bit but it's well beyond what's plausible.
This will invalidate shards on disk (as usual; I bumped the version),
but this time the filenames are changing so the old files will stick
around :-( So this is more expensive than the usual bump, but would be
good to land before the v9 branch when everyone will start using bg index.
Reviewers: kadircet
Subscribers: ilya-biryukov, MaskRay, jkorous, arphaman, llvm-commits
Tags: #llvm
Differential Revision: https://reviews.llvm.org/D64306
llvm-svn: 365311
2019-07-08 19:33:17 +08:00
|
|
|
IGN.Digest = digest(TestContent);
|
2018-11-28 00:08:53 +08:00
|
|
|
IGN.DirectIncludes = {"inc1", "inc2"};
|
|
|
|
IGN.URI = "URI";
|
2019-07-04 17:52:04 +08:00
|
|
|
IGN.Flags |= IncludeGraphNode::SourceFlag::IsTU;
|
|
|
|
IGN.Flags |= IncludeGraphNode::SourceFlag::HadErrors;
|
2018-11-28 00:08:53 +08:00
|
|
|
IncludeGraph Sources;
|
|
|
|
Sources[IGN.URI] = IGN;
|
2018-11-20 02:06:29 +08:00
|
|
|
// Write to binary format, and parse again.
|
|
|
|
IndexFileOut Out(*In);
|
|
|
|
Out.Format = IndexFileFormat::RIFF;
|
2018-11-28 00:08:53 +08:00
|
|
|
Out.Sources = &Sources;
|
|
|
|
{
|
2019-01-07 23:45:19 +08:00
|
|
|
std::string Serialized = llvm::to_string(Out);
|
2018-11-28 00:08:53 +08:00
|
|
|
|
|
|
|
auto In = readIndexFile(Serialized);
|
|
|
|
ASSERT_TRUE(bool(In)) << In.takeError();
|
|
|
|
ASSERT_TRUE(In->Symbols);
|
|
|
|
ASSERT_TRUE(In->Refs);
|
|
|
|
ASSERT_TRUE(In->Sources);
|
|
|
|
ASSERT_TRUE(In->Sources->count(IGN.URI));
|
|
|
|
// Assert the YAML serializations match, for nice comparisons and diffs.
|
|
|
|
EXPECT_THAT(YAMLFromSymbols(*In->Symbols),
|
|
|
|
UnorderedElementsAreArray(YAMLFromSymbols(*In->Symbols)));
|
|
|
|
EXPECT_THAT(YAMLFromRefs(*In->Refs),
|
|
|
|
UnorderedElementsAreArray(YAMLFromRefs(*In->Refs)));
|
|
|
|
auto IGNDeserialized = In->Sources->lookup(IGN.URI);
|
|
|
|
EXPECT_EQ(IGNDeserialized.Digest, IGN.Digest);
|
|
|
|
EXPECT_EQ(IGNDeserialized.DirectIncludes, IGN.DirectIncludes);
|
|
|
|
EXPECT_EQ(IGNDeserialized.URI, IGN.URI);
|
2019-07-04 17:52:04 +08:00
|
|
|
EXPECT_EQ(IGNDeserialized.Flags, IGN.Flags);
|
2018-11-28 00:08:53 +08:00
|
|
|
}
|
2018-11-20 02:06:29 +08:00
|
|
|
}
|
|
|
|
|
2019-07-04 17:51:53 +08:00
|
|
|
TEST(SerializationTest, CmdlTest) {
|
|
|
|
auto In = readIndexFile(YAML);
|
|
|
|
EXPECT_TRUE(bool(In)) << In.takeError();
|
|
|
|
|
|
|
|
tooling::CompileCommand Cmd;
|
|
|
|
Cmd.Directory = "testdir";
|
|
|
|
Cmd.CommandLine.push_back("cmd1");
|
|
|
|
Cmd.CommandLine.push_back("cmd2");
|
|
|
|
Cmd.Filename = "ignored";
|
|
|
|
Cmd.Heuristic = "ignored";
|
|
|
|
Cmd.Output = "ignored";
|
|
|
|
|
|
|
|
IndexFileOut Out(*In);
|
|
|
|
Out.Format = IndexFileFormat::RIFF;
|
|
|
|
Out.Cmd = &Cmd;
|
|
|
|
{
|
|
|
|
std::string Serialized = llvm::to_string(Out);
|
|
|
|
|
|
|
|
auto In = readIndexFile(Serialized);
|
|
|
|
ASSERT_TRUE(bool(In)) << In.takeError();
|
|
|
|
ASSERT_TRUE(In->Cmd);
|
|
|
|
|
|
|
|
const tooling::CompileCommand &SerializedCmd = In->Cmd.getValue();
|
|
|
|
EXPECT_EQ(SerializedCmd.CommandLine, Cmd.CommandLine);
|
|
|
|
EXPECT_EQ(SerializedCmd.Directory, Cmd.Directory);
|
|
|
|
EXPECT_NE(SerializedCmd.Filename, Cmd.Filename);
|
|
|
|
EXPECT_NE(SerializedCmd.Heuristic, Cmd.Heuristic);
|
|
|
|
EXPECT_NE(SerializedCmd.Output, Cmd.Output);
|
|
|
|
}
|
|
|
|
}
|
[clangd] Define a compact binary serialization fomat for symbol slab/index.
Summary:
This is intended to replace the current YAML format for general use.
It's ~10x more compact than YAML, and ~40% more compact than gzipped YAML:
llvmidx.riff = 20M, llvmidx.yaml = 272M, llvmidx.yaml.gz = 32M
It's also simpler/faster to read and write.
The format is a RIFF container (chunks of (type, size, data)) with:
- a compressed string table
- simple binary encoding of symbols (with varints for compactness)
It can be extended to include occurrences, Dex posting lists, etc.
There's no rich backwards-compatibility scheme, but a version number is included
so we can detect incompatible files and do ad-hoc back-compat.
Alternatives considered:
- compressed YAML or JSON: bulky and slow to load
- llvm bitstream: confusing model and libraries are hard to use. My attempt
produced slightly larger files, and the code was longer and slower.
- protobuf or similar: would be really nice (esp for back-compat) but the
dependency is a big hassle
- ad-hoc binary format without a container: it seems clear we're going
to add posting lists and occurrences here, and that they will benefit
from sharing a string table. The container makes it easy to debug
these pieces in isolation, and make them optional.
Reviewers: ioeric
Subscribers: mgorny, ilya-biryukov, MaskRay, jkorous, mgrang, arphaman, kadircet, cfe-commits
Differential Revision: https://reviews.llvm.org/D51585
llvm-svn: 341375
2018-09-05 00:16:50 +08:00
|
|
|
} // namespace
|
|
|
|
} // namespace clangd
|
|
|
|
} // namespace clang
|