llvm-project/clang-tools-extra
Sam McCall a4962cce49 [clangd] Fix unicode handling, using UTF-16 where LSP requires it.
Summary:
The Language Server Protocol unfortunately mandates that locations in files
be represented by line/column pairs, where the "column" is actually an index
into the UTF-16-encoded text of the line.
(This is because VSCode is written in JavaScript, which is UTF-16-native).

Internally clangd treats source files at UTF-8, the One True Encoding, and
generally deals with byte offsets (though there are exceptions).

Before this patch, conversions between offsets and LSP Position pretended
that Position.character was UTF-8 bytes, which is only true for ASCII lines.
Now we examine the text to convert correctly (but don't actually need to
transcode it, due to some nice details of the encodings).

The updated functions in SourceCode are the blessed way to interact with
the Position.character field, and anything else is likely to be wrong.
So I also updated the other accesses:
 - CodeComplete needs a "clang-style" line/column, with column in utf-8 bytes.
   This is now converted via Position -> offset -> clang line/column
   (a new function is added to SourceCode.h for the second conversion).
 - getBeginningOfIdentifier skipped backwards in UTF-16 space, which is will
   behave badly when it splits a surrogate pair. Skipping backwards in UTF-8
   coordinates gives the lexer a fighting chance of getting this right.
   While here, I clarified(?) the logic comments, fixed a bug with identifiers
   containing digits, simplified the signature slightly and added a test.

This seems likely to cause problems with editors that have the same bug, and
treat the protocol as if columns are UTF-8 bytes. But we can find and fix those.

Reviewers: hokein

Subscribers: klimek, ilya-biryukov, ioeric, MaskRay, jkorous, cfe-commits

Differential Revision: https://reviews.llvm.org/D46035

llvm-svn: 331029
2018-04-27 11:59:28 +00:00
..
change-namespace [change-namespace] Don't match a function call/ref multiple times. 2018-03-15 14:45:02 +00:00
clang-apply-replacements [clang-apply-replacements] Make clang-apply-replacements installable 2018-04-21 15:01:33 +00:00
clang-doc [clang-doc] Removing -Wunused-variable warning 2018-03-26 22:37:31 +00:00
clang-move Revert "[Tooling] [1/1] Refactor FrontendActionFactory::create() to return std::unique_ptr<>" 2018-02-27 15:54:41 +00:00
clang-query Fix for LLVM r326109 2018-02-26 20:21:30 +00:00
clang-reorder-fields [CMake] Use PRIVATE in target_link_libraries for executables 2017-12-05 21:49:56 +00:00
clang-tidy [clang-tidy] Improve bugprone-unused-return-value check 2018-04-24 21:25:16 +00:00
clang-tidy-vs [clang-tidy] Remove google-runtime-member-string-references 2018-04-05 14:51:01 +00:00
clangd [clangd] Fix unicode handling, using UTF-16 where LSP requires it. 2018-04-27 11:59:28 +00:00
docs [clang-tidy] Improve bugprone-unused-return-value check 2018-04-24 21:25:16 +00:00
include-fixer Improve completion experience for headers 2018-04-09 13:31:44 +00:00
modularize clang-tidy, modularize: return non-zero exit code on errors 2018-03-22 14:18:20 +00:00
pp-trace Revert "[Tooling] [1/1] Refactor FrontendActionFactory::create() to return std::unique_ptr<>" 2018-02-27 15:54:41 +00:00
test [clangd] Fix unicode handling, using UTF-16 where LSP requires it. 2018-04-27 11:59:28 +00:00
tool-template [CMake] Use PRIVATE in target_link_libraries for executables 2017-12-05 21:49:56 +00:00
unittests [clangd] Fix unicode handling, using UTF-16 where LSP requires it. 2018-04-27 11:59:28 +00:00
.arcconfig [clang-tools-extra] Set up .arcconfig to point to new Diffusion CTE repository 2017-11-27 15:58:25 +00:00
.gitignore
CMakeLists.txt [clang-doc] Reland "[clang-doc] Setup clang-doc frontend framework" 2018-03-22 23:34:46 +00:00
CODE_OWNERS.TXT Updating the code owners list. 2015-09-02 20:00:41 +00:00
LICENSE.TXT Rename the clang-tidy safety module to be hicpp, for the High-Integrity C++ coding standard from PRQA. 2017-03-19 17:23:23 +00:00
README.txt

README.txt

//===----------------------------------------------------------------------===//
// Clang Tools repository
//===----------------------------------------------------------------------===//

Welcome to the repository of extra Clang Tools.  This repository holds tools
that are developed as part of the LLVM compiler infrastructure project and the
Clang frontend.  These tools are kept in a separate "extra" repository to
allow lighter weight checkouts of the core Clang codebase.

This repository is only intended to be checked out inside of a full LLVM+Clang
tree, and in the 'tools/extra' subdirectory of the Clang checkout.

All discussion regarding Clang, Clang-based tools, and code in this repository
should be held using the standard Clang mailing lists:
  http://lists.llvm.org/mailman/listinfo/cfe-dev

Code review for this tree should take place on the standard Clang patch and
commit lists:
  http://lists.llvm.org/mailman/listinfo/cfe-commits

If you find a bug in these tools, please file it in the LLVM bug tracker:
  http://llvm.org/bugs/