[Clang] Fix invalid utf-8 detection

The length of valid codepoints was incorrectly
calculated which was not caught before because the
absence of tests for the valid codepoints scenario.

Differential Revision: https://reviews.llvm.org/D129223
This commit is contained in:
Corentin Jabot 2022-07-06 22:16:22 +02:00
parent 39ed08f8d4
commit bf45e27a67
2 changed files with 12 additions and 1 deletions

View File

@ -25,3 +25,14 @@
// abcd
// €abcd
// expected-warning@-1 {{invalid UTF-8 in comment}}
//§ § § 😀 你好 ©
/*§ § § 😀 你好 ©*/
/*
§ § § 😀 ©
*/
/* § § § 😀 你好 © */

View File

@ -423,7 +423,7 @@ Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd) {
*/
unsigned getUTF8SequenceSize(const UTF8 *source, const UTF8 *sourceEnd) {
int length = trailingBytesForUTF8[*source] + 1;
return (length > sourceEnd - source && isLegalUTF8(source, length)) ? length
return (length < sourceEnd - source && isLegalUTF8(source, length)) ? length
: 0;
}