Commit Graph

8 Commits

Author SHA1 Message Date
Nico Weber 8d05eb8556 llvm-undname: Fix assert-on->4GiB-string-literal, found by oss-fuzz
llvm-svn: 359109
2019-04-24 16:09:38 +00:00
Nico Weber ce67a41741 llvm-undname: Fix hex escapes in wchar_t, char16_t, char32_t strings
llvm-undname used to put '\x' in front of every pair of nibbles, but
u"\xD7\xFF" produces a string with 6 bytes: \xD7 \0 \xFF \0 (and \0\0). Correct
for a single character (plus terminating \0) is u\xD7FF instead.
Now, wchar_t, char16_t, and char32_t strings roundtrip from source to
clang-cl (and cl.exe) and then llvm-undname.

(...at least as long as it's not a string like L"\xD7FF" L"foo" which
gets demangled as L"\xD7FFfoo", where the compiler then considers the
"f" as part of the hex escape. That seems ok.)

Also add a comment saying that the "almost-valid" char32_t string I
added in my last commit is actually produced by compilers.

llvm-svn: 358857
2019-04-21 17:19:27 +00:00
Nico Weber 8fc9902bbb llvm-undname: Fix stack overflow on almost-valid
If a unsigned with all 4 bytes non-0 was passed to outputHex(), there
were two off-by-ones in it:

- Both MaxPos and Pos left space for the final \0, which left the buffer
  one byte to small. Set MaxPos to 16 instead of 15 to fix.

- The `assert(Pos >= 0);` was after a `Pos--`, move it up one line.

Since valid Unicode codepoints are <= 0x10ffff, this could never really
happen in practice.

Found by oss-fuzz.

llvm-svn: 358856
2019-04-21 16:58:25 +00:00
Nico Weber 8eeaf5178d llvm-undname: Improve string literal demangling with embedded \0 chars
- Don't assert when a string looks like a u32 string to the heuristic
  but doesn't have a length that's 0 mod 4.  Instead, classify those
  as u16 with embedded \0 chars. Found by oss-fuzz.
- Print embedded nul bytes as \0 instead of \x00.

llvm-svn: 358835
2019-04-20 23:59:06 +00:00
Nico Weber dfc08baceb [MS demangler] Use a slightly shorter unmangling for mangled strings.
Before: const wchar_t * {L"%"}
Now: L"%"

See also PR39593.
Differential Revision: https://reviews.llvm.org/D54294

llvm-svn: 346544
2018-11-09 19:28:50 +00:00
Zachary Turner 32a8a2028c [MS Demangler] Fix several crashes and demangling bugs.
These bugs were found by writing a Python script which spidered
the entire Chromium build directory tree demangling every symbol
in every object file.  At the start, the tool printed:

  Processed 27443 object files.
  2926377/2936108 symbols successfully demangled (99.6686%)
  9731 symbols could not be demangled (0.3314%)
  14589 files crashed while demangling (53.1611%)

After this patch, it prints:

  Processed 27443 object files.
  41295518/41295617 symbols successfully demangled (99.9998%)
  99 symbols could not be demangled (0.0002%)
  0 files crashed while demangling (0.0000%)

The issues fixed in this patch are:

  * Ignore empty parameter packs.  Previously we would encounter
    a mangling for an empty parameter pack and add a null node
    to the AST.  Since we don't print these anyway, we now just
    don't add anything to the AST and ignore it entirely.  This
    fixes some of the crashes.

  * Account for "incorrect" string literal demanglings.  Apparently
    an older version of clang would not truncate mangled string
    literals to 32 bytes of encoded character data.  The demangling
    code however would allocate a 32 byte buffer thinking that it
    would not encounter more than this, and overrun the buffer.
    We now demangle up to 128 bytes of data, since the buggy
    clang would encode up to 32 *characters* of data.

  * Extended support for demangling init-fini stubs.  If you had
    something like
      struct Foo {
        static vector<string> S;
      };
    this would generate a dynamic atexit initializer *for the
    variable*.  We didn't handle this, but now we print something
    nice.  This is actually an improvement over undname, which will
    fail to demangle this at all.

  * Fixed one case of static this adjustment.  We weren't handling
    several thunk codes so we didn't recognize the mangling.  These
    are now handled.

  * Fixed a back-referencing problem.  Member pointer templates
    should have their components considered for back-referencing

The remaining 99 symbols which can't be demangled are all symbols
which are compiler-generated and undname can't demangle either.

llvm-svn: 341000
2018-08-29 23:56:09 +00:00
Zachary Turner 3461bfaa9c [MS Demangler] Rework the way operators are demangled.
Previously, some of the code for actually parsing mangled
operator names was more like formatting code in nature,
and was interspersed with the demangling code which builds
the AST.  This means that by the time we got to the printing
code, we had lost all information about what type of operator
we had, and all we were left with was a string that we just
had to print.  However, not all operators are actually even
operators.  it's basically just a catch-all mangling for
"special names", and for some of the other types it helps
to know when we're actually doing the printing what it is.

This patch changes the way things work by introducing an
OperatorInfo structure and corresponding enumeration.  When
we demangle we store the enumeration value and demangled
components separately.  This gives more flexibility during
printing.

In doing so, some demanglings of special names which we didn't
previously support come out of this for free, so we now demangle
those.

A few are more complex and are better left for a followup patch
though.

An exhaustive test of every possible operator code is included,
with the ones that don't yet work commented out.

llvm-svn: 340046
2018-08-17 16:14:05 +00:00
Zachary Turner 970fdc3236 [MS Demangler] Demangle string literals.
When demangling string literals, Microsoft's undname
simply prints 'string'.  This patch implements string
literal demangling while doing a bit better than this
by decoding as much of the string as possible and
trying to faithfully reproduce the original string
literal definition.

This is a bit tricky because the different character
types char, char16_t, and char32_t are not uniquely
identified by the mangling, so we have to use a
heuristic to try to guess the character type.  But
it works pretty well, and many tests are added to
illustrate the behavior.

Differential Revision: https://reviews.llvm.org/D50806

llvm-svn: 339892
2018-08-16 16:17:36 +00:00