Commit Graph

133 Commits

Author SHA1 Message Date
David Majnemer 8f6b04cb57 llvm-objdump: Handle BSS sections larger than the object file
The size of the uninitialized sections, like BSS, can exceed the size of
the object file.

Do not attempt to grab the contents of such sections.

llvm-svn: 212953
2014-07-14 16:20:14 +00:00
Rafael Espindola cbc5ac7a7e Move CFG building code to a new lib/MC/MCAnalysis library.
The new library is 150KB on a Release+Asserts build, so it is quiet a bit of
code that regular users of MC don't need to link with now.

llvm-svn: 212209
2014-07-02 19:49:34 +00:00
Alp Toker e69170a110 Revert "Introduce a string_ostream string builder facilty"
Temporarily back out commits r211749, r211752 and r211754.

llvm-svn: 211814
2014-06-26 22:52:05 +00:00
Alp Toker 614717388c Introduce a string_ostream string builder facilty
string_ostream is a safe and efficient string builder that combines opaque
stack storage with a built-in ostream interface.

small_string_ostream<bytes> additionally permits an explicit stack storage size
other than the default 128 bytes to be provided. Beyond that, storage is
transferred to the heap.

This convenient class can be used in most places an
std::string+raw_string_ostream pair or SmallString<>+raw_svector_ostream pair
would previously have been used, in order to guarantee consistent access
without byte truncation.

The patch also converts much of LLVM to use the new facility. These changes
include several probable bug fixes for truncated output, a programming error
that's no longer possible with the new interface.

llvm-svn: 211749
2014-06-26 00:00:48 +00:00
Rafael Espindola ae460027a4 Convert the Archive API to use ErrorOr.
Now that we have c++11, even things like ErrorOr<std::unique_ptr<...>> are
easy to use.

No intended functionality change.

llvm-svn: 211033
2014-06-16 16:08:36 +00:00
Rafael Espindola 4453e42945 Remove 'using std::error_code' from tools.
llvm-svn: 210876
2014-06-13 03:07:50 +00:00
Rafael Espindola 3acea39853 Don't use 'using std::error_code' in include/llvm.
This should make sure that most new uses use the std prefix.

llvm-svn: 210835
2014-06-12 21:46:39 +00:00
Rafael Espindola a6e9c3e43a Remove system_error.h.
This is a minimal change to remove the header. I will remove the occurrences
of "using std::error_code" in a followup patch.

llvm-svn: 210803
2014-06-12 17:38:55 +00:00
Craig Topper e6cb63e471 [C++] Use 'nullptr'. Tools edition.
llvm-svn: 207176
2014-04-25 04:24:47 +00:00
Saleem Abdulrasool 98938f1e0a objdump: identify WoA WinCOFF/ARM correctly
Since LLVM currently only supports WinCOFF, assume that the input is WinCOFF
rather than another type of COFF file (ECOFF/XCOFF).  If the architecture is
detected as thumb (e.g. the file has a IMAGE_FILE_MACHINE_ARMNT magic) then use
a triple of thumbv7-windows.

This allows for objdump to properly handle WoA object files without having to
specify the target triple manually.

llvm-svn: 206446
2014-04-17 06:17:23 +00:00
Lang Hames a1bc0f5662 [MC] Require an MCContext when constructing an MCDisassembler.
This patch re-introduces the MCContext member that was removed from
MCDisassembler in r206063, and requires that an MCContext be passed in at
MCDisassembler construction time. (Previously the MCContext member had been
initialized in an ad-hoc fashion after construction). The MCCContext member
can be used by MCDisassembler sub-classes to construct constant or
target-specific MCExprs.

This patch updates disassemblers for in-tree targets, and provides the
MCRegisterInfo instance that some disassemblers were using through the
MCContext (previously those backends were constructing their own
MCRegisterInfo instances).

llvm-svn: 206241
2014-04-15 04:40:56 +00:00
Saleem Abdulrasool 13a3f6914b tools: fix heap-buffer-overrun detected via ASAN
Once the auxiliary fields relating to the filename have been inspected, any
following auxiliary fields need not be visited as they have been consumed (the
following fields comprise the filepath as a single unit).

Adjust the test to catch this even if ASAN is not enabled.

llvm-svn: 206190
2014-04-14 16:38:25 +00:00
Saleem Abdulrasool 7050eedb61 tools: simplify symbol handling in objdump
Rather than switching behaviour on whether a previous symbol has an auxiliary
symbol record for the next count of elements, simply iterate over the auxiliary
symbols right after processing the current symbol entry.  This makes the
behaviour much simpler to follow and similar to llvm-readobj and yaml2obj.

llvm-svn: 206146
2014-04-14 02:37:28 +00:00
Saleem Abdulrasool d38c6b1e4b tools: address possible non-null terminated filenames
If a filename is a multiple of 18 characters, there will be no null-terminator.
This will result in an invalid access by the constructed StringRef.  Add a test
case to exercise this and fix that handling.  Address this same vulnerability in
llvm-readobj as well.

llvm-svn: 206145
2014-04-14 02:37:23 +00:00
Saleem Abdulrasool 63a0dd6540 tools: avoid a string duplication
The auxiliary file records are contiguous and only contain the filename.
Construct a StringRef directly rather than copying to a temporary buffer.

Suggested by majnemer on IRC!

llvm-svn: 206139
2014-04-13 22:54:11 +00:00
Saleem Abdulrasool 9ede5c7dd0 tools: teach objdump about FILE aux records
Add support for file auxiliary symbol entries in COFF symbol tables.  A COFF
symbol table with a FILE entry is followed by sizeof(__FILE__) / 18 auxiliary
symbol records which contain the filename.  Read them and form the original
filename that the record contains.  Then display the name in the output.

llvm-svn: 206126
2014-04-13 03:11:08 +00:00
Lang Hames eb37092342 Update MCSymbolizer and its subclasses' constructors to reflect the fact that
they take ownership of the RelocationInfo they're constructed with.

llvm-svn: 204891
2014-03-27 02:39:01 +00:00
Greg Fitzgerald 1843227551 llvm-objdump output hex to match binutils' objdump
Patch by Ted Woodward

llvm-svn: 204409
2014-03-20 22:55:15 +00:00
David Majnemer ddf28f2b79 Object: Provide a richer means of describing auxiliary symbols
The current state of affairs has auxiliary symbols described as a big
bag of bytes. This is less than satisfying, it detracts from the YAML
file as being human readable.

Instead, allow for symbols to optionally contain their auxiliary data.
This allows us to have a much higher level way of describing things like
weak symbols, function definitions and section definitions.

This depends on D3105.

Differential Revision: http://llvm-reviews.chandlerc.com/D3092

llvm-svn: 204214
2014-03-19 04:47:47 +00:00
Rui Ueyama 4e39f717ff Use early returns to reduce nesting.
llvm-svn: 204171
2014-03-18 18:58:51 +00:00
Alexey Samsonov 464d2e448b [C++11] Introduce ObjectFile::symbols() to use range-based loops.
Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3081

llvm-svn: 204031
2014-03-17 07:28:19 +00:00
Alexey Samsonov aa4d29571c [C++11] Introduce SectionRef::relocations() to use range-based loops
Reviewers: rafael

Reviewed By: rafael

CC: llvm-commits

Differential Revision: http://llvm-reviews.chandlerc.com/D3077

llvm-svn: 203927
2014-03-14 14:22:49 +00:00
Alexey Samsonov 48803e5ca9 [C++11] Use ObjectFile::sections() in commandline llvm tools
llvm-svn: 203802
2014-03-13 14:37:36 +00:00
Ahmed Charles df17c83fa8 Change MCDisassembler::setSymbolizer to take unique_ptr by value.
This changes the interface to be more explicit that ownership is being
transferred.

llvm-svn: 203223
2014-03-07 09:38:02 +00:00
Saleem Abdulrasool 35476334e9 Support: split object format out of environment
This is a preliminary setup change to support a renaming of Windows target
triples.  Split the object file format information out of the environment into a
separate entity.  Unfortunately, file format was previously treated as an
environment with an unknown OS.  This is most obvious in the ARM subtarget where
the handling for macho on an arbitrary platform switches to AAPCS rather than
APCS (as per Apple's needs).

llvm-svn: 203160
2014-03-06 20:47:11 +00:00
Ahmed Charles 56440fd820 Replace OwningPtr<T> with std::unique_ptr<T>.
This compiles with no changes to clang/lld/lldb with MSVC and includes
overloads to various functions which are used by those projects and llvm
which have OwningPtr's as parameters. This should allow out of tree
projects some time to move. There are also no changes to libs/Target,
which should help out of tree targets have time to move, if necessary.

llvm-svn: 203083
2014-03-06 05:51:42 +00:00
Simon Atanasyan 2b614e1163 llvm-objdump: Do not attempt to disassemble symbols outside of section
boundaries.

It is possible to create an ELF executable where symbol from say .text
section 'points' to the address outside the section boundaries. It does
not have a sense to disassemble something outside the section.

Without this fix llvm-objdump prints finite or infinite (depends on
the executable file architecture) number of 'invalid instruction
encoding' warnings.

llvm-svn: 202083
2014-02-24 22:12:11 +00:00
Rafael Espindola 90c7f1cc16 Replace the F_Binary flag with a F_Text one.
After this I will set the default back to F_None. The advantage is that
before this patch forgetting to set F_Binary would corrupt a file on windows.
Forgetting to set F_Text produces one that cannot be read in notepad, which
is a better failure mode :-)

llvm-svn: 202052
2014-02-24 18:20:12 +00:00
Rafael Espindola 7dbcdd08c2 Don't make F_None the default.
This will make it easier to switch the default to being binary files.

llvm-svn: 202042
2014-02-24 15:07:20 +00:00
Rafael Espindola b5155a572f Change the begin and end methods in ObjectFile to match the style guide.
llvm-svn: 201108
2014-02-10 20:24:04 +00:00
Rafael Espindola 20122a436c Simplify getSymbolFlags.
None of the object formats require extra parsing to compute these flags,
so the method cannot fail.

llvm-svn: 200574
2014-01-31 20:57:12 +00:00
Rafael Espindola 5e812afaeb Simplify the handling of iterators in ObjectFile.
None of the object file formats reported error on iterator increment. In
retrospect, that is not too surprising: no object format stores symbols or
sections in a linked list or other structure that requires chasing pointers.
As a consequence, all error checking can be done on begin() and end().

This reduces the text segment of bin/llvm-readobj in my machine from 521233 to
518526 bytes.

llvm-svn: 200442
2014-01-30 02:49:50 +00:00
Mark Seaborn 0929d3d855 Fix "llvm-objdump -d -r" to show relocations inline for ELF files
This fixes a regression introduced by r182908, which broke
llvm-objdump's ability to display relocations inline in a disassembly
dump for ELF object files.

That change removed a SectionRelocMap from Object/ELF.h, which we
recreate in llvm-objdump.cpp.

I discovered this regression via an out-of-tree test
(test/NaCl/X86/pnacl-hides-sandbox-x86-64.ll) which used llvm-objdump.

Note that the "Unknown" string in the test output on i386 isn't quite
right, but this appears to be a pre-existing bug.

Differential Revision: http://llvm-reviews.chandlerc.com/D2559

llvm-svn: 200090
2014-01-25 17:38:19 +00:00
Mark Seaborn eb03ac50ed llvm-objdump: Some style cleanups to follow LLVM coding style
Rename "ec" to "EC", and rename some iterators.

Then fix whitespace using clang-format-diff.

(As requested in http://llvm-reviews.chandlerc.com/D2559)

Differential Revision: http://llvm-reviews.chandlerc.com/D2594

llvm-svn: 200053
2014-01-25 00:32:01 +00:00
Rafael Espindola 23a9750c47 Rename these methods to match the style guide.
llvm-svn: 199751
2014-01-21 16:09:45 +00:00
Rafael Espindola 63da295045 Return an ErrorOr<Binary *> from createBinary.
I did write a version returning ErrorOr<OwningPtr<Binary> >, but it is too
cumbersome to use without std::move. I will keep the patch locally and submit
when we switch to c++11.

llvm-svn: 199326
2014-01-15 19:37:43 +00:00
Rui Ueyama c2bed42904 Re-submit r191472 with a fix for big endian.
llvm-objdump: Dump COFF import table if -private-headers option is given.
llvm-svn: 191557
2013-09-27 21:04:00 +00:00
Rui Ueyama 333d28a0bb Revert "llvm-objdump: Dump COFF import table if -private-headers option is given."
This reverts commit r191472 because it's failing on BE machine.

llvm-svn: 191480
2013-09-27 01:29:36 +00:00
Rui Ueyama 5b1adbaad9 llvm-objdump: Dump COFF import table if -private-headers option is given.
This is a patch to add capability to llvm-objdump to dump COFF Import Table
entries, so that we can write tests for LLD checking Import Table contents.

llvm-objdump did not print anything but just file name if the format is COFF
and -private-headers option is given. This is a patch adds capability for
dumping DLL Import Table, which is specific to the COFF format.

In this patch I defined a new iterator to iterate over import table entries.
Also added a few functions to COFFObjectFile.cpp to access fields of the entry.

Differential Revision: http://llvm-reviews.chandlerc.com/D1719

llvm-svn: 191472
2013-09-27 00:07:01 +00:00
Ahmed Bougacha d56f705d87 Add basic YAML MC CFG testcase.
Drive-by llvm-objdump cleanup (don't hardcode ToolName).

llvm-svn: 188904
2013-08-21 16:13:25 +00:00
Ahmed Bougacha 1792647942 MC CFG: Add YAML MCModule representation to enable MC CFG testing.
Like yaml ObjectFiles, this will be very useful for testing the MC CFG
implementation (mostly MCObjectDisassembler), by matching the output
with YAML, and for potential users of the MC CFG, by using it as an input.

There isn't much to the actual format, it is just a serialization of the
MCModule class. Of note:
  - Basic block references (pred/succ, ..) are represented by the BB's
    start address.
  - Just as in the MC CFG, instructions are MCInsts with a size.
  - Operands have a prefix representing the type (only register and
    immediate supported here).
  - Instruction opcodes are represented by their names; enum values aren't
    stable, enum names mostly are: usually, a change to a name would need
    lots of changes in the backend anyway.
    Same with registers.

All in all, an example is better than 1000 words, here goes:

A simple binary:

  Disassembly of section __TEXT,__text:
  _main:
  100000f9c:      48 8b 46 08             movq    8(%rsi), %rax
  100000fa0:      0f be 00                movsbl  (%rax), %eax
  100000fa3:      3b 04 25 48 00 00 00    cmpl    72, %eax
  100000faa:      0f 8c 07 00 00 00       jl      7 <.Lend>
  100000fb0:      2b 04 25 48 00 00 00    subl    72, %eax
  .Lend:
  100000fb7:      c3                      ret

And the (pretty verbose) generated YAML:

  ---
  Atoms:
    - StartAddress:    0x0000000100000F9C
      Size:            20
      Type:            Text
      Content:
        - Inst:            MOV64rm
          Size:            4
          Ops:             [ RRAX, RRSI, I1, R, I8, R ]
        - Inst:            MOVSX32rm8
          Size:            3
          Ops:             [ REAX, RRAX, I1, R, I0, R ]
        - Inst:            CMP32rm
          Size:            7
          Ops:             [ REAX, R, I1, R, I72, R ]
        - Inst:            JL_4
          Size:            6
          Ops:             [ I7 ]
    - StartAddress:    0x0000000100000FB0
      Size:            7
      Type:            Text
      Content:
        - Inst:            SUB32rm
          Size:            7
          Ops:             [ REAX, REAX, R, I1, R, I72, R ]
    - StartAddress:    0x0000000100000FB7
      Size:            1
      Type:            Text
      Content:
        - Inst:            RET
          Size:            1
          Ops:             [  ]
  Functions:
    - Name:            __text
      BasicBlocks:
        - Address:         0x0000000100000F9C
          Preds:           [  ]
          Succs:           [ 0x0000000100000FB7, 0x0000000100000FB0 ]
     <snip>
  ...

llvm-svn: 188890
2013-08-21 07:29:02 +00:00
Bill Wendling bc07a8900c Use pointers to the MCAsmInfo and MCRegInfo.
Someone may want to do something crazy, like replace these objects if they
change or something.

No functionality change intended.

llvm-svn: 184175
2013-06-18 07:20:20 +00:00
NAKAMURA Takumi d5c2e60b19 llvm-objdump.cpp: Appease MSC16 x64. utostr(n++) causes internal compiler error.
llvm-svn: 182722
2013-05-27 00:02:48 +00:00
Ahmed Bougacha aa79068157 MC: Disassembled CFG reconstruction.
This patch builds on some existing code to do CFG reconstruction from
a disassembled binary:
- MCModule represents the binary, and has a list of MCAtoms.
- MCAtom represents either disassembled instructions (MCTextAtom), or
  contiguous data (MCDataAtom), and covers a specific range of addresses.
- MCBasicBlock and MCFunction form the reconstructed CFG. An MCBB is
  backed by an MCTextAtom, and has the usual successors/predecessors.
- MCObjectDisassembler creates a module from an ObjectFile using a
  disassembler. It first builds an atom for each section. It can also
  construct the CFG, and this splits the text atoms into basic blocks.

MCModule and MCAtom were only sketched out; MCFunction and MCBB were
implemented under the experimental "-cfg" llvm-objdump -macho option.
This cleans them up for further use; llvm-objdump -d -cfg now generates
graphviz files for each function found in the binary.

In the future, MCObjectDisassembler may be the right place to do
"intelligent" disassembly: for example, handling constant islands is just
a matter of splitting the atom, using information that may be available
in the ObjectFile. Also, better initial atom formation than just using
sections is possible using symbols (and things like Mach-O's
function_starts load command).

This brings two minor regressions in llvm-objdump -macho -cfg:
- The printing of a relocation's referenced symbol.
- An annotation on loop BBs, i.e., which are their own successor.

Relocation printing is replaced by the MCSymbolizer; the basic CFG
annotation will be superseded by more related functionality.

llvm-svn: 182628
2013-05-24 01:07:04 +00:00
Ahmed Bougacha ad1084de84 Add MCSymbolizer for symbolic/annotated disassembly.
This is a basic first step towards symbolization of disassembled
instructions. This used to be done using externally provided (C API)
callbacks. This patch introduces:
- the MCSymbolizer class, that mimics the same functions that were used
  in the X86 and ARM disassemblers to symbolize immediate operands and
  to annotate loads based off PC (for things like c string literals).
- the MCExternalSymbolizer class, which implements the old C API.
- the MCRelocationInfo class, which provides a way for targets to
  translate relocations (either object::RelocationRef, or disassembler
  C API VariantKinds) to MCExprs.
- the MCObjectSymbolizer class, which does symbolization using what it
  finds in an object::ObjectFile. This makes simple symbolization (with
  no fancy relocation stuff) work for all object formats!
- x86-64 Mach-O and ELF MCRelocationInfos.
- A basic ARM Mach-O MCRelocationInfo, that provides just enough to
  support the C API VariantKinds.

Most of what works in otool (the only user of the old symbolization API
that I know of) for x86-64 symbolic disassembly (-tvV) works, namely:
- symbol references: call _foo; jmp 15 <_foo+50>
- relocations:       call _foo-_bar; call _foo-4
- __cf?string:       leaq 193(%rip), %rax ## literal pool for "hello"
Stub support is the main missing part (because libObject doesn't know,
among other things, about mach-o indirect symbols).

As for the MCSymbolizer API, instead of relying on the disassemblers
to call the tryAdding* methods, maybe this could be done automagically
using InstrInfo? For instance, even though PC-relative LEAs are used
to get the address of string literals in a typical Mach-O file, a MOV
would be used in an ELF file. And right now, the explicit symbolization
only recognizes PC-relative LEAs. InstrInfo should have already have
most of what is needed to know what to symbolize, so this can
definitely be improved.

I'd also like to remove object::RelocationRef::getValueString (it seems
only used by relocation printing in objdump), as simply printing the
created MCExpr is definitely enough (and cleaner than string concats).

llvm-svn: 182625
2013-05-24 00:39:57 +00:00
Ahmed Bougacha 0835ca12ef llvm-objdump: Initialize MCDisassembler once instead of for each section.
llvm-svn: 182054
2013-05-16 21:28:23 +00:00
Rafael Espindola 227144c23c Remove the MachineMove class.
It was just a less powerful and more confusing version of
MCCFIInstruction. A side effect is that, since MCCFIInstruction uses
dwarf register numbers, calls to getDwarfRegNum are pushed out, which
should allow further simplifications.

I left the MachineModuleInfo::addFrameMove interface unchanged since
this patch was already fairly big.

llvm-svn: 181680
2013-05-13 01:16:13 +00:00
Rafael Espindola 1e48387962 Clarify getRelocationAddress x getRelocationOffset a bit.
getRelocationAddress is for dynamic libraries and executables,
getRelocationOffset for relocatable objects.

Mark the getRelocationAddress of COFF and MachO as not implemented yet. Add a
test of ELF's. llvm-readobj -r now prints the same values as readelf -r.

llvm-svn: 180259
2013-04-25 12:28:45 +00:00
Rafael Espindola 56f976f6bd At Jim Grosbach's request detemplate Object/MachO.h.
We are still able to handle mixed endian objects by swapping one struct at a
time.

llvm-svn: 179778
2013-04-18 18:08:55 +00:00
Alexey Samsonov 209095cd9f llvm-objdump: Don't print contents of BSS sections: it makes no sense and crashes llvm-objdump on relocated objects with large bss
llvm-svn: 179589
2013-04-16 10:53:11 +00:00