Commit Graph

76152 Commits

Author SHA1 Message Date
Owen Anderson c4d245c391 Fix the preprocessor checks used to determine if backtraces have been enabled.
llvm-svn: 227424
2015-01-29 07:53:13 +00:00
Owen Anderson 9253bb933a Use the existing build configuration parameter ENABLE_BACKTRACE to compile out all pretty stack trace support when backtraces are disabled.
This has the nice secondary effect of allowing LLVM to continue to build
for targets without __thread or thread_local support to continue to work
so long as they build without support for backtraces.

llvm-svn: 227423
2015-01-29 07:35:31 +00:00
Simon Atanasyan 99cd1fb012 [ELFYAML] Provide default value 0 for YAML relocation addendum field
Follow up to r227318.

llvm-svn: 227422
2015-01-29 06:56:24 +00:00
Chandler Carruth a1a622746e Remove an unused private field added r227405 to fix a Clang warning.
llvm-svn: 227415
2015-01-29 02:44:53 +00:00
Chandler Carruth fb3139ad9e [LPM] Clean up the use of TLS in pretty stack trace and disable it
entirely when threads are not enabled. This should allow anyone who
needs to bootstrap or cope with a host loader without TLS support to
limp along without threading support.

There is still some bug in the PPC TLS stuff that is not worked around.
I'm getting access to a machine to reproduce and debug this further.
There is some chance that I'll have to add a terrible workaround for
PPC.

There is also some problem with iOS, but I have no ability to really
evaluate what the issue is there. I'm leaving it to folks maintaining
that platform to suggest a path forward -- personally I don't see any
useful path forward that supports threading in LLVM but does so without
support for *very basic* TLS. Note that we don't need more than some
pointers, and we don't need constructors, destructors, or any of the
other fanciness which remains widely unimplemented.

llvm-svn: 227411
2015-01-29 01:23:04 +00:00
Reid Kleckner 8fcb0ad8a6 Remove unused variable
llvm-svn: 227408
2015-01-29 00:55:44 +00:00
Reid Kleckner 1185fced3d Add a Windows EH preparation pass that zaps resumes
If the personality is not a recognized MSVC personality function, this
pass delegates to the dwarf EH preparation pass. This chaining supports
people on *-windows-itanium or *-windows-gnu targets.

Currently this recognizes some personalities used by MSVC and turns
resume instructions into traps to avoid link errors.  Even if cleanups
are not used in the source program, LLVM requires the frontend to emit a
code path that resumes unwinding after an exception.  Clang does this,
and we get unreachable resume instructions. PR20300 covers cleaning up
these unreachable calls to resume.

Reviewers: majnemer

Differential Revision: http://reviews.llvm.org/D7216

llvm-svn: 227405
2015-01-29 00:41:44 +00:00
Eric Christopher 905f12d96d Remove getSubtargetImpl from AArch64ISelLowering and cache the
correct subtarget by passing it in during the constructor as
TargetLowering is Subtarget specific.

llvm-svn: 227402
2015-01-29 00:19:42 +00:00
Eric Christopher 1889fdc142 Remove getSubtargetImpl from ARMISelLowering and cache the
correct subtarget by passing it in during the constructor as
TargetLowering is Subtarget specific.

llvm-svn: 227401
2015-01-29 00:19:39 +00:00
Eric Christopher c125e12261 Small cleanup in ARMFastISel initialization.
llvm-svn: 227400
2015-01-29 00:19:37 +00:00
Eric Christopher 1b21f00904 Migrate ARM except for TTI, AsmPrinter, and frame lowering
away from getSubtargetImpl.

llvm-svn: 227399
2015-01-29 00:19:33 +00:00
Manuel Jacob 6f508c578b Add nullptr checks for TargetSelectionDAGInfo in SelectionDAG.
TSI is not guaranteed be non-null in SelectionDAG.

llvm-svn: 227397
2015-01-28 23:50:40 +00:00
Kostya Serebryany 265cf04f9c [fuzzer] add option -save_minimized_corpus
llvm-svn: 227395
2015-01-28 23:48:39 +00:00
Chandler Carruth b2fe3e5c35 [LPM] Fix the PPC attribute to be spelled 'global-dynamic'. This should
let the build bot make finish compiling stage2.

llvm-svn: 227391
2015-01-28 23:10:57 +00:00
Philip Reames 9198b33b48 Teach SplitBlockPredecessors how to handle landingpad blocks.
Patch by: Igor Laevsky <igor@azulsystems.com>

"Currently SplitBlockPredecessors generates incorrect code in case if basic block we are going to split has a landingpad. Also seems like it is fairly common case among it's users to conditionally call either SplitBlockPredecessors or SplitLandingPadPredecessors. Because of this I think it is reasonable to add this condition directly into SplitBlockPredecessors."

Differential Revision: http://reviews.llvm.org/D7157

llvm-svn: 227390
2015-01-28 23:06:47 +00:00
Kostya Serebryany a8fbcf0c1f Add lit-style tests for the Fuzzer library
Summary: Add test targets and the lit-style runner.

Test Plan: Run the tests on bot.

Reviewers: samsonov

Reviewed By: samsonov

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7217

llvm-svn: 227389
2015-01-28 22:49:25 +00:00
Sanjay Patel 08efcd9039 fix typos; NFC
llvm-svn: 227386
2015-01-28 22:37:32 +00:00
Chris Bieneman b6866425f3 Build fix for Visual Studio. NFC.
llvm-svn: 227385
2015-01-28 22:25:00 +00:00
Colin LeMahieu 4379d10273 [Hexagon] Updating several V5 intrinsics and adding FP tests.
llvm-svn: 227379
2015-01-28 22:08:16 +00:00
Simon Pilgrim 80bd3c9e5f Spelling fixes. NFC.
llvm-svn: 227376
2015-01-28 22:03:52 +00:00
Simon Pilgrim b55bd1e2ac Line endings fix. NFC.
llvm-svn: 227374
2015-01-28 21:56:52 +00:00
Zoran Jovanovic 14c567be90 [mips][microMIPS] Implement SWM and LWM aliases
Differential Revision: http://reviews.llvm.org/D5820

llvm-svn: 227373
2015-01-28 21:52:27 +00:00
Kostya Serebryany e6972a029f [fuzzer] instructions for building/running clang-format-fuzzer
llvm-svn: 227357
2015-01-28 19:51:58 +00:00
Sanjay Patel 4058dd9f3f invert check for less indentation; use local vars to reduce duplication; NFC
llvm-svn: 227355
2015-01-28 19:44:21 +00:00
Colin LeMahieu 1de7e0d923 [Hexagon] Updating many V4 intrinsic patterns. Adding missing instruction and deleting unused classes.
llvm-svn: 227353
2015-01-28 19:39:09 +00:00
Chandler Carruth be09eb75aa [LPM] Try to work around a bug with local-dynamic TLS on PowerPC 64.
Sadly, this precludes optimizing it down to initial-exec or local-exec
when statically linking, and in general makes the code slower on PPC 64,
but there's nothing else for it until we can arrange to produce the
correct bits for the linker.

Lots of thanks to Ulirch for tracking this down and Bill for working on
the long-term fix to LLVM so that we can relegate this to old host
clang versions.

I'll be watching the PPC build bots to make sure this effectively
revives them.

llvm-svn: 227352
2015-01-28 19:29:22 +00:00
Philip Reames 23cf2e2f97 Remove gc.root's performCustomLowering
This is a refactoring to restructure the single user of performCustomLowering as a specific lowering pass and remove the custom lowering hook entirely.

Before this change, the LowerIntrinsics pass (note to self: rename!) was essentially acting as a pass manager, but without being structured in terms of passes. Instead, it proxied calls to a set of GCStrategies internally. This adds a lot of conceptual complexity (i.e. GCStrategies are stateful!) for very little benefit. Since there's been interest in keeping the ShadowStackGC working, I extracting it's custom lowering pass into a dedicated pass and just added that to the pass order. It will only run for functions which opt-in to that gc.

I wasn't able to find an easy way to preserve the runtime registration of custom lowering functionality. Given that no user of this exists that I'm aware of, I made the choice to just remove that. If someone really cares, we can look at restoring it via dynamic pass registration in the future.

Note that despite the large diff, none of the lowering code actual changes. I added the framing needed to make it a pass and rename the class, but that's it.

Differential Revision: http://reviews.llvm.org/D7218

llvm-svn: 227351
2015-01-28 19:28:03 +00:00
Colin LeMahieu 94c33218e3 [Hexagon] Adding XTYPE/MPY intrinsic tests and some missing multiply instructions.
llvm-svn: 227347
2015-01-28 19:16:17 +00:00
Chris Bieneman d1d9430a05 Refactoring llvm command line parsing and option registration.
Summary:
The primary goal of this patch is to remove the need for MarkOptionsChanged(). That goal is accomplished by having addOption and removeOption properly sort the options.

This patch puts the new add and remove functionality on a CommandLineParser class that is a placeholder. Some of the functionality in this class will need to be merged into the OptionRegistry, and other bits can hopefully be in a better abstraction.

This patch also removes the RegisteredOptionList global, and the need for cl::Option objects to be linked list nodes.

The changes in CommandLineTest.cpp are required because these changes shift when we validate that options are not duplicated. Before this change duplicate options were only found during certain cl API calls (like cl::ParseCommandLine). With this change duplicate options are found during option construction.

Reviewers: dexonsmith, chandlerc, pete

Reviewed By: pete

Subscribers: pete, majnemer, llvm-commits

Differential Revision: http://reviews.llvm.org/D7132

llvm-svn: 227345
2015-01-28 19:00:25 +00:00
Colin LeMahieu 19ed07c75a [Hexagon] Deleting a lot of old variants of intrinsics and updating references.
llvm-svn: 227338
2015-01-28 18:29:11 +00:00
Colin LeMahieu 39b846ce0f [Hexagon] Converting XTYPE/BIT intrinsic patterns and adding tests.
llvm-svn: 227335
2015-01-28 18:06:23 +00:00
Sanjay Patel 9bb601856e use SDValue methods directly instead of getNode()->* ; NFCI
llvm-svn: 227334
2015-01-28 18:01:31 +00:00
Rafael Espindola a05b3b73a4 Simplify code. NFC.
llvm-svn: 227333
2015-01-28 17:54:19 +00:00
Colin LeMahieu fe03c9a678 [Hexagon] Replacing XTYPE/SHIFT intrinsic patternss. Adding tests and missing instructions with tests.
llvm-svn: 227330
2015-01-28 17:37:59 +00:00
Jozef Kolek e10a02ecf0 [mips][microMIPS] Implement LWGP instruction
Differential Revision: http://reviews.llvm.org/D6650

llvm-svn: 227325
2015-01-28 17:27:26 +00:00
Colin LeMahieu fdbc5adbb6 [Hexagon] Replacing intrinsics for halfword adds and max/min word/dword.
llvm-svn: 227322
2015-01-28 17:06:40 +00:00
Bjorn Steinbrink a09ac0085d Fix LLVMSetMetadata and LLVMAddNamedMetadataOperand for single value MDNodes
Summary:
MetadataAsValue uses a canonical format that strips the MDNode if it
contains only a single constant value. This triggers an assertion when
trying to cast the value to a MDNode.

Subscribers: llvm-commits

Differential Revision: http://reviews.llvm.org/D7165

llvm-svn: 227319
2015-01-28 16:35:59 +00:00
Michael Kuperstein 90e08320c9 [x32] Change the condition from bitness to LP64 for TCRETURNdi64.
TCRETURNmi64, which was mistakenly changed in r227307 will wait for another day.

llvm-svn: 227317
2015-01-28 16:11:35 +00:00
Tom Stellard 40ce8af4a5 R600: Move DataLayout to AMDGPUTargetMachine
This is a follow up to r227113.

It is now required to use the amdgcn target for SI and newer GPUs.

llvm-svn: 227316
2015-01-28 16:04:26 +00:00
Tom Stellard eba5648ad2 R600: Use a Southern Islands GPU as the default for the amdgcn target
llvm-svn: 227314
2015-01-28 15:38:42 +00:00
Hal Finkel 34c94d5caa Correct the AggressiveAntiDepBreaker's handling of subregisters defining super registers
As the AggressiveAntiDepBreaker iterated backward through a scheduling region,
we must leave super registers live through subregister definitions so that all
relevant subregister definitions are renamed together. The problem was that we
were also discarding sub-register use locations as the sub-registers are
redefined. The result is that we'd rename the super register along with some,
but not all, subregister definitions.

  R0_D = {R0_L, R1_L}
  R0_L = {R0_S, R1_S}

  %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
  %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
  %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
  %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S

  Anti:   %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S
   Def Groups: R4_D=g213->g215(via R4_S)->g214(via R4_L)->g216(via R5_S)->g216(via R4_L)->g217(via R5_L)
   Use Groups: R0_D=g0->g218(last-use) R1_L->g219(last-use) R6_S=g204->g220(last-use)
  Anti:   %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R0_L=g208->g209(via R0_S)->g218(via R0_D)->g210(via R1_S)->g210(via R0_D)
   Antidep reg: R0_L (real dependency)
   Use Groups: R0_L=g210->g224(last-use) R0_S->g225(last-use) R1_S->g226(last-use)
  Anti:   %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R1_L=g219->g210(via R0_D)
   Antidep reg: R1_L (real dependency)
   Use Groups: R1_L=g210->g229(last-use)
  Anti:   %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
   Def Groups: R0_L=g224->g225(via R0_S)->g210(via R0_D)->g226(via R1_S)->g226(via R0_D)
   Antidep reg: R0_L Use Groups: R2_L=g192 R0_S=g226->g230(last-use) R0_L=g226->g231(last-use) R1_S->g232(last-use)
  Anti:   %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
   Def Groups: R1_L=g229->g226(via R0_D)
   Antidep reg: R1_L Use Groups: R1_L=g226->g233(last-use) R0_S=g230
  Anti:   %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
   Def Groups: R0_L=g231->g230(via R0_S)->g226(via R0_D)->g232(via R1_S)->g232(via R0_D)
   Antidep reg: R0_L
   Rename Candidates for Group g232:
    R0_D: elcInt64Regs :: R0_D R1_D R2_D R3_D R4_D R5_D R8_D R9_D R10_D R11_D R12_D R13_D R14_D R15_D R16_D R17_D R18_D R19_D R20_D R21_D R22_D R23_D R24_D R25_D
    R0_L: elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R0_S: elcShrtRegs elcShrtRegs :: R0_S R1_S R2_S R3_S R4_S R5_S R8_S R9_S R10_S R11_S R12_S R13_S R14_S R15_S R16_S R17_S R18_S R19_S R20_S R21_S R22_S R23_S R24_S R25_S
   Find Registers: [R12_D: R12_D R12_L R12_S]
   Breaking anti-dependence edge on R0_L: R0_D->R12_D(1 refs) R0_L->R12_L(2 refs) R0_S->R12_S(2 refs)
   Use Groups:
  ...

  %R12_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R1_L<def> = LSRLrr %R1_L<kill>, %R12_S, pred:8, pred:%noreg
  %R0_L<def> = LSRLrr %R2_L<kill>, %R12_S, pred:8, pred:%noreg, %R12_L<imp-use>
  %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
  %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R12_D<kill>, %R6_S

With this change, we now produce:

  Anti:   %R4_D<def> = ASRDrr %R0_D<kill>, %R6_S
   Def Groups: R4_D=g213->g215(via R4_S)->g214(via R4_L)->g216(via R5_S)->g216(via R4_L)->g217(via R5_L)
   Use Groups: R0_D=g0->g218(last-use) R1_L->g219(last-use) R6_S=g204->g220(last-use)
  Anti:   %R0_L<def> = ANDLri %R0_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R0_L=g208->g209(via R0_S)->g218(via R0_D)->g210(via R1_S)->g210(via R0_D)
   Antidep reg: R0_L (real dependency)
   Use Groups: R0_L=g210
  Anti:   %R1_L<def> = ANDLri %R1_L<kill>, 2047, pred:8, pred:%noreg
   Def Groups: R1_L=g219->g210(via R0_D)
   Antidep reg: R1_L (real dependency)
   Use Groups: R1_L=g210
  Anti:   %R0_L<def> = LSRLrr %R2_L, %R0_S, pred:8, pred:%noreg, %R0_L<imp-use,kill>
   Def Groups: R0_L=g210->g210(via R0_D)->g210(via R0_D)
   Antidep reg: R0_L Use Groups: R2_L=g192 R0_S=g210 R0_L=g210
  Anti:   %R1_L<def> = LSRLrr %R1_L<kill>, %R0_S, pred:8, pred:%noreg
   Def Groups: R1_L=g210->g210(via R0_D)
   Antidep reg: R1_L Use Groups: R1_L=g210 R0_S=g210
  Anti:   %R0_L<def> = TRLi9 16, pred:8, pred:%noreg
   Def Groups: R0_L=g210->g210(via R0_D)->g210(via R0_D)
   Antidep reg: R0_L
   Rename Candidates for Group g210:
    R0_D: elcInt64Regs :: R0_D R1_D R2_D R3_D R4_D R5_D R8_D R9_D R10_D R11_D R12_D R13_D R14_D R15_D R16_D R17_D R18_D R19_D R20_D R21_D R22_D R23_D R24_D R25_D
    R0_L: elcIntRegs elcIntAIRegs elcIntRegs elcIntRegs elcIntRegs elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R1_L: elcIntRegs elcIntRegs elcIntRegs elcIntRegs elcIntRegs :: R0_L R1_L R2_L R3_L R4_L R5_L R8_L R9_L R10_L R11_L R12_L R13_L R14_L R15_L R16_L R17_L R18_L R19_L R20_L R21_L R22_L R23_L R24_L R25_L
    R0_S: elcShrtRegs elcShrtRegs :: R0_S R1_S R2_S R3_S R4_S R5_S R8_S R9_S R10_S R11_S R12_S R13_S R14_S R15_S R16_S R17_S R18_S R19_S R20_S R21_S R22_S R23_S R24_S R25_S
   Find Registers: [R12_D: R12_D R12_L R13_L R12_S]
   Breaking anti-dependence edge on R0_L: R0_D->R12_D(1 refs) R0_L->R12_L(7 refs) R1_L->R13_L(5 refs) R0_S->R12_S(2 refs)
   Use Groups:
  ...

  %R12_L<def> = TRLi9 16, pred:8, pred:%noreg
  %R13_L<def> = LSRLrr %R13_L<kill>, %R12_S, pred:8, pred:%noreg
  %R12_L<def> = LSRLrr %R2_L<kill>, %R12_S<kill>, pred:8, pred:%noreg, %R12_L<imp-use,kill>
  %R13_L<def> = ANDLri %R13_L<kill>, 2047, pred:8, pred:%noreg
  %R12_L<def> = ANDLri %R12_L<kill>, 2047, pred:8, pred:%noreg
  %R4_D<def> = ASRDrr %R12_D, %R6_S, %R12_L<imp-def>, %R12_S<imp-def>, %R13_S<imp-def>

As demonstrated by this example, this is also somewhat unfortunate, because
there is actually no need to rename the super register in this case (it is
fully covered by later subregister definitions), but we don't seem to track
enough information here to exploit that either.

Thanks to Daniil Troshkov for reporting the issue. The debug outputs in this
commit message are from Daniil.

llvm-svn: 227311
2015-01-28 14:44:14 +00:00
Michael Kuperstein 951995821a [X86] Reduce some 32-bit imuls into lea + shl
Reduce integer multiplication by a constant of the form k*2^c, where k is in {3,5,9} into a lea + shl. Previously it was only done for imulq on 64-bit platforms, but it makes sense for imull and 32-bit as well.

Differential Revision: http://reviews.llvm.org/D7196

llvm-svn: 227308
2015-01-28 14:08:22 +00:00
Michael Kuperstein f387611ac2 [x32] Enable sibcall optimization on x32.
This includes two things:
1) Fix TCRETURNdi and TCRETURN64di patterns to check the right thing (LP64 as opposed to target bitness).
2) Allow LEA64_32 in MatchingStackOffset.

llvm-svn: 227307
2015-01-28 13:38:48 +00:00
Elena Demikhovsky 7b0dd39db6 AVX-512: Added FMA intrinsics with rounding mode
By Asaf Badouh and Elena Demikhovsky

Added special nodes for rounding: FMADD_RND, FMSUB_RND..
It will prevent merge between nodes with rounding and other standard nodes.

llvm-svn: 227303
2015-01-28 10:21:27 +00:00
Craig Topper 7d3c6d307a [X86] Teach disassembler to handle illegal immediates on AVX512 integer compare instructions.
llvm-svn: 227302
2015-01-28 10:09:56 +00:00
Craig Topper 6772eac490 [X86] Merge printSSECC and printAVXCC. They only differed by an assertion.
llvm-svn: 227301
2015-01-28 10:09:52 +00:00
Chandler Carruth 16b670ec20 [LPM] Rip all of ManagedStatic and ThreadLocal out of the pretty stack
tracing code.

Managed static was just insane overhead for this. We took memory fences
and external function calls in every path that pushed a pretty stack
frame. This includes a multitude of layers setting up and tearing down
passes, the parser in Clang, everywhere. For the regression test suite
or low-overhead JITs, this was contributing to really significant
overhead.

Even the LLVM ThreadLocal is really overkill here because it uses
pthread_{set,get}_specific logic, and has careful code to both allocate
and delete the thread local data. We don't actually want any of that,
and this code in particular has problems coping with deallocation. What
we want is a single TLS pointer that is valid to use during global
construction and during global destruction, any time we want. That is
exactly what every host compiler and OS we use has implemented for
a long time, and what was standardized in C++11. Even though not all of
our host compilers support the thread_local keyword, we can directly use
the platform-specific keywords to get the minimal functionality needed.
Provided this limited trial survives the build bots, I will move this to
Compiler.h so it is more widely available as a light weight if limited
alternative to the ThreadLocal class. Many thanks to David Majnemer for
helping me think through the implications across platforms and craft the
MSVC-compatible syntax.

The end result is *substantially* faster. When running llc in a tight
loop over a small IR file targeting the aarch64 backend, this improves
its performance by over 10% for me. It also seems likely to fix the
remaining regressions seen by JIT users with threading enabled.

This may actually have more impact on real-world compile times due to
the use of the pretty stack tracing utility throughout the rest of Clang
or LLVM, but I've not collected any detailed measurements.

llvm-svn: 227300
2015-01-28 09:52:14 +00:00
Chandler Carruth 5b0d3e3f3a [LPM] A targeted but somewhat horrible fix to the legacy pass manager's
querying of the pass registry.

The pass manager relies on the static registry of PassInfo objects to
perform all manner of its functionality. I don't understand why it does
much of this. My very vague understanding is that this registry is
touched both during static initialization *and* while each pass is being
constructed. As a consequence it is hard to make accessing it not
require a acquiring some lock. This lock ends up in the hot path of
setting up, tearing down, and invaliditing analyses in the legacy pass
manager.

On most systems you can observe this as a non-trivial % of the time
spent in 'ninja check-llvm'. However, I haven't really seen it be more
than 1% in extreme cases of compiling more real-world software,
including LTO.

Unfortunately, some of the GPU JITs are seeing this taking essentially
all of their time because they have very small IR running through
a small pass pipeline very many times (at least, this is the vague
understanding I have of it).

This patch tries to minimize the cost of looking up PassInfo objects by
leveraging the fact that the objects themselves are immutable and they
are allocated separately on the heap and so don't have their address
change. It also requires a change I made the last time I tried to debug
this problem which removed the ability to de-register a pass from the
registry. This patch creates a single access path to these objects
inside the PMTopLevelManager which memoizes the result of querying the
registry. This is somewhat gross as I don't really know if
PMTopLevelManager is the *right* place to put it, and I dislike using
a mutable member to memoize things, but it seems to work.

For long-lived pass managers this should completely eliminate
the cost of acquiring locks to look into the pass registry once the
memoized cache is warm. For 'ninja check' I measured about 1.5%
reduction in CPU time and in total time on a machine with 32 hardware
threads. For normal compilation, I don't know how much this will help,
sadly. We will still pay the cost while we populate the memoized cache.
I don't think it will hurt though, and for LTO or compiles with many
small functions it should still be a win. However, for tight loops
around a pass manager with many passes and small modules, this will help
tremendously. On the AArch64 backend I saw nearly 50% reductions in time
to complete 2000 cycles of spinning up and tearing down the pipeline.
Measurements from Owen of an actual long-lived pass manager show more
along the lines of 10% improvements.

Differential Revision: http://reviews.llvm.org/D7213

llvm-svn: 227299
2015-01-28 09:47:21 +00:00
Elena Demikhovsky 45f0448081 Fold fcmp in cases where value is provably non-negative. By Arch Robison.
This patch folds fcmp in some cases of interest in Julia. The patch adds a function CannotBeOrderedLessThanZero that returns true if a value is provably not less than zero. I.e. the function returns true if the value is provably -0, +0, positive, or a NaN. The patch extends InstructionSimplify.cpp to fold instances of fcmp where:
 - the predicate is olt or uge
 - the first operand is provably not less than zero
 - the second operand is zero
The motivation for handling these cases optimizing away domain checks for sqrt in Julia for common idioms such as sqrt(x*x+y*y)..

http://reviews.llvm.org/D6972

llvm-svn: 227298
2015-01-28 08:03:58 +00:00
Chandler Carruth b81dfa6378 [LPM] Stop using the string based preservation API. It is an
abomination.

For starters, this API is incredibly slow. In order to lookup the name
of a pass it must take a memory fence to acquire a pointer to the
managed static pass registry, and then potentially acquire locks while
it consults this registry for information about what passes exist by
that name. This stops the world of LLVMs in your process no matter
how little they cared about the result.

To make this more joyful, you'll note that we are preserving many passes
which *do not exist* any more, or are not even analyses which one might
wish to have be preserved. This means we do all the work only to say
"nope" with no error to the user.

String-based APIs are a *bad idea*. String-based APIs that cannot
produce any meaningful error are an even worse idea. =/

I have a patch that simply removes this API completely, but I'm hesitant
to commit it as I don't really want to perniciously break out-of-tree
users of the old pass manager. I'd rather they just have to migrate to
the new one at some point. If others disagree and would like me to kill
it with fire, just say the word. =]

llvm-svn: 227294
2015-01-28 04:57:56 +00:00