Summary:
Use the SP32 physical register as the base for FrameIndex
lowering. Update it and the __stack_pointer global var in the prolog and
epilog. Extend the mapping of virtual registers to wasm locals to
include the physical registers.
Rather than modify the target-independent PrologEpilogInserter (which
asserts that there are no virtual registers left) include a
slightly-modified copy for Wasm that does not have this assertion and
only clears the virtual registers if scavenging was needed (which of
course it isn't for wasm).
Differential Revision: http://reviews.llvm.org/D15344
llvm-svn: 255392
Summary: This patch adds support of conversion (mul x, 2^N + 1) => (add (shl x, N), x) and (mul x, 2^N - 1) => (sub (shl x, N), x) if the multiplication can not be converted to LEA + SHL or LEA + LEA. LLVM has already supported this on ARM, and it should also be useful on X86. Note the patch currently only applies to cases where the constant operand is positive, and I am planing to add another patch to support negative cases after this.
Reviewers: craig.topper, RKSimon
Subscribers: aemerson, llvm-commits
Differential Revision: http://reviews.llvm.org/D14603
llvm-svn: 255391
DenseMap is the wrong data structure to use for sample records and call
sites. The keys are too large, causing massive core memory growth when
reading profiles.
Before this patch, a 21Mb input profile was causing the compiler to grow
to 3Gb in memory. By switching to std::map, the compiler now grows to
300Mb in memory.
There still are some opportunities for memory footprint reduction. I'll
be looking at those next.
llvm-svn: 255389
After much discussion, ending here:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315620.html
it has been decided that, instead of having the vectorizer directly generate
special absdiff and horizontal-add intrinsics, we'll recognize the relevant
reduction patterns during CodeGen. Accordingly, these intrinsics are not needed
(the operations they represent can be pattern matched, as is already done in
some backends). Thus, we're backing these out in favor of the current
development work.
r248483 - Codegen: Fix llvm.*absdiff semantic.
r242546 - [ARM] Use [SU]ABSDIFF nodes instead of intrinsics for VABD/VABA
r242545 - [AArch64] Use [SU]ABSDIFF nodes instead of intrinsics for ABD/ABA
r242409 - [Codegen] Add intrinsics 'absdiff' and corresponding SDNodes for absolute difference operation
llvm-svn: 255387
I am seeing disappointing clang performance on a large PowerPC64
Linux box. GetRandomNumberSeed() does a buffered read from
/dev/urandom to seed its PRNG. As a result we read an entire page
even though we only need 4 bytes.
With every clang task reading a page worth of /dev/urandom we
end up spending a large amount of time stuck on kernel spinlock.
Patch by Anton Blanchard!
llvm-svn: 255386
have a nested name specifier. Strictly speaking, forward declarations of class
template partial specializations are not permitted at all, but that seems like
an obvious wording defect, and if we allow them without a nested name specifier
we should also allow them with a nested name specifier.
llvm-svn: 255383
There's no way to make a flag alias to two flags, so add a /WCL4 flag that
maps to the All, Extra diag groups. Fixes PR25563.
http://reviews.llvm.org/D15350
llvm-svn: 255382
If we don't filter these out we can end up, generating bogus paths, for example:
/home/user/lld/build/bin -> /home/user/home/user/lld/build/bin/lld/build/bin.
llvm-svn: 255378
This change set includes all changes to make the code conform to the OMP 4.5 specification:
* Removed hint / hinted_init definitions from include/40 files
* Hint values are powers of 2 to enable composition (4.5 spec)
* Hinted lock initialization functions were renamed (4.5 spec)
kmp_init_lock_hinted -> omp_init_lock_with_hint
kmp_init_nest_lock_hinted -> omp_init_nest_lock_with_hint
* __kmpc_critical_section_with_hint was added to support a critical section with
a hint (4.5 spec)
* __kmp_map_hint_to_lock was added to convert a hint (possibly a composite) to
an internal lock type
* kmpc_init_lock_with_hint and kmpc_init_nest_lock_with_hint were added as
internal entries for the hinted lock initializers. The preivous internal
functions (__kmp_init*) were moved to kmp_csupport.c and reused in multiple
places
* Added the two init functions to dllexports
* KMP_USE_DYNAMIC_LOCK is turned on if OMP_41_ENABLED is turned on
Differential Revision: http://reviews.llvm.org/D15205
llvm-svn: 255376
* Added a new user TSX lock implementation, RTM, This implementation is a
light-weight version of the adaptive lock implementation, omitting the
back-off logic for deciding when to specualte (or not). The fall-back lock is
still the queuing lock.
* Changed indirect lock table management. The data for indirect lock management
was encapsulated in the "kmp_indirect_lock_table_t" type. Also, the lock table
dimension was changed to 2D (was linear), and each entry is a
kmp_indirect_lock_t object now (was a pointer to an object).
* Some clean up in the critical section code
* Removed the limits of the tuning parameters read from KMP_ADAPTIVE_LOCK_PROPS
* KMP_USE_DYNAMIC_LOCK=1 also turns on these two switches:
KMP_USE_TSX, KMP_USE_ADAPTIVE_LOCKS
Differential Revision: http://reviews.llvm.org/D15204
llvm-svn: 255375
Summary: The Hexagon ABI plugin uses hardcoded registers when setting up function calls. This is OK for the Hexagon simulator, but the register numbers are different on the gdbserver running on hardware. Change the hardcoded registers to LLDB generic registers.
Reviewers: clayborg
Subscribers: lldb-commits
Differential Revision: http://reviews.llvm.org/D15457
llvm-svn: 255374
There are going to be two more patches which bring this feature up to date and in line with OpenMP 4.5.
* Renamed jump tables for the lock functions (and some clean up).
* Renamed some macros to be in KMP_ namespace.
* Return type of unset functions changed from void to int.
* Enabled use of _xebgin() et al. intrinsics for accessing TSX instructions.
Differential Revision: http://reviews.llvm.org/D15199
llvm-svn: 255373
(This is part-2 of the patch -- fixing test cases)
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).
This patch fixed the issue. With the change, the index format
version will be bumped up by 1. Backward compatibility is
preserved with this change.
Differential Revision: http://reviews.llvm.org/D15243
llvm-svn: 255366
Before the patch, -fprofile-instr-generate compile will fail
if no integrated-as is specified when the file contains
any static functions (the -S output is also invalid).
This patch fixed the issue. With the change, the index format
version will be bumped up by 1. Backward compatibility is
preserved with this change.
Differential Revision: http://reviews.llvm.org/D15243
llvm-svn: 255365
computeRegisterLiveness() was broken in that it reported dead for a
register even if a subregister was alive. I assume this was because the
results of analayzePhysRegs() are hard to understand with respect to
subregisters.
This commit: Changes the results of analyzePhysRegs (=struct
PhysRegInfo) to be clearly understandable, also renames the fields to
avoid silent breakage of third-party code (and improve the grammar).
Fix all (two) users of computeRegisterLiveness() in llvm: By reenabling
it and removing workarounds for the bug.
This fixes http://llvm.org/PR24535 and http://llvm.org/PR25033
Differential Revision: http://reviews.llvm.org/D15320
llvm-svn: 255362
These are redundant pairs of nodes defined for
INSERT_VECTOR_ELEMENT/EXTRACT_VECTOR_ELEMENT.
insertelement/extractelement are slightly closer to the corresponding
C++ node name, and has stricter type checking so prefer it.
Update targets to only use these nodes where it is trivial to do so.
AArch64, ARM, and Mips all have various type errors on simple replacement,
so they will need work to fix.
Example from AArch64:
def : Pat<(sext_inreg (vector_extract (v16i8 V128:$Rn), VectorIndexB:$idx), i8),
(i32 (SMOVvi8to32 V128:$Rn, VectorIndexB:$idx))>;
Which is trying to do sext_inreg i8, i8.
llvm-svn: 255359
Summary:
ADJCALLSTACK{DOWN,UP} (aka CALLSEQ_{START,END}) MIs are supposed to use
and def the stack pointer. Since they do not, all the nodes are being
eliminated by DeadMachineInstructionElim, so they aren't in the IR when
PrologEpilogInserter/eliminateCallFramePseudo needs them.
This change fixes that, but since RegStackify will not stackify across
them (and it runs early, before PEI), change LowerCall to only emit them
when the call frame size is > 0. That makes the current code work the
same way and makes code handled by D15344 also work the same way. We can
expand the condition beyond NumBytes > 0 in the future if needed.
Reviewers: sunfish, jfb
Subscribers: jfb, dschuff, llvm-commits
Differential Revision: http://reviews.llvm.org/D15459
llvm-svn: 255356
Revert "[DSE] Disable non-local DSE to see if the bots go green."
Revert "[DeadStoreElimination] Use range-based loops. NFC."
Revert "[DeadStoreElimination] Add support for non-local DSE."
llvm-svn: 255354
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
supports it for the given calling convention and the function has only return
exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
virtual registers at beginning of the entry block and copies from virtual
registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15340
llvm-svn: 255353
This will be used in a future change to support rerunning flakey tests
that hit a test result isue in a low-load, single worker test runner phase.
This is implemented as an additive-style event rather than being
evaluated and added to the start_test event because the decorator code
only runs after the start_test event is created and sent. i.e.
LLDBTestResult.startTest() runs before the test method decorators run.
llvm-svn: 255351
Quoting from the comment added to the code:
// Objective-C on i386 uses artificial absolute symbols to
// perform some link time checks. Those symbols have a fixed 0
// address that might conflict with real symbols in the object
// file. As I cannot see a way for absolute symbols to find
// their way into the debug information, let's just ignore those.
llvm-svn: 255350
It is reasonable to specify an entry point for shared objects - for
example, for the FreeBSD rtld ld-elf.so.1.
Unlike GNU ld we leave the entry address as 0 if -shared is specified
without -e.
Differential Revision: http://reviews.llvm.org/D15454
llvm-svn: 255349
GlobalsAA's assumptions that passes do not escape globals not previously
escaped is not violated by AlignmentFromAssumptions and SLPVectorizer. Marking
them as such allows GlobalsAA to be preserved until GVN in the LTO pipeline.
http://lists.llvm.org/pipermail/llvm-dev/2015-December/092972.html
Patch by Vaivaswatha Nagaraj!
llvm-svn: 255348
address space unless address space is explicitly specified.
Correct the behavior of NULL constant detection -
generic AS void pointer should be accepted as a valid NULL constant.
http://reviews.llvm.org/D15293
llvm-svn: 255346
This patch enables the safestack for aarch64. The frontend already have
it enabled on all supported architectures and no adjustment is required
in llvm.
The compiler-rt adjustments are basically add on the cmake configuration
to enable the tests and fix the pagesize debug check by getting its
value at runtime (since aarch64 has multiple pagesize depending of
kernel configuration).
llvm-svn: 255345
AsmWriterEmitter will generate a getRegisterName function with an alternate
register name index as its second argument if the target makes use of them. The
enum of these values is generated in RegisterInfoEmitter. The getRegisterName
generator would assume the namespace could always be found by reading index 1
of the list of AltNameIndices, but this will fail if this list is sorted such
that the NoRegAltName is at index 1. Because this list is sorted by record name
(in CodeGenTarget::ReadRegAltNameIndices), you only run in to problems if your
MyTargetRegisterInfo.td defines a single RegAltNameIndex that sorts lexically
before NoRegAltName.
For example, if a target has something like
def AnAltNameIndex : RegAltNameIndex
and defines RegAltNameIndices for some registers then, prior to this change,
AsmWriterEmitter would generate references to
::AnAltNameIndex and ::NoRegAltName
Patch by Alex Bradbury!
llvm-svn: 255344
LLDB don't detect the loading of a shared object file linked against the
main executable before the static initializers are executed for the
given module. Because of this it is not possible to get breakpoint hits
in these static initializers and to display proper debug info in case of
a crash in these codes.
llvm-svn: 255342