Commit Graph

2826 Commits

Author SHA1 Message Date
Amara Emerson 7a4d2df04a [AArch64][GlobalISel] Optimize compare and branch cases with G_INTTOPTR and unknown values.
Since we have distinct types for pointers and scalars, G_INTTOPTRs can sometimes
obstruct attempts to find constant source values. These usually come about when
try to do some kind of null pointer check. Teaching getConstantVRegValWithLookThrough
about this operation allows the CBZ/CBNZ optimization to catch more cases.

This change also improves the case where we can't find a constant source at all.
Previously we would emit a cmp, cset and tbnz for that. Now we try to just emit
a cmp and conditional branch, saving an instruction.

The cumulative code size improvement of this change plus D64354 is 5.5% geomean
on arm64 CTMark -O0.

Differential Revision: https://reviews.llvm.org/D64377

llvm-svn: 365690
2019-07-10 19:21:43 +00:00
Jessica Paquette 7c95925b13 [GlobalISel][AArch64] Use getOpcodeDef instead of findMIFromReg
Some minor cleanup.

This function in Utils does the same thing as `findMIFromReg`. It also looks
through copies, which `findMIFromReg` didn't.

Delete `findMIFromReg` and use `getOpcodeDef` instead. This only happens in
`tryOptVectorDup` right now.

Update opt-shuffle-splat to show that we can look through the copies now, too.

Differential Revision: https://reviews.llvm.org/D64520

llvm-svn: 365684
2019-07-10 18:46:56 +00:00
Jessica Paquette 3132968ae9 [GlobalISel][AArch64][NFC] Use getDefIgnoringCopies from Utils where we can
There are a few places where we walk over copies throughout
AArch64InstructionSelector.cpp. In Utils, there's a function that does exactly
this which we can use instead.

Note that the utility function works with the case where we run into a COPY
from a physical register. We've run into bugs with this a couple times, so using
it should defend us from similar future bugs.

Also update opt-fold-compare.mir to show that we still handle physical registers
properly.

Differential Revision: https://reviews.llvm.org/D64513

llvm-svn: 365683
2019-07-10 18:44:57 +00:00
Nick Desaulniers 8728e45706 [TargetLowering] support BlockAddress as "i" inline asm constraint
Summary:
This allows passing address of labels to inline assembly "i" input
constraints.

Fixes pr/42502.

Reviewers: ostannard

Reviewed By: ostannard

Subscribers: void, echristo, nathanchance, ostannard, javed.absar, hiraditya, llvm-commits, srhines

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64167

llvm-svn: 365664
2019-07-10 17:08:25 +00:00
Matt Arsenault e595a2c964 GlobalISel: Define the full family of FP min/max instructions
llvm-svn: 365657
2019-07-10 16:31:15 +00:00
Peter Collingbourne 1366262b74 hwasan: Improve precision of checks using short granule tags.
A short granule is a granule of size between 1 and `TG-1` bytes. The size
of a short granule is stored at the location in shadow memory where the
granule's tag is normally stored, while the granule's actual tag is stored
in the last byte of the granule. This means that in order to verify that a
pointer tag matches a memory tag, HWASAN must check for two possibilities:

* the pointer tag is equal to the memory tag in shadow memory, or
* the shadow memory tag is actually a short granule size, the value being loaded
  is in bounds of the granule and the pointer tag is equal to the last byte of
  the granule.

Pointer tags between 1 to `TG-1` are possible and are as likely as any other
tag. This means that these tags in memory have two interpretations: the full
tag interpretation (where the pointer tag is between 1 and `TG-1` and the
last byte of the granule is ordinary data) and the short tag interpretation
(where the pointer tag is stored in the granule).

When HWASAN detects an error near a memory tag between 1 and `TG-1`, it
will show both the memory tag and the last byte of the granule. Currently,
it is up to the user to disambiguate the two possibilities.

Because this functionality obsoletes the right aligned heap feature of
the HWASAN memory allocator (and because we can no longer easily test
it), the feature is removed.

Also update the documentation to cover both short granule tags and
outlined checks.

Differential Revision: https://reviews.llvm.org/D63908

llvm-svn: 365551
2019-07-09 20:22:36 +00:00
Amara Emerson 6616e269a6 [AArch64][GlobalISel] Optimize conditional branches followed by unconditional branches
If we have an icmp->brcond->br sequence where the brcond just branches to the
next block jumping over the br, while the br takes the false edge, then we can
modify the conditional branch to jump to the br's target while inverting the
condition of the incoming icmp. This means we can eliminate the br as an
unconditional branch to the fallthrough block.

Differential Revision: https://reviews.llvm.org/D64354

llvm-svn: 365510
2019-07-09 16:05:59 +00:00
Jessica Paquette 55d19247ef [AArch64][GlobalISel] Use TST for comparisons when possible
Porting over the part of `emitComparison` in AArch64ISelLowering where we use
TST to represent a compare.

- Rename `tryOptCMN` to `tryFoldIntegerCompare`, since it now also emits TSTs
  when possible.

- Add a utility function for emitting a TST with register operands.

- Rename opt-fold-cmn.mir to opt-fold-compare.mir, since it now also tests the
  TST fold as well.

Differential Revision: https://reviews.llvm.org/D64371

llvm-svn: 365404
2019-07-08 22:58:36 +00:00
Jessica Paquette a99cfeea44 [GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for selectArithImmed
Instead of just stopping to see if we have a G_CONSTANT, instead, look through
G_TRUNCs, G_SEXTs, and G_ZEXTs.

This gives an average ~1.3% code size improvement on CINT2000 at -O3.

Differential Revision: https://reviews.llvm.org/D64108

llvm-svn: 365063
2019-07-03 17:46:23 +00:00
Roman Lebedev c4b83a6054 [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)
Summary:
This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]].
In middle-end, we'd want to prefer the form with two adds - D63992,
but as this diff shows, not every target will prefer that pattern.

Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars,
but only X86 prefer that same pattern for vectors.

Here i'm adding a new TLI hook, always defaulting to the inc-of-add,
but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars.

Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel

Reviewed By: efriedma

Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D64090

llvm-svn: 365010
2019-07-03 09:41:35 +00:00
Amara Emerson cac1151845 [AArch64][GlobalISel] Overhaul legalization & isel or shifts to select immediate forms.
There are two main issues preventing us from generating immediate form shifts:
1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift
immediate forms, but they currently don't work because the amount type is
expected to be an s64 constant, but we only legalize them to have homogenous
types.

To deal with this, first we introduce a custom legalizer to *only* custom legalize
s32 shifts which have a constant operand into a s64.

There is also an additional artifact combiner to fold zexts(g_constant) to a
larger G_CONSTANT if it's legal, a counterpart to the anyext version committed
in an earlier patch.

2) For G_SHL the importer can't cope with the pattern. For this I introduced an
early selection phase in the arm64 selector to select these forms manually
before the tablegen selector pessimizes it to a register-register variant.

Differential Revision: https://reviews.llvm.org/D63910

llvm-svn: 364994
2019-07-03 01:49:06 +00:00
Jessica Paquette 99316043bb [AArch64][GlobalISel] Teach tryOptSelect to handle G_ICMP
This teaches `tryOptSelect` to handle folding G_ICMP, and removes the
requirement that the G_SELECT we're dealing with is floating point.

Some refactoring to make this work nicely as well:

- Factor out the scalar case from the selection code for G_ICMP into
  `emitIntegerCompare`.
- Make `tryOptCMN` return a MachineInstr* instead of a bool.
- Make `tryOptCMN` not modify the instruction being selected.
- Factor out the CMN emission into `emitCMN` for readability.

By doing this this way, we can get all of the compare selection optimizations
in select emission.

Differential Revision: https://reviews.llvm.org/D64084

llvm-svn: 364961
2019-07-02 19:44:16 +00:00
Roman Lebedev 059f495831 [NFC][Codegen][X86][AArch64][ARM][PowerPC] Recommit: Add test coverage for "add-of-inc" vs "sub-of-not"
I initially committed it with --check-prefix instead of --check-prefixes
(again, shame on me, and utils/update_*.py not complaining!)
and did not have a moment to understand the failure,
so i reverted it initially in rL64939.

llvm-svn: 364945
2019-07-02 16:48:49 +00:00
Roman Lebedev 893bbc9001 Revert "[NFC][Codegen][X86][AArch64][ARM][PowerPC] Add test coverage for "add-of-inc" vs "sub-of-not""
Some test failures i don't have a moment to investigate.

This reverts commit r364930.

llvm-svn: 364939
2019-07-02 15:54:24 +00:00
Roman Lebedev 39639261cc [NFC][Codegen][X86][AArch64][ARM][PowerPC] Add test coverage for "add-of-inc" vs "sub-of-not"
As it is pointed out in https://reviews.llvm.org/D63992,
before we get to pick canonical variant in middle-end
we should ensure best codegen in backend.

llvm-svn: 364930
2019-07-02 14:48:52 +00:00
Matt Arsenault ce690544a6 GlobalISel: Add G_FENCE
The pattern importer is for some reason emitting checks for G_CONSTANT
for the immediate operands.

llvm-svn: 364926
2019-07-02 14:16:39 +00:00
Amara Emerson 000ef2c2ae [TailDuplicator] Fix copy instruction emitting into the wrong block.
The code for duplicating instructions could sometimes try to emit copies
intended to deal with unconstrainable register classes to the tail block of the
original instruction, rather than before the newly cloned instruction in the
predecessor block.

This was exposed by GlobalISel on arm64.

Differential Revision: https://reviews.llvm.org/D64049

llvm-svn: 364888
2019-07-02 06:04:46 +00:00
Roman Lebedev 0b8b419537 [NFC][Codegen] Revisit test coverage for X % C == 0 fold once more (add tests with '1' divisor)
llvm-svn: 364661
2019-06-28 17:26:28 +00:00
Roman Lebedev 9af4474253 [NFC][Codegen] Revisit test coverage for X % C == 0 fold
llvm-svn: 364642
2019-06-28 11:36:34 +00:00
Amara Emerson ecb7ac35f9 [GlobalISel][IRTranslator] Fix some PHI bugs related to jump tables when optimizations are used.
The new switch lowering code that tries to generate jump tables and range checks
were tested at -O0 on arm64, but on -O3 the generic switch lowering code goes to
town on trying to generate optimized lowerings, e.g. multiple jump tables, range
checks etc. This exposed bugs in the way PHI nodes are handled because the CFG
looks even stranger after all of this is done.

llvm-svn: 364613
2019-06-27 23:56:34 +00:00
Roman Lebedev 29d05c005f [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 3)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

This is a recommit, the original commit rL364563 was reverted in rL364568
because test-suite detected miscompile - the new comparison constant 'Q'
was being computed incorrectly (we divided by `D0` instead of `D`).

Original patch D50222 by @hermord (Dmytro Shynkevych)

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: dexonsmith, kristina, xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

llvm-svn: 364600
2019-06-27 21:52:10 +00:00
Roman Lebedev bd34e50cf0 [NFC][CodeGen] Add negative test for X u% C == 0 fold (D63391)
The fold (D63391) uses multiplicativeInverse(),
but it is not guaranteed to always succeed,
and '100' appears to be one of the problematic values.

llvm-svn: 364578
2019-06-27 19:09:51 +00:00
Roman Lebedev 0a2b7b79fa Revert "[CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)"
*Appears* to break test-suite on
http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/23790

FAIL: burg.execution_time
FAIL: spiff.execution_time
FAIL: employ.execution_time
FAIL: llu.execution_time
FAIL: gramschmidt.execution_time
FAIL: fdtd-apml.execution_time

This reverts commit r364563.

llvm-svn: 364568
2019-06-27 17:22:31 +00:00
Roman Lebedev 0627b09863 [CodeGen] [SelectionDAG] More efficient code for X % C == 0 (UREM case) (try 2)
Summary:
I'm submitting a new revision since i don't understand how to reclaim/reopen/take over the existing one, D50222.
There is no such action in "Add Action" menu...
Original patch D50222 by @hermord (Dmytro Shynkevych)

This implements an optimization described in Hacker's Delight 10-17: when `C` is constant,
the result of `X % C == 0` can be computed more cheaply without actually calculating the remainder.
The motivation is discussed here: https://bugs.llvm.org/show_bug.cgi?id=35479.

Original patch author: @hermord (Dmytro Shynkevych)!

Notes:
- In principle, it's possible to also handle the `X % C1 == C2` case, as discussed on bugzilla.
  This seems to require an extra branch on overflow, so I refrained from implementing this for now.
- An explicit check for when the `REM` can be reduced to just its LHS is included:
  the `X % C` == 0 optimization breaks `test1` in `test/CodeGen/X86/jump_sign.ll` otherwise.
  I hadn't managed to find a better way to not generate worse output in this case.
- The `test/CodeGen/X86/jump_sign.ll` regresses, and is being fixed by a followup patch D63390.

Reviewers: RKSimon, craig.topper, spatel, hermord, xbolva00

Reviewed By: RKSimon, xbolva00

Subscribers: xbolva00, javed.absar, llvm-commits, hermord

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D63391

llvm-svn: 364563
2019-06-27 16:45:42 +00:00
Diana Picus 43fb5ae50c [GlobalISel] Accept multiple vregs for lowerCall's args
Change the interface of CallLowering::lowerCall to accept several
virtual registers for each argument, instead of just one.  This is a
follow-up to D46018.

CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.

With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.

ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.

NFCI for AMDGPU, Mips and X86.

Differential Revision: https://reviews.llvm.org/D63551

llvm-svn: 364512
2019-06-27 09:18:03 +00:00
Diana Picus 8138996128 [GlobalISel] Accept multiple vregs for lowerCall's result
Change the interface of CallLowering::lowerCall to accept several
virtual registers for the call result, instead of just one.  This is a
follow-up to D46018.

CallLowering::lowerReturn was similarly refactored in D49660 and
lowerFormalArguments in D63549.

With this change, we no longer pack the virtual registers generated for
aggregates into one big lump before delegating to the target. Therefore,
the target can decide itself whether it wants to handle them as separate
pieces or use one big register.

ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.

NFCI for AMDGPU, Mips and X86.

Differential Revision: https://reviews.llvm.org/D63550

llvm-svn: 364511
2019-06-27 09:15:53 +00:00
Diana Picus c3dbe23977 [GlobalISel] Accept multiple vregs in lowerFormalArgs
Change the interface of CallLowering::lowerFormalArguments to accept
several virtual registers for each formal argument, instead of just one.
This is a follow-up to D46018.

CallLowering::lowerReturn was similarly refactored in D49660. lowerCall
will be refactored in the same way in follow-up patches.

With this change, we forward the virtual registers generated for
aggregates to CallLowering. Therefore, the target can decide itself
whether it wants to handle them as separate pieces or use one big
register. We also copy the pack/unpackRegs helpers to CallLowering to
facilitate this.

ARM and AArch64 have been updated to use the passed in virtual registers
directly, which means we no longer need to generate so many
merge/extract instructions.

AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was
put into a s64 instead of a p0. Added a test-case which illustrates the
problem more clearly (it crashes without this patch) and fixed the
existing test-case to expect p0.

AMDGPU has been updated to unpack into the virtual registers for
kernels. I think the other code paths fall back for aggregates, so this
should be NFC.

Mips doesn't support aggregates yet, so it's also NFC.

x86 seems to have code for dealing with aggregates, but I couldn't find
the tests for it, so I just added a fallback to DAGISel if we get more
than one virtual register for an argument.

Differential Revision: https://reviews.llvm.org/D63549

llvm-svn: 364510
2019-06-27 08:54:17 +00:00
Evandro Menezes 42e13c8328 [CodeGen] Improve formatting of jump tables (NFC)
Split jump tables into individual lines and fix spacing.

llvm-svn: 364436
2019-06-26 15:11:31 +00:00
Clement Courbet 2851248fa1 Revert "r364412 [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline."
Breaks sanitizers:
    libFuzzer :: cxxstring.test
    libFuzzer :: memcmp.test
    libFuzzer :: recommended-dictionary.test
    libFuzzer :: strcmp.test
    libFuzzer :: value-profile-mem.test
    libFuzzer :: value-profile-strcmp.test

llvm-svn: 364416
2019-06-26 12:13:13 +00:00
Clement Courbet 7b3a5f0e6d [ExpandMemCmp][MergeICmps] Move passes out of CodeGen into opt pipeline.
This allows later passes (in particular InstCombine) to optimize more
cases.

One that's important to us is `memcmp(p, q, constant) < 0` and memcmp(p, q, constant) > 0.

llvm-svn: 364412
2019-06-26 11:50:18 +00:00
Sanjay Patel 0baacea2c7 [AArch64][x86] add tests for ctpop != 1; NFC
This is the inverted predicate pattern for D63004.

llvm-svn: 364314
2019-06-25 13:37:16 +00:00
Simon Pilgrim fd7d0d4e3f [AArch64] Regenerate vcvt tests. NFCI.
Prep work for an upcoming patch

llvm-svn: 364205
2019-06-24 17:18:20 +00:00
Simon Pilgrim de1ce8230d [AArch64] Regenerate 2velem tests. NFCI.
Prep work for an upcoming patch

llvm-svn: 364204
2019-06-24 16:58:19 +00:00
Simon Pilgrim 0f0bbbd4bb [AArch64] Regenerate merge-store tests. NFCI.
Prep work for an upcoming patch

llvm-svn: 364203
2019-06-24 16:57:12 +00:00
Peter Collingbourne 8cd780b432 AArch64: Add support for reading pc using llvm.read_register.
This is useful for allowing code to efficiently take an address
that can be later mapped onto debug info. Currently the hwasan
pass achieves this by taking the address of the current function:
http://llvm-cs.pcc.me.uk/lib/Transforms/Instrumentation/HWAddressSanitizer.cpp#921

but this costs two instructions (plus a GOT entry in PIC code) per function
with stack variables. This will allow the cost to be reduced to a single
instruction.

Differential Revision: https://reviews.llvm.org/D63471

llvm-svn: 364126
2019-06-22 03:03:25 +00:00
Peter Collingbourne 4608868d2f AArch64: Prefer FP-relative debug locations in HWASANified functions.
To help produce better diagnostics for stack use-after-return, we'd like
to be able to determine the addresses of each HWASANified function's local
variables given a small amount of information recorded on entry to the
function. Currently we require all HWASANified functions to use frame pointers
and record (PC, FP) on function entry. This works better than recording SP
because FP cannot change during the function, unlike SP which can change
e.g. due to dynamic alloca.

However, most variables currently end up using SP-relative locations in their
debug info. This prevents us from recomputing the address of most variables
because the distance between SP and FP isn't recorded in the debug info. To
address this, make the AArch64 backend prefer FP-relative debug locations
when producing debug info for HWASANified functions.

Differential Revision: https://reviews.llvm.org/D63300

llvm-svn: 364117
2019-06-22 00:06:51 +00:00
Tom Tan 7ecb5145ba [COFF, ARM64] Fix encoding of debugtrap for Windows
On Windows ARM64, intrinsic __debugbreak is compiled into brk #0xF000 which is
mapped to llvm.debugtrap in Clang. Instruction brk #F000 is the defined break
point instruction on ARM64 which is recognized by Windows debugger and
exception handling code, so llvm.debugtrap should map to it instead of
redirecting to llvm.trap (brk #1) as the default implementation.

Differential Revision: https://reviews.llvm.org/D63635

llvm-svn: 364115
2019-06-21 23:38:05 +00:00
Amara Emerson 6e71b34fe6 [AArch64][GlobalISel] Implement selection support for the new G_JUMP_TABLE and G_BRJT ops.
With this we can now fully code generate jump tables, which is important for code size.

Differential Revision: https://reviews.llvm.org/D63223

llvm-svn: 364086
2019-06-21 18:10:41 +00:00
Amara Emerson fe4625fb24 [GlobalISel][IRTranslator] Change switch table translation to generate jump tables and range checks.
This change makes use of the newly refactored SwitchLoweringUtils code from
SelectionDAG to in order to generate jump tables and range checks where appropriate.

Much of this code is ported from SDAG with some modifications. We generate
G_JUMP_TABLE and G_BRJT instructions when JT opportunities are found. This means
that targets which previously relied on the naive one MBB per case stmt
translation will now start falling back until they add support for the new opcodes.

For range checks, we don't generate any previously unused operations. This
just recognizes contiguous ranges of case values and generates a single block per
range. Single case value blocks are just a special case of ranges so we get that
support almost for free.

There are still some optimizations missing that I haven't ported over, and
bit-tests are also unimplemented. This patch series is already complex enough.

Actual arm64 support for selection of jump tables is coming in a later patch.

Differential Revision: https://reviews.llvm.org/D63169

llvm-svn: 364085
2019-06-21 18:10:38 +00:00
Amara Emerson 8f25a021dd [AArch64][GlobalISel] Make s8 and s16 G_CONSTANTs legal.
We sometimes get poor code size because constants of types < 32b are legalized
as 32 bit G_CONSTANTs with a truncate to fit. This works but means that the
localizer can no longer sink them (although it's possible to extend it to do so).

On AArch64 however s8 and s16 constants can be selected in the same way as s32
constants, with a mov pseudo into a W register. If we make s8 and s16 constants
legal then we can avoid unnecessary truncates, they can be CSE'd, and the
localizer can sink them as normal.

There is a caveat: if the user of a smaller constant has to widen the sources,
we end up with an anyext of the smaller typed G_CONSTANT. This can cause
regressions because of the additional extend and missed pattern matching. To
remedy this, there's a new artifact combiner to generate the wider G_CONSTANT
if it's legal for the target.

Differential Revision: https://reviews.llvm.org/D63587

llvm-svn: 364075
2019-06-21 16:43:50 +00:00
Amara Emerson bc0d08e0ee [GlobalISel][Localizer] Allow localization of G_INTTOPTR and chains of instructions.
G_INTTOPTR can prevent the localizer from moving G_CONSTANTs, but since it's
essentially a side effect free cast instruction we can remat both instructions.
This patch changes the localizer to enable localization of the chains by
iterating over the entry block instructions in reverse order. That way, uses will
localized first, and then the defs are free to be localized as well.

This also changes the previous SmallPtrSet of localized instructions to use a
SetVector instead. We're dealing with pointers and need deterministic iteration
order.

Overall, this change improves ARM64 -O0 CTMark code size by around 0.7% geomean.

Differential Revision: https://reviews.llvm.org/D63630

llvm-svn: 364001
2019-06-21 00:36:19 +00:00
Evandro Menezes aa10f05044 [CodeGen] Fix formatting and comments (NFC)
llvm-svn: 363947
2019-06-20 16:34:00 +00:00
Peter Collingbourne 2742eeb78e hwasan: Shrink outlined checks by 1 instruction.
Turns out that we can save an instruction by folding the right shift into
the compare.

Differential Revision: https://reviews.llvm.org/D63568

llvm-svn: 363874
2019-06-19 20:40:03 +00:00
Evandro Menezes a7ed3a627b [AArch64] Improve jump tables testing (NFC)
Improve testing of the minimum and maximum sizes of jump tables.

llvm-svn: 363839
2019-06-19 16:59:34 +00:00
Evandro Menezes 54252b8243 [AArch64] Improve jump tables testing (NFC)
Improve testing of the minimum and maximum sizes of jump tables.

llvm-svn: 363837
2019-06-19 16:35:30 +00:00
Matt Arsenault 9cac4e6d14 Rename ExpandISelPseudo->FinalizeISel, delay register reservation
This allows targets to make more decisions about reserved registers
after isel. For example, now it should be certain there are calls or
stack objects in the frame or not, which could have been introduced by
legalization.

Patch by Matthias Braun

llvm-svn: 363757
2019-06-19 00:25:39 +00:00
Peter Collingbourne fb9ce100d1 hwasan: Add a tag_offset DWARF attribute to instrumented stack variables.
The goal is to improve hwasan's error reporting for stack use-after-return by
recording enough information to allow the specific variable that was accessed
to be identified based on the pointer's tag. Currently we record the PC and
lower bits of SP for each stack frame we create (which will eventually be
enough to derive the base tag used by the stack frame) but that's not enough
to determine the specific tag for each variable, which is the stack frame's
base tag XOR a value (the "tag offset") that is unique for each variable in
a function.

In IR, the tag offset is most naturally represented as part of a location
expression on the llvm.dbg.declare instruction. However, the presence of the
tag offset in the variable's actual location expression is likely to confuse
debuggers which won't know about tag offsets, and moreover the tag offset
is not required for a debugger to determine the location of the variable on
the stack, so at the DWARF level it is represented as an attribute so that
it will be ignored by debuggers that don't know about it.

Differential Revision: https://reviews.llvm.org/D63119

llvm-svn: 363635
2019-06-17 23:39:41 +00:00
Amara Emerson 146882242f [GlobalISel][Localizer] Rewrite localizer to run in 2 phases, inter & intra block.
Inter-block localization is the same as what currently happens, except now it
only runs on the entry block because that's where the problematic constants with
long live ranges come from.

The second phase is a new intra-block localization phase which attempts to
re-sink the already localized instructions further right before one of the
multiple uses.

One additional change is to also localize G_GLOBAL_VALUE as they're constants
too. However, on some targets like arm64 it takes multiple instructions to
materialize the value, so some additional heuristics with a TTI hook have been
introduced attempt to prevent code size regressions when localizing these.

Overall, these changes improve CTMark code size on arm64 by 1.2%.

Full code size results:

Program                                         baseline       new       diff
------------------------------------------------------------------------------
 test-suite...-typeset/consumer-typeset.test    1249984      1217216     -2.6%
 test-suite...:: CTMark/ClamAV/clamscan.test    1264928      1232152     -2.6%
 test-suite :: CTMark/SPASS/SPASS.test          1394092      1361316     -2.4%
 test-suite...Mark/mafft/pairlocalalign.test    731320       714928      -2.2%
 test-suite :: CTMark/lencod/lencod.test        1340592      1324200     -1.2%
 test-suite :: CTMark/kimwitu++/kc.test         3853512      3820420     -0.9%
 test-suite :: CTMark/Bullet/bullet.test        3406036      3389652     -0.5%
 test-suite...ark/tramp3d-v4/tramp3d-v4.test    8017000      8016992     -0.0%
 test-suite...TMark/7zip/7zip-benchmark.test    2856588      2856588      0.0%
 test-suite...:: CTMark/sqlite3/sqlite3.test    765704       765704       0.0%
 Geomean difference                                                      -1.2%

Differential Revision: https://reviews.llvm.org/D63303

llvm-svn: 363632
2019-06-17 23:20:29 +00:00
Michael Berg f9bff2a55e Propagate fmf in IRTranslate for fneg
Summary: This case is related to D63405 in that we need to be propagating FMF on negates.

Reviewers: volkan, spatel, arsenm

Reviewed By: arsenm

Subscribers: wdng, javed.absar

Differential Revision: https://reviews.llvm.org/D63458

llvm-svn: 363631
2019-06-17 23:19:40 +00:00
Volkan Keles 689509edab [test][AArch64] Relax the check line for G_BRJT in legalizer-info-validation.mir
Replace the specific number with a pattern to relax the test.

llvm-svn: 363621
2019-06-17 21:25:25 +00:00