Commit Graph

303636 Commits

Author SHA1 Message Date
Sanjay Patel 08c0a0ac58 [Hexagon] make test immune to improvements in undef simplification
llvm-svn: 347218
2018-11-19 15:34:09 +00:00
Sanjay Patel 60abc29b0a [x86] add/make tests immune to improvements in undef simplification
llvm-svn: 347217
2018-11-19 15:33:44 +00:00
Zachary Turner 58db03a116 Fix some issues with LLDB's lit configuration files.
Recently I tried to port LLDB's lit configuration files over to use a
on the surface, but broke some cases that weren't broken before and also
exposed some additional problems with the old approach that we were just
getting lucky with.

When we set up a lit environment, the goal is to make it as hermetic as
possible. We should not be relying on PATH and enabling the use of
arbitrary shell commands. Instead, only whitelisted commands should be
allowed. These are, generally speaking, the lit builtins such as echo,
cd, etc, as well as anything for which substitutions have been
explicitly set up for. These substitutions should map to the build
output directory, but in some cases it's useful to be able to override
this (for example to point to an installed tools directory).

This is, of course, how it's supposed to work. What was actually
happening is that we were bringing in PATH and LD_LIBRARY_PATH and then
just running the given run line as a shell command. This led to problems
such as finding the wrong version of clang-cl on PATH since it wasn't
even a substitution, and flakiness / non-determinism since the
environment the tests were running in would change per-machine. On the
other hand, it also made other things possible. For example, we had some
tests that were explicitly running cl.exe and link.exe instead of
clang-cl and lld-link and the only reason it worked at all is because it
was finding them on PATH. Unfortunately we can't entirely get rid of
these tests, because they support a few things in debug info that
clang-cl and lld-link don't (notably, the LF_UDT_MOD_SRC_LINE record
which makes some of the tests fail.

The high level changes introduced in this patch are:

1. Removal of functionality - The lit test suite no longer respects
   LLDB_TEST_C_COMPILER and LLDB_TEST_CXX_COMPILER. This means there is no
   more support for gcc, but nobody was using this anyway (note: The
   functionality is still there for the dotest suite, just not the lit test
   suite). There is no longer a single substitution %cxx and %cc which maps
   to <arbitrary-compiler>, you now explicitly specify the compiler with a
   substitution like %clang or %clangxx or %clang_cl. We can revisit this
   in the future when someone needs gcc.

2. Introduction of the LLDB_LIT_TOOLS_DIR directory. This does in spirit
   what LLDB_TEST_C_COMPILER and LLDB_TEST_CXX_COMPILER used to do, but now
   more friendly. If this is not specified, all tools are expected to be
   the just-built tools. If it is specified, the tools which are not
   themselves being tested but are being used to construct and run checks
   (e.g. clang, FileCheck, llvm-mc, etc) will be searched for in this
   directory first, then the build output directory.

3. Changes to core llvm lit files. The use_lld() and use_clang()
   functions were introduced long ago in anticipation of using them in
   lldb, but since they were never actually used anywhere but their
   respective problems, there were some issues to be resolved regarding
   generality and ability to use them outside their project.

4. Changes to .test files - These are all just replacing things like
   clang-cl with %clang_cl and %cxx with %clangxx, etc.

5. Changes to lit.cfg.py - Previously we would load up some system
   environment variables and then add some new things to them. Then do a
   bunch of work building out our own substitutions. First, we delete the
   system environment variable code, making the environment hermetic. Then,
   we refactor the substitution logic into two separate helper functions,
   one which sets up substitutions for the tools we want to test (which
   must come from the build output directory), and another which sets up
   substitutions for support tools (like compilers, etc).

6. New substitutions for MSVC -- Previously we relied on location of
   MSVC by bringing in the entire parent's PATH and letting
   subprocess.Popen just run the command line. Now we set up real
   substitutions that should have the same effect. We use PATH to find
   them, and then look for INCLUDE and LIB to construct a substitution
   command line with appropriate /I and /LIBPATH: arguments. The nice thing
   about this is that it opens the door to having separate %msvc-cl32 and
   %msvc-cl64 substitutions, rather than only requiring the user to run
   vcvars first. Because we can deduce the path to 32-bit libraries from
   64-bit library directories, and vice versa. Without these substitutions
   this would have been impossible.

Differential Revision: https://reviews.llvm.org/D54567

llvm-svn: 347216
2018-11-19 15:12:34 +00:00
Fedor Sergeev 3a3d688cc8 [LoopPass] fixing 'Modification' messages in -debug-pass=Executions for loop passes
Legacy loop pass manager is issuing "Made Modification" message after each Loop Pass
run, however condition for issuing it is accumulated among all the runs.
That leads to confusing 'modification' messages as soon as the first modification is done.

Changing condition to be "current pass made modifications", similar to how
it is being done in all other pass managers.

llvm-svn: 347215
2018-11-19 15:10:59 +00:00
Patrick Lyster 8f7f586e53 [OpenMP] Check target architecture supports unified shared memory for requires directive. Differential Review: https://reviews.llvm.org/D54493
llvm-svn: 347214
2018-11-19 15:09:33 +00:00
Zachary Turner 28d0131d40 Don't use -O in lit tests.
Because of different shell quoting rules, and the fact that LLDB
commands often contain spaces, -O is not portable for writing command
lines. Instead, we should use explicit lldbinit files.

Differential Revision: https://reviews.llvm.org/D54680

llvm-svn: 347213
2018-11-19 15:06:10 +00:00
Sanjay Patel a1dca3553e [SelectionDAG] simplify select FP with undef condition
llvm-svn: 347212
2018-11-19 14:42:28 +00:00
Sanjay Patel 7a51bdcf3b [x86] add test for select FP with undef condition; NFC
llvm-svn: 347211
2018-11-19 14:39:57 +00:00
Sanjay Patel c036d844be [SelectionDAG] add simplifySelect() to reduce code duplication; NFC
This should be extended to handle FP and vectors in follow-up patches.

llvm-svn: 347210
2018-11-19 14:35:22 +00:00
Clement Courbet bbab546a71 [llvm-exegesis][NFC] More tests for ExegesisTarget::fillMemoryOperands().
Reviewers: gchatelet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54304

llvm-svn: 347209
2018-11-19 14:31:43 +00:00
Martin Elshuber fef3036d37 Subject: [PATCH] [CodeGen] Add pass to combine interleaved loads.
This patch defines an interleaved-load-combine pass. The pass searches
for ShuffleVector instructions that represent interleaved loads. Matches are
converted such that they will be captured by the InterleavedAccessPass.

The pass extends LLVMs capabilities to use target specific instruction
selection of interleaved load patterns (e.g.: ld4 on Aarch64
architectures).

Differential Revision: https://reviews.llvm.org/D52653

llvm-svn: 347208
2018-11-19 14:26:10 +00:00
Eugene Leviant 0c7460ad05 [ThinLTO] Fix comment. NFC
llvm-svn: 347207
2018-11-19 14:19:37 +00:00
Sanjay Patel 3fa76bd08f [SelectionDAG] fix formatting; NFC
llvm-svn: 347206
2018-11-19 14:03:07 +00:00
Sam McCall e84385fef8 [FileManager] getFile(open=true) after getFile(open=false) should open the file.
Summary:
Old behavior is to just return the cached entry regardless of opened-ness.
That feels buggy (though I guess nobody ever actually needed this).

This came up in the context of clangd+clang-tidy integration: we're
going to getFile(open=false) to replay preprocessor actions obscured by
the preamble, but the compilation may subsequently getFile(open=true)
for non-preamble includes.

Reviewers: ilya-biryukov

Subscribers: ioeric, kadircet, cfe-commits

Differential Revision: https://reviews.llvm.org/D54691

llvm-svn: 347205
2018-11-19 13:37:46 +00:00
Roman Lebedev 71fdb57640 [llvm-exegesis] (+final perf overview) InstructionBenchmarkClustering::rangeQuery(): reserve for the upper bound of Neighbors
Summary:
As it was pointed out in D54388+D54390, the maximal size of `Neighbors` is known,
it will contain at most Points_.size() minus one (the center of the cluster)

While that is the upper bound, meaning in the most cases, the actual count
will be much smaller, since D54390 made the allocation persistent,
we no longer have to worry about overly-optimistically `reserve()`ing.

Old: (D54393)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       6553.167456      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.21% )
...
            6.5547 +- 0.0134 seconds time elapsed  ( +-  0.20% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       6315.057872      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.24% )
...
            6.3187 +- 0.0160 seconds time elapsed  ( +-  0.25% )
```
And that is another -~4%.


Since this is the last (as of this moment) patch in this patch series,
it is a good time to summarize:
Old: (svn trunk, as stated in D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m24.884s
user    0m24.099s
sys     0m0.785s
```
So these patches, on a given benchmark,
has decreased llvm-exegesis analysis time by 74.62%.

There surely is more room for further improvements.
D54514 may improve thins by -11.5% more (relative to this patch).
Parallelization may improve things further significantly, too.


Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54415

llvm-svn: 347204
2018-11-19 13:28:41 +00:00
Roman Lebedev 8e315b66c2 [llvm-exegesis] Move InstructionBenchmarkClustering::isNeighbour() into header
Summary:
Old: (D54390)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       7432.421721      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.15% )
...
            7.4336 +- 0.0115 seconds time elapsed  ( +-  0.15% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       6569.936144      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.22% )
...
            6.5711 +- 0.0143 seconds time elapsed  ( +-  0.22% )
```
And another -12%. You'd think it would be `inline`d anyway, but no! :)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54393

llvm-svn: 347203
2018-11-19 13:28:36 +00:00
Roman Lebedev 666d855fbb [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): write into llvm::SmallVectorImpl& output parameter
Summary:
I do believe this is the correct fix.
We call `rangeQuery()` *very* often. And many times it's output vector is large (tens of thousands entries), so small-size-opt won't help.

Old: (D54389)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       7934.528363      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.19% )
...
            7.9354 +- 0.0148 seconds time elapsed  ( +-  0.19% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       7383.793440      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.47% )
...
            7.3868 +- 0.0340 seconds time elapsed  ( +-  0.46% )
```
And another -7%. And that isn't even the good bit yet.

Old:
* calls to allocation functions: 2081419
* temporary allocations: 219658 (10.55%)
* bytes allocated in total (ignoring deallocations): 4.31 GB

New:
* calls to allocation functions: 1880295 (-10%)
* temporary allocations: 18758 (1%) (-91% *sic*)
* bytes allocated in total (ignoring deallocations): 545.15 MB (-88% *sic*)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54390

llvm-svn: 347202
2018-11-19 13:28:31 +00:00
Roman Lebedev 5c5b1ea725 [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): replace std::vector<> with std::deque<> in llvm::SetVector<>
Summary:
Old: (D54388)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       8606.323981      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.11% )
...
           8.60773 +- 0.00978 seconds time elapsed  ( +-  0.11% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       7971.403653      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.14% )
...
            7.9728 +- 0.0113 seconds time elapsed  ( +-  0.14% )
```
Another -~7%.

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, RKSimon

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54389

llvm-svn: 347201
2018-11-19 13:28:26 +00:00
Roman Lebedev 8aecb0c489 [llvm-exegesis] InstructionBenchmarkClustering::rangeQuery(): use llvm::SmallVector<size_t, 0> for storage.
Summary:
Old: (D54383)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       9098.781978      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.16% )
...
            9.1015 +- 0.0148 seconds time elapsed  ( +-  0.16% )
```
New:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (10 runs):

       8553.352480      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.12% )
...
            8.5539 +- 0.0105 seconds time elapsed  ( +-  0.12% )
```
So another -6%.
That is because the `SmallVector` **doubles** it size when reallocating, which is great here,
since we can't `reserve()` since we can't know how many `Neighbors` we will have.

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54388

llvm-svn: 347200
2018-11-19 13:28:22 +00:00
Roman Lebedev b311c1d6b8 [llvm-exegesis] Analysis: writeMeasurementValue(): don't alloc string for double each time.
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54382)
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       9024.354355      task-clock (msec)         #    1.000 CPUs utilized            ( +-  0.18% )
...
            9.0262 +- 0.0161 seconds time elapsed  ( +-  0.18% )
```
New time:
```
 Performance counter stats for './bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html' (16 runs):

       8996.541057      task-clock (msec)         #    0.999 CPUs utilized            ( +-  0.19% )
...
            9.0045 +- 0.0172 seconds time elapsed  ( +-  0.19% )
```
-~0.3%, not that much. But this isn't the important part.

Old:
* calls to allocation functions: 2109712
* temporary allocations: 33112
* bytes allocated in total (ignoring deallocations): 4.43 GB

New:
* calls to allocation functions: 2095345 (-0.68%)
* temporary allocations: 18745 (-43.39% !!!)
* bytes allocated in total (ignoring deallocations): 4.31 GB (-2.71%)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54383

llvm-svn: 347199
2018-11-19 13:28:17 +00:00
Roman Lebedev f8b28e9bf4 [llvm-exegesis] Analysis::writeSnippet(): be smarter about memory allocations.
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time: (D54381)
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m10.487s
user    0m9.745s
sys     0m0.740s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m9.599s
user    0m8.824s
sys     0m0.772s

```
Not that much, around -9%. But that is not the good part yet, again.

Old:
* calls to allocation functions: 3347676
* temporary allocations: 277818
* bytes allocated in total (ignoring deallocations): 10.52 GB

New:
* calls to allocation functions: 2109712 (-36%)
* temporary allocations: 33112 (-88%)
* bytes allocated in total (ignoring deallocations): 4.43 GB (-58% *sic*)

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet, MaskRay

Subscribers: tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54382

llvm-svn: 347198
2018-11-19 13:28:14 +00:00
Roman Lebedev 0b4b512826 [llvm-exegesis] InstructionBenchmarkClustering::dbScan(): use llvm::SetVector<> instead of ILLEGAL std::unordered_set<>
Summary:
Test data: 500kLOC of benchmark.yaml, 23Mb. (that is a subset of the actual uops benchmark i was trying to analyze!)
Old time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m24.884s
user    0m24.099s
sys     0m0.785s
```
New time:
```
$ time ./bin/llvm-exegesis -mode=analysis -analysis-epsilon=100000 -benchmarks-file=/tmp/benchmarks.yaml -analysis-inconsistencies-output-file=/tmp/clusters.html &> /dev/null

real    0m10.469s
user    0m9.797s
sys     0m0.672s
```
So -60%. And that isn't the good bit yet.

Old:
* calls to allocation functions: 106560180  (yes, 107 *million* allocations.)
* bytes allocated in total (ignoring deallocations): 12.17 GB

New:
* calls to allocation functions: 3347676  (-96.86%)  (just 3 mil)
* bytes allocated in total (ignoring deallocations): 10.52 GB (~2GB less)

---

Two points i want to raise:
* `std::unordered_set<>` should not have been used there in the first place.
  It is banned by the https://llvm.org/docs/ProgrammersManual.html#other-set-like-container-options
* There is no tests, so i'm not fully sure this is correct.
  Since it was unordered set, i guess there are zero restrictions on the order, and anything will be ok?
* I tried other containers suggested in https://llvm.org/docs/ProgrammersManual.html#set-like-containers-std-set-smallset-setvector-etc,
  this `llvm::SetVector<>` seems to be best here.

Reviewers: courbet, MaskRay, RKSimon, gchatelet, john.brawn

Reviewed By: courbet

Subscribers: kristina, bobsayshilol, tschuett, llvm-commits

Differential Revision: https://reviews.llvm.org/D54381

llvm-svn: 347197
2018-11-19 13:28:09 +00:00
Anastasia Stulova b879df24c9 Fixed uninitialized variable issue.
This commit should fix failing bots.

llvm-svn: 347196
2018-11-19 12:43:39 +00:00
Simon Pilgrim f6c2fbdd1a [X86] Add codegen tests for slow-shld scalar funnel shifts
llvm-svn: 347195
2018-11-19 12:29:41 +00:00
Michael Platings 49af748672 Test commit - delete trailing space.
llvm-svn: 347194
2018-11-19 12:16:05 +00:00
Michael Platings dad611c1a0 Test commit - delete a trailing space.
llvm-svn: 347193
2018-11-19 12:10:07 +00:00
Nicolai Haehnle c548d91419 AMDGPU/InsertWaitcnts: Some more const-correctness
Reviewers: msearles, rampitec, scott.linder, kanarayan

Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam

Differential Revision: https://reviews.llvm.org/D54225

llvm-svn: 347192
2018-11-19 12:03:11 +00:00
Sam Parker e7c42dd7e2 [ARM] Remove trunc sinks in ARM CGP
Truncs are treated as sources if their produce a value of the same
type as the one we currently trying to promote. Truncs used to be
considered as a sink if their operand was the same value type.
    
We now allow smaller types in the search, so we should search through
truncs that produce a smaller value. These truncs can then be
converted to an AND mask.
    
This leaves sinks as being:
  - points where the value in the register is being observed, such as
    an icmp, switch or store.
  - points where value types have to match, such as calls and returns.
  - zext are included to ease the transformation and are generally
    removed later on.
    
During this change, it also became apart from truncating sinks was
broken: if a sink used a source, its type information had already
been lost by the time the truncation happens. So I've changed the
method of caching the type information.

Differential Revision: https://reviews.llvm.org/D54515

llvm-svn: 347191
2018-11-19 11:34:40 +00:00
John Brawn 12c046fba0 [LICM] Make LICM able to hoist phis
The general approach taken is to make note of loop invariant branches, then when
we see something conditional on that branch, such as a phi, we create a copy of
the branch and (empty versions of) its successors and hoist using that.

This has no impact by itself that I've been able to see, as LICM typically
doesn't see such phis as they will have been converted into selects by the time
LICM is run, but once we start doing phi-to-select conversion later it will be
important.

Differential Revision: https://reviews.llvm.org/D52827

llvm-svn: 347190
2018-11-19 11:31:24 +00:00
Anastasia Stulova 7eb6938c48 [OpenCL] Fix address space deduction in template args.
Don't deduce address spaces for non-pointer-like types
in template args.

Fixes PR38603!

Differential Revision: https://reviews.llvm.org/D54634

llvm-svn: 347189
2018-11-19 11:00:14 +00:00
Benjamin Kramer 2775b5caae Remove unused variable. NFC.
llvm-svn: 347188
2018-11-19 10:59:12 +00:00
Anton Korobeynikov 4df19b75c0 [MSP430] Optimize srl/sra in case of A >> (8 + N)
There is no variable-length shifts on MSP430. Therefore
"eat" 8 bits of shift via bswap & ext.

Path by Kristina Bessonova!

Differential Revision: https://reviews.llvm.org/D54623

llvm-svn: 347187
2018-11-19 10:43:02 +00:00
Serge Guelton 12c7a96064 Fix disturbing warning - NFCI
llvm-svn: 347186
2018-11-19 10:05:28 +00:00
Craig Topper 8b22bcd39f [X86] Use a pcmpgt with 0 instead of psrad 31, to fill elements with the sign bit in v4i32 MULH lowering.
The shift requires a copy to avoid clobbering a register. Comparing with 0 uses an xor to produce 0 that will be overwritten with the compare results. So still requires 2 instructions, but should be one byte shorter since it doesn't need to encode an immediate.

llvm-svn: 347185
2018-11-19 07:22:26 +00:00
Fangrui Song 209cfbe60e [LoopSimplifyCFG] Add requires: asserts after rL347183
llvm-svn: 347184
2018-11-19 06:28:15 +00:00
Max Kazantsev 8e3e33d138 [LoopSimplifyCFG] Teach LoopSimplifyCFG to constant-fold branches and switches
This patch introduces infrastructure and the simplest case for constant-folding
of branch and switch instructions within loop into unconditional branches.
It is useful as a cleanup for such passes as loop unswitching that sometimes
produce such branches.

Only the simplest case supported in this patch: after the folding, no block
should become dead or stop being part of the loop. Support for more
sophisticated cases will go separately in follow-up patches.

Differential Revision: https://reviews.llvm.org/D54021
Reviewed By: anna

llvm-svn: 347183
2018-11-19 05:54:38 +00:00
Vedant Kumar e7b789b529 [ProfileSummary] Standardize methods and fix comment
Every Analysis pass has a get method that returns a reference of the Result of
the Analysis, for example, BlockFrequencyInfo
&BlockFrequencyInfoWrapperPass::getBFI().  I believe that
ProfileSummaryInfo::getPSI() is the only exception to that, as it was returning
a pointer.

Another change is renaming isHotBB and isColdBB to isHotBlock and isColdBlock,
respectively.  Most methods use BB as the argument of variable names while
methods usually refer to Basic Blocks as Blocks, instead of BB.  For example,
Function::getEntryBlock, Loop:getExitBlock, etc.

I also fixed one of the comments.

Patch by Rodrigo Caetano Rocha!

Differential Revision: https://reviews.llvm.org/D54669

llvm-svn: 347182
2018-11-19 05:23:16 +00:00
Craig Topper 3616891046 [X86] Use compare with 0 to fill an element with sign bits when sign extending to v2i64 pre-sse4.1
Previously we used an arithmetic shift right by 31, but that requires a copy to preserve the input. So we might as well materialize a zero and compare to it since the comparison will overwrite the register that contains the zeros. This should be one byte shorter.

llvm-svn: 347181
2018-11-19 04:33:20 +00:00
Craig Topper 053f1eea96 [X86] Remove most of the SEXTLOAD Custom setOperationAction calls under -x86-experimental-vector-widening-legalization.
Leave just the v4i8->v4i64 and v8i8->v8i64, but only enable them on pre-sse4.1 targets when 64-bit mode is enabled. In those cases we end up creating sext loads that get scalarized to code that looks better than what we get from loading into a vector register and doing a multiple step sign extend using unpacks and shifts.

llvm-svn: 347180
2018-11-19 00:33:16 +00:00
Brad Smith a7b204b44f [PowerPC] Set the default PLT mode on OpenBSD/powerpc to Secure PLT.
OpenBSD/powerpc only supports Secure PLT.

llvm-svn: 347179
2018-11-19 00:21:06 +00:00
Brad Smith 58ceba6e46 Replace the UTF-8 characters in the error message.
llvm-svn: 347178
2018-11-18 22:30:58 +00:00
Simon Pilgrim 7f92efa5a9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE packed i2fp conversions.
llvm-svn: 347177
2018-11-18 22:13:31 +00:00
Craig Topper 0468c860b7 [X86] Add custom type legalization for extending v4i8/v4i16->v4i64.
Pre-SSE4.1 sext_invec for v2i64 is complicated because we don't have a v2i64 sra instruction. So instead we sign extend to i32 using unpack and sra, then copy the elements and do a v4i32 sra to fill with sign bits, then interleave the i32 sign extend and the sign bits. So really we're doing to two sign extends but only using half of the v4i32 intermediate result.

When the result is more than 128 bits, default type legalization would prefer to split the destination type all the way down to v2i64 with shuffles followed by v16i8/v8i16->v2i64 sext_inreg operations. This results in more instructions than necessary because we are only utilizing the lower 2 elements of the v4i32 intermediate result. Instead we can custom split a v4i8/v4i16->v4i64 sign_extend. Then we can sign extend v4i8/v4i16->v4i32 invec producing a full v4i32 result. Create the sign bit vector as a v4i32 then split and interleave with the sign bits using an punpackldq and punpackhdq.

llvm-svn: 347176
2018-11-18 21:28:50 +00:00
Craig Topper 950f3842cc [X86] Add a 32-bit command line with only sse2 to vector-sext.ll and vector-sext.ll to show some of the scalarized load sequences without 64-bit scalar support.
Some of these sequeces look pretty bad since we have to copy the sign bit from a 32 bit register to a 64 bit register to finish a sign extend.

llvm-svn: 347175
2018-11-18 21:28:47 +00:00
Zachary Turner f8610fc4e7 Revert "Implement basic DidAttach and DidLaunch for DynamicLoaderWindowsDYLD"
This breaks many tests on Windows, which now all fail with an error such
as "Unable to read memory at address <xxxxxxxx>".

llvm-svn: 347174
2018-11-18 20:48:25 +00:00
Simon Pilgrim b31bdbd2e9 [X86][SSE] Add SimplifyDemandedVectorElts support for SSE splat-vector-shifts.
SSE vector shifts only use the bottom 64-bits of the shift amount vector.

llvm-svn: 347173
2018-11-18 20:21:52 +00:00
Craig Topper 11d50948e2 [X86] Disable combineToExtendVectorInReg under -x86-experimental-vector-widening-legalization. Add custom type legalization for extends.
If we widen illegal types instead of promoting, we should be able to rely on the type legalizer to create the vector_inreg operations for us with some caveats.

This patch disables combineToExtendVectorInReg when we are using widening.

I've enabled custom legalization for v8i8->v8i64 extends under avx512f since the type legalizer would want to create a vector_inreg with a v64i8 input type which isn't legal without avx512bw. So we go to v16i8 with custom code using the relaxation of rules we get from D54346.

I've also enable custom legalization of v8i64 and v16i32 operations with with AVX. When the input type is 128 bits, the default splitting legalization would extend first 128->256, then do the a split to two 128 pieces. Extend each half to 256 and then concat the result. The custom legalization I've added instead uses a 128->256 bit vector_inreg extend that only reads the lower 64-bits for the low half of the split. Then shuffles the high 64-bits to the low 64-bits and does another vector_inreg extend.

llvm-svn: 347172
2018-11-18 18:11:25 +00:00
Craig Topper bc8148f7b0 [X86] Lower v16i16->v8i16 truncate using an 'and' with 255, an extract_subvector, and a packuswb instruction.
Summary: This is an improvement over the two pshufbs and punpcklqdq we'd get otherwise.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D54671

llvm-svn: 347171
2018-11-18 17:59:28 +00:00
Sanjay Patel 8c0cd77bff [DAG] add undef simplifications for select nodes
Sadly, this duplicates (twice) the logic from InstSimplify. There
might be some way to at least share the DAG versions of the code, 
but copying the folds seems to be the standard method to ensure 
that we don't miss these folds. 

Unlike in IR, we don't run DAGCombiner to fixpoint, so there's no 
way to ensure that we do these kinds of simplifications unless the 
code is repeated at node creation time and during combines.

There were other tests that would become worthless with this
improvement that I changed as pre-commits:
rL347161
rL347164
rL347165
rL347166
rL347167

I'm not sure how to salvage the remaining tests (diffs in this patch).
So the x86 tests verify that the new code is working as intended.
The AMDGPU test is actually similar to my motivating case: we have
some undef value that has survived to machine IR in an x86 test, and 
then it gets folded in some weird way, or we crash if we don't transfer
the undef flag. But we would have been better off never getting to that
point by doing these simplifications.

This will lead back to PR32023 someday...
https://bugs.llvm.org/show_bug.cgi?id=32023

llvm-svn: 347170
2018-11-18 17:36:23 +00:00
Simon Pilgrim ec808cf541 Remove unused variable. NFCI.
llvm-svn: 347169
2018-11-18 17:24:59 +00:00