Commit Graph

29 Commits

Author SHA1 Message Date
Dávid Bolvanský cd54c57919 Reland "[Libcalls, Attrs] Annotate libcalls with noundef"
Fixed Clang tests.
2021-02-20 06:18:48 +01:00
Florian Hahn 01089c876b
[InstCombine] Preserve !annotation on newly created instructions.
If the source instruction has !annotation metadata, all instructions
created during combining should also have it. Tell the builder to
add it.

The !annotation system was discussed on llvm-dev as part of
'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)

This patch is based on an earlier patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, lebedev.ri

Differential Revision: https://reviews.llvm.org/D91444
2020-12-17 15:20:23 +00:00
Florian Hahn ca2e7e5999 [IRGen] Add !annotation metadata for auto-init stores.
This patch updates Clang's IRGen to add !annotation nodes with an
"auto-init" annotation to all stores for auto-initialization.

As discussed in 'RFC: Combining Annotation Metadata and Remarks'
(http://lists.llvm.org/pipermail/llvm-dev/2020-November/146393.html)
this allows using optimization remarks to track down where auto-init
code was inserted (and not removed by optimizations).

There are a few cases in the tests where !annotation gets dropped by
optimizations. Those optimizations will be updated in subsequent
patches.

This patch is based on a patch by Francis Visoiu Mistrih.

Reviewed By: thegameg, paquette

Differential Revision: https://reviews.llvm.org/D91417
2020-11-16 10:37:02 +00:00
Jon Roelofs 38b39c34ab [clang] Add missing FileCheck colons 2020-04-14 12:32:48 -06:00
Eli Friedman 3f1defa6e2 [clang codegen] Clean up handling of vectors with trivial-auto-var-init.
The code was pretending to be doing something useful with vectors, but
really it was doing nothing: the element type of a vector is always a
scalar type, so constWithPadding would always just return the input constant.

Split off from D75661 so it can be reviewed separately.

While I'm here, also add testcase to show missing vector handling.

Differential Revision: https://reviews.llvm.org/D76528
2020-03-24 14:34:40 -07:00
Eric Christopher fd39b1bb20 Revert "Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.""
This reapplies: 8ff85ed905

Original commit message:

As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.

This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.

Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.

Original post:

http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html

Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410

This reverts commit c9ddb02659.
2019-11-26 20:28:52 -08:00
Muhammad Omair Javaid c9ddb02659 Revert "As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there."
This reverts commit 8ff85ed905.

This commit introduced 9 new failures on lldb buildbot host at http://lab.llvm.org:8014/builders/lldb-aarch64-ubuntu

Following tests were failing:
    lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq1/TestAmbiguousTailCallSeq1.py
    lldb-api :: functionalities/tail_call_frames/ambiguous_tail_call_seq2/TestAmbiguousTailCallSeq2.py
    lldb-api :: functionalities/tail_call_frames/disambiguate_call_site/TestDisambiguateCallSite.py
    lldb-api :: functionalities/tail_call_frames/disambiguate_paths_to_common_sink/TestDisambiguatePathsToCommonSink.py
    lldb-api :: functionalities/tail_call_frames/disambiguate_tail_call_seq/TestDisambiguateTailCallSeq.py
    lldb-api :: functionalities/tail_call_frames/inlining_and_tail_calls/TestInliningAndTailCalls.py
    lldb-api :: functionalities/tail_call_frames/sbapi_support/TestTailCallFrameSBAPI.py
    lldb-api :: functionalities/tail_call_frames/thread_step_out_message/TestArtificialFrameStepOutMessage.py
    lldb-api :: functionalities/tail_call_frames/thread_step_out_or_return/TestSteppingOutWithArtificialFrames.py
    lldb-api :: functionalities/tail_call_frames/unambiguous_sequence/TestUnambiguousTailCalls.py

Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
2019-11-26 09:32:13 +05:00
Eric Christopher 8ff85ed905 As a follow-up to my initial mail to llvm-dev here's a first pass at the O1 described there.
This change doesn't include any change to move from selection dag to fast isel
and that will come with other numbers that should help inform that decision.
There also haven't been any real debuggability studies with this pipeline yet,
this is just the initial start done so that people could see it and we could start
tweaking after.

Test updates: Outside of the newpm tests most of the updates are coming from either
optimization passes not run anymore (and without a compelling argument at the moment)
that were largely used for canonicalization in clang.

Original post:

http://lists.llvm.org/pipermail/llvm-dev/2019-April/131494.html

Tags: #llvm
Differential Revision: https://reviews.llvm.org/D65410
2019-11-25 17:16:46 -08:00
Clement Courbet 30b5331df8 [clang][codegen][NFC] Make test patterns more permissive.
See the discussion in:
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190909/692736.html

llvm-svn: 371522
2019-09-10 14:20:08 +00:00
David Bolvansky c3012b2c26 [NFC] Updated tests after r368657
llvm-svn: 368658
2019-08-13 09:12:07 +00:00
Vitaly Buka c2ac925d6e CodeGet: Init 32bit pointers with 0xFFFFFFFF
Summary:
Patch makes D63967 effective for 32bit platforms and improves pattern
initialization there. It cuts size of 32bit binary compiled with
-ftrivial-auto-var-init=pattern by 2% (3% with -Os).

Binary size change on CTMark, (with -fuse-ld=lld -Wl,--icf=all, similar results with default linker options)
```
                   master           patch      diff
Os pattern   7.915580e+05    7.698424e+05 -0.028387
O3 pattern   9.953688e+05    9.752952e+05 -0.019325
```

Zero vs Pattern on master
```
               zero       pattern      diff
Os     7.689712e+05  7.915580e+05  0.031380
O3     9.744796e+05  9.953688e+05  0.021133
```

Zero vs Pattern with the patch
```
               zero       pattern      diff
Os     7.689712e+05  7.698424e+05  0.000789
O3     9.744796e+05  9.752952e+05  0.000742
```

Reviewers: pcc, eugenis, glider, jfb

Reviewed By: jfb

Subscribers: hubert.reinterpretcast, dexonsmith, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D64597

llvm-svn: 365921
2019-07-12 17:21:55 +00:00
Vitaly Buka c559e63798 Handle IntToPtr in isBytewiseValue
Summary:
This helps with more efficient use of memset for pattern initialization

From @pcc prototype for -ftrivial-auto-var-init=pattern optimizations

Binary size change on CTMark, (with -fuse-ld=lld -Wl,--icf=all, similar results with default linker options)
```
                   master           patch      diff
Os           8.238864e+05    8.238864e+05       0.0
O3           1.054797e+06    1.054797e+06       0.0
Os zero      8.292384e+05    8.292384e+05       0.0
O3 zero      1.062626e+06    1.062626e+06       0.0
Os pattern   8.579712e+05    8.338048e+05 -0.030299
O3 pattern   1.090502e+06    1.067574e+06 -0.020481
```

Zero vs Pattern on master
```
               zero       pattern      diff
Os     8.292384e+05  8.579712e+05  0.036578
O3     1.062626e+06  1.090502e+06  0.025124
```

Zero vs Pattern with the patch
```
               zero       pattern      diff
Os     8.292384e+05  8.338048e+05  0.003333
O3     1.062626e+06  1.067574e+06  0.003193
```

Reviewers: pcc, eugenis

Subscribers: hiraditya, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D63967

llvm-svn: 365858
2019-07-12 01:42:03 +00:00
Vitaly Buka 669ad5ff15 Codegen, NFC: 32bit test in auto-var-init.cpp
llvm-svn: 365857
2019-07-12 01:36:11 +00:00
Vitaly Buka f55aad0356 CodeGen: Suppress c++ warnings in test
llvm-svn: 365835
2019-07-11 21:59:09 +00:00
Vitaly Buka 07bfa5b870 CodeGen, NFC: Test for auto-init for 32bit pointers
llvm-svn: 365822
2019-07-11 20:51:59 +00:00
Leonard Chan f66309203e [clang][NewPM] Add -fno-experimental-new-pass-manager to tests
As per the discussion on D58375, we disable test that have optimizations under
the new PM. This patch adds -fno-experimental-new-pass-manager to RUNS that:

- Already run with optimizations (-O1 or higher) that were missed in D58375.
- Explicitly test new PM behavior along side some new PM RUNS, but are missing
  this flag if new PM is enabled by default.
- Specify -O without the number. Based on getOptimizationLevel(), it seems the
  default is 2, and the IR appears to be the same when changed to -O2, so
  update the test to explicitly say -O2 and provide -fno-experimental-new-pass-manager`.

Differential Revision: https://reviews.llvm.org/D63156

llvm-svn: 364066
2019-06-21 16:03:06 +00:00
JF Bastien cdf26f15d1 Fix auto-init test
r359628 changed the initialization of padding to follow C, but I didn't update the C++ tests.

llvm-svn: 359636
2019-04-30 23:27:28 +00:00
Peter Collingbourne 68b4673fea CodeGen: Preserve packed attribute in constStructWithPadding.
Otherwise the object may have an incorrect size due to tail padding.

Differential Revision: https://reviews.llvm.org/D59446

llvm-svn: 356328
2019-03-16 19:25:39 +00:00
JF Bastien b5e5bc760e Variable auto-init: split out small arrays
Summary: Following up with r355181, initialize small arrays as well.

LLVM stage2 shows a tiny size gain.

<rdar://48523005>

Reviewers: glider, pcc, kcc, rjmccall

Subscribers: jkorous, dexonsmith, jdoerfert, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D58885

llvm-svn: 355660
2019-03-08 01:26:49 +00:00
Alexander Potapenko fa61dddf5d CodeGen: Fix PR40605 by splitting constant struct initializers
When emitting initializers for local structures for code built with
-ftrivial-auto-var-init, replace constant structures with sequences of
stores.

This appears to greatly help removing dead initialization stores to those
locals that are later overwritten by other data.
This also removes a lot of .rodata constants (see PR40605), replacing most
of them with immediate values (for Linux kernel the .rodata size is
reduced by ~1.9%)

llvm-svn: 355181
2019-03-01 09:00:41 +00:00
Alexander Potapenko 4f7bc0eee7 CodeGen: Explicitly initialize structure padding in the -ftrivial-auto-var-init mode
When generating initializers for local structures in the
-ftrivial-auto-var-init mode, explicitly wipe the padding bytes with
either 0x00 or 0xAA.

This will allow us to automatically handle the padding when splitting
the initialization stores (see https://reviews.llvm.org/D57898).

Reviewed at https://reviews.llvm.org/D58188

llvm-svn: 354861
2019-02-26 10:46:21 +00:00
JF Bastien 17aec7bd36 [NFC] Reorder some mis-ordered tests
I somehow had misaligned some of the tests when I originally wrote this. Re-order them properly.

llvm-svn: 354831
2019-02-25 23:09:34 +00:00
Nemanja Ivanovic db64e7e9fa [NFC] Explicitly add -std=c++14 option to tests that rely on the C++14 default
When Clang/LLVM is built with the CLANG_DEFAULT_STD_CXX CMake macro that sets
the default standard to something other than C++14, there are a number of lit
tests that fail as they rely on the C++14 default.
This patch just adds the language standard option explicitly to such test cases.

Differential revision: https://reviews.llvm.org/D57581

llvm-svn: 353163
2019-02-05 12:05:53 +00:00
JF Bastien 14daa20be1 Automatic variable initialization
Summary:
Add an option to initialize automatic variables with either a pattern or with
zeroes. The default is still that automatic variables are uninitialized. Also
add attributes to request uninitialized on a per-variable basis, mainly to disable
initialization of large stack arrays when deemed too expensive.

This isn't meant to change the semantics of C and C++. Rather, it's meant to be
a last-resort when programmers inadvertently have some undefined behavior in
their code. This patch aims to make undefined behavior hurt less, which
security-minded people will be very happy about. Notably, this means that
there's no inadvertent information leak when:

  - The compiler re-uses stack slots, and a value is used uninitialized.
  - The compiler re-uses a register, and a value is used uninitialized.
  - Stack structs / arrays / unions with padding are copied.

This patch only addresses stack and register information leaks. There's many
more infoleaks that we could address, and much more undefined behavior that
could be tamed. Let's keep this patch focused, and I'm happy to address related
issues elsewhere.

To keep the patch simple, only some `undef` is removed for now, see
`replaceUndef`. The padding-related infoleaks are therefore not all gone yet.
This will be addressed in a follow-up, mainly because addressing padding-related
leaks should be a stand-alone option which is implied by variable
initialization.

There are three options when it comes to automatic variable initialization:

  0. Uninitialized

    This is C and C++'s default. It's not changing. Depending on code
    generation, a programmer who runs into undefined behavior by using an
    uninialized automatic variable may observe any previous value (including
    program secrets), or any value which the compiler saw fit to materialize on
    the stack or in a register (this could be to synthesize an immediate, to
    refer to code or data locations, to generate cookies, etc).

  1. Pattern initialization

    This is the recommended initialization approach. Pattern initialization's
    goal is to initialize automatic variables with values which will likely
    transform logic bugs into crashes down the line, are easily recognizable in
    a crash dump, without being values which programmers can rely on for useful
    program semantics. At the same time, pattern initialization tries to
    generate code which will optimize well. You'll find the following details in
    `patternFor`:

    - Integers are initialized with repeated 0xAA bytes (infinite scream).
    - Vectors of integers are also initialized with infinite scream.
    - Pointers are initialized with infinite scream on 64-bit platforms because
      it's an unmappable pointer value on architectures I'm aware of. Pointers
      are initialize to 0x000000AA (small scream) on 32-bit platforms because
      32-bit platforms don't consistently offer unmappable pages. When they do
      it's usually the zero page. As people try this out, I expect that we'll
      want to allow different platforms to customize this, let's do so later.
    - Vectors of pointers are initialized the same way pointers are.
    - Floating point values and vectors are initialized with a negative quiet
      NaN with repeated 0xFF payload (e.g. 0xffffffff and 0xffffffffffffffff).
      NaNs are nice (here, anways) because they propagate on arithmetic, making
      it more likely that entire computations become NaN when a single
      uninitialized value sneaks in.
    - Arrays are initialized to their homogeneous elements' initialization
      value, repeated. Stack-based Variable-Length Arrays (VLAs) are
      runtime-initialized to the allocated size (no effort is made for negative
      size, but zero-sized VLAs are untouched even if technically undefined).
    - Structs are initialized to their heterogeneous element's initialization
      values. Zero-size structs are initialized as 0xAA since they're allocated
      a single byte.
    - Unions are initialized using the initialization for the largest member of
      the union.

    Expect the values used for pattern initialization to change over time, as we
    refine heuristics (both for performance and security). The goal is truly to
    avoid injecting semantics into undefined behavior, and we should be
    comfortable changing these values when there's a worthwhile point in doing
    so.

    Why so much infinite scream? Repeated byte patterns tend to be easy to
    synthesize on most architectures, and otherwise memset is usually very
    efficient. For values which aren't entirely repeated byte patterns, LLVM
    will often generate code which does memset + a few stores.

  2. Zero initialization

    Zero initialize all values. This has the unfortunate side-effect of
    providing semantics to otherwise undefined behavior, programs therefore
    might start to rely on this behavior, and that's sad. However, some
    programmers believe that pattern initialization is too expensive for them,
    and data might show that they're right. The only way to make these
    programmers wrong is to offer zero-initialization as an option, figure out
    where they are right, and optimize the compiler into submission. Until the
    compiler provides acceptable performance for all security-minded code, zero
    initialization is a useful (if blunt) tool.

I've been asked for a fourth initialization option: user-provided byte value.
This might be useful, and can easily be added later.

Why is an out-of band initialization mecanism desired? We could instead use
-Wuninitialized! Indeed we could, but then we're forcing the programmer to
provide semantics for something which doesn't actually have any (it's
uninitialized!). It's then unclear whether `int derp = 0;` lends meaning to `0`,
or whether it's just there to shut that warning up. It's also way easier to use
a compiler flag than it is to manually and intelligently initialize all values
in a program.

Why not just rely on static analysis? Because it cannot reason about all dynamic
code paths effectively, and it has false positives. It's a great tool, could get
even better, but it's simply incapable of catching all uses of uninitialized
values.

Why not just rely on memory sanitizer? Because it's not universally available,
has a 3x performance cost, and shouldn't be deployed in production. Again, it's
a great tool, it'll find the dynamic uses of uninitialized variables that your
test coverage hits, but it won't find the ones that you encounter in production.

What's the performance like? Not too bad! Previous publications [0] have cited
2.7 to 4.5% averages. We've commmitted a few patches over the last few months to
address specific regressions, both in code size and performance. In all cases,
the optimizations are generally useful, but variable initialization benefits
from them a lot more than regular code does. We've got a handful of other
optimizations in mind, but the code is in good enough shape and has found enough
latent issues that it's a good time to get the change reviewed, checked in, and
have others kick the tires. We'll continue reducing overheads as we try this out
on diverse codebases.

Is it a good idea? Security-minded folks think so, and apparently so does the
Microsoft Visual Studio team [1] who say "Between 2017 and mid 2018, this
feature would have killed 49 MSRC cases that involved uninitialized struct data
leaking across a trust boundary. It would have also mitigated a number of bugs
involving uninitialized struct data being used directly.". They seem to use pure
zero initialization, and claim to have taken the overheads down to within noise.
Don't just trust Microsoft though, here's another relevant person asking for
this [2]. It's been proposed for GCC [3] and LLVM [4] before.

What are the caveats? A few!

  - Variables declared in unreachable code, and used later, aren't initialized.
    This goto, Duff's device, other objectionable uses of switch. This should
    instead be a hard-error in any serious codebase.
  - Volatile stack variables are still weird. That's pre-existing, it's really
    the language's fault and this patch keeps it weird. We should deprecate
    volatile [5].
  - As noted above, padding isn't fully handled yet.

I don't think these caveats make the patch untenable because they can be
addressed separately.

Should this be on by default? Maybe, in some circumstances. It's a conversation
we can have when we've tried it out sufficiently, and we're confident that we've
eliminated enough of the overheads that most codebases would want to opt-in.
Let's keep our precious undefined behavior until that point in time.

How do I use it:

  1. On the command-line:

    -ftrivial-auto-var-init=uninitialized (the default)
    -ftrivial-auto-var-init=pattern
    -ftrivial-auto-var-init=zero -enable-trivial-auto-var-init-zero-knowing-it-will-be-removed-from-clang

  2. Using an attribute:

    int dont_initialize_me __attribute((uninitialized));

  [0]: https://users.elis.ugent.be/~jsartor/researchDocs/OOPSLA2011Zero-submit.pdf
  [1]: https://twitter.com/JosephBialek/status/1062774315098112001
  [2]: https://outflux.net/slides/2018/lss/danger.pdf
  [3]: https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00615.html
  [4]: 776a0955ef
  [5]: http://wg21.link/p1152

I've also posted an RFC to cfe-dev: http://lists.llvm.org/pipermail/cfe-dev/2018-November/060172.html

<rdar://problem/39131435>

Reviewers: pcc, kcc, rsmith

Subscribers: JDevlieghere, jkorous, dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D54604

llvm-svn: 349442
2018-12-18 05:12:21 +00:00
JF Bastien fe2ea8249d CDDecl More automatic variable tail padding test
Test tail padded automatic variable at different width, because they encounter different codegen.

llvm-svn: 339273
2018-08-08 17:05:17 +00:00
JF Bastien cc365c376a [NFC] Improve auto-var-init alignment check
We're not actually testing for alignment, we just want to know that whatever incoming alignment got propagated. Do that by capturing the alignment and checking that it's actually what's passed later, instead of hard-coding an alignment value.

llvm-svn: 339196
2018-08-07 22:43:44 +00:00
JF Bastien 8137793a8f Auto var init test fix #2
It turns out that the AVX bots have different alignment for their vectors, and my test mistakenly assumed a particular vector alignent on the stack. Instead, capture the alignment and test for it in subsequent operations.

llvm-svn: 339093
2018-08-07 04:44:13 +00:00
JF Bastien 1b222cea08 Remove broken command flag
I was using it for testing, r339089 shouldn't have contained it.

llvm-svn: 339090
2018-08-07 04:03:03 +00:00
JF Bastien c70f65e86e [NFC] Test automatic variable initialization
Summary:
r337887 started using memset for automatic variable initialization where sensible. A follow-up discussion leads me to believe that we should better test automatic variable initialization, and that there are probably follow-up patches in clang and LLVM to improve codegen. It’ll be important to measure -O0 compile time, and figure out which transforms should be in the frontend versus the backend.

This patch is just a test of the current behavior, no questions asked. Follow-up patches will tune the code generation.

<rdar://problem/42981573>

Subscribers: dexonsmith, cfe-commits

Differential Revision: https://reviews.llvm.org/D50361

llvm-svn: 339089
2018-08-07 03:12:52 +00:00