Commit Graph

82497 Commits

Author SHA1 Message Date
Ilya Dryomov 9dd2845ccb libceph: protect osdc->osd_lru list with a spinlock
OSD client is getting moved from the big per-client lock to a set of
per-session locks.  The big rwlock would only be held for read most of
the time, so a global osdc->osd_lru needs additional protection.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:12:30 +02:00
Ilya Dryomov 42c1b12403 libceph: handle_one_map()
Separate osdmap handling from decoding and iterating over a bag of maps
in a fresh MOSDMap message.  This sets up the scene for the updated OSD
client.

Of particular importance here is the addition of pi->was_full, which
can be used to answer "did this pool go full -> not-full in this map?".
This is the key bit for supporting pool quotas.

We won't be able to downgrade map_sem for much longer, so drop
downgrade_write().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 01:12:27 +02:00
Ilya Dryomov e5253a7bde libceph: allocate dummy osdmap in ceph_osdc_init()
This leads to a simpler osdmap handling code, particularly when dealing
with pi->was_full, which is introduced in a later commit.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:29 +02:00
Ilya Dryomov fe5da05e97 libceph: redo callbacks and factor out MOSDOpReply decoding
If you specify ACK | ONDISK and set ->r_unsafe_callback, both
->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
very confusing.  Redo this so that only one of them is called:

    ->r_unsafe_callback(true), on ack
    ->r_unsafe_callback(false), on commit

or

    ->r_callback, on ack|commit

Decode everything in decode_MOSDOpReply() to reduce clutter.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:28 +02:00
Ilya Dryomov 85e084feb4 libceph: drop msg argument from ceph_osdc_callback_t
finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success.  This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:27 +02:00
Ilya Dryomov bb873b5391 libceph: switch to calc_target(), part 2
The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target.  Encoding now happens within ceph_osdc_start_request().

Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded.  If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.

Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op().  While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:27 +02:00
Ilya Dryomov a66dd38309 libceph: switch to calc_target(), part 1
Replace __calc_request_pg() and most of __map_request() with
calc_target() and start using req->r_t.

ceph_osdc_build_request() however still encodes base_oid, because it's
called before calc_target() is and target_oid is empty at that point in
time; a printf in osdc_show() also shows base_oid.  This is fixed in
"libceph: switch to calc_target(), part 2".

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:26 +02:00
Ilya Dryomov 63244fa123 libceph: introduce ceph_osd_request_target, calc_target()
Introduce ceph_osd_request_target, containing all mapping-related
fields of ceph_osd_request and calc_target() for calculating mappings
and populating it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:26 +02:00
Ilya Dryomov 04812acf57 libceph: pi->min_size, pi->last_force_request_resend
Add and decode pi->min_size and pi->last_force_request_resend.  These
are going to be used by calc_target().

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:26 +02:00
Ilya Dryomov f984cb76cc libceph: make pgid_cmp() global
calc_target() code is going to need to know how to compare PGs.  Take
lhs and rhs pgid by const * while at it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:25 +02:00
Ilya Dryomov f81f16339a libceph: rename ceph_calc_pg_primary()
Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
emphasise that it returns acting primary.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:25 +02:00
Ilya Dryomov 6f3bfd45cd libceph: ceph_osds, ceph_pg_to_up_acting_osds()
Knowning just acting set isn't enough, we need to be able to record up
set as well to detect interval changes.  This means returning (up[],
up_len, up_primary, acting[], acting_len, acting_primary) and passing
it around.  Introduce and switch to ceph_osds to help with that.

Rename ceph_calc_pg_acting() to ceph_pg_to_up_acting_osds() and return
both up and acting sets from it.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:25 +02:00
Ilya Dryomov d9591f5e28 libceph: rename ceph_oloc_oid_to_pg()
Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg().  Emphasise
that returned is raw PG and return -ENOENT instead of -EIO if the pool
doesn't exist.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:24 +02:00
Ilya Dryomov 985c167388 libceph: fix ceph_eversion encoding
eversion_t is version+epoch in userspace and is encoded in that order.
ceph_eversion is defined as epoch+version in rados.h, yet we memcpy it
in __send_request().  Reoder ceph_eversion fields.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:24 +02:00
Ilya Dryomov fcd00b68bb libceph: DEFINE_RB_FUNCS macro
Given

    struct foo {
        u64 id;
        struct rb_node bar_node;
    };

generate insert_bar(), erase_bar() and lookup_bar() functions with

    DEFINE_RB_FUNCS(bar, struct foo, id, bar_node)

The key is assumed to be an integer (u64, int, etc), compared with
< and >.  nodefld has to be initialized with RB_CLEAR_NODE().

Start using it for MDS, MON and OSD requests and OSD sessions.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:23 +02:00
Ilya Dryomov 0c0a8de13f libceph: nuke unused fields and functions
Either unused or useless:

    osdmap->mkfs_epoch
    osd->o_marked_for_keepalive
    monc->num_generic_requests
    osdc->map_waiters
    osdc->last_requested_map
    osdc->timeout_tid

    osd_req_op_cls_response_data()

    osdmap_apply_incremental() @msgr arg

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:23 +02:00
Ilya Dryomov d30291b985 libceph: variable-sized ceph_object_id
Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
expect one - long rbd image names:

- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"

We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh.  (A format 2 header name is
a small system-generated string.)

Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string.  Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:22 +02:00
Ilya Dryomov 13d1ad16d0 libceph: move message allocation out of ceph_osdc_alloc_request()
The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed.  Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:

    req = ceph_osdc_alloc_request(...);
    <fill in req->r_base_oid>
    <fill in req->r_base_oloc>
    ceph_osdc_alloc_messages(req);

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2016-05-26 00:36:21 +02:00
Linus Torvalds 6ba5b85fd4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "Overlayfs fixes from Miklos, assorted fixes from me.

  Stable fodder of varying severity, all sat in -next for a while"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  ovl: ignore permissions on underlying lookup
  vfs: add lookup_hash() helper
  vfs: rename: check backing inode being equal
  vfs: add vfs_select_inode() helper
  get_rock_ridge_filename(): handle malformed NM entries
  ecryptfs: fix handling of directory opening
  atomic_open(): fix the handling of create_error
  fix the copy vs. map logics in blk_rq_map_user_iov()
  do_splice_to(): cap the size before passing to ->splice_read()
2016-05-14 11:59:43 -07:00
Linus Torvalds 1410b74e40 Merge branch 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "During v4.6-rc1 cgroup namespace support was merged.  There is an
  issue where it's impossible to tell whether a given cgroup mount point
  is bind mounted or namespaced.  Serge has been working on the issue
  but it took longer than expected to resolve, so the late pull request.

  Given that it's a completely new feature and the patches don't touch
  anything else, the risk seems acceptable.  However, if this is too
  late, an alternative is plugging new cgroup ns creation for v4.6 and
  retrying for v4.7"

* 'for-4.6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix compile warning
  kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call
  cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
  kernfs_path_from_node_locked: don't overwrite nlen
2016-05-13 16:26:46 -07:00
Linus Torvalds c3548b730d regulator: Fixes for v4.6
A small collection of driver specific fixes for the regulator
 subsysetem:
 
  - Fix handling of probe deferral for GPIO regulators.
  - Fix a typo in the module alias for DA9053.
  - Fix the definition of BUCK9 in the S2MPS11 driver.  This change looks
    larger than it is because an irregularity in the hardware means that
    the macro used to define bucks 6-10 needs duplicating and tweaking
    to have a separate macro for 9.
  - Fix a series of errors in the definitions of the LDOs the AXP20x
    regulators, some of which had always been present and some of which
    were introduced in the merge window.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXNazxAAoJECTWi3JdVIfQd9EH/0x/3Era4ctuLFP/dKVg+lAQ
 PDwgGRlHg1RVyEzzy1ZWqew//P7hgBo6urwFHjcczBE95e0bBhhMJERGnhya07uq
 eWYfPARl1zHxEm0gOfLBDaFAW8YIaNyXiEsW/aTeViNIL9/HIAd/ZTZhJiDLPiAF
 FdNs3yzBuAR5SHsnvCTXZY1d20LeQC6AC2nWUo2WrcyKKZZ0HamrSLP0lMq0OHVQ
 n2h/8f1g9TcykakCStB/4D8vLKMt0OYmalh3Y/C/JGkWnhMoBTLRgktrNFh6lnh9
 E0SnMXSuZbC9+mv8MLRHQft6XagDYzNq+odwSb9tfqbXqBdncOpvvVr6hLmzxGk=
 =sVmG
 -----END PGP SIGNATURE-----

Merge tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
 "A small collection of driver specific fixes for the regulator
  subsysetem:

   - Fix handling of probe deferral for GPIO regulators

   - Fix a typo in the module alias for DA9053

   - Fix the definition of BUCK9 in the S2MPS11 driver.  This change
     looks larger than it is because an irregularity in the hardware
     means that the macro used to define bucks 6-10 needs duplicating
     and tweaking to have a separate macro for 9

   - Fix a series of errors in the definitions of the LDOs the AXP20x
     regulators, some of which had always been present and some of which
     were introduced in the merge window"

* tag 'regulator-fix-v4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: da9063: Correct module alias prefix to fix module autoloading
  regulator: axp20x: Fix axp22x ldo_io registration error on cold boot
  regulator: axp20x: Fix axp22x ldo_io voltage ranges
  regulator: axp20x: Fix LDO4 linear voltage range
  regulator: s2mps11: Fix invalid selector mask and voltages for buck9
  regulator: gpio: check return value of of_get_named_gpio
2016-05-13 09:46:00 -07:00
Mark Brown 9689dab30a Merge remote-tracking branches 'regulator/fix/axp20x', 'regulator/fix/da9063', 'regulator/fix/gpio' and 'regulator/fix/s2mps11' into regulator-linus 2016-05-13 11:11:08 +01:00
Andrea Arcangeli 6d0a07edd1 mm: thp: calculate the mapcount correctly for THP pages during WP faults
This will provide fully accuracy to the mapcount calculation in the
write protect faults, so page pinning will not get broken by false
positive copy-on-writes.

total_mapcount() isn't the right calculation needed in
reuse_swap_page(), so this introduces a page_trans_huge_mapcount()
that is effectively the full accurate return value for page_mapcount()
if dealing with Transparent Hugepages, however we only use the
page_trans_huge_mapcount() during COW faults where it strictly needed,
due to its higher runtime cost.

This also provide at practical zero cost the total_mapcount
information which is needed to know if we can still relocate the page
anon_vma to the local vma. If page_trans_huge_mapcount() returns 1 we
can reuse the page no matter if it's a pte or a pmd_trans_huge
triggering the fault, but we can only relocate the page anon_vma to
the local vma->anon_vma if we're sure it's only this "vma" mapping the
whole THP physical range.

Kirill A. Shutemov discovered the problem with moving the page
anon_vma to the local vma->anon_vma in a previous version of this
patch and another problem in the way page_move_anon_rmap() was called.

Andrew Morton discovered that CONFIG_SWAP=n wouldn't build in a
previous version, because reuse_swap_page must be a macro to call
page_trans_huge_mapcount from swap.h, so this uses a macro again
instead of an inline function. With this change at least it's a less
dangerous usage than it was before, because "page" is used only once
now, while with the previous code reuse_swap_page(page++) would have
called page_mapcount on page+1 and it would have increased page twice
instead of just once.

Dean Luick noticed an uninitialized variable that could result in a
rmap inefficiency for the non-THP case in a previous version.

Mike Marciniszyn said:

: Our RDMA tests are seeing an issue with memory locking that bisects to
: commit 61f5d698cc ("mm: re-enable THP")
:
: The test program registers two rather large MRs (512M) and RDMA
: writes data to a passive peer using the first and RDMA reads it back
: into the second MR and compares that data.  The sizes are chosen randomly
: between 0 and 1024 bytes.
:
: The test will get through a few (<= 4 iterations) and then gets a
: compare error.
:
: Tracing indicates the kernel logical addresses associated with the individual
: pages at registration ARE correct , the data in the "RDMA read response only"
: packets ARE correct.
:
: The "corruption" occurs when the packet crosse two pages that are not physically
: contiguous.   The second page reads back as zero in the program.
:
: It looks like the user VA at the point of the compare error no longer points to
: the same physical address as was registered.
:
: This patch totally resolves the issue!

Link: http://lkml.kernel.org/r/1462547040-1737-2-git-send-email-aarcange@redhat.com
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: "Kirill A. Shutemov" <kirill@shutemov.name>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Tested-by: Josh Collier <josh.d.collier@intel.com>
Cc: Marc Haber <mh+linux-kernel@zugschlus.de>
Cc: <stable@vger.kernel.org>	[4.5]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-12 15:52:50 -07:00
Al Viro e4d35be584 Merge branch 'ovl-fixes' into for-linus 2016-05-11 00:00:29 -04:00
Miklos Szeredi 3c9fe8cdff vfs: add lookup_hash() helper
Overlayfs needs lookup without inode_permission() and already has the name
hash (in form of dentry->d_name on overlayfs dentry).  It also doesn't
support filesystems with d_op->d_hash() so basically it only needs
the actual hashed lookup from lookup_one_len_unlocked()

So add a new helper that does unlocked lookup of a hashed name.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
2016-05-10 23:56:28 -04:00
Miklos Szeredi 54d5ca871e vfs: add vfs_select_inode() helper
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org> # v4.2+
2016-05-10 23:55:01 -04:00
Jamal Hadi Salim d99079e2fb export tc ife uapi header
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-10 01:08:39 -04:00
Mikko Rapeli 4a91cb61bb uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h
glibc's net/if.h contains copies of definitions from linux/if.h and these
conflict and cause build failures if both files are included by application
source code. Changes in uapi headers, which fixed header file dependencies to
include linux/if.h when it was needed, e.g. commit 1ffad83d, made the
net/if.h and linux/if.h incompatibilities visible as build failures for
userspace applications like iproute2 and xtables-addons.

This patch fixes compile errors when glibc net/if.h is included before
linux/if.h:

./linux/if.h:99:21: error: redeclaration of enumerator ‘IFF_NOARP’
./linux/if.h:98:23: error: redeclaration of enumerator ‘IFF_RUNNING’
./linux/if.h:97:26: error: redeclaration of enumerator ‘IFF_NOTRAILERS’
./linux/if.h:96:27: error: redeclaration of enumerator ‘IFF_POINTOPOINT’
./linux/if.h:95:24: error: redeclaration of enumerator ‘IFF_LOOPBACK’
./linux/if.h:94:21: error: redeclaration of enumerator ‘IFF_DEBUG’
./linux/if.h:93:25: error: redeclaration of enumerator ‘IFF_BROADCAST’
./linux/if.h:92:19: error: redeclaration of enumerator ‘IFF_UP’
./linux/if.h:252:8: error: redefinition of ‘struct ifconf’
./linux/if.h:203:8: error: redefinition of ‘struct ifreq’
./linux/if.h:169:8: error: redefinition of ‘struct ifmap’
./linux/if.h:107:23: error: redeclaration of enumerator ‘IFF_DYNAMIC’
./linux/if.h:106:25: error: redeclaration of enumerator ‘IFF_AUTOMEDIA’
./linux/if.h:105:23: error: redeclaration of enumerator ‘IFF_PORTSEL’
./linux/if.h:104:25: error: redeclaration of enumerator ‘IFF_MULTICAST’
./linux/if.h:103:21: error: redeclaration of enumerator ‘IFF_SLAVE’
./linux/if.h:102:22: error: redeclaration of enumerator ‘IFF_MASTER’
./linux/if.h:101:24: error: redeclaration of enumerator ‘IFF_ALLMULTI’
./linux/if.h💯23: error: redeclaration of enumerator ‘IFF_PROMISC’

The cases where linux/if.h is included before net/if.h need a similar fix in
the glibc side, or the order of include files can be changed userspace
code as a workaround.

This change was tested in x86 userspace on Debian unstable with
scripts/headers_compile_test.sh:

$ make headers_install && \
  cd usr/include && ../../scripts/headers_compile_test.sh -l -k
...
cc -Wall -c -nostdinc -I /usr/lib/gcc/i586-linux-gnu/5/include -I /usr/lib/gcc/i586-linux-gnu/5/include-fixed -I . -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH -I /home/mcfrisk/src/linux-2.6/usr/headers_compile_test_include.2uX2zH/i586-linux-gnu -o /dev/null ./linux/if.h_libc_before_kernel.h
PASSED libc before kernel test: ./linux/if.h

Reported-by: Jan Engelhardt <jengelh@inai.de>
Reported-by: Josh Boyer <jwboyer@fedoraproject.org>
Reported-by: Stephen Hemminger <shemming@brocade.com>
Reported-by: Waldemar Brodkorb <mail@waldemar-brodkorb.de>
Cc: Gabriel Laskar <gabriel@lse.epita.fr>
Signed-off-by: Mikko Rapeli <mikko.rapeli@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09 21:29:31 -04:00
Linus Torvalds 26acc792c9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Check klogctl failure correctly, from Colin Ian King.

 2) Prevent OOM when under memory pressure in flowcache, from Steffen
    Klassert.

 3) Fix info leak in llc and rtnetlink ifmap code, from Kangjie Lu.

 4) Memory barrier and multicast handling fixes in bnxt_en, from Michael
    Chan.

 5) Endianness bug in mlx5, from Daniel Jurgens.

 6) Fix disconnect handling in VSOCK, from Ian Campbell.

 7) Fix locking of netdev list walking in get_bridge_ifindices(), from
    Nikolay Aleksandrov.

 8) Bridge multicast MLD parser can look at wrong packet offsets, fix
    from Linus Lüssing.

 9) Fix chip hang in qede driver, from Sudarsana Reddy Kalluru.

10) Fix missing setting of encapsulation before inner handling completes
    in udp_offload code, from Jarno Rajahalme.

11) Missing rollbacks during LAG join and flood configuration failures
    in mlxsw driver, from Ido Schimmel.

12) Fix error code checks in netxen driver, from Dan Carpenter.

13) Fix key size in new macsec driver, from Sabrina Dubroca.

14) Fix mlx5/VXLAN dependencies, from Arnd Bergmann.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (29 commits)
  net/mlx5e: make VXLAN support conditional
  Revert "net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue"
  macsec: key identifier is 128 bits, not 64
  Documentation/networking: more accurate LCO explanation
  macvtap: segmented packet is consumed
  tools: bpf_jit_disasm: check for klogctl failure
  qede: uninitialized variable in qede_start_xmit()
  netxen: netxen_rom_fast_read() doesn't return -1
  netxen: reversed condition in netxen_nic_set_link_parameters()
  netxen: fix error handling in netxen_get_flash_block()
  mlxsw: spectrum: Add missing rollback in flood configuration
  mlxsw: spectrum: Fix rollback order in LAG join failure
  udp_offload: Set encapsulation before inner completes.
  udp_tunnel: Remove redundant udp_tunnel_gro_complete().
  qede: prevent chip hang when increasing channels
  net: ipv6: tcp reset, icmp need to consider L3 domain
  bridge: fix igmp / mld query parsing
  net: bridge: fix old ioctl unlocked net device walk
  VSOCK: do not disconnect socket when peer has shutdown SEND only
  net/mlx4_en: Fix endianness bug in IPV6 csum calculation
  ...
2016-05-09 12:11:37 -07:00
Josh Poimboeuf 8634de6d25 compiler-gcc: require gcc 4.8 for powerpc __builtin_bswap16()
gcc support for __builtin_bswap16() was supposedly added for powerpc in
gcc 4.6, and was then later added for other architectures in gcc 4.8.

However, Stephen Rothwell reported that attempting to use it on powerpc
in gcc 4.6 fails with:

  lib/vsprintf.c:160:2: error: initializer element is not constant
  lib/vsprintf.c:160:2: error: (near initialization for 'decpair[0]')
  lib/vsprintf.c:160:2: error: initializer element is not constant
  lib/vsprintf.c:160:2: error: (near initialization for 'decpair[1]')
  ...

I'm not entirely sure what those errors mean, but I don't see them on
gcc 4.8.  So let's consider gcc 4.8 to be the official starting point
for __builtin_bswap16().

Arnd Bergmann adds:
 "I found the commit in gcc-4.8 that replaced the powerpc-specific
  implementation of __builtin_bswap16 with an architecture-independent
  one.  Apparently the powerpc version (gcc-4.6 and 4.7) just mapped to
  the lhbrx/sthbrx instructions, so it ended up not being a constant,
  though the intent of the patch was mainly to add support for the
  builtin to x86:

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52624

  has the patch that went into gcc-4.8 and more information."

Fixes: 7322dd755e ("byteswap: try to avoid __builtin_constant_p gcc bug")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Tested-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-09 11:54:29 -07:00
Serge E. Hallyn 4f41fc5962 cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces
Patch summary:

When showing a cgroupfs entry in mountinfo, show the path of the mount
root dentry relative to the reader's cgroup namespace root.

Short explanation (courtesy of mkerrisk):

If we create a new cgroup namespace, then we want both /proc/self/cgroup
and /proc/self/mountinfo to show cgroup paths that are correctly
virtualized with respect to the cgroup mount point.  Previous to this
patch, /proc/self/cgroup shows the right info, but /proc/self/mountinfo
does not.

Long version:

When a uid 0 task which is in freezer cgroup /a/b, unshares a new cgroup
namespace, and then mounts a new instance of the freezer cgroup, the new
mount will be rooted at /a/b.  The root dentry field of the mountinfo
entry will show '/a/b'.

 cat > /tmp/do1 << EOF
 mount -t cgroup -o freezer freezer /mnt
 grep freezer /proc/self/mountinfo
 EOF

 unshare -Gm  bash /tmp/do1
 > 330 160 0:34 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer
 > 355 133 0:34 /a/b /mnt rw,relatime - cgroup freezer rw,freezer

The task's freezer cgroup entry in /proc/self/cgroup will simply show
'/':

 grep freezer /proc/self/cgroup
 9:freezer:/

If instead the same task simply bind mounts the /a/b cgroup directory,
the resulting mountinfo entry will again show /a/b for the dentry root.
However in this case the task will find its own cgroup at /mnt/a/b,
not at /mnt:

 mount --bind /sys/fs/cgroup/freezer/a/b /mnt
 130 25 0:34 /a/b /mnt rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,freezer

In other words, there is no way for the task to know, based on what is
in mountinfo, which cgroup directory is its own.

Example (by mkerrisk):

First, a little script to save some typing and verbiage:

echo -e "\t/proc/self/cgroup:\t$(cat /proc/self/cgroup | grep freezer)"
cat /proc/self/mountinfo | grep freezer |
        awk '{print "\tmountinfo:\t\t" $4 "\t" $5}'

Create cgroup, place this shell into the cgroup, and look at the state
of the /proc files:

2653
2653                         # Our shell
14254                        # cat(1)
        /proc/self/cgroup:      10:freezer:/a/b
        mountinfo:              /       /sys/fs/cgroup/freezer

Create a shell in new cgroup and mount namespaces. The act of creating
a new cgroup namespace causes the process's current cgroups directories
to become its cgroup root directories. (Here, I'm using my own version
of the "unshare" utility, which takes the same options as the util-linux
version):

Look at the state of the /proc files:

        /proc/self/cgroup:      10:freezer:/
        mountinfo:              /       /sys/fs/cgroup/freezer

The third entry in /proc/self/cgroup (the pathname of the cgroup inside
the hierarchy) is correctly virtualized w.r.t. the cgroup namespace, which
is rooted at /a/b in the outer namespace.

However, the info in /proc/self/mountinfo is not for this cgroup
namespace, since we are seeing a duplicate of the mount from the
old mount namespace, and the info there does not correspond to the
new cgroup namespace. However, trying to create a new mount still
doesn't show us the right information in mountinfo:

                                      # propagating to other mountns
        /proc/self/cgroup:      7:freezer:/
        mountinfo:              /a/b    /mnt/freezer

The act of creating a new cgroup namespace caused the process's
current freezer directory, "/a/b", to become its cgroup freezer root
directory. In other words, the pathname directory of the directory
within the newly mounted cgroup filesystem should be "/",
but mountinfo wrongly shows us "/a/b". The consequence of this is
that the process in the cgroup namespace cannot correctly construct
the pathname of its cgroup root directory from the information in
/proc/PID/mountinfo.

With this patch, the dentry root field in mountinfo is shown relative
to the reader's cgroup namespace.  So the same steps as above:

        /proc/self/cgroup:      10:freezer:/a/b
        mountinfo:              /       /sys/fs/cgroup/freezer
        /proc/self/cgroup:      10:freezer:/
        mountinfo:              /../..  /sys/fs/cgroup/freezer
        /proc/self/cgroup:      10:freezer:/
        mountinfo:              /       /mnt/freezer

cgroup.clone_children  freezer.parent_freezing  freezer.state      tasks
cgroup.procs           freezer.self_freezing    notify_on_release
3164
2653                   # First shell that placed in this cgroup
3164                   # Shell started by 'unshare'
14197                  # cat(1)

Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Tested-by: Michael Kerrisk <mtk.manpages@gmail.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2016-05-09 12:15:03 -04:00
Sabrina Dubroca 8acca6aceb macsec: key identifier is 128 bits, not 64
The MACsec standard mentions a key identifier for each key, but
doesn't specify anything about it, so I arbitrarily chose 64 bits.

IEEE 802.1X-2010 specifies MKA (MACsec Key Agreement), and defines the
key identifier to be 128 bits (96 bits "member identifier" + 32 bits
"key number").

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-09 00:09:01 -04:00
Jarno Rajahalme 229740c631 udp_offload: Set encapsulation before inner completes.
UDP tunnel segmentation code relies on the inner offsets being set for
an UDP tunnel GSO packet, but the inner *_complete() functions will
set the inner offsets only if 'encapsulation' is set before calling
them.  Currently, udp_gro_complete() sets 'encapsulation' only after
the inner *_complete() functions are done.  This causes the inner
offsets having invalid values after udp_gro_complete() returns, which
in turn will make it impossible to properly segment the packet in case
it needs to be forwarded, which would be visible to the user either as
invalid packets being sent or as packet loss.

This patch fixes this by setting skb's 'encapsulation' in
udp_gro_complete() before calling into the inner complete functions,
and by making each possible UDP tunnel gro_complete() callback set the
inner_mac_header to the beginning of the tunnel payload.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Reviewed-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:26 -04:00
Jarno Rajahalme 43b8448cd7 udp_tunnel: Remove redundant udp_tunnel_gro_complete().
The setting of the UDP tunnel GSO type is already performed by
udp[46]_gro_complete().

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-06 18:25:26 -04:00
Linus Torvalds 01ec716761 Power management and ACPI fixes for v4.6-rc7
- Fix for a recent regression in the intel_pstate driver causing
    it to fail to restore the HWP (HW-managed P-states) configuration
    of the boot CPU after suspend-to-RAM (Rafael Wysocki).
 
  - Fix for two recent regressions in the intel_pstate driver, one
    that can trigger a divide by zero if the driver is accessed via
    sysfs before it manages to take the first sample and one causing
    it to fail to update a structure field used in a trace point, so
    the information coming from it is less useful (Rafael Wysocki).
 
  - Fix for a problem in the sti-cpufreq driver introduced during
    the 4.5 cycle that causes it to break CPU PM in multi-platform
    kernels by registering cpufreq-dt (which subsequently doesn't
    work) unconditionally and preventing the driver that would
    actually work from registering (Sudeep Holla).
 
  - Stable-candidate fix for an ARM64 cpuidle issue causing idle
    state usage counters to be incorrectly updated for idle states
    that were not entered due to errors (James Morse).
 
  - Fix for a recently introduced issue in the OPP (Operating
    Performance Points) framework causing it to print bogus error
    messages for missing optional regulators (Viresh Kumar).
 
  - Fix for a recently introduced issue in the generic device
    properties framework that may cause it to attempt to dereferece
    and invalid pointer in some cases (Heikki Krogerus).
 
  - Fix for a deadlock in the ACPICA core that may be triggered
    by device (eg. Thunderbolt) hotplug (Prarit Bhargava).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJXLIwPAAoJEILEb/54YlRxT+wP/ROEo/r5IaRZ2k8cphWjsiKk
 k9eDuWBL2KZ29ikghXs/vVY2fMbtQkaDT5h57imsUEKoEzI3MlYA3OkQyffFOcsY
 dz/9EnG6K9Efi6VS1dS1tNCgl45aIeHLCqlVPOBCZ9TwSoAERdNJGqItJdS2YKIA
 +C1LGrWl4UiJ95AOof9PHfKfnWxrnRbpIsB2PbxD0Swe5vfskrHoRWGOAMLJIwpF
 7NvEJ15fryDIvlMR/ggNrg2L2piOu1fJl2kVZYWZTb/u+qAO3utxTQN4y++zTSNb
 LAN78Hq/nJu156SSioO9fLa0wPaU+k2OChfWXtlMsTDK+L5EQz4G3pJwi5FA8QTD
 nfeZNC9VgqfP4LtqWw05h/AOw4A0XUeuwB8Edbc+WG5twzULqDhS57jew4A4xX8d
 jOsvK5syygnR+/rExWc0NWSmCH0g1u6mCUWXQuocfSb/oOEcUGq5RSixRNRfmJUq
 9XNF3hbp7W/Vnp9GWT30Md+CenrEtQXFK8ZQtg0ckBl+b5bEqKYs6FXGqCkUmjZy
 Qgt5sqxgdLWtslS3vSu1/mdryeaLmXNO6c6wueSPMmLyYODEoIHSSka9N9O0Inwv
 d106p7gUy3/ETamC3lbnyHkUrAru74Qh8rErKpqaRLkKfcIq7YCB073fxbqlamzz
 X4n8a1H37LefLqmKwIbF
 =pU+A
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI fixes from Rafael Wysocki:
 "Fixes for problems introduced or discovered recently (intel_pstate,
  sti-cpufreq, ARM64 cpuidle, Operating Performance Points framework,
  generic device properties framework) and one fix for a hotplug-related
  deadlock in ACPICA that's been there forever, but is nasty enough.

  Specifics:

   - Fix for a recent regression in the intel_pstate driver causing it
     to fail to restore the HWP (HW-managed P-states) configuration of
     the boot CPU after suspend-to-RAM (Rafael Wysocki).

   - Fix for two recent regressions in the intel_pstate driver, one that
     can trigger a divide by zero if the driver is accessed via sysfs
     before it manages to take the first sample and one causing it to
     fail to update a structure field used in a trace point, so the
     information coming from it is less useful (Rafael Wysocki).

   - Fix for a problem in the sti-cpufreq driver introduced during the
     4.5 cycle that causes it to break CPU PM in multi-platform kernels
     by registering cpufreq-dt (which subsequently doesn't work)
     unconditionally and preventing the driver that would actually work
     from registering (Sudeep Holla).

   - Stable-candidate fix for an ARM64 cpuidle issue causing idle state
     usage counters to be incorrectly updated for idle states that were
     not entered due to errors (James Morse).

   - Fix for a recently introduced issue in the OPP (Operating
     Performance Points) framework causing it to print bogus error
     messages for missing optional regulators (Viresh Kumar).

   - Fix for a recently introduced issue in the generic device
     properties framework that may cause it to attempt to dereferece and
     invalid pointer in some cases (Heikki Krogerus).

   - Fix for a deadlock in the ACPICA core that may be triggered by
     device (eg Thunderbolt) hotplug (Prarit Bhargava)"

* tag 'pm+acpi-4.6-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / OPP: Remove useless check
  ACPICA: Dispatcher: Update thread ID for recursive method calls
  intel_pstate: Fix intel_pstate_get()
  cpufreq: intel_pstate: Fix HWP on boot CPU after system resume
  cpufreq: st: enable selective initialization based on the platform
  ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
  device property: Avoid potential dereferences of invalid pointers
2016-05-06 11:58:45 -07:00
Rafael J. Wysocki 7c21b38ca9 Merge branches 'acpica-fixes' and 'device-properties-fixes'
* acpica-fixes:
  ACPICA: Dispatcher: Update thread ID for recursive method calls

* device-properties-fixes:
  device property: Avoid potential dereferences of invalid pointers
2016-05-06 13:15:52 +02:00
Linus Torvalds 9caa7e7848 Merge branch 'akpm' (patches from Andrew)
Merge fixes from Andrew Morton:
 "14 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  byteswap: try to avoid __builtin_constant_p gcc bug
  lib/stackdepot: avoid to return 0 handle
  mm: fix kcompactd hang during memory offlining
  modpost: fix module autoloading for OF devices with generic compatible property
  proc: prevent accessing /proc/<PID>/environ until it's ready
  mm/zswap: provide unique zpool name
  mm: thp: kvm: fix memory corruption in KVM with THP enabled
  MAINTAINERS: fix Rajendra Nayak's address
  mm, cma: prevent nr_isolated_* counters from going negative
  mm: update min_free_kbytes from khugepaged after core initialization
  huge pagecache: mmap_sem is unlocked when truncation splits pmd
  rapidio/mport_cdev: fix uapi type definitions
  mm: memcontrol: let v2 cgroups follow changes in system swappiness
  mm: thp: correct split_huge_pages file permission
2016-05-05 20:48:35 -07:00
Arnd Bergmann 7322dd755e byteswap: try to avoid __builtin_constant_p gcc bug
This is another attempt to avoid a regression in wwn_to_u64() after that
started using get_unaligned_be64(), which in turn ran into a bug on
gcc-4.9 through 6.1.

The regression got introduced due to the combination of two separate
workarounds (commits e3bde9568d99: "include/linux/unaligned: force
inlining of byteswap operations" and ef3fb2422ffe: "scsi: fc: use
get/put_unaligned64 for wwn access") that each try to sidestep distinct
problems with gcc behavior (code growth and increased stack usage).

Unfortunately after both have been applied, a more serious gcc bug has
been uncovered, leading to incorrect object code that discards part of a
function and causes undefined behavior.

As part of this problem is how __builtin_constant_p gets evaluated on an
argument passed by reference into an inline function, this avoids the
use of __builtin_constant_p() for all architectures that set
CONFIG_ARCH_USE_BUILTIN_BSWAP.  Most architectures do not set
ARCH_SUPPORTS_OPTIMIZED_INLINING, which means they probably do not
suffer from the problem in the qla2xxx driver, but they might still run
into it elsewhere.

Both of the original workarounds were only merged in the 4.6 kernel, and
the bug that is fixed by this patch should only appear if both are
there, so we probably don't need to backport the fix.  On the other
hand, it works by simplifying the code path and should not have any
negative effects.

[arnd@arndb.de: fix older gcc warnings]
  (http://lkml.kernel.org/r/12243652.bxSxEgjgfk@wuerfel)
Link: https://lkml.org/lkml/headers/2016/4/12/1103
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66122
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70232
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70646
Fixes: e3bde9568d ("include/linux/unaligned: force inlining of byteswap operations")
Fixes: ef3fb2422f ("scsi: fc: use get/put_unaligned64 for wwn access")
Link: http://lkml.kernel.org/r/1780465.XdtPJpi8Tt@wuerfel
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Tested-by: Josh Poimboeuf <jpoimboe@redhat.com> # on gcc-5.3
Tested-by: Quinn Tran <quinn.tran@qlogic.com>
Cc: Martin Jambor <mjambor@suse.cz>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Jan Hubicka <hubicka@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05 17:38:53 -07:00
Andrea Arcangeli 127393fbe5 mm: thp: kvm: fix memory corruption in KVM with THP enabled
After the THP refcounting change, obtaining a compound pages from
get_user_pages() no longer allows us to assume the entire compound page
is immediately mappable from a secondary MMU.

A secondary MMU doesn't want to call get_user_pages() more than once for
each compound page, in order to know if it can map the whole compound
page.  So a secondary MMU needs to know from a single get_user_pages()
invocation when it can map immediately the entire compound page to avoid
a flood of unnecessary secondary MMU faults and spurious
atomic_inc()/atomic_dec() (pages don't have to be pinned by MMU notifier
users).

Ideally instead of the page->_mapcount < 1 check, get_user_pages()
should return the granularity of the "page" mapping in the "mm" passed
to get_user_pages().  However it's non trivial change to pass the "pmd"
status belonging to the "mm" walked by get_user_pages up the stack (up
to the caller of get_user_pages).  So the fix just checks if there is
not a single pte mapping on the page returned by get_user_pages, and in
turn if the caller can assume that the whole compound page is mapped in
the current "mm" (in a pmd_trans_huge()).  In such case the entire
compound page is safe to map into the secondary MMU without additional
get_user_pages() calls on the surrounding tail/head pages.  In addition
of being faster, not having to run other get_user_pages() calls also
reduces the memory footprint of the secondary MMU fault in case the pmd
split happened as result of memory pressure.

Without this fix after a MADV_DONTNEED (like invoked by QEMU during
postcopy live migration or balloning) or after generic swapping (with a
failure in split_huge_page() that would only result in pmd splitting and
not a physical page split), KVM would map the whole compound page into
the shadow pagetables, despite regular faults or userfaults (like
UFFDIO_COPY) may map regular pages into the primary MMU as result of the
pte faults, leading to the guest mode and userland mode going out of
sync and not working on the same memory at all times.

Any other secondary MMU notifier manager (KVM is just one of the many
MMU notifier users) will need the same information if it doesn't want to
run a flood of get_user_pages_fast and it can support multiple
granularity in the secondary MMU mappings, so I think it is justified to
be exposed not just to KVM.

The other option would be to move transparent_hugepage_adjust to
mm/huge_memory.c but that currently has all kind of KVM data structures
in it, so it's definitely not a cut-and-paste work, so I couldn't do a
fix as cleaner as this one for 4.6.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Li, Liang Z" <liang.z.li@intel.com>
Cc: Amit Shah <amit.shah@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05 17:38:53 -07:00
Alexandre Bounine 4e1016dac1 rapidio/mport_cdev: fix uapi type definitions
Fix problems in uapi definitions reported by Gabriel Laskar: (see
https://lkml.org/lkml/2016/4/5/205 for details)

 - move public header file rio_mport_cdev.h to include/uapi/linux directory
 - change types in data structures passed as IOCTL parameters
 - improve parameter checking in some IOCTL service routines

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Reported-by: Gabriel Laskar <gabriel@lse.epita.fr>
Tested-by: Barry Wood <barry.wood@idt.com>
Cc: Gabriel Laskar <gabriel@lse.epita.fr>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Andre van Herk <andre.van.herk@prodrive-technologies.com>
Cc: Barry Wood <barry.wood@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05 17:38:53 -07:00
Johannes Weiner 4550c4e157 mm: memcontrol: let v2 cgroups follow changes in system swappiness
Cgroup2 currently doesn't have a per-cgroup swappiness setting.  We
might want to add one later - that's a different discussion - but until
we do, the cgroups should always follow the system setting.  Otherwise
it will be unchangeably set to whatever the ancestor inherited from the
system setting at the time of cgroup creation.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: <stable@vger.kernel.org>	[4.5]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-05 17:38:53 -07:00
Linus Torvalds 85f397a97a asm-generic syscall fix for 4.6-rc
My last pull request for asm-generic had just one patch that added two
 new system calls to asm/unistd.h, but unfortunately it turned out
 to be wrong, pointing arch/tile compat mode at the native handlers
 rather than the compat ones.
 
 This was spotted by Yury Norov, who is working on ILP32 mode
 for arch/arm64, which would have the same problem when merged.
 This fixes the table to use the correct compat syscalls, like
 the other 64-bit architectures do.
 
 I'll try to find the time to come up with a solution that
 prevents this problem from happening again, by allowing all
 future system calls to just get added in a single file
 for use by all architectures.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIVAwUAVyuiZWCrR//JCVInAQLgcxAAnDsLXnepV7gYfkF3MjoN3GVR2BzehI+a
 f7YWTAoA/7MA9DJsSeSlqz0F0/M0TnVM7Yg3VkG4RvxhgSpHCnpol9/CEXuG4TLe
 1Yn3CqNyMfNv9G3WfwWwSu4NeRWUeZAYbNkmWovhx3uWzmk1I+BnShd+IWmzDo0v
 +KC9tkiq7NYlManpdUR+e80Eoougqkrryk6VAdNcgmVvVSCEhSA3VVxiTx26kcdd
 mI7oz0gcJxlCwMZvNfRFFtrEAN9XGwV8bwkO5gYD/1nQSbzXcGLkmFpJmw8eCIX7
 oM5gRs46tcKAEUA9fGVG58drrn0itwKqQO1LlUAhp+fsXU96c9rcgvZfY4Twehhk
 lGVIGPfRUJFOtXtVICofR0DPBkNvZB+EVTPV12gZlPOQKzSPHefEzQMkmUemZO13
 Pv1lB8EeKeKlXsC9cSfKyNgBUNAkV35gV1s6wg/MrZo4Asx5/TlqO85n2wU0fspg
 d8yb866F+guz3OvU+RPyJpZL4sZ6tlg0/4TpBpDJetUwxYZNo4c1KpdwMFr64AFI
 Xh3wckbIBfLSUmB4ex6GCchqDlhla4QMA8Vl/ij/bfLMR6SwQEvfrtGegCluJsVX
 LRWV2arusSJTgcdCxa+chVnXjd6xwet8wPgahpOpTKtkJ9B7fnTFheMY+RRIV5Bb
 V7h/3AEbkvs=
 =yaH0
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic syscall fix from Arnd Bergmann:
 "My last pull request for asm-generic had just one patch that added two
  new system calls to asm/unistd.h, but unfortunately it turned out to
  be wrong, pointing arch/tile compat mode at the native handlers rather
  than the compat ones.

  This was spotted by Yury Norov, who is working on ILP32 mode for
  arch/arm64, which would have the same problem when merged.  This fixes
  the table to use the correct compat syscalls, like the other 64-bit
  architectures do.

  I'll try to find the time to come up with a solution that prevents
  this problem from happening again, by allowing all future system calls
  to just get added in a single file for use by all architectures"

* tag 'asm-generic-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  asm-generic: use compat version for preadv2 and pwritev2
2016-05-05 15:40:38 -07:00
Yury Norov 1f93e9f231 asm-generic: use compat version for preadv2 and pwritev2
Compat architectures that does not use generic unistd (mips, s390),
declare compat version in their syscall tables for preadv2 and
pwritev2. Generic unistd syscall table should do it as well.

[arnd: this initially slipped through the review and an
 incorrect patch got merged. arch/tile/ is the only architecture
 that could be affected for their 32-bit compat mode, every
 other architecture we support today is fine.]

Signed-off-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2016-05-05 00:42:20 +02:00
David S. Miller 32b583a0cb Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:

====================
pull request (net): ipsec 2016-05-04

1) The flowcache can hit an OOM condition if too
   many entries are in the gc_list. Fix this by
   counting the entries in the gc_list and refuse
   new allocations if the value is too high.

2) The inner headers are invalid after a xfrm transformation,
   so reset the skb encapsulation field to ensure nobody tries
   access the inner headers. Otherwise tunnel devices stacked
   on top of xfrm may build the outer headers based on wrong
   informations.

3) Add pmtu handling to vti, we need it to report
   pmtu informations for local generated packets.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-04 16:35:31 -04:00
Linus Torvalds 41143b774a xen: regression fixes for 4.6-rc6
- Fix two regressions causing crashes in 32-bit PV guests.
 - Fix a regression in the evtchn driver.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJXKhf2AAoJEFxbo/MsZsTRYeEH/jkKi9CsCpcHBdJudqJ/RCmr
 ZLWu9JT5+iYUOD2o+NVumiMbNE7Ary/5rDVYzskzgtD2OL1MQJWQia3FC9Mhvj7y
 6lFEl5m6SPNMi1VOW+BPU8k6tduSgnPISYyJbsQ5/YoAiHNX+ieaWX5UhFFlfDkj
 8kpjVNclc3efgh6RqV1GzrmqhkYwVFATwG3SQFujzGSIC7KZNJHy5RQEH3UBvTU2
 ymcL+eO35VSZrH9BXVvOberI3ME3UOkFxFIlAVqlBEgO8MoOlV0tFs/ciqjsHZvH
 ieKAb5jxLlHTwIu6ZxL2vCMp/ili9jeDNsiADkk9utJUUuI5WlHDjrLbqnUf2aY=
 =ckNq
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen regression fixes from David Vrabel:

 - Fix two regressions causing crashes in 32-bit PV guests

 - Fix a regression in the evtchn driver

* tag 'for-linus-4.6-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/evtchn: fix ring resize when binding new events
  xen/balloon: Fix crash when ballooning on x86 32 bit PAE
  xen: Fix page <-> pfn conversion on 32 bit systems
2016-05-04 11:00:05 -07:00
Linus Torvalds 7391daf2ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "Some straggler bug fixes:

   1) Batman-adv DAT must consider VLAN IDs when choosing candidate
      nodes, from Antonio Quartulli.

   2) Fix botched reference counting of vlan objects and neigh nodes in
      batman-adv, from Sven Eckelmann.

   3) netem can crash when it sees GSO packets, the fix is to segment
      then upon ->enqueue.  Fix from Neil Horman with help from Eric
      Dumazet.

   4) Fix VXLAN dependencies in mlx5 driver Kconfig, from Matthew
      Finlay.

   5) Handle VXLAN ops outside of rcu lock, via a workqueue, in mlx5,
      since it can sleep.  Fix also from Matthew Finlay.

   6) Check mdiobus_scan() return values properly in pxa168_eth and macb
      drivers.  From Sergei Shtylyov.

   7) If the netdevice doesn't support checksumming, disable
      segmentation.  From Alexandery Duyck.

   8) Fix races between RDS tcp accept and sending, from Sowmini
      Varadhan.

   9) In macb driver, probe MDIO bus before we register the netdev,
      otherwise we can try to open the device before it is really ready
      for that.  Fix from Florian Fainelli.

  10) Netlink attribute size for ILA "tunnels" not calculated properly,
      fix from Nicolas Dichtel"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  ipv6/ila: fix nlsize calculation for lwtunnel
  net: macb: Probe MDIO bus before registering netdev
  RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.
  RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock
  vxlan: Add checksum check to the features check function
  net: Disable segmentation if checksumming is not supported
  net: mvneta: Remove superfluous SMP function call
  macb: fix mdiobus_scan() error check
  pxa168_eth: fix mdiobus_scan() error check
  net/mlx5e: Use workqueue for vxlan ops
  net/mlx5e: Implement a mlx5e workqueue
  net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue
  net/mlx5: Unmap only the relevant IO memory mapping
  netem: Segment GSO packets on enqueue
  batman-adv: Fix reference counting of hardif_neigh_node object for neigh_node
  batman-adv: Fix reference counting of vlan object for tt_local_entry
  batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP event
  batman-adv: fix DAT candidate selection (must use vid)
2016-05-03 15:07:50 -07:00
Alexander Duyck af67eb9e7e vxlan: Add checksum check to the features check function
We need to perform an additional check on the inner headers to determine if
we can offload the checksum for them.  Previously this check didn't occur
so we would generate an invalid frame in the case of an IPv6 header
encapsulated inside of an IPv4 tunnel.  To fix this I added a secondary
check to vxlan_features_check so that we can verify that we can offload the
inner checksum.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-03 16:00:54 -04:00
Linus Torvalds 689de1d6ca Minimal fix-up of bad hashing behavior of hash_64()
This is a fairly minimal fixup to the horribly bad behavior of hash_64()
with certain input patterns.

In particular, because the multiplicative value used for the 64-bit hash
was intentionally bit-sparse (so that the multiply could be done with
shifts and adds on architectures without hardware multipliers), some
bits did not get spread out very much.  In particular, certain fairly
common bit ranges in the input (roughly bits 12-20: commonly with the
most information in them when you hash things like byte offsets in files
or memory that have block factors that mean that the low bits are often
zero) would not necessarily show up much in the result.

There's a bigger patch-series brewing to fix up things more completely,
but this is the fairly minimal fix for the 64-bit hashing problem.  It
simply picks a much better constant multiplier, spreading the bits out a
lot better.

NOTE! For 32-bit architectures, the bad old hash_64() remains the same
for now, since 64-bit multiplies are expensive.  The bigger hashing
cleanup will replace the 32-bit case with something better.

The new constants were picked by George Spelvin who wrote that bigger
cleanup series.  I just picked out the constants and part of the comment
from that series.

Cc: stable@vger.kernel.org
Cc: George Spelvin <linux@horizon.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2016-05-02 13:01:51 -07:00
Linus Torvalds 9c5d1bc2b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) MODULE_FIRMWARE firmware string not correct for iwlwifi 8000 chips,
    from Sara Sharon.

 2) Fix SKB size checks in batman-adv stack on receive, from Sven
    Eckelmann.

 3) Leak fix on mac80211 interface add error paths, from Johannes Berg.

 4) Cannot invoke napi_disable() with BH disabled in myri10ge driver,
    fix from Stanislaw Gruszka.

 5) Fix sign extension problem when computing feature masks in
    net_gso_ok(), from Marcelo Ricardo Leitner.

 6) lan78xx driver doesn't count packets and packet lengths in its
    statistics properly, fix from Woojung Huh.

 7) Fix the buffer allocation sizes in pegasus USB driver, from Petko
    Manolov.

 8) Fix refcount overflows in bpf, from Alexei Starovoitov.

 9) Unified dst cache handling introduced a preempt warning in
    ip_tunnel, fix by resetting rather then setting the cached route.
    From Paolo Abeni.

10) Listener hash collision test fix in soreuseport, from Craig Gallak

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (47 commits)
  gre: do not pull header in ICMP error processing
  net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
  tipc: only process unicast on intended node
  cxgb3: fix out of bounds read
  net/smscx5xx: use the device tree for mac address
  soreuseport: Fix TCP listener hash collision
  net: l2tp: fix reversed udp6 checksum flags
  ip_tunnel: fix preempt warning in ip tunnel creation/updating
  samples/bpf: fix trace_output example
  bpf: fix check_map_func_compatibility logic
  bpf: fix refcnt overflow
  drivers: net: cpsw: use of_phy_connect() in fixed-link case
  dt: cpsw: phy-handle, phy_id, and fixed-link are mutually exclusive
  drivers: net: cpsw: don't ignore phy-mode if phy-handle is used
  drivers: net: cpsw: fix segfault in case of bad phy-handle
  drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
  MAINTAINERS: net: Change maintainer for GRETH 10/100/1G Ethernet MAC device driver
  gre: reject GUE and FOU in collect metadata mode
  pegasus: fixes reported packet length
  pegasus: fixes URB buffer allocation size;
  ...
2016-05-02 09:40:42 -07:00
Tim Bingham 2c94b53738 net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
Prior to commit d92cff89a0 ("net_dbg_ratelimited: turn into no-op
when !DEBUG") the implementation of net_dbg_ratelimited() was buggy
for both the DEBUG and CONFIG_DYNAMIC_DEBUG cases.

The bug was that net_ratelimit() was being called and, despite
returning true, nothing was being printed to the console. This
resulted in messages like the following -

"net_ratelimit: %d callbacks suppressed"

with no other output nearby.

After commit d92cff89a0 ("net_dbg_ratelimited: turn into no-op when
!DEBUG") the bug is fixed for the DEBUG case. However, there's no
output at all for CONFIG_DYNAMIC_DEBUG case.

This patch restores debug output (if enabled) for the
CONFIG_DYNAMIC_DEBUG case.

Add a definition of net_dbg_ratelimited() for the CONFIG_DYNAMIC_DEBUG
case. The implementation takes care to check that dynamic debugging is
enabled before calling net_ratelimit().

Fixes: d92cff89a0 ("net_dbg_ratelimited: turn into no-op when !DEBUG")
Signed-off-by: Tim Bingham <tbingham@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-01 21:34:01 -04:00