Commit Graph

76271 Commits

Author SHA1 Message Date
Alexei Starovoitov 4e10df9a60 bpf: introduce bpf_skb_vlan_push/pop() helpers
Allow eBPF programs attached to TC qdiscs call skb_vlan_push/pop via
helper functions. These functions may change skb->data/hlen which are
cached by some JITs to improve performance of ld_abs/ld_ind instructions.
Therefore JITs need to recognize bpf_skb_vlan_push/pop() calls,
re-compute header len and re-cache skb->data/hlen back into cpu registers.
Note, skb->data/hlen are not directly accessible from the programs,
so any changes to skb->data done either by these helpers or by other
TC actions are safe.

eBPF JIT supported by three architectures:
- arm64 JIT is using bpf_load_pointer() without caching, so it's ok as-is.
- x64 JIT re-caches skb->data/hlen unconditionally after vlan_push/pop calls
  (experiments showed that conditional re-caching is slower).
- s390 JIT falls back to interpreter for now when bpf_skb_vlan_push() is present
  in the program (re-caching is tbd).

These helpers allow more scalable handling of vlan from the programs.
Instead of creating thousands of vlan netdevs on top of eth0 and attaching
TC+ingress+bpf to all of them, the program can be attached to eth0 directly
and manipulate vlans as necessary.

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:52:31 -07:00
David S. Miller f3120acc78 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2015-07-17

This series contains updates to igb, ixgbe, ixgbevf, i40e, bnx2x,
freescale, siena and dp83640.

Jacob provides several patches to clarify the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info().  It is okay to support
the specific filters in SIOCSHWTSTAMP by upscaling them to the generic
filters.

Alex Duyck provides a igb patch to pull the time stamp from the fragment
before it gets added to the skb, to avoid a possible issue in which the
fragment can possibly be less than IGB_RX_HDR_LEN due to the time stamp
being pulled after the copybreak check.  Also provides a ixgbevf patch to
fold the ixgbevf_pull_tail() call into ixgbevf_add_rx_frag(), which gives
the advantage that the fragment does not have to be modified after it is
added to the skb.

Fan provides patches for ixgbe/ixgbevf to set the receive hash type
based on receive descriptor RSS type.

Todd provides a fix for igb where on check for link on any media other
than copper was not being detected since it was looking on the incorrect
PHY page (due to the page being used gets switched before the function
to check link gets executed).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:50:19 -07:00
Joachim Eastwood f4c190eb8b stmmac: drop custom_* fields from plat_stmmacenet_data
Both of these fields are unused and has been unused since they
were added 3 and 5 years ago. Drop them since they are clearly
not very useful.

Signed-off-by: Joachim Eastwood <manabian@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:45:57 -07:00
Jiri Benc 6acc232660 net: remove skb_frag_add_head
It's not used anywhere.

Signed-off-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:38:49 -07:00
Scott Feldman 1a3b2ec93d switchdev: add offload_fwd_mark generator helper
skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be
unique for device and may even be unique for a sub-set of ports within
device, so add switchdev helper function to generate unique marks based on
port's switch ID and group_ifindex.  group_ifindex would typically be the
container dev's ifindex, such as the bridge's ifindex.

The generator uses a global hash table to store offload_fwd_marks hashed by
{switch ID, group_ifindex} key.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:44 -07:00
Scott Feldman d754f98b50 net: add phys ID compare helper to test if two IDs are the same
Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:44 -07:00
Scott Feldman 0c4f691ff6 net: don't reforward packets already forwarded by offload device
Just before queuing skb for xmit on port, check if skb has been marked by
switchdev port driver as already fordwarded by device.  If so, drop skb.  A
non-zero skb->offload_fwd_mark field is set by the switchdev port
driver/device on ingress to indicate the skb has already been forwarded by
the device to egress ports with matching dev->skb_mark.  The switchdev port
driver would assign a non-zero dev->offload_skb_mark for each device port
netdev during registration, for example.

Signed-off-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 18:32:44 -07:00
Daniel Borkmann 8d20aabe1c ebpf: add helper to retrieve net_cls's classid cookie
It would be very useful to retrieve the net_cls's classid from an eBPF
program to allow for a more fine-grained classification, it could be
directly used or in conjunction with additional policies. I.e. docker,
but also tooling such as cgexec, can easily run applications via net_cls
cgroups:

  cgcreate -g net_cls:/foo
  echo 42 > foo/net_cls.classid
  cgexec -g net_cls:foo <prog>

Thus, their respecitve classid cookie of foo can then be looked up on
the egress path to apply further policies. The helper is desigend such
that a non-zero value returns the cgroup id.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:41:30 -07:00
Daniel Borkmann b87a173e25 cls_cgroup: factor out classid retrieval
Split out retrieving the cgroups net_cls classid retrieval into its
own function, so that it can be reused later on from other parts of
the traffic control subsystem. If there's no skb->sk, then the small
helper returns 0 as well, which in cls_cgroup terms means 'could not
classify'.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:41:30 -07:00
Jacob Keller eff3cddc22 clarify implementation of ethtool's get_ts_info op
This patch adds some clarification about the intended way to implement
both SIOCSHWTSTAMP and ethtool's get_ts_info. The HWTSTAMP API has
several Rx filters which are very specific, as well as more general
filters. The specific filters really only exist to support some broken
hardware which can't fully implement the generic filters. This patch
adds clarification that it is okay to support the specific filters in
SIOCSHWTSTAMP by upscaling them to the generic filters. In addition,
update the header for ethtool_ts_info to specify that drivers ought to
only report the filters they support without upscaling in this manner.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Reviewed-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-07-17 19:59:04 -07:00
Anuradha Karuppiah 88d6378bd6 netlink: changes for setting and clearing protodown via netlink.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15 21:39:40 -07:00
Anuradha Karuppiah d746d707a8 net core: Add protodown support.
This patch introduces the proto_down flag that can be used by user space
applications to notify switch drivers that errors have been detected on the
device.

The switch driver can react to protodown notification by doing a phys down
on the associated switch port.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-15 21:39:40 -07:00
David S. Miller 638d3c6381 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/bridge/br_mdb.c

Minor conflict in br_mdb.c, in 'net' we added a memset of the
on-stack 'ip' variable whereas in 'net-next' we assign a new
member 'vid'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-13 17:28:09 -07:00
Nikolay Aleksandrov 74fe61f17e bridge: mdb: add vlan support for user entries
Until now all user mdb entries were added in vlan 0, this patch adds
support to allow the user to specify the vlan for the entry.
About the uapi change a hole in struct br_mdb_entry is used so the size
and offsets are kept the same (verified with pahole and tested with older
iproute2).

Example:
$ bridge mdb
dev br0 port eth1 grp 239.0.0.1 permanent vlan 2000
dev br0 port eth1 grp 239.0.0.1 permanent vlan 200
dev br0 port eth1 grp 239.0.0.1 permanent

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-13 14:41:26 -07:00
Linus Torvalds f760b87f8f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Missing list head init in bluetooth hidp session creation, from Tedd
    Ho-Jeong An.

 2) Don't leak SKB in bridge netfilter error paths, from Florian
    Westphal.

 3) ipv6 netdevice private leak in netfilter bridging, fixed by Julien
    Grall.

 4) Fix regression in IP over hamradio bpq encapsulation, from Ralf
    Baechle.

 5) Fix race between rhashtable resize events and table walks, from Phil
    Sutter.

 6) Missing validation of IFLA_VF_INFO netlink attributes, fix from
    Daniel Borkmann.

 7) Missing security layer socket state initialization in tipc code,
    from Stephen Smalley.

 8) Fix shared IRQ handling in boomerang 3c59x interrupt handler, from
    Denys Vlasenko.

 9) Missing minor_idr destroy on module unload on macvtap driver, from
    Johannes Thumshirn.

10) Various pktgen kernel thread races, from Oleg Nesterov.

11) Fix races that can cause packets to be processed in the backlog even
    after a device attached to that SKB has been fully unregistered.
    From Julian Anastasov.

12) bcmgenet driver doesn't account packet drops vs.  errors properly,
    fix from Petri Gynther.

13) Array index validation and off by one fix in DSA layer from Florian
    Fainelli

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (66 commits)
  can: replace timestamp as unique skb attribute
  ARM: dts: dra7x-evm: Prevent glitch on DCAN1 pinmux
  can: c_can: Fix default pinmux glitch at init
  can: rcar_can: unify error messages
  can: rcar_can: print request_irq() error code
  can: rcar_can: fix typo in error message
  can: rcar_can: print signed IRQ #
  can: rcar_can: fix IRQ check
  net: dsa: Fix off-by-one in switch address parsing
  net: dsa: Test array index before use
  net: switchdev: don't abort unsupported operations
  net: bcmgenet: fix accounting of packet drops vs errors
  cdc_ncm: update specs URL
  Doc: z8530book: Fix typo in API-z8530-sync-txdma-open.html
  net: inet_diag: always export IPV6_V6ONLY sockopt for listening sockets
  bridge: mdb: allow the user to delete mdb entry if there's a querier
  net: call rcu_read_lock early in process_backlog
  net: do not process device backlog during unregistration
  bridge: fix potential crash in __netdev_pick_tx()
  net: axienet: Fix devm_ioremap_resource return value check
  ...
2015-07-13 11:18:25 -07:00
Oliver Hartkopp d3b58c47d3 can: replace timestamp as unique skb attribute
Commit 514ac99c64 "can: fix multiple delivery of a single CAN frame for
overlapping CAN filters" requires the skb->tstamp to be set to check for
identical CAN skbs.

Without timestamping to be required by user space applications this timestamp
was not generated which lead to commit 36c01245eb "can: fix loss of CAN frames
in raw_rcv" - which forces the timestamp to be set in all CAN related skbuffs
by introducing several __net_timestamp() calls.

This forces e.g. out of tree drivers which are not using alloc_can{,fd}_skb()
to add __net_timestamp() after skbuff creation to prevent the frame loss fixed
in mainline Linux.

This patch removes the timestamp dependency and uses an atomic counter to
create an unique identifier together with the skbuff pointer.

Btw: the new skbcnt element introduced in struct can_skb_priv has to be
initialized with zero in out-of-tree drivers which are not using
alloc_can{,fd}_skb() too.

Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2015-07-12 21:13:22 +02:00
Linus Torvalds 7b732169e9 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Thomas Gleixner:
 "This update from the timer departement contains:

   - A series of patches which address a shortcoming in the tick
     broadcast code.

     If the broadcast device is not available or an hrtimer emulated
     broadcast device, some of the original assumptions lead to boot
     failures.  I rather plugged all of the corner cases instead of only
     addressing the issue reported, so the change got a little larger.

     Has been extensivly tested on x86 and arm.

   - Get rid of the last holdouts using do_posix_clock_monotonic_gettime()

   - A regression fix for the imx clocksource driver

   - An update to the new state callbacks mechanism for clockevents.
     This is required to simplify the conversion, which will take place
     in 4.3"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  tick/broadcast: Prevent NULL pointer dereference
  time: Get rid of do_posix_clock_monotonic_gettime
  cris: Replace do_posix_clock_monotonic_gettime()
  tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
  tick/broadcast: Handle spurious interrupts gracefully
  tick/broadcast: Check for hrtimer broadcast active early
  tick/broadcast: Return busy when IPI is pending
  tick/broadcast: Return busy if periodic mode and hrtimer broadcast
  tick/broadcast: Move the check for periodic mode inside state handling
  tick/broadcast: Prevent deep idle if no broadcast device available
  tick/broadcast: Make idle check independent from mode and config
  tick/broadcast: Sanity check the shutdown of the local clock_event
  tick/broadcast: Prevent hrtimer recursion
  clockevents: Allow set-state callbacks to be optional
  clocksource/imx: Define clocksource for mx27
2015-07-12 09:36:59 -07:00
Linus Torvalds c4bc680cf7 Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fix from Thomas Gleixner:
 "A single fix for a cpu hotplug race vs. interrupt descriptors:

  Prevent irq setup/teardown across the cpu starting/dying parts of cpu
  hotplug so that the starting/dying cpu has a stable view of the
  descriptor space.  This has been an issue for all architectures in the
  cpu dying phase, where interrupts are migrated away from the dying
  cpu.  In the starting phase its mostly a x86 issue vs the vector space
  update"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  hotplug: Prevent alloc/free of irq descriptors during cpu up/down
2015-07-12 09:15:02 -07:00
Linus Torvalds 59c3cb553f Merge branch 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm
Pull libnvdimm fixes from Dan Williams:
 "1) Fixes for a handful of smatch reports (Thanks Dan C.!) and minor
     bug fixes (patches 1-6)

  2) Correctness fixes to the BLK-mode nvdimm driver (patches 7-10).

     Granted these are slightly large for a -rc update.  They have been
     out for review in one form or another since the end of May and were
     deferred from the merge window while we settled on the "PMEM API"
     for the PMEM-mode nvdimm driver (ie memremap_pmem, memcpy_to_pmem,
     and wmb_pmem).

     Now that those apis are merged we implement them in the BLK driver
     to guarantee that mmio aperture moves stay ordered with respect to
     incoming read/write requests, and that writes are flushed through
     those mmio-windows and platform-buffers to be persistent on media.

  These pass the sub-system unit tests with the updates to
  tools/testing/nvdimm, and have received a successful build-report from
  the kbuild robot (468 configs).

  With acks from Rafael for the touches to drivers/acpi/"

* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm:
  nfit: add support for NVDIMM "latch" flag
  nfit: update block I/O path to use PMEM API
  tools/testing/nvdimm: add mock acpi_nfit_flush_address entries to nfit_test
  tools/testing/nvdimm: fix return code for unimplemented commands
  tools/testing/nvdimm: mock ioremap_wt
  pmem: add maintainer for include/linux/pmem.h
  nfit: fix smatch "use after null check" report
  nvdimm: Fix return value of nvdimm_bus_init() if class_create() fails
  libnvdimm: smatch cleanups in __nd_ioctl
  sparse: fix misplaced __pmem definition
2015-07-11 20:44:31 -07:00
Linus Torvalds 84e3e9d04d ARM: SoC: fixes for v4.2-rc2
A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
 prima2 as well as a few arm64-specific DT fixes.
 
 This series also includes a late to support a new Allwinner (sunxi)
 SoC, but since it's rather simple and isolated to the
 platform-specific code, it's included it for this -rc.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVoCLHAAoJEFk3GJrT+8ZlDcQP/jVIDk0MvuvfeIsbgWw4Bhys
 +ISmgdSTRwSAaI9oHp3ApNSOmq7QspqORdYsZinR6+Em1Seul5vvT9BN9bYAs4fP
 Uefvcyo9YSgiKQCLVbOkWnp1pJIPq7BKSvfNco159N4vi6RX+4A4XRrHhEbdLkGa
 OhKDnrh0TmbM5b2RkLXlMZR1vsBYEeKxpUlBe3FhKnXYo16yP9Aix2q6oMJBuf99
 1kKNfp0DlGhBwkH+nqbUCgNi8OShFcIBrtR1X4fg7LjANEVNvE1Rv0yAJDzsz5hd
 g8v2xWaB+ONY08c4NelMLu0ZpspMV+fmeDmTuYpvOEPSYWvGamqEZUsdFMe1Vurm
 yqxIoMHSG71dW4SK35QtuvB5LJ/QPytaXidTBU4noFzTVqGaAsvZDHjbRh3YbFm1
 3mB+l2oiWtS1zTOjNLK7fGpyWMZ5OKtKdIxMrDPdWR+IHQy7RDGomMIyT1KenrgF
 FO2a/1l4CDHumWFAiDx/vyfAm/KSO9uB8p5XTNIdVqge+uT3dVpmwpVSrl9IGDZy
 n0YCpqN94lqRR8tEZ+vzyK/zbaUN50t0xOIj3wQKqRUxQxWG//wuX8m1pYW3iJtB
 q/CbgladY4jYcEZFcKoeddBBVzI7E/ntPfL54O36Ubv/BA3dSUZ8RHmYi97OzhLf
 YuvxhEnO0zwLHvghibZk
 =UEUK
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Kevin Hilman:
 "A fairly random colletion of fixes based on -rc1 for OMAP, sunxi and
  prima2 as well as a few arm64-specific DT fixes.

  This series also includes a late to support a new Allwinner (sunxi)
  SoC, but since it's rather simple and isolated to the
  platform-specific code, it's included it for this -rc"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  arm64: dts: add device tree for ARM SMM-A53x2 on LogicTile Express 20MG
  arm: dts: vexpress: add missing CCI PMU device node to TC2
  arm: dts: vexpress: describe all PMUs in TC2 dts
  GICv3: Add ITS entry to THUNDER dts
  arm64: dts: Add poweroff button device node for APM X-Gene platform
  ARM: dts: am4372.dtsi: disable rfbi
  ARM: dts: am57xx-beagle-x15: Provide supply for usb2_phy2
  ARM: dts: am4372: Add emif node
  Revert "ARM: dts: am335x-boneblack: disable RTC-only sleep"
  ARM: sunxi: Enable simplefb in the defconfig
  ARM: Remove deprecated symbol from defconfig files
  ARM: sunxi: Add Machine support for A33
  ARM: sunxi: Introduce Allwinner H3 support
  Documentation: sunxi: Update Allwinner SoC documentation
  ARM: prima2: move to use REGMAP APIs for rtciobrg
  ARM: dts: atlas7: add pinctrl and gpio descriptions
  ARM: OMAP2+: Remove unnessary return statement from the void function, omap2_show_dma_caps
  memory: omap-gpmc: Fix parsing of devices
2015-07-11 10:20:36 -07:00
David Thomson 634ec36cc0 net: phy: Pass mdix ethtool setting through to phy driver
Pass the mdix setting from ethtool down to the phy driver, to allow
driver specific implementations of manually setting the polarity.

Signed-off-by: David Thomson <david.thomson@alliedtelesis.co.nz>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-10 23:17:32 -07:00
Tom Herbert 35a256fee5 ipv6: Nonlocal bind
Add support to allow non-local binds similar to how this was done for IPv4.
Non-local binds are very useful in emulating the Internet in a box, etc.

This add the ip_nonlocal_bind sysctl under ipv6.

Testing:

Set up nonlocal binding and receive routing on a host, e.g.:

ip -6 rule add from ::/0 iif eth0 lookup 200
ip -6 route add local 2001:0:0:1::/64 dev lo proto kernel scope host table 200
sysctl -w net.ipv6.ip_nonlocal_bind=1

Set up routing to 2001:0:0:1::/64 on peer to go to first host

ping6 -I 2001:0:0:1::1 peer-address -- to verify

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 21:09:10 -07:00
Eric Dumazet dbe7faa404 inet: inet_twsk_deschedule factorization
inet_twsk_deschedule() calls are followed by inet_twsk_put().

Only particular case is in inet_twsk_purge() but there is no point
to defer the inet_twsk_put() after re-enabling BH.

Lets rename inet_twsk_deschedule() to inet_twsk_deschedule_put()
and move the inet_twsk_put() inside.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:12:20 -07:00
Eric Dumazet fc01538f9f inet: simplify timewait refcounting
timewait sockets have a complex refcounting logic.
Once we realize it should be similar to established and
syn_recv sockets, we can use sk_nulls_del_node_init_rcu()
and remove inet_twsk_unhash()

In particular, deferred inet_twsk_put() added in commit
13475a30b6 ("tcp: connect() race with timewait reuse")
looks unecessary : When removing a timewait socket from
ehash or bhash, caller must own a reference on the socket
anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:12:20 -07:00
Eric Dumazet 3fd2f1b9d9 inet: remove BUG_ON() in twsk_destructor()
Kernel will crash the same if one of the pointer is NULL anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:12:20 -07:00
Florian Westphal 8b58a39846 ipv6: use flag instead of u16 for hop in inet6_skb_parm
Hop was always either 0 or sizeof(struct ipv6hdr).

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 15:06:59 -07:00
Enrico Mioso 4a0e3e989d cdc_ncm: Add support for moving NDP to end of NCM frame
NCM specs are not actually mandating a specific position in the frame for
the NDP (Network Datagram Pointer). However, some Huawei devices will
ignore our aggregates if it is not placed after the datagrams it points
to. Add support for doing just this, in a per-device configurable way.
While at it, update NCM subdrivers, disabling this functionality in all of
them, except in huawei_cdc_ncm where it is enabled instead.
We aren't making any distinction between different Huawei NCM devices,
based on what the vendor driver does. Standard NCM devices are left
unaffected: if they are compliant, they should be always usable, still
stay on the safe side.

This change has been tested and working with a Huawei E3131 device (which
works regardless of NDP position), a Huawei E3531 (also working both
ways) and an E3372 (which mandates NDP to be after indexed datagrams).

V1->V2:
- corrected wrong NDP acronym definition
- fixed possible NULL pointer dereference
- patch cleanup
V2->V3:
- Properly account for the NDP size when writing new packets to SKB

Signed-off-by: Enrico Mioso <mrkiko.rs@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:58:31 -07:00
Yuchung Cheng 76174004a0 tcp: do not slow start when cwnd equals ssthresh
In the original design slow start is only used to raise cwnd
when cwnd is stricly below ssthresh. It makes little sense
to slow start when cwnd == ssthresh: especially
when hystart has set ssthresh in the initial ramp, or after
recovery when cwnd resets to ssthresh. Not doing so will
also help reduce the buffer bloat slightly.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:22:52 -07:00
Yuchung Cheng 071d5080e3 tcp: add tcp_in_slow_start helper
Add a helper to test the slow start condition in various congestion
control modules and other places. This is to prepare a slight improvement
in policy as to exactly when to slow start.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-09 14:22:52 -07:00
Linus Torvalds 4c0a9f7458 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fixes from Sage Weil:
 "There is a fix for CephFS and RBD when used within containers/namespaces,
   and a fix for the address learning the client is supposed to do when
  initially talking to the Ceph cluster.

  There are also two patches updating MAINTAINERS.  One breaks out the
  common Ceph code shared by fs/ceph and drivers/block/rbd.c into a
  separate entry with the appropriate maintainers listed.  The second
  adds a second reference to the github tree where the Ceph client
  development takes place (before it is pushed to korg and then to you).

  The goal here is to move closer to a situation where Ilya Dryomov or
  one of the other maintainers can push things to you if I am
  unavailable.  Ilya has done most of the work preparing branches for
  upstream recently; you should not be surprised to hear from him if I
  am trapped in some internet-less wasteland or hit by a bus or
  something.  In the meantime, we'll work on getting him added to the
  kernel web of trust"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  MAINTAINERS: add secondary tree for ceph modules
  MAINTAINERS: update ceph entries
  libceph: treat sockaddr_storage with uninitialized family as blank
  libceph: enable ceph in a non-default network namespace
2015-07-09 13:13:11 -07:00
Ilya Dryomov 757856d2b9 libceph: enable ceph in a non-default network namespace
Grab a reference on a network namespace of the 'rbd map' (in case of
rbd) or 'mount' (in case of ceph) process and use that to open sockets
instead of always using init_net and bailing if network namespace is
anything but init_net.  Be careful to not share struct ceph_client
instances between different namespaces and don't add any code in the
!CONFIG_NET_NS case.

This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
2015-07-09 20:30:34 +03:00
Thomas Gleixner 1f6823faa8 time: Get rid of do_posix_clock_monotonic_gettime
All users gone. Remove it before we get another one.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-09 10:51:46 +02:00
Andy Gospodarek 974d7af5fc ipv4: add support for linkdown sysctl to netconf
This kernel patch exports the value of the new
ignore_routes_with_linkdown via netconf.

v2: changes to notify userspace via netlink when sysctl values change
and proposed for 'net' since this could be considered a bugfix

Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Suggested-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 23:34:53 -07:00
Linus Torvalds 883a2dfd6f Power management and ACPI material for v4.2-rc2
- Fix for an ACPI resources management regression introduced
    during the 4.1 cycle (that unfortunately went into -stable)
    effectively reverting the bad commit along with the recent
    fixups on top of it and using an alternative approach to
    address the underlying issue (Rafael J Wysocki).
 
  - Fix for a memory leak and an incorrect return value in an
    error code path in the ACPI LPSS (Low-Power Subsystem) driver
    (Rafael J Wysocki).
 
  - Fix for a leftover dangling pointer in an error code path in
    the new wakeup IRQ support code (Rafael J Wysocki).
 
  - Fix to prevent infinite loops (due to errors in other places)
    from happening in the core generic PM domains support code
    (Geert Uytterhoeven).
 
  - Hibernation documentation update/clarification (Uwe Geuder).
 
  - Support for _CLS-based device enumeration in the ACPI core
    and in the ATA subsystem (Suravee Suthikulpanit).
 
 /
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABCAAGBQJVnZFwAAoJEILEb/54YlRxuysQALZ3y6QAtVFWurwOLXg0Zwoy
 qoBWlfCRCEKsiLQNe1sDK+dqmfi+ZXvsEqm4y/4uU8dsJdTv1MByaX0aQxP0qmOj
 Lre9AGNq28JxtTmU2o00qjLW9d9WIQKJn1N51+qqGKTVoqijmEcVlz5xVnqoJmBQ
 s0Af2kAzG7CLFtNzPI6RbOPMm7S0VEiI/WQhKZXWqGheppfDaRG+wl0pKPv95G73
 7wCG+yklSi4j5u2Aox/x1samOy+c+S/l2VgU+Decnv0ccPlChYBCMOzWSJuFn5v7
 qVEvUrFYc2ItQElzaqTWc1uEPzf0yg/S4pWJjLBADpDAUBQNXExmaQ2wLlzgR00/
 K/cGUqCn0APcw4tjZjVlGvTHmJg6ivdeceQuLdgxGoGgvph9V9JlHD9zmzFLqAXl
 3VCJiuHnT9QyfEcAkOhZUvO5WcX892fNpHYe/riLQOll9ODwoWCr+TofGfIvHKZp
 1XQ1r68rBpsZnnb5JpYT8F/LgaKa4nWZboLroFWi9SZ0jYdH4U7k88C2bCYpaZGz
 S07l4GEICSuTPHpQNzshzYw6lWJqoXPFQ5okqnZxwNDWAE30ZII0LQwasuupmuR4
 S1fFi30YTI23tiDEZRWgVWbMPD++jZdQoflBZq5Lo8s4EfPD5yS+LtxACzktDlM0
 fq6bcUUO85Weo+zNxEa3
 =qomE
 -----END PGP SIGNATURE-----

Merge tag 'pm+acpi-4.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management and ACPI updates from Rafael Wysocki:
 "These are fixes on top of the previous PM+ACPI pull requests
  (including one fix for a 4.1 regression) and two commits adding
  _CLS-based device enumeration support to the ACPI core and the ATA
  subsystem that waited for the latest ACPICA changes to be merged.

  Specifics:

   - Fix for an ACPI resources management regression introduced during
     the 4.1 cycle (that unfortunately went into -stable) effectively
     reverting the bad commit along with the recent fixups on top of it
     and using an alternative approach to address the underlying issue
     (Rafael J Wysocki).

   - Fix for a memory leak and an incorrect return value in an error
     code path in the ACPI LPSS (Low-Power Subsystem) driver (Rafael J
     Wysocki).

   - Fix for a leftover dangling pointer in an error code path in the
     new wakeup IRQ support code (Rafael J Wysocki).

   - Fix to prevent infinite loops (due to errors in other places) from
     happening in the core generic PM domains support code (Geert
     Uytterhoeven).

   - Hibernation documentation update/clarification (Uwe Geuder).

   - Support for _CLS-based device enumeration in the ACPI core and in
     the ATA subsystem (Suravee Suthikulpanit)"

* tag 'pm+acpi-4.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / wakeirq: Avoid setting power.wakeirq too hastily
  ata: ahci_platform: Add ACPI _CLS matching
  ACPI / scan: Add support for ACPI _CLS device matching
  PM / hibernate: clarify resume documentation
  PM / Domains: Avoid infinite loops in attach/detach code
  ACPI / LPSS: Fix up acpi_lpss_create_device()
  ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage
2015-07-08 17:34:51 -07:00
Kevin Hilman 8dfaf05682 move CSR rtc iobrg read/write API to be regmap
this moves to general APIs, and all drivers will be changed based
 on it.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJVd+OkAAoJEDIv4aC191RhuWkP/2LzJBRoBP1WJmTOvR0Kofl7
 4PG5Xi/+1NtSHubvTSJJX2kZ9I9FHemQAJba8pqZ0pgZkb+7t0dA2+G1mDi8SOr8
 VyWlMotI8Im0PoUHXmt5GPHEI4E6WqyX1TQFUD4JM/Ln07mevH/ocjd94LN79eO6
 tc1z8AEw1/d5MCcEI34FP+vNXyLOE/9hM7IsmBtf4Niq7VrO8SWS5y1tJBOz+d2v
 HmZZYEP/WS2mX8E77cz89pf0+BkmhF6nj1zoFCmlLVxCXrlKncs+Uvnsb+n/k1BI
 gA4UNOfSoA+k+h+1F/7BhyEABqYkEuBDYs9m5NsypL+WITaVnI9pmNWNAA7IZNTg
 dEYoVKqpJX3lVPWJ0i0hvgK1xXNMKlbwFb6iV80upvUVNfzZs9JveSL97Iiy9WjI
 LlRdCTlteBvgsNBJyVG//788VU5+dr8HXV1UCfm+drs3H43TudCnJ8AhF3JaToO+
 wv045McJTUL/CI+WQiXGNlAoXDjprgjLSiXRyEKF1xNsem3u78Z5bpjGX5crpzgC
 HIdnsxb39jCHCNVJLuK/eCJ9Oocpo9LdKU0wSqIx0GNAx+xdoPo5H8jmlE4bnSWo
 9f/9784A1SI/tpc0mKRz/z3FEKiP0i3CTJfcmC9EAd4LWFYk54WgtlzPztRkE625
 YQMHjM+cavYWpzzgnQ+B
 =Beg0
 -----END PGP SIGNATURE-----

Merge tag 'sirf-iobrg2regmap-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux into fixes

Merge "CSR SiRFSoC rtc iobrg move to regmap for 4.2" from Barry Song:

move CSR rtc iobrg read/write API to be regmap

this moves to general APIs, and all drivers will be changed based
on it.

* tag 'sirf-iobrg2regmap-for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/baohua/linux:
  ARM: prima2: move to use REGMAP APIs for rtciobrg
2015-07-08 14:20:12 -07:00
Eric Dumazet 2ee22a90c7 net_sched: act_mirred: remove spinlock in fast path
Like act_gact, act_mirred can be lockless in packet processing

1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) use rcu to protect tcfm_dev
4) Remove spinlock usage, as it is no longer needed.

Next step : add multi queue capability to ifb device

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 13:50:42 -07:00
Eric Dumazet 56e5d1ca18 net_sched: act_gact: remove spinlock in fast path
Final step for gact RCU operation :

1) Use percpu stats
2) update lastuse only every clock tick to avoid false sharing
3) Remove spinlock acquisition, as it is no longer needed.

Since this is the last contended lock in packet RX when tc gact is used,
this gives impressive gain.

My host with 8 RX queues was handling 5 Mpps before the patch,
and more than 11 Mpps after patch.

Tested:

On receiver :

dev=eth0
tc qdisc del dev $dev ingress 2>/dev/null
tc qdisc add dev $dev ingress
tc filter del dev $dev root pref 10 2>/dev/null
tc filter del dev $dev pref 10 2>/dev/null
tc filter add dev $dev est 1sec 4sec parent ffff: protocol ip prio 1 \
	u32 match ip src 7.0.0.0/8 flowid 1:15 action drop

Sender sends packets flood from 7/8 network

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 13:50:42 -07:00
Eric Dumazet cc6510a950 net_sched: act_gact: use a separate packet counters for gact_determ()
Second step for gact RCU operation :

We want to get rid of the spinlock protecting gact operations.
Stats (packets/bytes) will soon be per cpu.

gact_determ() would not work without a central packet counter,
so lets add it for this mode.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 13:50:42 -07:00
Eric Dumazet 519c818e8f net: sched: add percpu stats to actions
Reuse existing percpu infrastructure John Fastabend added for qdisc.

This patch adds a new cpustats parameter to tcf_hash_create() and all
actions pass false, meaning this patch should have no effect yet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 13:50:41 -07:00
Eric Dumazet 24ea591d22 net: sched: extend percpu stats helpers
qdisc_bstats_update_cpu() and other helpers were added to support
percpu stats for qdisc.

We want to add percpu stats for tc action, so this patch add common
helpers.

qdisc_bstats_update_cpu() is renamed to qdisc_bstats_cpu_update()
qdisc_qstats_drop_cpu() is renamed to qdisc_qstats_cpu_drop()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 13:50:41 -07:00
Thomas Gleixner a899418167 hotplug: Prevent alloc/free of irq descriptors during cpu up/down
When a cpu goes up some architectures (e.g. x86) have to walk the irq
space to set up the vector space for the cpu. While this needs extra
protection at the architecture level we can avoid a few race
conditions by preventing the concurrent allocation/free of irq
descriptors and the associated data.

When a cpu goes down it moves the interrupts which are targeted to
this cpu away by reassigning the affinities. While this happens
interrupts can be allocated and freed, which opens a can of race
conditions in the code which reassignes the affinities because
interrupt descriptors might be freed underneath.

Example:

CPU1				CPU2
cpu_up/down
 irq_desc = irq_to_desc(irq);
				remove_from_radix_tree(desc);
 raw_spin_lock(&desc->lock);
				free(desc);

We could protect the irq descriptors with RCU, but that would require
a full tree change of all accesses to interrupt descriptors. But
fortunately these kind of race conditions are rather limited to a few
things like cpu hotplug. The normal setup/teardown is very well
serialized. So the simpler and obvious solution is:

Prevent allocation and freeing of interrupt descriptors accross cpu
hotplug.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: xiao jin <jin.xiao@intel.com>
Cc: Joerg Roedel <jroedel@suse.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com>
Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
2015-07-08 11:32:25 +02:00
Rafael J. Wysocki 8076ca480f Merge branch 'acpi-scan'
* acpi-scan:
  ata: ahci_platform: Add ACPI _CLS matching
  ACPI / scan: Add support for ACPI _CLS device matching
2015-07-07 22:48:25 +02:00
Thomas Gleixner 37b64a4206 tick/broadcast: Unbreak CONFIG_GENERIC_CLOCKEVENTS=n build
Making tick_broadcast_oneshot_control() independent from
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for
CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined
there.

Provide a proper stub inline.

Fixes: f32dd11705 'tick/broadcast: Make idle check independent from mode and config'
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-07 21:56:34 +02:00
Thomas Gleixner f32dd11705 tick/broadcast: Make idle check independent from mode and config
Currently the broadcast busy check, which prevents the idle code from
going into deep idle, works only in one shot mode.

If NOHZ and HIGHRES are off (config or command line) there is no
sanity check at all, so under certain conditions cpus are allowed to
go into deep idle, where the local timer stops, and are not woken up
again because there is no broadcast timer installed or a hrtimer based
broadcast device is not evaluated.

Move tick_broadcast_oneshot_control() into the common code and provide
proper subfunctions for the various config combinations.

The common check in tick_broadcast_oneshot_control() is for the C3STOP
misfeature flag of the local clock event device. If its not set, idle
can proceed. If set, further checks are necessary.

Provide checks for the trivial cases:

 - If broadcast is disabled in the config, then return busy

 - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return
   busy if the broadcast device is hrtimer based.

 - If oneshot mode is enabled in the config call the original
   tick_broadcast_oneshot_control() function. That function needs
   extra checks which will be implemented in seperate patches.

[ Split out from a larger combo patch ]

Reported-and-tested-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Suzuki Poulose <Suzuki.Poulose@arm.com>
Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com>
Cc: Catalin Marinas <Catalin.Marinas@arm.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
2015-07-07 18:46:47 +02:00
Suthikulpanit, Suravee 26095a01d3 ACPI / scan: Add support for ACPI _CLS device matching
Device drivers typically use ACPI _HIDs/_CIDs listed in struct device_driver
acpi_match_table to match devices. However, for generic drivers, we do not
want to list _HID for all supported devices. Also, certain classes of devices
do not have _CID (e.g. SATA, USB). Instead, we can leverage ACPI _CLS,
which specifies PCI-defined class code (i.e. base-class, subclass and
programming interface). This patch adds support for matching ACPI devices using
the _CLS method.

To support loadable module, current design uses _HID or _CID to match device's
modalias. With the new way of matching with _CLS this would requires modification
to the current ACPI modalias key to include _CLS. This patch appends PCI-defined
class-code to the existing ACPI modalias as following.

    acpi:<HID>:<CID1>:<CID2>:..:<CIDn>:<bbsspp>:
E.g:
    # cat /sys/devices/platform/AMDI0600:00/modalias
    acpi:AMDI0600:010601:

where bb is th base-class code, ss is te sub-class code, and pp is the
programming interface code

Since there would not be _HID/_CID in the ACPI matching table of the driver,
this patch adds a field to acpi_device_id to specify the matching _CLS.

    static const struct acpi_device_id ahci_acpi_match[] = {
        { ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xffffff) },
        {},
    };

In this case, the corresponded entry in modules.alias file would be:

    alias acpi*:010601:* ahci_platform

Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-07 01:55:20 +02:00
Rafael J. Wysocki 0294112ee3 ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage
This effectively reverts the following three commits:

 7bc10388cc ACPI / resources: free memory on error in add_region_before()
 0f1b414d19 ACPI / PNP: Avoid conflicting resource reservations
 b9a5e5e18f ACPI / init: Fix the ordering of acpi_reserve_resources()

(commit b9a5e5e18f introduced regressions some of which, but not
all, were addressed by commit 0f1b414d19 and commit 7bc10388cc
was a fixup on top of the latter) and causes ACPI fixed hardware
resources to be reserved at the fs_initcall_sync stage of system
initialization.

The story is as follows.  First, a boot regression was reported due
to an apparent resource reservation ordering change after a commit
that shouldn't lead to such changes.  Investigation led to the
conclusion that the problem happened because acpi_reserve_resources()
was executed at the device_initcall() stage of system initialization
which wasn't strictly ordered with respect to driver initialization
(and with respect to the initialization of the pcieport driver in
particular), so a random change causing the device initcalls to be
run in a different order might break things.

The response to that was to attempt to run acpi_reserve_resources()
as soon as we knew that ACPI would be in use (commit b9a5e5e18f).
However, that turned out to be too early, because it caused resource
reservations made by the PNP system driver to fail on at least one
system and that failure was addressed by commit 0f1b414d19.

That fix still turned out to be insufficient, though, because
calling acpi_reserve_resources() before the fs_initcall stage of
system initialization caused a boot regression to happen on the
eCAFE EC-800-H20G/S netbook.  That meant that we only could call
acpi_reserve_resources() at the fs_initcall initialization stage
or later, but then we might just as well call it after the PNP
initalization in which case commit 0f1b414d19 wouldn't be
necessary any more.

For this reason, the changes made by commit 0f1b414d19 are reverted
(along with a memory leak fixup on top of that commit), the changes
made by commit b9a5e5e18f that went too far are reverted too and
acpi_reserve_resources() is changed into fs_initcall_sync, which
will cause it to be executed after the PNP subsystem initialization
(which is an fs_initcall) and before device initcalls (including
the pcieport driver initialization) which should avoid the initial
issue.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581
Link: http://marc.info/?t=143092384600002&r=1&w=2
Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831
Link: http://marc.info/?t=143389402600001&r=1&w=2
Fixes: b9a5e5e18f "ACPI / init: Fix the ordering of acpi_reserve_resources()"
Reported-by: Roland Dreier <roland@purestorage.com>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-07-06 23:52:21 +02:00
Linus Torvalds 1c4c7159ed Bug fixes (all for stable kernels) for ext4:
* address corner cases for indirect blocks->extent migration
   * fix reserved block accounting invalidate_page when
 	page_size != block_size (i.e., ppc or 1k block size file systems)
   * fix deadlocks when a memcg is under heavy memory pressure
   * fix fencepost error in lazytime optimization
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABCAAGBQJVmW27AAoJEPL5WVaVDYGjmEkIAJsGHVIKur1Kp//FhejSB/wI
 B0d+UuQt5kdAE3lNxC7lHO1NqIhvnS7eBho+52LG8V4JDRrzTbE1GdbsBhAIk6FW
 CcsQvsHAI99QJMdqOCachu/+nhCwIINGkxmbumhNaZoJPn6wmGQzCA3Cn5qmnGnK
 Ctbk6li1HuMXyzbbvxCLfaD/xCUs1NCdufEnRU44i0U4OfaYNpiAhddeGIQ8WMEQ
 G14l2JvhIfye6fG8lnCzfacFvnT9zvvSGfRO3ZQjC4Az1EogIUbhCPLvq0ebDbPp
 i4eRfrSRdXmMojqmW/knET8skXQVZVnD7LWuvkue+n47UbTH2c0roTbp4l76W+U=
 =x8Cc
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 bugfixes from Ted Ts'o:
 "Bug fixes (all for stable kernels) for ext4:

   - address corner cases for indirect blocks->extent migration

   - fix reserved block accounting invalidate_page when
     page_size != block_size (i.e., ppc or 1k block size file systems)

   - fix deadlocks when a memcg is under heavy memory pressure

   - fix fencepost error in lazytime optimization"

* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: replace open coded nofail allocation in ext4_free_blocks()
  ext4: correctly migrate a file with a hole at the beginning
  ext4: be more strict when migrating to non-extent based file
  ext4: fix reservation release on invalidatepage for delalloc fs
  ext4: avoid deadlocks in the writeback path by using sb_getblk_gfp
  bufferhead: Add _gfp version for sb_getblk()
  ext4: fix fencepost error in lazytime optimization
2015-07-05 16:24:54 -07:00
Linus Torvalds 1dc51b8288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted VFS fixes and related cleanups (IMO the most interesting in
  that part are f_path-related things and Eric's descriptor-related
  stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
  fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups".  The
  file_table locking rule change by Eric Dumazet is a rather big and
  fundamental update even if the patch isn't huge.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
  9p: cope with bogus responses from server in p9_client_{read,write}
  p9_client_write(): avoid double p9_free_req()
  9p: forgetting to cancel request on interrupted zero-copy RPC
  dax: bdev_direct_access() may sleep
  block: Add support for DAX reads/writes to block devices
  dax: Use copy_from_iter_nocache
  dax: Add block size note to documentation
  fs/file.c: __fget() and dup2() atomicity rules
  fs/file.c: don't acquire files->file_lock in fd_install()
  fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
  vfs: avoid creation of inode number 0 in get_next_ino
  namei: make set_root_rcu() return void
  make simple_positive() public
  ufs: use dir_pages instead of ufs_dir_pages()
  pagemap.h: move dir_pages() over there
  remove the pointless include of lglock.h
  fs: cleanup slight list_entry abuse
  xfs: Correctly lock inode when removing suid and file capabilities
  fs: Call security_ops->inode_killpriv on truncate
  fs: Provide function telling whether file_remove_privs() will do anything
  ...
2015-07-04 19:36:06 -07:00
Linus Torvalds 5c755fe142 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "It's been a busy development cycle for target-core in a number of
  different areas.

  The fabric API usage for se_node_acl allocation is now within
  target-core code, dropping the external API callers for all fabric
  drivers tree-wide.

  There is a new conversion to RCU hlists for se_node_acl and
  se_portal_group LUN mappings, that turns fast-past LUN lookup into a
  completely lockless code-path.  It also removes the original
  hard-coded limitation of 256 LUNs per fabric endpoint.

  The configfs attributes for backends can now be shared between core
  and driver code, allowing existing drivers to use common code while
  still allowing flexibility for new backend provided attributes.

  The highlights include:

   - Merge sbc_verify_dif_* into common code (sagi)
   - Remove iscsi-target support for obsolete IFMarker/OFMarker
     (Christophe Vu-Brugier)
   - Add bidi support in target/user backend (ilias + vangelis + agover)
   - Move se_node_acl allocation into target-core code (hch)
   - Add crc_t10dif_update common helper (akinobu + mkp)
   - Handle target-core odd SGL mapping for data transfer memory
     (akinobu)
   - Move transport ID handling into target-core (hch)
   - Move task tag into struct se_cmd + support 64-bit tags (bart)
   - Convert se_node_acl->device_list[] to RCU hlist (nab + hch +
     paulmck)
   - Convert se_portal_group->tpg_lun_list[] to RCU hlist (nab + hch +
     paulmck)
   - Simplify target backend driver registration (hch)
   - Consolidate + simplify target backend attribute implementations
     (hch + nab)
   - Subsume se_port + t10_alua_tg_pt_gp_member into se_lun (hch)
   - Drop lun_sep_lock for se_lun->lun_se_dev RCU usage (hch + nab)
   - Drop unnecessary core_tpg_register TFO parameter (nab)
   - Use 64-bit LUNs tree-wide (hannes)
   - Drop left-over TARGET_MAX_LUNS_PER_TRANSPORT limit (hannes)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (76 commits)
  target: Bump core version to v5.0
  target: remove target_core_configfs.h
  target: remove unused TARGET_CORE_CONFIG_ROOT define
  target: consolidate version defines
  target: implement WRITE_SAME with UNMAP bit using ->execute_unmap
  target: simplify UNMAP handling
  target: replace se_cmd->execute_rw with a protocol_data field
  target/user: Fix inconsistent kmap_atomic/kunmap_atomic
  target: Send UA when changing LUN inventory
  target: Send UA upon LUN RESET tmr completion
  target: Send UA on ALUA target port group change
  target: Convert se_lun->lun_deve_lock to normal spinlock
  target: use 'se_dev_entry' when allocating UAs
  target: Remove 'ua_nacl' pointer from se_ua structure
  target_core_alua: Correct UA handling when switching states
  xen-scsiback: Fix compile warning for 64-bit LUN
  target: Remove TARGET_MAX_LUNS_PER_TRANSPORT
  target: use 64-bit LUNs
  target: Drop duplicate + unused se_dev_check_wce
  target: Drop unnecessary core_tpg_register TFO parameter
  ...
2015-07-04 14:13:43 -07:00
Linus Torvalds 6d7c8e1b3a A very significant modification to NTB in this series.
An abstraction layer was added to allow the hardware and clients to be
 easily added.  This required rewriting the NTB transport layer for this
 abstraction layer.  This modification will allow future
 "high performance" NTB clients.
 
 In addition to this change, a number of performance modifications were
 added.  These changes include NUMA enablement, using CPU memcpy instead
 of asyncdma, and modification of NTB layer MTU size.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJVmCNDAAoJEG5mS6x6i9IjI/QQAINNe7XCENdIL8iQZU2NDWCB
 BmVhQjKKnS5d774qrCIc/29FoxolTo4MG5H7j1zyGUUrx0mWbjrvRMA4AGt+yE0p
 Xf4DvFUi+Ptmf/dxlzxwt7ySBAQtAJiPRK0xJjEUqXpJqR/u5eTcsG2og7rXnXE1
 DHtgqmh05D4Noi725yyn4qqVrqnUnFtgJ0hp6s7BE5ReeNPJNrD5yRpByEH81TBw
 +FiUHoCIuhZ4taGbfUU3G6lbBoWqztV8RmjI1AQGRGiij5BmYNZUBTEuSoEt86Df
 jxoLIz+77A9SGKejSmZbomeeBT3FnlnwHurC8hKdlqF05m3BebiTlCGEmddUdRAa
 u+8v/z5lmCaxH2Bg0rdQFn1HIvwtnToT4N3nfRtOkywg8tDHfYv2s1IaGO9bosJn
 XaIS5TR548IgEcsrIjJPX8ab/Q7nBUPkhaIEtXLyPdo41rON7J8aaJnv7bjPq/22
 BoHU8fIoiYjIsiDPpsIgRAEXSTz8pr3uHU68NDR8R8pzf9OoysZQV9vy7N04YKTy
 3Hr+AtEmiPel9YIx9Z8rgvLoMzP/nyxecJeFqUQpLkVR5YWdI50eg5TO9lBMU6Io
 hqROco8RUxJDC5J3IE75eWpc2YIxUsNo+5IFsxbNJ78OnY6GiF+pWnKx5rxwxs/G
 YHILdVMg/OBbVX3+fvJV
 =CyfQ
 -----END PGP SIGNATURE-----

Merge tag 'ntb-4.2' of git://github.com/jonmason/ntb

Pull NTB updates from Jon Mason:
 "This includes a pretty significant reworking of the NTB core code, but
  has already produced some significant performance improvements.

  An abstraction layer was added to allow the hardware and clients to be
  easily added.  This required rewriting the NTB transport layer for
  this abstraction layer.  This modification will allow future "high
  performance" NTB clients.

  In addition to this change, a number of performance modifications were
  added.  These changes include NUMA enablement, using CPU memcpy
  instead of asyncdma, and modification of NTB layer MTU size"

* tag 'ntb-4.2' of git://github.com/jonmason/ntb: (22 commits)
  NTB: Add split BAR output for debugfs stats
  NTB: Change WARN_ON_ONCE to pr_warn_once on unsafe
  NTB: Print driver name and version in module init
  NTB: Increase transport MTU to 64k from 16k
  NTB: Rename Intel code names to platform names
  NTB: Default to CPU memcpy for performance
  NTB: Improve performance with write combining
  NTB: Use NUMA memory in Intel driver
  NTB: Use NUMA memory and DMA chan in transport
  NTB: Rate limit ntb_qp_link_work
  NTB: Add tool test client
  NTB: Add ping pong test client
  NTB: Add parameters for Intel SNB B2B addresses
  NTB: Reset transport QP link stats on down
  NTB: Do not advance transport RX on link down
  NTB: Differentiate transport link down messages
  NTB: Check the device ID to set errata flags
  NTB: Enable link for Intel root port mode in probe
  NTB: Read peer info from local SPAD in transport
  NTB: Split ntb_hw_intel and ntb_transport drivers
  ...
2015-07-04 14:07:47 -07:00