Commit Graph

49750 Commits

Author SHA1 Message Date
David S. Miller 9bb862beb6 Merge branch 'master' of git://1984.lsi.us.es/net-next 2012-05-08 14:40:21 -04:00
Pablo Neira Ayuso d16cf20e2f netfilter: remove ip_queue support
This patch removes ip_queue support which was marked as obsolete
years ago. The nfnetlink_queue modules provides more advanced
user-space packet queueing mechanism.

This patch also removes capability code included in SELinux that
refers to ip_queue. Otherwise, we break compilation.

Several warning has been sent regarding this to the mailing list
in the past month without anyone rising the hand to stop this
with some strong argument.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 20:25:42 +02:00
Pablo Neira Ayuso 6714cf5465 netfilter: nf_conntrack: fix explicit helper attachment and NAT
Explicit helper attachment via the CT target is broken with NAT
if non-standard ports are used. This problem was hidden behind
the automatic helper assignment routine. Thus, it becomes more
noticeable now that we can disable the automatic helper assignment
with Eric Leblond's:

9e8ac5a netfilter: nf_ct_helper: allow to disable automatic helper assignment

Basically, nf_conntrack_alter_reply asks for looking up the helper
up if NAT is enabled. Unfortunately, we don't have the conntrack
template at that point anymore.

Since we don't want to rely on the automatic helper assignment,
we can skip the second look-up and stick to the helper that was
attached by iptables. With the CT target, the user is in full
control of helper attachment, thus, the policy is to trust what
the user explicitly configures via iptables (no automatic magic
anymore).

Interestingly, this bug was hidden by the automatic helper look-up
code. But it can be easily trigger if you attach the helper in
a non-standard port, eg.

iptables -I PREROUTING -t raw -p tcp --dport 8888 \
	-j CT --helper ftp

And you disabled the automatic helper assignment.

I added the IPS_HELPER_BIT that allows us to differenciate between
a helper that has been explicitly attached and those that have been
automatically assigned. I didn't come up with a better solution
(having backward compatibility in mind).

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:44:42 +02:00
Pablo Neira Ayuso f73181c828 ipvs: add support for sync threads
Allow master and backup servers to use many threads
for sync traffic. Add sysctl var "sync_ports" to define the
number of threads. Every thread will use single UDP port,
thread 0 will use the default port 8848 while last thread
will use port 8848+sync_ports-1.

	The sync traffic for connections is scheduled to many
master threads based on the cp address but one connection is
always assigned to same thread to avoid reordering of the
sync messages.

	Remove ip_vs_sync_switch_mode because this check
for sync mode change is still risky. Instead, check for mode
change under sync_buff_lock.

	Make sure the backup socks do not block on reading.

Special thanks to Aleksey Chudov for helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:33 +02:00
Julian Anastasov 749c42b620 ipvs: reduce sync rate with time thresholds
Add two new sysctl vars to control the sync rate with the
main idea to reduce the rate for connection templates because
currently it depends on the packet rate for controlled connections.
This mechanism should be useful also for normal connections
with high traffic.

sync_refresh_period: in seconds, difference in reported connection
	timer that triggers new sync message. It can be used to
	avoid sync messages for the specified period (or half of
	the connection timeout if it is lower) if connection state
	is not changed from last sync.

sync_retries: integer, 0..3, defines sync retries with period of
	sync_refresh_period/8. Useful to protect against loss of
	sync messages.

	Allow sysctl_sync_threshold to be used with
sysctl_sync_period=0, so that only single sync message is sent
if sync_refresh_period is also 0.

	Add new field "sync_endtime" in connection structure to
hold the reported time when connection expires. The 2 lowest
bits will represent the retry count.

	As the sysctl_sync_period now can be 0 use ACCESS_ONCE to
avoid division by zero.

	Special thanks to Aleksey Chudov for being patient with me,
for his extensive reports and helping in all tests.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:40:10 +02:00
Pablo Neira Ayuso 1c003b1580 ipvs: wakeup master thread
High rate of sync messages in master can lead to
overflowing the socket buffer and dropping the messages.
Fixed sleep of 1 second without wakeup events is not suitable
for loaded masters,

	Use delayed_work to schedule sending for queued messages
and limit the delay to IPVS_SYNC_SEND_DELAY (20ms). This will
reduce the rate of wakeups but to avoid sending long bursts we
wakeup the master thread after IPVS_SYNC_WAKEUP_RATE (8) messages.

	Add hard limit for the queued messages before sending
by using "sync_qlen_max" sysctl var. It defaults to 1/32 of
the memory pages but actually represents number of messages.
It will protect us from allocating large parts of memory
when the sending rate is lower than the queuing rate.

	As suggested by Pablo, add new sysctl var
"sync_sock_size" to configure the SNDBUF (master) or
RCVBUF (slave) socket limit. Default value is 0 (preserve
system defaults).

	Change the master thread to detect and block on
SNDBUF overflow, so that we do not drop messages when
the socket limit is low but the sync_qlen_max limit is
not reached. On ENOBUFS or other errors just drop the
messages.

	Change master thread to enter TASK_INTERRUPTIBLE
state early, so that we do not miss wakeups due to messages or
kthread_should_stop event.

Thanks to Pablo Neira Ayuso for his valuable feedback!

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:39:53 +02:00
Julian Anastasov cdcc5e905d ipvs: always update some of the flags bits in backup
As the goal is to mirror the inactconns/activeconns
counters in the backup server, make sure the cp->flags are
updated even if cp is still not bound to dest. If cp->flags
are not updated ip_vs_bind_dest will rely only on the initial
flags when updating the counters. To avoid mistakes and
complicated checks for protocol state rely only on the
IP_VS_CONN_F_INACTIVE bit when updating the counters.

Signed-off-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Aleksey Chudov <aleksey.chudov@gmail.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-05-08 19:38:31 +02:00
Eric Dumazet ac3a546ac8 netfilter: nf_conntrack: use this_cpu_inc()
this_cpu_inc() is IRQ safe and faster than
local_bh_disable()/__this_cpu_inc()/local_bh_enable(), at least on x86.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Christoph Lameter <cl@linux.com>
Cc: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:36:33 +02:00
Eric Leblond a900689264 netfilter: nf_ct_helper: allow to disable automatic helper assignment
This patch allows you to disable automatic conntrack helper
lookup based on TCP/UDP ports, eg.

echo 0 > /proc/sys/net/netfilter/nf_conntrack_helper

[ Note: flows that already got a helper will keep using it even
  if automatic helper assignment has been disabled ]

Once this behaviour has been disabled, you have to explicitly
use the iptables CT target to attach helper to flows.

There are good reasons to stop supporting automatic helper
assignment, for further information, please read:

http://www.netfilter.org/news.html#2012-04-03

This patch also adds one message to inform that automatic helper
assignment is deprecated and it will be removed soon (this is
spotted only once, with the first flow that gets a helper attached
to make it as less annoying as possible).

Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-05-08 19:35:18 +02:00
Joe Perches b44907e64c etherdev.h: Convert int is_<foo>_ether_addr to bool
Make the return value explicitly true or false.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-08 13:06:16 -04:00
David S. Miller 0d6c4a2e46 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/param.c
	drivers/net/wireless/iwlwifi/iwl-agn-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans-pcie-rx.c
	drivers/net/wireless/iwlwifi/iwl-trans.h

Resolved the iwlwifi conflict with mainline using 3-way diff posted
by John Linville and Stephen Rothwell.  In 'net' we added a bug
fix to make iwlwifi report a more accurate skb->truesize but this
conflicted with RX path changes that happened meanwhile in net-next.

In e1000e a conflict arose in the validation code for settings of
adapter->itr.  'net-next' had more sophisticated logic so that
logic was used.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 23:35:40 -04:00
David Daney 0ca2997d14 netdev/of/phy: Add MDIO bus multiplexer support.
This patch adds a somewhat generic framework for MDIO bus
multiplexers.  It is modeled on the I2C multiplexer.

The multiplexer is needed if there are multiple PHYs with the same
address connected to the same MDIO bus adepter, or if there is
insufficient electrical drive capability for all the connected PHY
devices.

Conceptually it could look something like this:

                   ------------------
                   | Control Signal |
                   --------+---------
                           |
 ---------------   --------+------
 | MDIO MASTER |---| Multiplexer |
 ---------------   --+-------+----
                     |       |
                     C       C
                     h       h
                     i       i
                     l       l
                     d       d
                     |       |
     ---------       A       B   ---------
     |       |       |       |   |       |
     | PHY@1 +-------+       +---+ PHY@1 |
     |       |       |       |   |       |
     ---------       |       |   ---------
     ---------       |       |   ---------
     |       |       |       |   |       |
     | PHY@2 +-------+       +---+ PHY@2 |
     |       |                   |       |
     ---------                   ---------

This framework configures the bus topology from device tree data.  The
mechanics of switching the multiplexer is left to device specific
drivers.

The follow-on patch contains a multiplexer driven by GPIO lines.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 22:58:09 -04:00
David Daney 2510602200 netdev/of/phy: New function: of_mdio_find_bus().
Add of_mdio_find_bus() which allows an mii_bus to be located given its
associated the device tree node.

This is needed by the follow-on patch to add a driver for MDIO bus
multiplexers.

The of_mdiobus_register() function is modified so that the device tree
node is recorded in the mii_bus.  Then we can find it again by
iterating over all mdio_bus_class devices.

Because the OF device tree has now become an integral part of the
kernel, this can live in mdio_bus.c (which contains the needed
mdio_bus_class structure) instead of of_mdio.c.

Signed-off-by: David Daney <david.daney@cavium.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 22:58:09 -04:00
Johannes Berg 1c430a727f net: compare_ether_addr[_64bits]() has no ordering
Neither compare_ether_addr() nor compare_ether_addr_64bits()
(as it can fall back to the former) have comparison semantics
like memcmp() where the sign of the return value indicates sort
order. We had a bug in the wireless code due to a blind memcmp
replacement because of this.

A cursory look suggests that the wireless bug was the only one
due to this semantic difference.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-07 19:21:29 -04:00
Alexander Duyck ec47ea8247 skb: Add inline helper for getting the skb end offset from head
With the recent changes for how we compute the skb truesize it occurs to me
we are probably going to have a lot of calls to skb_end_pointer -
skb->head.  Instead of running all over the place doing that it would make
more sense to just make it a separate inline skb_end_offset(skb) that way
we can return the correct value without having gcc having to do all the
optimization to cancel out skb->head - skb->head.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-06 13:13:19 -04:00
Eric Dumazet bd14b1b2e2 tcp: be more strict before accepting ECN negociation
It appears some networks play bad games with the two bits reserved for
ECN. This can trigger false congestion notifications and very slow
transferts.

Since RFC 3168 (6.1.1) forbids SYN packets to carry CT bits, we can
disable TCP ECN negociation if it happens we receive mangled CT bits in
the SYN packet.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Perry Lorier <perryl@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Wilmer van der Gaast <wilmer@google.com>
Cc: Ankur Jain <jankur@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Dave Täht <dave.taht@bufferbloat.net>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-04 12:05:27 -04:00
Karsten Keil f45ebf3a6b mISDN: Help to identify the card
With multiple cards is hard to figure out which port caused trouble
int the layer2 routines (e.g. got a timeout).
Now we have the informations in the log output.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-04 11:56:19 -04:00
Karsten Keil c626c12727 mISDN: Make layer1 timer 3 value configurable
For certification test it is very useful to change the layer1
timer3 value on runtime.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-04 11:55:05 -04:00
Karsten Keil 8423e6b212 mISDN: L2 timeouts need to be queued as L2 event
To be full preemptiv safe, we cannot handle a L2 timeout in the timer
context itself, we should do all actions via the D-channel thread.

Signed-off-by: Karsten Keil <kkeil@linux-pingi.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-04 11:54:27 -04:00
Linus Torvalds c42f1d4b52 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Transfer padding was wrong for full-speed USB in ASIX driver, fix
    from Ingo van Lil.

 2) Propagate the negative packet offset fix into the PowerPC BPF JIT.
    From Jan Seiffert.

 3) dl2k driver's private ioctls were letting unprivileged tasks make
    MII writes and other ugly bits like that.  Fix from Jeff Mahoney.

 4) Fix TX VLAN and RX packet drops in ucc_geth, from Joakim Tjernlund.

 5) OOPS and network namespace fixes in IPVS from Hans Schillstrom and
    Julian Anastasov.

 6) Fix races and sleeping in locked context bugs in drop_monitor, from
    Neil Horman.

 7) Fix link status indication in smsc95xx driver, from Paolo Pisati.

 8) Fix bridge netfilter OOPS, from Peter Huang.

 9) L2TP sendmsg can return on error conditions with the socket lock
    held, oops.  Fix from Sasha Levin.

10) udp_diag should return meaningful values for socket memory usage,
    from Shan Wei.

11) Eric Dumazet is so awesome he gets his own section:

       Socket memory cgroup code (I never should have applied those
       patches, grumble...) made erroneous changes to
       sk_sockets_allocated_read_positive().  It was changed to
       use percpu_counter_sum_positive (which requires BH disabling)
       instead of percpu_counter_read_positive (which does not).
       Revert back to avoid crashes and lockdep warnings.

       Adjust the default tcp_adv_win_scale and tcp_rmem[2] values
       to fix throughput regressions.  This is necessary as a result
       of our more precise skb->truesize tracking.

       Fix SKB leak in netem packet scheduler.

12) New device IDs for various bluetooth devices, from Manoj Iyer,
    AceLan Kao, and Steven Harms.

13) Fix command completion race in ipw2200, from Stanislav Yakovlev.

14) Fix rtlwifi oops on unload, from Larry Finger.

15) Fix hard_mtu when adjusting hard_header_len in smsc95xx driver.
    From Stephane Fillod.

16) ehea driver registers it's IRQ before all the necessary state is
    setup, resulting in crashes.  Fix from Thadeu Lima de Souza
    Cascardo.

17) Fix PHY connection failures in davinci_emac driver, from Anatolij
    Gustschin.

18) Missing break; in switch statement in bluetooth's
    hci_cmd_complete_evt().  Fix from Szymon Janc.

19) Fix queue programming in iwlwifi, from Johannes Berg.

20) Interrupt throttling defaults not being actually programmed into the
    hardware, fix from Jeff Kirsher and Ying Cai.

21) TLAN driver SKB encoding in descriptor busted on 64-bit, fix from
    Benjamin Poirier.

22) Fix blind status block RX producer pointer deref in TG3 driver, from
    Matt Carlson.

23) Promisc and multicast are busted on ehea, fixes from Thadeu Lima de
    Souza Cascardo.

24) Fix crashes in 6lowpan, from Alexander Smirnov.

25) tcp_complete_cwr() needs to be careful to not rewind the CWND to
    ssthresh if ssthresh has the "infinite" value.  Fix from Yuchung
    Cheng.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (81 commits)
  sungem: Fix WakeOnLan
  tcp: change tcp_adv_win_scale and tcp_rmem[2]
  net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg
  drop_monitor: prevent init path from scheduling on the wrong cpu
  usbnet: fix failure handling in usbnet_probe
  usbnet: fix leak of transfer buffer of dev->interrupt
  ucc_geth: Add 16 bytes to max TX frame for VLANs
  net: ucc_geth, increase no. of HW RX descriptors
  netem: fix possible skb leak
  sky2: fix receive length error in mixed non-VLAN/VLAN traffic
  sky2: propogate rx hash when packet is copied
  net: fix two typos in skbuff.h
  cxgb3: Don't call cxgb_vlan_mode until q locks are initialized
  ixgbe: fix calling skb_put on nonlinear skb assertion bug
  ixgbe: Fix a memory leak in IEEE DCB
  igbvf: fix the bug when initializing the igbvf
  smsc75xx: enable mac to detect speed/duplex from phy
  smsc75xx: declare smsc75xx's MII as GMII capable
  smsc75xx: fix phy interrupt acknowledge
  smsc75xx: fix phy init reset loop
  ...
2012-05-03 17:10:39 -07:00
Alexander Duyck 3a7c1ee4ab skb: Add skb_head_is_locked helper function
This patch adds support for a skb_head_is_locked helper function.  It is
meant to be used any time we are considering transferring the head from
skb->head to a paged frag.  If the head is locked it means we cannot remove
the head from the skb so it must be copied or we must take the skb as a
whole.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-03 13:18:37 -04:00
Eric Dumazet b081f85c29 net: implement tcp coalescing in tcp_queue_rcv()
Extend tcp coalescing implementing it from tcp_queue_rcv(), the main
receiver function when application is not blocked in recvmsg().

Function tcp_queue_rcv() is moved a bit to allow its call from
tcp_data_queue()

This gives good results especially if GRO could not kick, and if skb
head is a fragment.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexander.h.duyck@intel.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 21:11:11 -04:00
Yuchung Cheng 750ea2bafa tcp: early retransmit: delayed fast retransmit
Implementing the advanced early retransmit (sysctl_tcp_early_retrans==2).
Delays the fast retransmit by an interval of RTT/4. We borrow the
RTO timer to implement the delay. If we receive another ACK or send
a new packet, the timer is cancelled and restored to original RTO
value offset by time elapsed.  When the delayed-ER timer fires,
we enter fast recovery and perform fast retransmit.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:10 -04:00
Yuchung Cheng eed530b6c6 tcp: early retransmit
This patch implements RFC 5827 early retransmit (ER) for TCP.
It reduces DUPACK threshold (dupthresh) if outstanding packets are
less than 4 to recover losses by fast recovery instead of timeout.

While the algorithm is simple, small but frequent network reordering
makes this feature dangerous: the connection repeatedly enter
false recovery and degrade performance. Therefore we implement
a mitigation suggested in the appendix of the RFC that delays
entering fast recovery by a small interval, i.e., RTT/4. Currently
ER is conservative and is disabled for the rest of the connection
after the first reordering event. A large scale web server
experiment on the performance impact of ER is summarized in
section 6 of the paper "Proportional Rate Reduction for TCP”,
IMC 2011. http://conferences.sigcomm.org/imc/2011/docs/p155.pdf

Note that Linux has a similar feature called THIN_DUPACK. The
differences are THIN_DUPACK do not mitigate reorderings and is only
used after slow start. Currently ER is disabled if THIN_DUPACK is
enabled. I would be happy to merge THIN_DUPACK feature with ER if
people think it's a good idea.

ER is enabled by sysctl_tcp_early_retrans:
  0: Disables ER

  1: Reduce dupthresh to packets_out - 1 when outstanding packets < 4.

  2: (Default) reduce dupthresh like mode 1. In addition, delay
     entering fast recovery by RTT/4.

Note: mode 2 is implemented in the third part of this patch series.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-02 20:56:10 -04:00
John W. Linville 076e7779c0 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-05-01 14:14:05 -04:00
Eric Dumazet d961949660 net: fix two typos in skbuff.h
fix kernel doc typos in function names

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:40:19 -04:00
Eric Dumazet e4ae004b84 netem: add ECN capability
Add ECN (Explicit Congestion Notification) marking capability to netem

tc qdisc add dev eth0 root netem drop 0.5 ecn

Instead of dropping packets, try to ECN mark them.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:39:48 -04:00
Eric Dumazet 18d0700024 net: skb_peek()/skb_peek_tail() cleanups
remove useless casts and rename variables for less confusion.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:39:48 -04:00
Chris Elston a32e0eec70 l2tp: introduce L2TPv3 IP encapsulation support for IPv6
L2TPv3 defines an IP encapsulation packet format where data is carried
directly over IP (no UDP). The kernel already has support for L2TP IP
encapsulation over IPv4 (l2tp_ip). This patch introduces support for
L2TP IP encapsulation over IPv6.

The implementation is derived from ipv6/raw and ipv4/l2tp_ip.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Chris Elston f9bac8df90 l2tp: netlink api for l2tpv3 ipv6 unmanaged tunnels
This patch adds support for unmanaged L2TPv3 tunnels over IPv6 using
the netlink API. We already support unmanaged L2TPv3 tunnels over
IPv4. A patch to iproute2 to make use of this feature will be
submitted separately.

Signed-off-by: Chris Elston <celston@katalix.com>
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
James Chapman 9d4ec1aeda pppox: Replace __attribute__((packed)) in if_pppox.h
Checkpatch warns about the use of __attribute__((packed)). So use the
recommended __packed syntax instead.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-01 09:30:55 -04:00
Eric Dumazet d7e8883cfc net: make GRO aware of skb->head_frag
GRO can check if skb to be merged has its skb->head mapped to a page
fragment, instead of a kmalloc() area.

We 'upgrade' skb->head as a fragment in itself

This avoids the frag_list fallback, and permits to build true GRO skb
(one sk_buff and up to 16 fragments), using less memory.

This reduces number of cache misses when user makes its copy, since a
single sk_buff is fetched.

This is a followup of patch "net: allow skb->head to be a page fragment"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:49 -04:00
Eric Dumazet d3836f21b0 net: allow skb->head to be a page fragment
skb->head is currently allocated from kmalloc(). This is convenient but
has the drawback the data cannot be converted to a page fragment if
needed.

We have three spots were it hurts :

1) GRO aggregation

 When a linear skb must be appended to another skb, GRO uses the
frag_list fallback, very inefficient since we keep all struct sk_buff
around. So drivers enabling GRO but delivering linear skbs to network
stack aren't enabling full GRO power.

2) splice(socket -> pipe).

 We must copy the linear part to a page fragment.
 This kind of defeats splice() purpose (zero copy claim)

3) TCP coalescing.

 Recently introduced, this permits to group several contiguous segments
into a single skb. This shortens queue lengths and save kernel memory,
and greatly reduce probabilities of TCP collapses. This coalescing
doesnt work on linear skbs (or we would need to copy data, this would be
too slow)

Given all these issues, the following patch introduces the possibility
of having skb->head be a fragment in itself. We use a new skb flag,
skb->head_frag to carry this information.

build_skb() is changed to accept a frag_size argument. Drivers willing
to provide a page fragment instead of kmalloc() data will set a non zero
value, set to the fragment size.

Then, on situations we need to convert the skb head to a frag in itself,
we can check if skb->head_frag is set and avoid the copies or various
fallbacks we have.

This means drivers currently using frags could be updated to avoid the
current skb->head allocation and reduce their memory footprint (aka skb
truesize). (thats 512 or 1024 bytes saved per skb). This also makes
bpf/netfilter faster since the 'first frag' will be part of skb linear
part, no need to copy data.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Matt Carlson <mcarlson@broadcom.com>
Cc: Michael Chan <mchan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 21:35:11 -04:00
Linus Torvalds e7a7c9ab41 SCSI fixes on 20120430
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iQEcBAABAgAGBQJPnjtXAAoJEDeqqVYsXL0MNWkIAMYc5n06Ac/gdLY8xIIYD6GA
 OzLd/NQ2Q4RqZJsX26OueX2ZjMRYDixBAmiw837K9PvrLUHGfMAaninNyzDgVZ3R
 pbIK45OylIQrvMAl/R7ZzqihfxJjg5utqEUt1M2qv/ikkBJ8+zWAJTQKxRYJ1OAW
 9QDhIP986hljNyZfDuqXBvA4XkngYeCO0WLjBOeYCseSzeV8vOLpmsD/ANHDcqA6
 Ct4uM4KJWF4jHz3sN3w5T3morNwvh42onNcy++911HcH5Nc9bkOqBWQAZ5RtINp9
 8rkUz3OtCCQpZz6ffF3ZXJaopuU3ELfP80t03OScr2T75SxLqLf6mFYVAcR7RpQ=
 =nXrE
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of SAS and SATA fixes; there are one or two longstanding
  bug fixes, but most of this is regression fixes."

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] libfc: update mfs boundry checking
  [SCSI] Revert "[SCSI] libsas: fix sas port naming"
  [SCSI] libsas: fix false positive 'device attached' conditions
  [SCSI] libsas, libata: fix start of life for a sas ata_port
  [SCSI] libsas: fix ata_eh clobbering ex_phys via smp_ata_check_ready
  [SCSI] libsas: unify domain_device sas_rphy lifetimes
  [SCSI] libsas: fix sas_get_port_device regression
  [SCSI] libsas: fix sas_find_bcast_phy() in the presence of 'vacant' phys
  [SCSI] libsas: introduce sas_work to fix sas_drain_work vs sas_queue_work
  [SCSI] libata: Pass correct DMA device to scsi host
  [SCSI] scsi_lib: use correct DMA device in __scsi_alloc_queue
2012-04-30 15:33:50 -07:00
Matthew Garrett 41b3254c93 efi: Add new variable attributes
More recent versions of the UEFI spec have added new attributes for
variables. Add them.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-30 15:30:18 -07:00
Eric Dumazet 518fbf9cdf net: fix sk_sockets_allocated_read_positive
Denys Fedoryshchenko reported frequent crashes on a proxy server and kindly
provided a lockdep report that explains it all :

  [  762.903868]
  [  762.903880] =================================
  [  762.903890] [ INFO: inconsistent lock state ]
  [  762.903903] 3.3.4-build-0061 #8 Not tainted
  [  762.904133] ---------------------------------
  [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
  [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>]
__percpu_counter_sum+0xd/0x58
  [  762.904542] {IN-SOFTIRQ-W} state was registered at:
  [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
  [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
  [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
  [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
  [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
  [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
  [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
  [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
  [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
  [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
  [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
  [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
  [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
  [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
  [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
  [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
  [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
  [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
  [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
  [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
  [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
  [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
  [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
  [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
  [  762.904542] irq event stamp: 156915469
  [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>]
__slab_alloc.clone.58.clone.63+0xc4/0x2de
  [  762.904542] hardirqs last disabled at (156915468): [<c019b452>]
__slab_alloc.clone.58.clone.63+0x22/0x2de
  [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>]
lock_sock_nested+0x64/0x6c
  [  762.904542] softirqs last disabled at (156915464): [<c0349914>]
_raw_spin_lock_bh+0xe/0x45
  [  762.904542]
  [  762.904542] other info that might help us debug this:
  [  762.904542]  Possible unsafe locking scenario:
  [  762.904542]
  [  762.904542]        CPU0
  [  762.904542]        ----
  [  762.904542]   lock(key#3);
  [  762.904542]   <Interrupt>
  [  762.904542]     lock(key#3);
  [  762.904542]
  [  762.904542]  *** DEADLOCK ***
  [  762.904542]
  [  762.904542] 1 lock held by squid/1603:
  [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
lock_sock+0xa/0xc
  [  762.904542]
  [  762.904542] stack backtrace:
  [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
  [  762.904542] Call Trace:
  [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
  [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
  [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
  [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
  [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
  [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
  [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
  [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
  [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
  [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
  [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
  [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
  [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
  [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
  [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
  [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
  [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
  [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
  [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
  [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
  [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
  [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
  [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83

Bug is that sk_sockets_allocated_read_positive() calls
percpu_counter_sum_positive() without BH being disabled.

This bug was added in commit 180d8cd942
(foundations of per-cgroup memory pressure controlling.), since previous
code was using percpu_counter_read_positive() which is IRQ safe.

In __sk_mem_schedule() we dont need the precise count of allocated
sockets and can revert to previous behavior.

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Sined-off-by: Eric Dumazet <edumazet@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Acked-by: Neal Cardwell <ncardwell@google.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-30 13:37:59 -04:00
David S. Miller 5414fc12e3 Merge branch 'master' of git://1984.lsi.us.es/net 2012-04-30 13:23:22 -04:00
Hans Schillstrom 8537de8a7a ipvs: kernel oops - do_ip_vs_get_ctl
Change order of init so netns init is ready
when register ioctl and netlink.

Ver2
	Whitespace fixes and __init added.

Reported-by: "Ryan O'Hara" <rohara@redhat.com>
Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Hans Schillstrom 582b8e3ead ipvs: take care of return value from protocol init_netns
ip_vs_create_timeout_table() can return NULL
All functions protocol init_netns is affected of this patch.

Signed-off-by: Hans Schillstrom <hans.schillstrom@ericsson.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Signed-off-by: Simon Horman <horms@verge.net.au>
2012-04-30 10:40:35 +02:00
Linus Torvalds 9883035ae7 pipes: add a "packetized pipe" mode for writing
The actual internal pipe implementation is already really about
individual packets (called "pipe buffers"), and this simply exposes that
as a special packetized mode.

When we are in the packetized mode (marked by O_DIRECT as suggested by
Alan Cox), a write() on a pipe will not merge the new data with previous
writes, so each write will get a pipe buffer of its own.  The pipe
buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
will tell the reader side to break the read at that boundary (and throw
away any partial packet contents that do not fit in the read buffer).

End result: as long as you do writes less than PIPE_BUF in size (so that
the pipe doesn't have to split them up), you can now treat the pipe as a
packet interface, where each read() system call will read one packet at
a time.  You can just use a sufficiently big read buffer (PIPE_BUF is
sufficient, since bigger than that doesn't guarantee atomicity anyway),
and the return value of the read() will naturally give you the size of
the packet.

NOTE! We do not support zero-sized packets, and zero-sized reads and
writes to a pipe continue to be no-ops.  Also note that big packets will
currently be split at write time, but that the size at which that
happens is not really specified (except that it's bigger than PIPE_BUF).
Currently that limit is the system page size, but we might want to
explicitly support bigger packets some day.

The main user for this is going to be the autofs packet interface,
allowing us to stop having to care so deeply about exact packet sizes
(which have had bugs with 32/64-bit compatibility modes).  But user
space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
fail with an EINVAL on kernels that do not support this interface.

Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: David Miller <davem@davemloft.net>
Cc: Ian Kent <raven@themaw.net>
Cc: Thomas Meyer <thomas@m3y3r.de>
Cc: stable@kernel.org  # needed for systemd/autofs interaction fix
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-04-29 13:12:42 -07:00
Linus Torvalds 8d7d1adcd7 USB fixes for 3.4-rc5
Here are a number of small USB fixes for 3.4-rc5.
 
 Nothing major, as before, some USB gadget fixes.  There's a crash fix
 for a number of ASUS laptops on resume that had been reported by a
 number of different people.  We think the fix might also pertain to
 other machines, as this was a BIOS bug, and they seem to travel to
 different models and manufacturers quite easily.  Other than that, some
 other reported problems fixed as well.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.18 (GNU/Linux)
 
 iEYEABECAAYFAk+dZ7kACgkQMUfUDdst+ylxwACdFS4V69bnBi2nw9spEYClh+FB
 SToAoLUFsDiTDexIMIMWZkCyq+bqVLG+
 =Pzvi
 -----END PGP SIGNATURE-----

Merge tag 'usb-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg Kroah-Hartman:
 "Here are a number of small USB fixes for 3.4-rc5.

  Nothing major, as before, some USB gadget fixes.  There's a crash fix
  for a number of ASUS laptops on resume that had been reported by a
  number of different people.  We think the fix might also pertain to
  other machines, as this was a BIOS bug, and they seem to travel to
  different models and manufacturers quite easily.  Other than that,
  some other reported problems fixed as well."

* tag 'usb-3.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: gadget: udc-core: fix incompatibility with dummy-hcd
  usb: gadget: udc-core: fix wrong call order
  USB: cdc-wdm: fix race leading leading to memory corruption
  USB: EHCI: fix crash during suspend on ASUS computers
  usb gadget: uvc: uvc_request_data::length field must be signed
  usb: gadget: dummy: do not call pullup() on udc_stop()
  usb: musb: davinci.c: add missing unregister
  usb: musb: drop __deprecated flag
  USB: gadget: storage gadgets send wrong error code for unknown commands
  usb: otg: gpio_vbus: Add otg transceiver events and notifiers
2012-04-29 12:17:54 -07:00
Benjamin LaHaise d2cf336167 net/l2tp: add support for L2TP over IPv6 UDP
Now that encap_rcv() works on IPv6 UDP sockets, wire L2TP up to IPv6.
Support has been tested with and without hardware offloading.  This
version fixes the L2TP over localhost issue with incorrect checksums
being reported.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
Benjamin LaHaise d7f3f62167 net/ipv6/udp: UDP encapsulation: introduce encap_rcv hook into IPv6
Now that the sematics of udpv6_queue_rcv_skb() match IPv4's
udp_queue_rcv_skb(), introduce the UDP encap_rcv() hook for IPv6.

Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-28 22:21:51 -04:00
Linus Torvalds b990f9b3cb ARM: SoC fixes for 3.4-rc
Nothing controversial, just another batch of fixes:
 
 - Samsung/exynos fixes for more merge window fallout: build errors and
   warnings mostly, but also some clock/device setup issues on exynos4/5
 - PXA bug and warning fixes related to gpio and pinmux
 - IRQ domain conversion bugfixes for U300 and MSM
 - A regulator setup fix for U300
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPmzKwAAoJEIwa5zzehBx36jMP/3D/oE3AkqI/fhulDU2g8KmO
 9/dVGOsdeKWqH14xcjc79l1V0aX5dtgqRIBQ/c13LetKhpyxY7J+ZVQml2c1jN7G
 +h3DhmA+EzRUxXNuC8Iut6UgoITx+Rk8aGu6s1gD/osn9Ynm3B4+xxjhcEf0cZkD
 aWq7GLipBa8uOd2ryrGftW1BS+tgf+Tcmc6yCxebbv5ARA5lJQRPuP+YgnYNyhzU
 f7SCgkt6IpgcCcxAAzzgWLeFoPK0FfsOUXTkzLYOhJpOrkn82g5MWXDhe1tGddUr
 JqJpYoIEwNzDcsEcImWDuoSFoaqgO6GiOU+8Y/tx6KNVnya9520JAoxBjU1EqgnV
 +jHNHW9JaZs/oc7LbJ774NuV5YZ2XDhpZm1Cc0dBT2oDh2oFmeaXIt6cSA4jGYpm
 4OP7BMDuxBdLlrKk8coDXCUT21HNsgRSkyx6Ic6dVNxvz86sKKE8FMj5m5c82GBk
 n+1C+jCEs6rJi7e/ddCNMa6P0R9my5lq40S7WwOXDPw6YeetYPs5QYz7GJhM+n7z
 f4uT20ggm1qmUWHVtAA2MiuTWlinSEFThRDS7+yRNioTNEfvcT5LmOTYt1UAtFQj
 f9StJUgkBuoB+kuUNDKPtNqy5dT3y6d9L+9gbCIpalxwBOLs/s+0k8nHTYeu/lZ5
 Ir56+pqGpr+7TCuBg28N
 =V4y6
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "Nothing controversial, just another batch of fixes:

   - Samsung/exynos fixes for more merge window fallout: build errors
     and warnings mostly, but also some clock/device setup issues on
     exynos4/5
   - PXA bug and warning fixes related to gpio and pinmux
   - IRQ domain conversion bugfixes for U300 and MSM
   - A regulator setup fix for U300"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: PXA2xx: MFP: fix potential direction bug
  ARM: PXA2xx: MFP: fix bug with MFP_LPM_KEEP_OUTPUT
  arm/sa1100: fix sa1100-rtc memory resource
  ARM: pxa: fix gpio wakeup setting
  ARM: SAMSUNG: add missing MMC_CAP2_BROKEN_VOLTAGE capability
  ARM: EXYNOS: Fix compilation error when CONFIG_OF is not defined
  ARM: EXYNOS: Fix resource on dev-dwmci.c
  ARM: S3C24XX: Fix build warning for S3C2410_PM
  ARM: mini2440_defconfig: Fix build error
  ARM: msm: Fix gic irqdomain support
  ARM: EXYNOS: Fix incorrect initialization of GIC
  ARM: EXYNOS: use 'exynos4-sdhci' as device name for sdhci controllers
  ARM: u300: bump all IRQ numbers by one
  ARM: ux300: Fix unimplementable regulation constraints
2012-04-28 09:28:43 -07:00
Linus Torvalds 84c6a81bc6 Miscellaneous SPI device driver bug fixes for v3.4
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJPmubjAAoJEEFnBt12D9kBPZ4P/2wrFqNy2+Fxatstmb1GKwqW
 Yx0acIZkJK/gfZetFDSiTzM8bcxp/oz2lfqHE7xFq96s36wSZyO9B9Ry+YiD9x4g
 wzN/Gb7UufnuSEEJo19E13sJxMXJ/4dVFHKtJnxSL3k9lC6nO/YkEn7C2OXE9gXy
 d3a8H2jyGK9CPPHzQVxDP+9vYj8RpZcXz1M49H8dZ7UZblPfHlvPNzsclKbuTiFK
 8zg16sX01BjvbawwMuAuJH/SGytKpM7WtlzDni/OplqX3CDjIoNZ8BEqDgXDVcS8
 dwhmhPEf89v5ctWhaDHlskwAkL8oFjwFAHMZ7pjiIZ1aTxPPEKtyegVqPKoQwQmK
 XCbjb9SqFNRXI2nIkbeX1xGHQS1wngRcfWJ13qLDEDi3d3aGLUJDYTtMHPrjvNnI
 kzqbNC6fFuKZNOaDYMb1nqV3oWa8VLlDhXgEn5K8ZzBInZ/12LsqybRqImwLQnN0
 flRCcOWApbHmJB64IAI1NzbG5Y6FxG53lwW+bwkw83W1fWw1WaUNNvJ8HBfDoN3G
 QflSqY8TtRE5vGvTburRbDIF3YBeGsaPDTksIg2t82PZ1SabUfIhpWRbhzS6YvMH
 NiKlgk88Q3VHLUF4TOxDThXZEJpL4VcJWOV0/Y4eSDNNhOtBKfGMoo9EBCfLwevM
 q6EyapSjiDz0pu9jUN/P
 =V4y2
 -----END PGP SIGNATURE-----

Merge tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6

Pull misc SPI device driver bug fixes from Grant Likely.

* tag 'spi-for-linus' of git://git.secretlab.ca/git/linux-2.6:
  spi/spi-bfin5xx: Fix flush of last bit after each spi transfer
  spi/spi-bfin5xx: fix reversed if condition in interrupt mode
  spi/spi_bfin_sport: drop bits_per_word from client data
  spi/bfin_spi: drop bits_per_word from client data
  spi/spi-bfin-sport: move word length setup to transfer handler
  spi/bfin5xx: rename config macro name for bfin5xx spi controller driver
  spi/pl022: Allow request for higher frequency than maximum possible
  spi/bcm63xx: set master driver mode_bits.
  spi/bcm63xx: don't use the stopping state
  spi/bcm63xx: convert to the pump message infrastructure
  spi/spi-ep93xx.c: use dma_transfer_direction instead of dma_data_direction
  spi: fix spi.h kernel-doc warning
  spi/pl022: Fix calculate_effective_freq()
  spi/pl022: Fix range checking for bits per word
2012-04-27 19:52:30 -07:00
John W. Linville 4dcc0637fc Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2012-04-27 15:16:43 -04:00
Randy Dunlap dbabe0d659 spi: fix spi.h kernel-doc warning
Fix kernel-doc warning in spi.h (copy/paste):

Warning(include/linux/spi/spi.h:365): No description found for parameter 'unprepare_transfer_hardware'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-04-27 11:03:38 -06:00
Eric Dumazet 6746960140 ipv6: RTAX_FEATURE_ALLFRAG causes inefficient TCP segment sizing
Quoting Tore Anderson from :
https://bugzilla.kernel.org/show_bug.cgi?id=42572

When RTAX_FEATURE_ALLFRAG is set on a route, the effective TCP segment
size does not take into account the size of the IPv6 Fragmentation
header that needs to be included in outbound packets, causing every
transmitted TCP segment to be fragmented across two IPv6 packets, the
latter of which will only contain 8 bytes of actual payload.

RTAX_FEATURE_ALLFRAG is typically set on a route in response to
receving a ICMPv6 Packet Too Big message indicating a Path MTU of less
than 1280 bytes. 1280 bytes is the minimum IPv6 MTU, however ICMPv6
PTBs with MTU < 1280 are still valid, in particular when an IPv6
packet is sent to an IPv4 destination through a stateless translator.
Any ICMPv4 Need To Fragment packets originated from the IPv4 part of
the path will be translated to ICMPv6 PTB which may then indicate an
MTU of less than 1280.

The Linux kernel refuses to reduce the effective MTU to anything below
1280 bytes, instead it sets it to exactly 1280 bytes, and
RTAX_FEATURE_ALLFRAG is also set. However, the TCP segment size appears
to be set to 1240 bytes (1280 Path MTU - 40 bytes of IPv6 header),
instead of 1232 (additionally taking into account the 8 bytes required
by the IPv6 Fragmentation extension header).

This in turn results in rather inefficient transmission, as every
transmitted TCP segment now is split in two fragments containing
1232+8 bytes of payload.

After this patch, all the outgoing packets that includes a
Fragmentation header all are "atomic" or "non-fragmented" fragments,
i.e., they both have Offset=0 and More Fragments=0.

With help from David S. Miller

Reported-by: Tore Anderson <tore@fud.no>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Tom Herbert <therbert@google.com>
Tested-by: Tore Anderson <tore@fud.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-27 00:03:34 -04:00
Robert Jarzmik b95ace54a2 ARM: pxa: fix gpio wakeup setting
In 3.3, gpio wakeup setting was broken. The call
enable_irq_wake() didn't set up the PXA gpio registers
(PWER, ...) anymore.

Fix it at least for pxa27x. The driver doesn't seem to be
used in pxa25x (weird ...), and the fix doesn't extend to
pxa3xx and pxa95x (which don't have a gpio_set_wake()
available).

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Haojian Zhuang <haojian.zhuang@gmail.com>
2012-04-27 10:46:45 +08:00
Linus Torvalds 110a5c8b38 Merge branch 'akpm' (Andrew's patch-bomb)
Merge fixes from Andrew Morton:
 "13 fixes.  The acerhdf patches aren't (really) fixes.  But they've
  been stuck in my tree for up to two years, sent to Matthew multiple
  times and the developers are unhappy."

* emailed from Andrew Morton <akpm@linux-foundation.org>: (13 patches)
  mm: fix NULL ptr dereference in move_pages
  mm: fix NULL ptr dereference in migrate_pages
  revert "proc: clear_refs: do not clear reserved pages"
  drivers/rtc/rtc-ds1307.c: fix BUG shown with lock debugging enabled
  arch/arm/mach-ux500/mbox-db5500.c: world-writable sysfs fifo file
  hugetlbfs: lockdep annotate root inode properly
  acerhdf: lowered default temp fanon/fanoff values
  acerhdf: add support for new hardware
  acerhdf: add support for Aspire 1410 BIOS v1.3314
  fs/buffer.c: remove BUG() in possible but rare condition
  mm: fix up the vmscan stat in vmstat
  epoll: clear the tfile_check_list on -ELOOP
  mm/hugetlb: fix warning in alloc_huge_page/dequeue_huge_page_vma
2012-04-26 15:24:45 -07:00