OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Eric Dumazet	18e8c134f4	net: Increase NET_SKB_PAD to 64 bytes eth_type_trans() & get_rps_cpus() currently need two 64bytes cache lines in packet to compute rxhash. Increasing NET_SKB_PAD from 32 to 64 reduces the need to one cache line only, and makes RPS faster. NET_IP_ALIGN(2) + ethernet_header(14) + IP_header(20/40) + ports(8) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-06 21:58:51 -07:00
David S. Miller	ffb273623b	netpoll: Use 'bool' for netpoll_rx() return type. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-06 01:31:27 -07:00
WANG Cong	0e34e93177	netpoll: add generic support for bridge and bonding devices This whole patchset is for adding netpoll support to bridge and bonding devices. I already tested it for bridge, bonding, bridge over bonding, and bonding over bridge. It looks fine now. To make bridge and bonding support netpoll, we need to adjust some netpoll generic code. This patch does the following things: 1) introduce two new priv_flags for struct net_device: IFF_IN_NETPOLL which identifies we are processing a netpoll; IFF_DISABLE_NETPOLL is used to disable netpoll support for a device at run-time; 2) introduce one new method for netdev_ops: ->ndo_netpoll_cleanup() is used to clean up netpoll when a device is removed. 3) introduce netpoll_poll_dev() which takes a struct net_device * parameter; export netpoll_send_skb() and netpoll_poll_dev() which will be used later; 4) hide a pointer to struct netpoll in struct netpoll_info, ditto. 5) introduce ->real_dev for struct netpoll. 6) introduce a new status NETDEV_BONDING_DESLAE, which is used to disable netconsole before releasing a slave, to avoid deadlocks. Cc: David Miller <davem@davemloft.net> Cc: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: WANG Cong <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-06 00:47:21 -07:00
David S. Miller	58544feb67	Merge branch 'vhost' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost	2010-05-06 00:26:49 -07:00
David S. Miller	2861a185e3	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-05-05 15:09:05 -07:00
John W. Linville	83163244f8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: drivers/net/wireless/libertas_tf/cmd.c drivers/net/wireless/libertas_tf/main.c	2010-05-05 16:14:16 -04:00
Eric Dumazet	ec7d2f2cf3	net: __alloc_skb() speedup With following patch I can reach maximum rate of my pktgen+udpsink simulator : - 'old' machine : dual quad core E5450 @3.00GHz - 64 UDP rx flows (only differ by destination port) - RPS enabled, NIC interrupts serviced on cpu0 - rps dispatched on 7 other cores. (~130.000 IPI per second) - SLAB allocator (faster than SLUB in this workload) - tg3 NIC - 1.080.000 pps without a single drop at NIC level. Idea is to add two prefetchw() calls in __alloc_skb(), one to prefetch first sk_buff cache line, the second to prefetch the shinfo part. Also using one memset() to initialize all skb_shared_info fields instead of one by one to reduce number of instructions, using long word moves. All skb_shared_info fields before 'dataref' are cleared in __alloc_skb(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-05 01:07:37 -07:00
Eric Dumazet	4f70ecca9c	net: rcu fixes Add hlist_for_each_entry_rcu_bh() and hlist_for_each_entry_continue_rcu_bh() macros, and use them in ipv6_get_ifaddr(), if6_get_first() and if6_get_next() to fix lockdeps warnings. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-03 15:53:54 -07:00
Michael S. Tsirkin	d9d52b5178	tun: add ioctl to modify vnet header size virtio added mergeable buffers mode where 2 bytes of extra info is put after vnet header but before actual data (tun does not need this data). In hindsight, it would have been better to add the new info before the packet: as it is, users need a lot of tricky code to skip the extra 2 bytes in the middle of the iovec, and in fact applications seem to get it wrong, and only work with specific iovec layout. The fact we might need to split iovec also means we might in theory overflow iovec max size. This patch adds a simpler way for applications to handle this, and future proofs the interface against further extensions, by making the size of the virtio net header configurable from userspace. As a result, tun driver will simply skip the extra 2 bytes on both input and output. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: David S. Miller <davem@davemloft.net>	2010-05-03 12:33:13 +03:00
David S. Miller	cd7b5396e7	net: Use explicit "unsigned int" instead of plain "unsigned" in netdevice.h Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-02 22:27:59 -07:00
Changli Gao	dee42870a4	net: fix softnet_stat Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context without any protection. And enqueue_to_backlog should update the netdev_rx_stat of the target CPU. This patch renames softnet_data.total to softnet_data.processed: the number of packets processed in uppper levels(IP stacks). softnet_stat data is moved into softnet_data. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/linux/netdevice.h \| 17 +++++++---------- net/core/dev.c \| 26 ++++++++++++-------------- net/sched/sch_generic.c \| 2 +- 3 files changed, 20 insertions(+), 25 deletions(-) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-02 22:26:57 -07:00
David S. Miller	47d29646a2	net: Inline skb_pull() in eth_type_trans(). In commit `6be8ac2f` ("[NET]: uninline skb_pull, de-bloats a lot") we uninlined skb_pull. But in some critical paths it makes sense to inline this thing and it helps performance significantly. Create an skb_pull_inline() so that we can do this in a way that serves also as annotation. Based upon a patch by Eric Dumazet. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-02 02:21:44 -07:00
Eric Dumazet	4381548237	net: sock_def_readable() and friends RCU conversion sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we need two atomic operations (and associated dirtying) per incoming packet. RCU conversion is pretty much needed : 1) Add a new structure, called "struct socket_wq" to hold all fields that will need rcu_read_lock() protection (currently: a wait_queue_head_t and a struct fasync_struct pointer). [Future patch will add a list anchor for wakeup coalescing] 2) Attach one of such structure to each "struct socket" created in sock_alloc_inode(). 3) Respect RCU grace period when freeing a "struct socket_wq" 4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct socket_wq" 5) Change sk_sleep() function to use new sk->sk_wq instead of sk->sk_sleep 6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside a rcu_read_lock() section. 7) Change all sk_has_sleeper() callers to : - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock) - Use wq_has_sleeper() to eventually wakeup tasks. - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock) 8) sock_wake_async() is modified to use rcu protection as well. 9) Exceptions : macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq" instead of dynamically allocated ones. They dont need rcu freeing. Some cleanups or followups are probably needed, (possible sk_callback_lock conversion to a spinlock for example...). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-01 15:00:15 -07:00
Sjur Braendeland	bece7b2398	caif: Rewritten socket implementation Changes: This is a complete re-write of the socket layer. Making the socket implementation more aligned with the other socket layers and using more of the support functions available in sock.c. Lots of code is copied from af_unix (and some from af_irda). Non-blocking mode should be working as well. Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-28 12:55:14 -07:00
Changli Gao	6e7676c1a7	net: batch skb dequeueing from softnet input_pkt_queue batch skb dequeueing from softnet input_pkt_queue to reduce potential lock contention when RPS is enabled. Note: in the worst case, the number of packets in a softnet_data may be double of netdev_max_backlog. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 15:11:49 -07:00
Changli Gao	a9cbd588fd	net: reimplement softnet_data.output_queue as a FIFO queue reimplement softnet_data.output_queue as a FIFO queue to keep the fairness among the qdiscs rescheduled. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> ---- include/linux/netdevice.h \| 1 + net/core/dev.c \| 22 ++++++++++++---------- 2 files changed, 13 insertions(+), 10 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 14:32:12 -07:00
Felix Fietkau	fd8aaaf351	cfg80211: add ap isolation support This is used to configure APs to not bridge traffic between connected stations. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-27 16:09:21 -04:00
David S. Miller	bb61187465	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/ipmr-2.6	2010-04-27 12:57:39 -07:00
Alexander Duyck	ff846f5293	igb: add support for reporting 5GT/s during probe on PCIe Gen2 This change corrects the fact that we were not reporting Gen2 link speeds when we were in fact connected at Gen2 rates. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-27 12:53:28 -07:00
Rafał Miłecki	5af5542885	ssb: Fix order of definitions and some text space indents Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-26 13:51:09 -04:00
Rafał Miłecki	0a182fd88f	ssb: Use relative offsets for SPROM Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-26 13:51:08 -04:00
Rafał Miłecki	ea2db495f9	ssb: Look for SPROM at different offset on higher rev CC Our offset handling becomes even a little more hackish now. For some reason I do not understand all offsets as inrelative. It assumes base offset is 0x1000 but it will work for now as we make offsets relative anyway by removing base 0x1000. Should be cleaner however. Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-26 13:51:08 -04:00
John W. Linville	d53cdbb94a	ssb: do not read SPROM if it does not exist Attempting to read registers that don't exist on the SSB bus can cause hangs on some boxes. At least some b43 devices are 'in the wild' that don't have SPROMs at all. When the SSB bus support loads, it attempts to read these (non-existant) SPROMs and causes hard hangs on the box -- no console output, etc. This patch adds some intelligence to determine whether or not the SPROM is present before attempting to read it. This avoids those hard hangs on those devices with no SPROM attached to their SSB bus. The SSB-attached devices (e.g. b43, et al.) won't work, but at least the box will survive to test further patches. :-) Signed-off-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: Rafał Miłecki <zajec5@gmail.com> Cc: Larry Finger <Larry.Finger@lwfinger.net> Cc: Michael Buesch <mb@bu3sch.de>	2010-04-26 13:50:54 -04:00
Patrick McHardy	25239cee7e	net: rtnetlink: decouple rtnetlink address families from real address families Decouple rtnetlink address families from real address families in socket.h to be able to add rtnetlink interfaces to code that is not a real address family without increasing AF_MAX/NPROTO. This will be used to add support for multicast route dumping from all tables as the proc interface can't be extended to support anything but the main table without breaking compatibility. This partialy undoes the patch to introduce independant families for routing rules and converts ipmr routing rules to a new rtnetlink family. Similar to that patch, values up to 127 are reserved for real address families, values above that may be used arbitrarily. Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-04-26 16:13:54 +02:00
Brian Haley	4b340ae20d	IPv6: Complete IPV6_DONTFRAG support Finally add support to detect a local IPV6_DONTFRAG event and return the relevant data to the user if they've enabled IPV6_RECVPATHMTU on the socket. The next recvmsg() will return no data, but have an IPV6_PATHMTU as ancillary data. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-23 23:35:29 -07:00
Brian Haley	793b147316	IPv6: data structure changes for new socket options Add underlying data structure changes and basic setsockopt() and getsockopt() support for IPV6_RECVPATHMTU, IPV6_PATHMTU, and IPV6_DONTFRAG. IPV6_PATHMTU is actually fully functional at this point. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-23 23:35:28 -07:00
John W. Linville	3b51cc996e	Merge branch 'master' into for-davem Conflicts: drivers/net/wireless/ath/ath9k/phy.c drivers/net/wireless/iwlwifi/iwl-6000.c drivers/net/wireless/iwlwifi/iwl-debugfs.c	2010-04-23 14:43:45 -04:00
Scott Feldman	286d1e7f73	remove DCB_PROTO_VERSION as we don't do netlink versioning remove DCB_PROTO_VERSION as we don't do netlink versioning Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-22 18:32:12 -07:00
Andrew Hendry	5ebfbc06aa	X25: Add if_x25.h and x25 to device identifiers V2 Feedback from John Hughes. - Add header for userspace implementations such as xot/xoe to use - Use explicit values for interface stability - No changes to driver patches V1 - Use identifiers instead of magic numbers for X25 layer 3 to device interface. - Also fixed checkpatch notes on updated code. [ Add new user header to include/linux/Kbuild -DaveM ] Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-22 16:12:36 -07:00
Paul LeoNerd Evans	40eaf96271	net: Socket filter ancilliary data access for skb->dev->type Add an SKF_AD_HATYPE field to the packet ancilliary data area, giving access to skb->dev->type, as reported in the sll_hatype field. When capturing packets on a PF_PACKET/SOCK_RAW socket bound to all interfaces, there doesn't appear to be a way for the filter program to actually find out the underlying hardware type the packet was captured on. This patch adds such ability. This patch also handles the case where skb->dev can be NULL, such as on netlink sockets. Signed-off-by: Paul Evans <leonerd@leonerd.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-22 16:05:44 -07:00
Stephen Hemminger	e802af9cab	IPv6: Generic TTL Security Mechanism (final version) This patch adds IPv6 support for RFC5082 Generalized TTL Security Mechanism. Not to users of mapped address; the IPV6 and IPV4 socket options are seperate. The server does have to deal with both IPv4 and IPv6 socket options and the client has to handle the different for each family. On client: int ttl = 255; getaddrinfo(argv[1], argv[2], &hint, &result); for (rp = result; rp != NULL; rp = rp->ai_next) { s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol); if (s < 0) continue; if (rp->ai_family == AF_INET) { setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl)); } else if (rp->ai_family == AF_INET6) { setsockopt(s, IPPROTO_IPV6, IPV6_UNICAST_HOPS, &ttl, sizeof(ttl))) } if (connect(s, rp->ai_addr, rp->ai_addrlen) == 0) { ... On server: int minttl = 255 - maxhops; getaddrinfo(NULL, port, &hints, &result); for (rp = result; rp != NULL; rp = rp->ai_next) { s = socket(rp->ai_family, rp->ai_socktype, rp->ai_protocol); if (s < 0) continue; if (rp->ai_family == AF_INET6) setsockopt(s, IPPROTO_IPV6, IPV6_MINHOPCOUNT, &minttl, sizeof(minttl)); setsockopt(s, IPPROTO_IP, IP_MINTTL, &minttl, sizeof(minttl)); if (bind(s, rp->ai_addr, rp->ai_addrlen) == 0) break ... Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-22 15:24:53 -07:00
Richard Röjfors	a1aa8822d5	ks8842: Add platform data for setting mac address This patch adds platform data to the ks8842 driver. Via the platform data a MAC address, to be used by the controller, can be passed. To ensure this MAC address is used, the MAC address is written after each hardware reset. Signed-off-by: Richard Röjfors <richard.rojfors@pelagicore.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-21 16:33:29 -07:00
Eric Dumazet	989a297920	fasync: RCU and fine grained locking kill_fasync() uses a central rwlock, candidate for RCU conversion, to avoid cache line ping pongs on SMP. fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short section instead during whole list scan. Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and fasync_{remove\|add}_entry(). This spinlock is IRQ safe, so sock_fasync() doesnt need its own implementation and can use fasync_helper(), to reduce code size and complexity. We can remove __kill_fasync() direct use in net/socket.c, and rename it to kill_fasync_rcu(). Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-21 16:19:29 -07:00
David S. Miller	87eb367003	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-6000.c net/core/dev.c	2010-04-21 01:14:25 -07:00
Rami Rosen	ccb7c7732e	net: Remove two unnecessary exports (skbuff). There is no need to export skb_under_panic() and skb_over_panic() in skbuff.c, since these methods are used only in skbuff.c ; this patch removes these two exports. It also marks these functions as 'static' and removeS the extern declarations of them from include/linux/skbuff.h Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-20 22:39:53 -07:00
Felix Fietkau	f79d9bad37	mac80211: add flags for STBC (Space-Time Block Coding) Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-20 11:52:21 -04:00
Eric Dumazet	e36fa2f7e9	rps: cleanups struct softnet_data holds many queues, so consistent use "sd" name instead of "queue" is better. Adds a rps_ipi_queued() helper to cleanup enqueue_to_backlog() Adds a _and_irq_disable suffix to net_rps_action() name, as David suggested. incr_input_queue_head() becomes input_queue_head_incr() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-20 01:18:05 -07:00
Eric Dumazet	88751275b8	rps: shortcut net_rps_action() net_rps_action() is a bit expensive on NR_CPUS=64..4096 kernels, even if RPS is not active. Tom Herbert used two bitmasks to hold information needed to send IPI, but a single LIFO list seems more appropriate. Move all RPS logic into net_rps_action() to cleanup net_rx_action() code (remove two ifdefs) Move rps_remote_softirq_cpus into softnet_data to share its first cache line, filling an existing hole. In a future patch, we could call net_rps_action() from process_backlog() to make sure we send IPI before handling this cpu backlog. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-19 13:20:34 -07:00
Linus Torvalds	85341c6136	Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: rcu: Make RCU lockdep check the lockdep_recursion variable rcu: Update docs for rcu_access_pointer and rcu_dereference_protected rcu: Better explain the condition parameter of rcu_dereference_check() rcu: Add rcu_access_pointer and rcu_dereference_protected	2010-04-19 08:35:47 -07:00
Paul E. McKenney	bc293d62b2	rcu: Make RCU lockdep check the lockdep_recursion variable The lockdep facility temporarily disables lockdep checking by incrementing the current->lockdep_recursion variable. Such disabling happens in NMIs and in other situations where lockdep might expect to recurse on itself. This patch therefore checks current->lockdep_recursion, disabling RCU lockdep splats when this variable is non-zero. In addition, this patch removes the "likely()", as suggested by Lai Jiangshan. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Reported-by: David Miller <davem@davemloft.net> Tested-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: dhowells@redhat.com Cc: eric.dumazet@gmail.com LKML-Reference: <20100415195039.GA22623@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-19 08:37:19 +02:00
Tom Herbert	fec5e652e5	rfs: Receive Flow Steering This patch implements receive flow steering (RFS). RFS steers received packets for layer 3 and 4 processing to the CPU where the application for the corresponding flow is running. RFS is an extension of Receive Packet Steering (RPS). The basic idea of RFS is that when an application calls recvmsg (or sendmsg) the application's running CPU is stored in a hash table that is indexed by the connection's rxhash which is stored in the socket structure. The rxhash is passed in skb's received on the connection from netif_receive_skb. For each received packet, the associated rxhash is used to look up the CPU in the hash table, if a valid CPU is set then the packet is steered to that CPU using the RPS mechanisms. The convolution of the simple approach is that it would potentially allow OOO packets. If threads are thrashing around CPUs or multiple threads are trying to read from the same sockets, a quickly changing CPU value in the hash table could cause rampant OOO packets-- we consider this a non-starter. To avoid OOO packets, this solution implements two types of hash tables: rps_sock_flow_table and rps_dev_flow_table. rps_sock_table is a global hash table. Each entry is just a CPU number and it is populated in recvmsg and sendmsg as described above. This table contains the "desired" CPUs for flows. rps_dev_flow_table is specific to each device queue. Each entry contains a CPU and a tail queue counter. The CPU is the "current" CPU for a matching flow. The tail queue counter holds the value of a tail queue counter for the associated CPU's backlog queue at the time of last enqueue for a flow matching the entry. Each backlog queue has a queue head counter which is incremented on dequeue, and so a queue tail counter is computed as queue head count + queue length. When a packet is enqueued on a backlog queue, the current value of the queue tail counter is saved in the hash entry of the rps_dev_flow_table. And now the trick: when selecting the CPU for RPS (get_rps_cpu) the rps_sock_flow table and the rps_dev_flow table for the RX queue are consulted. When the desired CPU for the flow (found in the rps_sock_flow table) does not match the current CPU (found in the rps_dev_flow table), the current CPU is changed to the desired CPU if one of the following is true: - The current CPU is unset (equal to RPS_NO_CPU) - Current CPU is offline - The current CPU's queue head counter >= queue tail counter in the rps_dev_flow table. This checks if the queue tail has advanced beyond the last packet that was enqueued using this table entry. This guarantees that all packets queued using this entry have been dequeued, thus preserving in order delivery. Making each queue have its own rps_dev_flow table has two advantages: 1) the tail queue counters will be written on each receive, so keeping the table local to interrupting CPU s good for locality. 2) this allows lockless access to the table-- the CPU number and queue tail counter need to be accessed together under mutual exclusion from netif_receive_skb, we assume that this is only called from device napi_poll which is non-reentrant. This patch implements RFS for TCP and connected UDP sockets. It should be usable for other flow oriented protocols. There are two configuration parameters for RFS. The "rps_flow_entries" kernel init parameter sets the number of entries in the rps_sock_flow_table, the per rxqueue sysfs entry "rps_flow_cnt" contains the number of entries in the rps_dev_flow table for the rxqueue. Both are rounded to power of two. The obvious benefit of RFS (over just RPS) is that it achieves CPU locality between the receive processing for a flow and the applications processing; this can result in increased performance (higher pps, lower latency). The benefits of RFS are dependent on cache hierarchy, application load, and other factors. On simple benchmarks, we don't necessarily see improvement and sometimes see degradation. However, for more complex benchmarks and for applications where cache pressure is much higher this technique seems to perform very well. Below are some benchmark results which show the potential benfit of this patch. The netperf test has 500 instances of netperf TCP_RR test with 1 byte req. and resp. The RPC test is an request/response test similar in structure to netperf RR test ith 100 threads on each host, but does more work in userspace that netperf. e1000e on 8 core Intel No RFS or RPS 104K tps at 30% CPU No RFS (best RPS config): 290K tps at 63% CPU RFS 303K tps at 61% CPU RPC test tps CPU% 50/90/99% usec latency Latency StdDev No RFS/RPS 103K 48% 757/900/3185 4472.35 RPS only: 174K 73% 415/993/2468 491.66 RFS 223K 73% 379/651/1382 315.61 Signed-off-by: Tom Herbert <therbert@google.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-16 16:01:27 -07:00
Grazvydas Ignotas	a02a295680	wl1251: add support for dedicated IRQ line wl1251 has WLAN_IRQ pin for generating interrupts to host processor, which is mandatory in SPI mode and optional in SDIO mode (which can use SDIO interrupts instead). However TI recommends using deditated IRQ line for SDIO too. Add support for using dedicated interrupt line with SDIO, but also leave ability to switch to SDIO interrupts in case it's needed. Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Reviewed-by: Bob Copeland <me@bobcopeland.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2010-04-16 15:47:14 -04:00
David S. Miller	3eb14b944f	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2010-04-15 14:31:06 -07:00
John W. Linville	5c01d56693	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem Conflicts: Documentation/feature-removal-schedule.txt drivers/net/wireless/ath/ath5k/phy.c drivers/net/wireless/wl12xx/wl1271_main.c	2010-04-15 16:21:34 -04:00
Linus Torvalds	2fed94c032	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: cdev: change license of exported header files to MIT license firewire: cdev: comment fixlet firewire: cdev: iso packet documentation firewire: cdev: fix information leak firewire: cdev: require quadlet-aligned headers for transmit packets firewire: cdev: disallow receive packets without header	2010-04-15 11:56:20 -07:00
Linus Torvalds	00eef7bd01	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: wacom - switch mode upon system resume Revert "Input: wacom - merge out and in prox events" Input: matrix_keypad - allow platform to disable key autorepeat Input: ALPS - add signature for HP Pavilion dm3 laptops Input: i8042 - spelling fix Input: sparse-keymap - implement safer freeing of the keymap Input: update the status of the Multitouch X driver project Input: clarify the no-finger event in multitouch protocol Input: bcm5974 - retract efi-broken suspend_resume Input: sparse-keymap - free the right keymap on error	2010-04-15 11:49:55 -07:00
Stefan Richter	19b3eecc21	firewire: cdev: change license of exported header files to MIT license Among else, this allows projects like libdc1394 to carry copies of the ABI related header files without them or distributors having to worry about effects on the project's overall license terms. Switch to MIT license as suggested by Kristian. Also update the year in the copyright statement according to source history. Cc: Jay Fenlason <fenlason@redhat.com> Acked-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Kristian Høgsberg <krh@bitplanet.net>	2010-04-15 17:50:49 +02:00
Changli Gao	fd793d8905	net: CONFIG_SMP should be CONFIG_RPS Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-15 00:16:59 -07:00
Giuseppe CAVALLARO	e326e8503d	stmmac: new descriptor field for the driver's platform The new enh_desc is used for selecting the enhanced descriptors structure. There are several scenarios; some chips (mac10/100 or gmac) want to use the enhanced descriptors; others want the normal ones. For example, on ST platforms: MAC10/100 uses the normal desc structure and the GMAC uses the enhanced one. It can be useful to get this information from the platform. This could also be decided at run-time looking at the chip's ID number; but it could happen that chips with the same ID want to use different descriptor structure. Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-14 04:49:51 -07:00
David Howells	c08c68dd76	rcu: Better explain the condition parameter of rcu_dereference_check() Better explain the condition parameter of rcu_dereference_check() that describes the conditions under which the dereference is permitted to take place (and incorporate Yong Zhang's suggestion). This condition is only checked under lockdep proving. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: laijs@cn.fujitsu.com Cc: dipankar@in.ibm.com Cc: mathieu.desnoyers@polymtl.ca Cc: josh@joshtriplett.org Cc: dvhltc@us.ibm.com Cc: niv@us.ibm.com Cc: peterz@infradead.org Cc: rostedt@goodmis.org Cc: Valdis.Kletnieks@vt.edu Cc: eric.dumazet@gmail.com LKML-Reference: <1270852752-25278-2-git-send-email-paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-04-14 12:20:04 +02:00

1 2 3 4 5 ...

19776 Commits