OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
David S. Miller	a07aa004c8	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2011-01-20 00:06:15 -08:00
Eric Dumazet	cc7ec456f8	net_sched: cleanups Cleanup net/sched code to current CodingStyle and practices. Reduce inline abuse Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:31:12 -08:00
John Fastabend	b8970f0bfc	net_sched: implement a root container qdisc sch_mqprio This implements a mqprio queueing discipline that by default creates a pfifo_fast qdisc per tx queue and provides the needed configuration interface. Using the mqprio qdisc the number of tcs currently in use along with the range of queues alloted to each class can be configured. By default skbs are mapped to traffic classes using the skb priority. This mapping is configurable. Configurable parameters, struct tc_mqprio_qopt { __u8 num_tc; __u8 prio_tc_map[TC_BITMASK + 1]; __u8 hw; __u16 count[TC_MAX_QUEUE]; __u16 offset[TC_MAX_QUEUE]; }; Here the count/offset pairing give the queue alignment and the prio_tc_map gives the mapping from skb->priority to tc. The hw bit determines if the hardware should configure the count and offset values. If the hardware bit is set then the operation will fail if the hardware does not implement the ndo_setup_tc operation. This is to avoid undetermined states where the hardware may or may not control the queue mapping. Also minimal bounds checking is done on the count/offset to verify a queue does not exceed num_tx_queues and that queue ranges do not overlap. Otherwise it is left to user policy or hardware configuration to create useful mappings. It is expected that hardware QOS schemes can be implemented by creating appropriate mappings of queues in ndo_tc_setup(). One expected use case is drivers will use the ndo_setup_tc to map queue ranges onto 802.1Q traffic classes. This provides a generic mechanism to map network traffic onto these traffic classes and removes the need for lower layer drivers to know specifics about traffic types. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-19 23:31:11 -08:00
Patrick McHardy	14f0290ba4	Merge branch 'master' of /repos/git/net-next-2.6	2011-01-19 23:51:37 +01:00
Linus Torvalds	d018b6f4f1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) GRETH: resolve SMP issues and other problems GRETH: handle frame error interrupts GRETH: avoid writing bad speed/duplex when setting transfer mode GRETH: fixed skb buffer memory leak on frame errors GRETH: GBit transmit descriptor handling optimization GRETH: fix opening/closing GRETH: added raw AMBA vendor/device number to match against. cassini: Fix build bustage on x86. e1000e: consistent use of Rx/Tx vs. RX/TX/rx/tx in comments/logs e1000e: update Copyright for 2011 e1000: Avoid unhandled IRQ r8169: keep firmware in memory. netdev: tilepro: Use is_unicast_ether_addr helper etherdevice.h: Add is_unicast_ether_addr function ks8695net: Use default implementation of ethtool_ops::get_link ks8695net: Disable non-working ethtool operations USB CDC NCM: Don't deref NULL in cdc_ncm_rx_fixup() and don't use uninitialized variable. vxge: Remember to release firmware after upgrading firmware netdev: bfin_mac: Remove is_multicast_ether_addr use in netdev_for_each_mc_addr ipsec: update MAX_AH_AUTH_LEN to support sha512 ...	2011-01-14 13:25:30 -08:00
Patrick McHardy	0134e89c7b	Merge branch 'master' of git://1984.lsi.us.es/net-next-2.6 Conflicts: net/ipv4/route.c Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-14 14:12:37 +01:00
Patrick McHardy	c7066f70d9	netfilter: fix Kconfig dependencies Fix dependencies of netfilter realm match: it depends on NET_CLS_ROUTE, which itself depends on NET_SCHED; this dependency is missing from netfilter. Since matching on realms is also useful without having NET_SCHED enabled and the option really only controls whether the tclassid member is included in route and dst entries, rename the config option to IP_ROUTE_CLASSID and move it outside of traffic scheduling context to get rid of the NET_SCHED dependeny. Reported-by: Vladis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-01-14 13:36:42 +01:00
Eric Dumazet	1ac9ad1394	net: remove dev_txq_stats_fold() After recent changes, (percpu stats on vlan/tunnels...), we dont need anymore per struct netdev_queue tx_bytes/tx_packets/tx_dropped counters. Only remaining users are ixgbe, sch_teql, gianfar & macvlan : 1) ixgbe can be converted to use existing tx_ring counters. 2) macvlan incremented txq->tx_dropped, it can use the dev->stats.tx_dropped counter. 3) sch_teql : almost revert `ab35cd4b8f` (Use net_device internal stats) Now we have ndo_get_stats64(), use it, even for "unsigned long" fields (No need to bring back a struct net_device_stats) 4) gianfar adds a stats structure per tx queue to hold tx_bytes/tx_packets This removes a lockdep warning (and possible lockup) in rndis gadget, calling dev_get_stats() from hard IRQ context. Ref: http://www.spinics.net/lists/netdev/msg149202.html Reported-by: Neil Jones <neiljay@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Jarek Poplawski <jarkao2@gmail.com> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jeff Kirsher <jeffrey.t.kirsher@intel.com> CC: Sandeep Gopalpet <sandeep.kumar@freescale.com> CC: Michal Nazarewicz <mina86@mina86.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-13 21:44:34 -08:00
Linus Torvalds	008d23e485	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits) Documentation/trace/events.txt: Remove obsolete sched_signal_send. writeback: fix global_dirty_limits comment runtime -> real-time ppc: fix comment typo singal -> signal drivers: fix comment typo diable -> disable. m68k: fix comment typo diable -> disable. wireless: comment typo fix diable -> disable. media: comment typo fix diable -> disable. remove doc for obsolete dynamic-printk kernel-parameter remove extraneous 'is' from Documentation/iostats.txt Fix spelling milisec -> ms in snd_ps3 module parameter description Fix spelling mistakes in comments Revert conflicting V4L changes i7core_edac: fix typos in comments mm/rmap.c: fix comment sound, ca0106: Fix assignment to 'channel'. hrtimer: fix a typo in comment init/Kconfig: fix typo anon_inodes: fix wrong function name in comment fix comment typos concerning "consistent" poll: fix a typo in comment ... Fix up trivial conflicts in: - drivers/net/wireless/iwlwifi/iwl-core.c (moved to iwl-legacy.c) - fs/ext4/ext4.h Also fix missed 'diabled' typo in drivers/net/bnx2x/bnx2x.h while at it.	2011-01-13 10:05:56 -08:00
Eric Dumazet	bfe0d0298f	net_sched: factorize qdisc stats handling HTB takes into account skb is segmented in stats updates. Generalize this to all schedulers. They should use qdisc_bstats_update() helper instead of manipulating bstats.bytes and bstats.packets Add bstats_update() helper too for classes that use gnet_stats_basic_packed fields. Note : Right now, TCQ_F_CAN_BYPASS shortcurt can be taken only if no stab is setup on qdisc. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-10 16:07:54 -08:00
Eric Dumazet	44b8288308	net_sched: pfifo_head_drop problem commit `57dbb2d83d` (sched: add head drop fifo queue) introduced pfifo_head_drop, and broke the invariant that sch->bstats.bytes and sch->bstats.packets are COUNTER (increasing counters only) This can break estimators because est_timer() handles unsigned deltas only. A decreasing counter can then give a huge unsigned delta. My mid term suggestion would be to change things so that sch->bstats.bytes and sch->bstats.packets are incremented in dequeue() only, not at enqueue() time. We also could add drop_bytes/drop_packets and provide estimations of drop rates. It would be more sensible anyway for very low speeds, and big bursts. Right now, if we drop packets, they still are accounted in byte/packets abolute counters and rate estimators. Before this mid term change, this patch makes pfifo_head_drop behavior similar to other qdiscs in case of drops : Dont decrement sch->bstats.bytes and sch->bstats.packets Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-05 13:39:17 -08:00
Eric Dumazet	0dfb33a0d7	sch_red: report backlog information Provide child qdisc backlog (byte count) information so that "tc -s qdisc" can report it to user. packet count is already correctly provided. qdisc red 11: parent 1:11 limit 60Kb min 15Kb max 45Kb ecn Sent 3116427684 bytes 1415782 pkt (dropped 8, overlimits 7866 requeues 0) rate 242385Kbit 13630pps backlog 13560b 8p requeues 0 marked 7865 early 1 pdrop 7 other 0 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-01-03 12:13:15 -08:00
Eric Dumazet	18c8d82ae5	sfq: fix slot_dequeue_head() slot_dequeue_head() should make sure slot skb chain is correct in both ways, or we can crash if all possible flows are in use. Jarek pointed out slot_queue_init() can now be done in sfq_init() once, instead each time a flow is setup. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 12:48:55 -08:00
Eric Dumazet	eeaeb068f1	sch_sfq: allow big packets and be fair SFQ is currently 'limited' to small packets, because it uses a 15bit allotment number per flow. Introduce a scale by 8, so that we can handle full size TSO/GRO packets. Use appropriate handling to make sure allot is positive before a new packet is dequeued, so that fairness is respected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-31 12:47:37 -08:00
Eric Dumazet	ee09b3c1cf	sfq: fix sfq class stats handling sfq_walk() runs without qdisc lock. By the time it selects a non empty hash slot and sfq_dump_class_stats() is run (with lock held), slot might have been freed : We then access q->slots[SFQ_EMPTY_SLOT], out of bounds, and crash in slot_queue_walk() On previous kernels, bug is here but out of bounds qs[SFQ_DEPTH] and allot[SFQ_DEPTH] are located in struct sfq_sched_data, so no illegal memory access happens, only possibly wrong data reported to user. Also, slot_dequeue_tail() should make sure slot skb chain is correctly terminated, or sfq_dump_class_stats() can access freed skbs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-22 11:39:59 -08:00
Jiri Kosina	4b7bd36470	Merge branch 'master' into for-next Conflicts: MAINTAINERS arch/arm/mach-omap2/pm24xx.c drivers/scsi/bfa/bfa_fcpim.c Needed to update to apply fixes for which the old branch was too outdated.	2010-12-22 18:57:02 +01:00
Eric Dumazet	eda83e3b63	net_sched: sch_sfq: better struct layouts Here is a respin of patch. I'll send a short patch to make SFQ more fair in presence of large packets as well. Thanks [PATCH v3 net-next-2.6] net_sched: sch_sfq: better struct layouts This patch shrinks sizeof(struct sfq_sched_data) from 0x14f8 (or more if spinlocks are bigger) to 0x1180 bytes, and reduce text size as well. text data bss dec hex filename 4821 152 0 4973 136d old/net/sched/sch_sfq.o 4627 136 0 4763 129b new/net/sched/sch_sfq.o All data for a slot/flow is now grouped in a compact and cache friendly structure, instead of being spreaded in many different points. struct sfq_slot { struct sk_buff skblist_next; struct sk_buff skblist_prev; sfq_index qlen; /* number of skbs in skblist / sfq_index next; / next slot in sfq chain / struct sfq_head dep; / anchor in dep[] chains / unsigned short hash; / hash value (index in ht[]) / short allot; / credit for this slot */ }; Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jarek Poplawski <jarkao2@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 21:32:59 -08:00
David S. Miller	d9993be65a	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-12-20 13:24:14 -08:00
Eric Dumazet	aa3e219997	net_sched: sch_sfq: fix allot handling When deploying SFQ/IFB here at work, I found the allot management was pretty wrong in sfq, even changing allot from short to int... We should init allot for each new flow, not using a previous value found in slot. Before patch, I saw bursts of several packets per flow, apparently denying the default "quantum 1514" limit I had on my SFQ class. class sfq 11:1 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 7p requeues 0 allot 11546 class sfq 11:46 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 1p requeues 0 allot -23873 class sfq 11:78 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 5p requeues 0 allot 11393 After patch, better fairness among each flow, allot limit being respected, allot is positive : class sfq 11:e parent 11: (dropped 0, overlimits 0 requeues 86) backlog 0b 3p requeues 86 allot 596 class sfq 11:94 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1468 class sfq 11:a4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 4p requeues 0 allot 650 class sfq 11:bb parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 596 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 13:18:16 -08:00
Eric Dumazet	c426626324	net_sched: sch_sfq: add backlog info in sfq_dump_class_stats() We currently return for each active SFQ slot the number of packets in queue. We can also give number of bytes accounted for these packets. tc -s class show dev ifb0 Before patch : class sfq 11:3d9 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 0b 3p requeues 0 allot 1266 After patch : class sfq 11:3e4 parent 11: (dropped 0, overlimits 0 requeues 0) backlog 4380b 3p requeues 0 allot 1212 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-20 13:13:56 -08:00
Octavian Purdila	443457242b	net: factorize sync-rcu call in unregister_netdevice_many Add dev_close_many and dev_deactivate_many to factorize another sync-rcu operation on the netdevice unregister path. $ modprobe dummy numdummies=10000 $ ip link set dev dummy* up $ time rmmod dummy Without the patch With the patch real 0m 24.63s real 0m 5.15s user 0m 0.00s user 0m 0.00s sys 0m 6.05s sys 0m 5.14s Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-16 14:04:44 -08:00
Eric Dumazet	f2cd2d3e9b	net sched: use xps information for qdisc NUMA affinity Allocate qdisc memory according to NUMA properties of cpus included in xps map. To be effective, qdisc should be (re)setup after changes of /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus I added a numa_node field in struct netdev_queue, containing NUMA node if all cpus included in xps_cpus share same node, else -1. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Tom Herbert <therbert@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-12-01 12:47:42 -08:00
Eric Dumazet	5a0d2268d2	net: add netif_tx_queue_frozen_or_stopped When testing struct netdev_queue state against FROZEN bit, we also test XOFF bit. We can test both bits at once and save some cycles. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-28 10:47:18 -08:00
Michael Witten	c996d8b9a8	Docs/Kconfig: Update: osdl.org -> linuxfoundation.org Some of the documentation refers to web pages under the domain `osdl.org'. However, `osdl.org' now redirects to `linuxfoundation.org'. Rather than rely on redirections, this patch updates the addresses appropriately; for the most part, only documentation that is meant to be current has been updated. The patch should be pretty quick to scan and check; each new web-page url was gotten by trying out the original URL in a browser and then simply copying the the redirected URL (formatting as necessary). There is some conflict as to which one of these domain names is preferred: linuxfoundation.org linux-foundation.org So, I wrote: info@linuxfoundation.org and got this reply: Message-ID: <4CE17EE6.9040807@linuxfoundation.org> Date: Mon, 15 Nov 2010 10:41:42 -0800 From: David Ames <david@linuxfoundation.org> ... linuxfoundation.org is preferred. The canonical name for our web site is www.linuxfoundation.org. Our list site is actually lists.linux-foundation.org. Regarding email linuxfoundation.org is preferred there are a few people who choose to use linux-foundation.org for their own reasons. Consequently, I used `linuxfoundation.org' for web pages and `lists.linux-foundation.org' for mailing-list web pages and email addresses; the only personal email address I updated from `@osdl.org' was that of Andrew Morton, who prefers `linux-foundation.org' according `git log'. Signed-off-by: Michael Witten <mfwitten@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2010-11-15 23:50:13 +01:00
stephen hemminger	4c46ee5258	classifier: report statistics for basic classifier The basic classifier keeps statistics but does not report it to user space. This showed up when using basic classifier (with police) as a default catch all on ingress; no statistics were reported. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-08 12:17:05 -08:00
Herbert Xu	c00b2c9e79	cls_cgroup: Fix crash on module unload Somewhere along the lines net_cls_subsys_id became a macro when cls_cgroup is built as a module. Not only did it make cls_cgroup completely useless, it also causes it to crash on module unload. This patch fixes this by removing that macro. Thanks to Eric Dumazet for diagnosing this problem. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-11-03 18:50:50 -07:00
Thomas Graf	5ec1cea057	text ematch: check for NULL pointer before destroying textsearch config While validating the configuration em_ops is already set, thus the individual destroy functions are called, but the ematch data has not been allocated and associated with the ematch yet. Signed-off-by: Thomas Graf <tgraf@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-31 09:37:38 -07:00
Linus Torvalds	5f05647dd8	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits) bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL. vlan: Calling vlan_hwaccel_do_receive() is always valid. tproxy: use the interface primary IP address as a default value for --on-ip tproxy: added IPv6 support to the socket match cxgb3: function namespace cleanup tproxy: added IPv6 support to the TPROXY target tproxy: added IPv6 socket lookup function to nf_tproxy_core be2net: Changes to use only priority codes allowed by f/w tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled tproxy: added tproxy sockopt interface in the IPV6 layer tproxy: added udp6_lib_lookup function tproxy: added const specifiers to udp lookup functions tproxy: split off ipv6 defragmentation to a separate module l2tp: small cleanup nf_nat: restrict ICMP translation for embedded header can: mcp251x: fix generation of error frames can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set can-raw: add msg_flags to distinguish local traffic 9p: client code cleanup rds: make local functions/variables static ... Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and drivers/net/wireless/ath/ath9k/debug.c as per David	2010-10-23 11:47:02 -07:00
David S. Miller	9941fb6276	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2010-10-21 08:21:34 -07:00
Changli Gao	3511c9132f	net_sched: remove the unused parameter of qdisc_create_dflt() The first parameter dev isn't in use in qdisc_create_dflt(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:47 -07:00
Eric Dumazet	7b5edbc4cf	net/sched: fix missing spinlock init Under network load, doing : tc qdisc del dev eth0 root triggers : [ 167.193087] BUG: spinlock bad magic on CPU#3, udpflood/4928 [ 167.193139] lock: c15bc324, .magic: 00000000, .owner: <none>/-1, .owner_cpu: -1 [ 167.193193] Pid: 4928, comm: udpflood Not tainted 2.6.36-rc7-11417-g215340c-dirty #323 [ 167.193245] Call Trace: [ 167.193292] [<c13abaa0>] ? printk+0x18/0x20 [ 167.193342] [<c11afb53>] spin_bug+0xa3/0xf0 [ 167.193389] [<c11afcdd>] do_raw_spin_lock+0x7d/0x160 [ 167.193440] [<c1313d4e>] ? __dev_xmit_skb+0x27e/0x2b0 [ 167.193496] [<c107382b>] ? trace_hardirqs_on+0xb/0x10 [ 167.193545] [<c13ae99a>] _raw_spin_lock+0x3a/0x40 [ 167.193593] [<c1313d4e>] ? __dev_xmit_skb+0x27e/0x2b0 [ 167.193641] [<c1313d4e>] __dev_xmit_skb+0x27e/0x2b0 commit `79640a4ca6` (add additional lock to qdisc to increase throughput) forgot to initialize noop_qdisc and noqueue_qdisc busylock Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-21 03:09:40 -07:00
Venkatesh Pallipadi	75e1056f5c	sched: Fix softirq time accounting Peter Zijlstra found a bug in the way softirq time is accounted in VIRT_CPU_ACCOUNTING on this thread: http://lkml.indiana.edu/hypermail//linux/kernel/1009.2/01366.html The problem is, softirq processing uses local_bh_disable internally. There is no way, later in the flow, to differentiate between whether softirq is being processed or is it just that bh has been disabled. So, a hardirq when bh is disabled results in time being wrongly accounted as softirq. Looking at the code a bit more, the problem exists in !VIRT_CPU_ACCOUNTING as well. As account_system_time() in normal tick based accouting also uses softirq_count, which will be set even when not in softirq with bh disabled. Peter also suggested solution of using 2*SOFTIRQ_OFFSET as irq count for local_bh_{disable,enable} and using just SOFTIRQ_OFFSET while softirq processing. The patch below does that and adds API in_serving_softirq() which returns whether we are currently processing softirq or not. Also changes one of the usages of softirq_count in net/sched/cls_cgroup.c to in_serving_softirq. Looks like many usages of in_softirq really want in_serving_softirq. Those changes can be made individually on a case by case basis. Signed-off-by: Venkatesh Pallipadi <venki@google.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <1286237003-12406-2-git-send-email-venki@google.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>	2010-10-18 20:52:20 +02:00
Jan Engelhardt	243bf6e29e	netfilter: xtables: resolve indirect macros 3/3	2010-10-13 18:00:46 +02:00
Jan Engelhardt	87a2e70db6	netfilter: xtables: resolve indirect macros 2/3 Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-10-13 18:00:41 +02:00
Eric Dumazet	0ed8ddf404	neigh: Protect neigh->ha[] with a seqlock Add a seqlock in struct neighbour to protect neigh->ha[], and avoid dirtying neighbour in stress situation (many different flows / dsts) Dirtying takes place because of read_lock(&n->lock) and n->used writes. Switching to a seqlock, and writing n->used only on jiffies changes permits less dirtying. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-11 12:54:04 -07:00
Changli Gao	e18434c457	net_sched: use __TCA_HTB_MAX and TCA_HTB_MAX Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-09 09:22:53 -07:00
David S. Miller	69259abb64	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/pcmcia/pcnet_cs.c net/caif/caif_socket.c	2010-10-06 19:39:31 -07:00
Dan Carpenter	4e18b3edf7	cls_u32: signedness bug skb_headroom() is unsigned so "skb_headroom(skb) + toff" is also unsigned and can't be less than zero. This test was added in 66d50d25: "u32: negative offset fix" It was supposed to fix a regression. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 00:40:39 -07:00
Eric Dumazet	24824a09e3	net: dynamic ingress_queue allocation ingress being not used very much, and net_device->ingress_queue being quite a big object (128 or 256 bytes), use a dynamic allocation if needed (tc qdisc add dev eth0 ingress ...) dev_ingress_queue(dev) helper should be used only with RTNL taken. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-10-05 00:23:44 -07:00
Eric Dumazet	bfa5ae63b8	net: rename netdev rx_queue to ingress_queue There is some confusion with rx_queue name after RPS, and net drivers private rx_queue fields. I suggest to rename "struct net_device"->rx_queue to ingress_queue. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-29 13:25:53 -07:00
David S. Miller	e40051d134	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/qlcnic/qlcnic_init.c net/ipv4/ip_output.c	2010-09-27 01:03:03 -07:00
David S. Miller	a505b3b30f	sch_atm: Fix potential NULL deref. The list_head conversion unearther an unnecessary flow check. Since flow is always NULL here we don't need to see if a matching flow exists already. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-12 11:56:44 -07:00
David S. Miller	e548833df8	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/mac80211/main.c	2010-09-09 22:27:33 -07:00
Michal Soltys	3b2eb6131e	net/sched/sch_hfsc.c: initialize parent's cl_cfmin properly in init_vf() This patch fixes init_vf() function, so on each new backlog period parent's cl_cfmin is properly updated (including further propgation towards the root), even if the activated leaf has no upperlimit curve defined. Signed-off-by: Michal Soltys <soltys@ziu.info> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-01 14:29:35 -07:00
Jeff Mahoney	0f04cfd098	net sched: fix kernel leak in act_police While reviewing commit `1c40be12f7`, I audited other users of tc_action_ops->dump for information leaks. That commit covered almost all of them but act_police still had a leak. opt.limit and opt.capab aren't zeroed out before the structure is passed out. This patch uses the C99 initializers to zero everything unused out. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-09-01 14:29:34 -07:00
Stephen Hemminger	c2e3143e3c	tc: add meta match on receive hash Trivial extension to existing meta data match rules to allow matching on skb receive hash value. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-24 14:48:10 -07:00
Changli Gao	0eec32ff35	net_sched: act_csum: coding style cleanup Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-23 20:43:15 -07:00
David S. Miller	7abac68602	pkt_sched: Make act_csum depend upon INET. It uses ip_send_check() and stuff like that. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-23 20:42:11 -07:00
Stephen Rothwell	2436243a39	net/sched: need to include net/ip6_checksum.h for the declararion of csum_ipv6_magic. Fixes this build error on PowerPC (at least): net/sched/act_csum.c: In function 'tcf_csum_ipv6_icmp': net/sched/act_csum.c:178: error: implicit declaration of function 'csum_ipv6_magic' Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-22 20:31:14 -07:00
Changli Gao	739a91ef06	net_sched: cls_flow: add key rxhash We can use rxhash to classify the traffic into flows. As rxhash maybe supplied by NIC or RPS, it is cheaper. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-21 23:40:14 -07:00
David S. Miller	d3c6e7ad09	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-08-21 23:32:24 -07:00
Grégoire Baron	eb4d406545	net/sched: add ACT_CSUM action to update packets checksums net/sched: add ACT_CSUM action to update packets checksums ACT_CSUM can be called just after ACT_PEDIT in order to re-compute some altered checksums in IPv4 and IPv6 packets. The following checksums are supported by this patch: - IPv4: IPv4 header, ICMP, IGMP, TCP, UDP & UDPLite - IPv6: ICMPv6, TCP, UDP & UDPLite It's possible to request in the same action to update different kind of checksums, if the packets flow mix TCP, UDP and UDPLite, ... An example of usage is done in the associated iproute2 patch. Version 3 changes: - remove useless goto instructions - improve IPv6 hop options decoding Version 2 changes: - coding style correction - remove useless arguments of some functions - use stack in tcf_csum_dump() - add tcf_csum_skb_nextlayer() to factor code Signed-off-by: Gregoire Baron <baronchon@n7mm.org> Acked-by: jamal <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-20 01:42:59 -07:00
Changli Gao	b9959c2e44	net_sched: sch_sfq: use proto_ports_offset() to support AH message Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-19 17:16:25 -07:00
Changli Gao	78d3307ede	net_sched: cls_flow: use proto_ports_offset() to support AH message Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-19 17:16:24 -07:00
Dan Carpenter	00093fab98	net/sched: remove unneeded NULL check There is no need to check "s". nla_data() doesn't return NULL. Also we already dereferenced "s" at this point so it would have oopsed ealier if it were NULL. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-18 14:24:51 -07:00
Eric Dumazet	1c40be12f7	net sched: fix some kernel memory leaks We leak at least 32bits of kernel memory to user land in tc dump, because we dont init all fields (capab ?) of the dumped structure. Use C99 initializers so that holes and non explicit fields are zeroed. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-17 15:12:15 -07:00
Jarek Poplawski	3e9e5a5921	pkt_sched: Check .walk and .leaf class handlers Require qdisc class ops .walk and .leaf for classful qdisc in register_qdisc(). The checks could be done later insted, but these ops are really needed and used by most of classful qdiscs. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-11 01:37:00 -07:00
Jarek Poplawski	41065fba84	pkt_sched: Fix sch_sfq vs tc_modify_qdisc oops sch_sfq as a classful qdisc needs the .leaf handler. Otherwise, there is an oops possible in tc_modify_qdisc()/check_loop(). Fixes commit `7d2681a6ff` Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-11 01:36:59 -07:00
Ben Greear	9871e50edd	net: Use NET_XMIT_SUCCESS where possible. This is based on work originally done by Patric McHardy. Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-10 02:51:11 -07:00
Jarek Poplawski	68fd26b598	pkt_sched: Add some basic qdisc class ops verification. Was: [PATCH] sfq: add dummy bind/unbind handles Verify in register_qdisc() some basic qdisc class handlers are present. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-10 01:39:14 -07:00
Jarek Poplawski	da7115d94a	pkt_sched: sch_sfq: Add dummy unbind_tcf and put handles. Was: [PATCH] sfq: add dummy bind/unbind handles Add dummy .unbind_tcf and .put qdisc class ops for easier verification. (All other schedulers have it like this.) Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-10 01:39:13 -07:00
Jarek Poplawski	eb4a5527b1	pkt_sched: Fix sch_sfq vs tcf_bind_filter oops Since there was added ->tcf_chain() method without ->bind_tcf() to sch_sfq class options, there is oops when a filter is added with the classid parameter. Fixes commit `7d2681a6ff` netdev thread: null pointer at cls_api.c Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Reported-by: Franchoze Eric <franchoze@yandex.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-07 22:45:41 -07:00
Changli Gao	f2f009812f	sch_sfq: add sanity check for the packet length The packet length should be checked before the packet data is dereferenced. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-04 21:53:16 -07:00
Changli Gao	12dc96d167	cls_rsvp: add sanity check for the packet length The packet length should be checked before the packet data is dereferenced. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-04 21:53:15 -07:00
Changli Gao	4b95c3d40d	cls_flow: add sanity check for the packet length The packet length should be checked before the packet data is dereferenced. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-04 21:53:15 -07:00
Changli Gao	36d12690a2	act_nat: fix on the TX path On the TX path, skb->data points to the ethernet header, not the network header. So when validating the packet length for accessing we should take the ethernet header into account. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-04 21:53:14 -07:00
David S. Miller	00dad5e479	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/e1000e/hw.h net/bridge/br_device.c net/bridge/br_input.c	2010-08-02 22:22:46 -07:00
stephen hemminger	66d50d2550	u32: negative offset fix It was possible to use a negative offset in a u32 match to reference the ethernet header or other parts of the link layer header. This fixes the regression caused by: commit `fbc2e7d9cf` Author: Changli Gao <xiaosuo@gmail.com> Date: Wed Jun 2 07:32:42 2010 -0700 cls_u32: use skb_header_pointer() to dereference data safely Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-08-02 22:07:45 -07:00
Changli Gao	3a3dfb062c	act_nat: the checksum of ICMP doesn't have pseudo header after updating the value of the ICMP payload, inet_proto_csum_replace4() should be called with zero pseudohdr. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-31 22:04:55 -07:00
Changli Gao	072d79a31a	act_nat: fix wild pointer pskb_may_pull() may change skb pointers, so adjust icmph after pskb_may_pull(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-31 22:04:54 -07:00
David S. Miller	bb7e95c8fd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x_main.c Merge bnx2x bug fixes in by hand... :-/ Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-27 21:01:35 -07:00
stephen hemminger	3b87956ea6	net sched: fix race in mirred device removal This fixes hang when target device of mirred packet classifier action is removed. If a mirror or redirection action is configured to cause packets to go to another device, the classifier holds a ref count, but was assuming the adminstrator cleaned up all redirections before removing. The fix is to add a notifier and cleanup during unregister. The new list is implicitly protected by RTNL mutex because it is held during filter add/delete as well as notifier. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-24 21:04:20 -07:00
David S. Miller	11fe883936	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/vhost/net.c net/bridge/br_device.c Fix merge conflict in drivers/vhost/net.c with guidance from Stephen Rothwell. Revert the effects of net-2.6 commit `573201f36f` since net-next-2.6 has fixes that make bridge netpoll work properly thus we don't need it disabled. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-20 18:25:24 -07:00
Eric Dumazet	d6d9ca0fec	net: this_cpu_xxx conversions Use modern this_cpu_xxx() api, saving few bytes on x86 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-19 15:12:51 -07:00
David S. Miller	6accec76f6	sch_atm: Convert to use standard list_head facilities. Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-18 19:52:55 -07:00
Dan Carpenter	0eff683f73	net/sched: potential data corruption The reset_policy() does: memset(d->tcfd_defdata, 0, SIMP_MAX_DATA); strlcpy(d->tcfd_defdata, defdata, SIMP_MAX_DATA); In the original code, the size of d->tcfd_defdata wasn't fixed and if strlen(defdata) was less than 31, reset_policy() would cause memory corruption. Please Note: The original alloc_defdata() assumes defdata is 32 characters and a NUL terminator while reset_policy() assumes defdata is 31 characters and a NUL. This patch updates alloc_defdata() to match reset_policy() (ie a shorter string). I'm not very familiar with this code so please review carefully. Signed-off-by: Dan Carpenter <error27@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-14 17:56:37 -07:00
Changli Gao	70c2efa5a3	act_nat: not all of the ICMP packets need an IP header payload not all of the ICMP packets need an IP header payload, so we check the length of the skbs only when the packets should have an IP header payload. Based upon analysis and initial patch by Rodrigo Partearroyo González. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> ---- net/sched/act_nat.c \| 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-07-12 20:00:19 -07:00
Changli Gao	504f85c9d0	act_nat: use stack variable act_nat: use stack variable structure tc_nat isn't too big for stack, so we can put it in stack. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/sched/act_nat.c \| 31 ++++++++++--------------------- 1 file changed, 10 insertions(+), 21 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-30 12:12:37 -07:00
Changli Gao	5acbf7f10b	act_mirred: combine duplicate code act_mirred: combine duplicate code tcf_bstats is updated in any way, so we can do it earlier to reduce the size of the code. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> ---- net/sched/act_mirred.c \| 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-30 12:12:36 -07:00
Changli Gao	210d6de78c	act_mirred: don't clone skb when skb isn't shared don't clone skb when skb isn't shared When the tcf_action is TC_ACT_STOLEN, and the skb isn't shared, we don't need to clone a new skb. As the skb will be freed after this function returns, we can use it freely once we get a reference to it. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/net/sch_generic.h \| 11 +++++++++-- net/sched/act_mirred.c \| 6 +++--- 2 files changed, 12 insertions(+), 5 deletions(-) Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-28 23:24:32 -07:00
David S. Miller	8244132ea8	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: net/ipv4/ip_output.c	2010-06-23 18:26:27 -07:00
Tom Hughes	fa68a78227	Clear IFF_XMIT_DST_RELEASE for teql interfaces https://bugzilla.kernel.org/show_bug.cgi?id=16183 The sch_teql module, which can be used to load balance over a set of underlying interfaces, stopped working after 2.6.30 and has been broken in all kernels since then for any underlying interface which requires the addition of link level headers. The problem is that the transmit routine relies on being able to access the destination address in the skb in order to do address resolution once it has decided which underlying interface it is going to transmit through. In 2.6.31 the IFF_XMIT_DST_RELEASE flag was introduced, and set by default for all interfaces, which causes the destination address to be released before the transmit routine for the interface is called. The solution is to clear that flag for teql interfaces. Signed-off-by: Tom Hughes <tom@compton.nu> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-16 14:47:30 -07:00
Eric Dumazet	c7de2cf053	pkt_sched: gen_kill_estimator() rcu fixes gen_kill_estimator() API is incomplete or not well documented, since caller should make sure an RCU grace period is respected before freeing stats_lock. This was partially addressed in commit `5d944c640b` (gen_estimator: deadlock fix), but same problem exist for all gen_kill_estimator() users, if lock they use is not already RCU protected. A code review shows xt_RATEEST.c, act_api.c, act_police.c have this problem. Other are ok because they use qdisc lock, already RCU protected. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-11 18:37:08 -07:00
jamal	9dacaf17a6	net sched: make pedit check for clones instead Now that the core path doesnt set OK to munge we detect writable skbs by looking to see if they are cloned. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-07 01:10:43 -07:00
Changli Gao	f2a03367c0	htb: remove two unnecessary assignments remove two unnecessary assignments we don't need to assign NULL when initialize structure objects. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/sched/sch_htb.c \| 2 -- 1 file changed, 2 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-07 01:08:11 -07:00
David S. Miller	eedc765ca4	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/sfc/net_driver.h drivers/net/sfc/siena.c	2010-06-06 17:42:02 -07:00
Changli Gao	db2c24175d	act_pedit: access skb->data safely access skb->data safely we should use skb_header_pointer() and skb_store_bits() to access skb->data to handle small or non-linear skbs. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/sched/act_pedit.c \| 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-03 03:28:27 -07:00
Changli Gao	fbc2e7d9cf	cls_u32: use skb_header_pointer() to dereference data safely use skb_header_pointer() to dereference data safely the original skb->data dereference isn't safe, as there isn't any skb->len or skb_is_nonlinear() check. skb_header_pointer() is used instead in this patch. And when the skb isn't long enough, we terminate the function u32_classify() immediately with -1. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 07:32:42 -07:00
Changli Gao	33c29dde7d	act_nat: fix the wrong checksum when addr isn't in old_addr/mask fix the wrong checksum when addr isn't in old_addr/mask For TCP and UDP packets, when addr isn't in old_addr/mask we don't do SNAT or DNAT, and we should not update layer 4 checksum. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/sched/act_nat.c \| 4 ++++ 1 file changed, 4 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 06:51:34 -07:00
Eric Dumazet	79640a4ca6	net: add additional lock to qdisc to increase throughput When many cpus compete for sending frames on a given qdisc, the qdisc spinlock suffers from very high contention. The cpu owning __QDISC_STATE_RUNNING bit has same priority to acquire the lock, and cannot dequeue packets fast enough, since it must wait for this lock for each dequeued packet. One solution to this problem is to force all cpus spinning on a second lock before trying to get the main lock, when/if they see __QDISC_STATE_RUNNING already set. The owning cpu then compete with at most one other cpu for the main lock, allowing for higher dequeueing rate. Based on a previous patch from Alexander Duyck. I added the heuristic to avoid the atomic in fast path, and put the new lock far away from the cache line used by the dequeue worker. Also try to release the busylock lock as late as possible. Tests with following script gave a boost from ~50.000 pps to ~600.000 pps on a dual quad core machine (E5450 @3.00GHz), tg3 driver. (A single netperf flow can reach ~800.000 pps on this platform) for j in `seq 0 3`; do for i in `seq 0 7`; do netperf -H 192.168.0.1 -t UDP_STREAM -l 60 -N -T $i -- -m 6 & done done Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 05:09:29 -07:00
Eric Dumazet	bc135b23d0	net: Define accessors to manipulate QDISC_STATE_RUNNING Define three helpers to manipulate QDISC_STATE_RUNNIG flag, that a second patch will move on another location. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-06-02 03:23:51 -07:00
Ian Campbell	06c4648d46	arp_notify: allow drivers to explicitly request a notification event. Currently such notifications are only generated when the device comes up or the address changes. However one use case for these notifications is to enable faster network recovery after a virtual machine migration (by causing switches to relearn their MAC tables). A migration appears to the network stack as a temporary loss of carrier and therefore does not trigger either of the current conditions. Rather than adding carrier up as a trigger (which can cause issues when interfaces a flapping) simply add an interface which the driver can use to explicitly trigger the notification. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: Stephen Hemminger <shemminger@linux-foundation.org> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: David S. Miller <davem@davemloft.net> Cc: netdev@vger.kernel.org Cc: stable@kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-31 00:27:44 -07:00
Herbert Xu	f845172531	cls_cgroup: Store classid in struct sock Up until now cls_cgroup has relied on fetching the classid out of the current executing thread. This runs into trouble when a packet processing is delayed in which case it may execute out of another thread's context. Furthermore, even when a packet is not delayed we may fail to classify it if soft IRQs have been disabled, because this scenario is indistinguishable from one where a packet unrelated to the current thread is processed by a real soft IRQ. In fact, the current semantics is inherently broken, as a single skb may be constructed out of the writes of two different tasks. A different manifestation of this problem is when the TCP stack transmits in response of an incoming ACK. This is currently unclassified. As we already have a concept of packet ownership for accounting purposes in the skb->sk pointer, this is a natural place to store the classid in a persistent manner. This patch adds the cls_cgroup classid in struct sock, filling up an existing hole on 64-bit :) The value is set at socket creation time. So all sockets created via socket(2) automatically gains the ID of the thread creating it. Whenever another process touches the socket by either reading or writing to it, we will change the socket classid to that of the process if it has a valid (non-zero) classid. For sockets created on inbound connections through accept(2), we inherit the classid of the original listening socket through sk_clone, possibly preceding the actual accept(2) call. In order to minimise risks, I have not made this the authoritative classid. For now it is only used as a backup when we execute with soft IRQs disabled. Once we're completely happy with its semantics we can use it as the sole classid. Footnote: I have rearranged the error path on cls_group module creation. If we didn't do this, then there is a window where someone could create a tc rule using cls_group before the cgroup subsystem has been registered. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-24 00:12:34 -07:00
Eric Dumazet	53b0f08042	net_sched: Fix qdisc_notify() Ben Pfaff reported a kernel oops and provided a test program to reproduce it. https://kerneltrap.org/mailarchive/linux-netdev/2010/5/21/6277805 tc_fill_qdisc() should not be called for builtin qdisc, or it dereference a NULL pointer to get device ifindex. Fix is to always use tc_qdisc_dump_ignore() before calling tc_fill_qdisc(). Reported-by: Ben Pfaff <blp@nicira.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-23 23:11:07 -07:00
Joe Perches	3fa21e07e6	net: Remove unnecessary returns from void function()s This patch removes from net/ (but not any netfilter files) all the unnecessary return; statements that precede the last closing brace of void functions. It does not remove the returns that are immediately preceded by a label as gcc doesn't like that. Done via: $ grep -rP --include=*.[ch] -l "return;\n}" net/ \| \ xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }' Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 23:23:14 -07:00
stephen hemminger	b60b6592ba	net sched: cleanup and rate limit warning If the user has a bad classification configuration, and gets a packet that goes through too many steps. Chances are more packets will arrive, and the message spew will overrun syslog because it is not rate limited. And because it is not tagged with appropriate priority it can't not be screened. Added the qdisc to the message to try and give some more context when the message does arrive. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 23:23:13 -07:00
stephen hemminger	6ff9c3644e	net sched: printk message severity The previous patch encourage me to go look at all the messages in the network scheduler and fix them. Many messages were missing any severity level. Some serious ones that should never happen were turned into WARN(), and the random noise messages that were handled changed to pr_debug(). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 23:23:12 -07:00
Patrick McHardy	a2f7922713	net_sched: sch_hfsc: fix classification loops When attaching filters to a class pointing to a class higher up in the hierarchy, classification may enter an endless loop. Currently this is prevented for filters that are already resolved, but not for filters resolved at runtime. Only allow filters to point downwards in the hierarchy, similar to what CBQ does. Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:44:46 -07:00
stephen hemminger	f0cd15081a	tbf: stop wanton destruction of children (v2) Several netem users use TBF for rate control. But every time the parameters of TBF are changed it destroys the child qdisc, requiring reconfigation. Better to just keep child qdisc and just notify it of changed limit. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:44:35 -07:00
Eric Dumazet	7fee226ad2	net: add a noref bit on skb dst Use low order bit of skb->_skb_dst to tell dst is not refcounted. Change _skb_dst to _skb_refdst to make sure all uses are catched. skb_dst() returns the dst, regardless of noref bit set or not, but with a lockdep check to make sure a noref dst is not given if current user is not rcu protected. New skb_dst_set_noref() helper to set an notrefcounted dst on a skb. (with lockdep check) skb_dst_drop() drops a reference only if skb dst was refcounted. skb_dst_force() helper is used to force a refcount on dst, when skb is queued and not anymore RCU protected. Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if !IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in sock_queue_rcv_skb(), in __nf_queue(). Use skb_dst_force() in dev_requeue_skb(). Note: dst_use_noref() still dirties dst, we might transform it later to do one dirtying per jiffies. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-17 17:18:50 -07:00
Patrick McHardy	cba7a98a47	Merge branch 'master' of git://dev.medozas.de/linux	2010-05-11 18:59:21 +02:00
Jan Engelhardt	4b560b447d	netfilter: xtables: substitute temporary defines by final name Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-05-11 18:31:17 +02:00
Patrick McHardy	1e4b105712	Merge branch 'master' of /repos/git/net-next-2.6 Conflicts: net/bridge/br_device.c net/bridge/br_forward.c Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-05-10 18:39:28 +02:00
Changli Gao	dee42870a4	net: fix softnet_stat Per cpu variable softnet_data.total was shared between IRQ and SoftIRQ context without any protection. And enqueue_to_backlog should update the netdev_rx_stat of the target CPU. This patch renames softnet_data.total to softnet_data.processed: the number of packets processed in uppper levels(IP stacks). softnet_stat data is moved into softnet_data. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- include/linux/netdevice.h \| 17 +++++++---------- net/core/dev.c \| 26 ++++++++++++-------------- net/sched/sch_generic.c \| 2 +- 3 files changed, 20 insertions(+), 25 deletions(-) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-05-02 22:26:57 -07:00
Eric Dumazet	0eae88f31c	net: Fix various endianness glitches Sparse can help us find endianness bugs, but we need to make some cleanups to be able to more easily spot real bugs. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-20 19:06:52 -07:00
Patrick McHardy	6291055465	Merge branch 'master' of /repos/git/net-next-2.6 Conflicts: Documentation/feature-removal-schedule.txt net/ipv6/netfilter/ip6t_REJECT.c net/netfilter/xt_limit.c Signed-off-by: Patrick McHardy <kaber@trash.net>	2010-04-20 16:02:01 +02:00
David S. Miller	871039f02f	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/stmmac/stmmac_main.c drivers/net/wireless/wl12xx/wl1271_cmd.c drivers/net/wireless/wl12xx/wl1271_main.c drivers/net/wireless/wl12xx/wl1271_spi.c net/core/ethtool.c net/mac80211/scan.c	2010-04-11 14:53:53 -07:00
David S. Miller	4a35ecf8bf	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bonding/bond_main.c drivers/net/via-velocity.c drivers/net/wireless/iwlwifi/iwl-agn.c	2010-04-06 23:53:30 -07:00
Eric Dumazet	5d944c640b	gen_estimator: deadlock fix One of my test machine got a deadlock during "tc" sessions, adding/deleting classes & filters, using traffic estimators. After some analysis, I believe we have a potential use after free case in est_timer() : spin_lock(e->stats_lock); << HERE >> read_lock(&est_lock); if (e->bstats == NULL) << TEST >> goto skip; Test is done a bit late, because after estimator is killed, and before rcu grace period elapsed, we might already have freed/reuse memory where e->stats_locks points to (some qdisc->q.lock) A possible fix is to respect a rcu grace period at Qdisc dismantle time. On 64bit, sizeof(struct Qdisc) is exactly 192 bytes. Adding 16 bytes to it (for struct rcu_head) is a problem because it might change performance, given QDISC_ALIGNTO is 32 bytes. This is why I also change QDISC_ALIGNTO to 64 bytes, to satisfy most current alignment requirements. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-04-01 18:38:48 -07:00
Tom Goff	7e5ab15781	net_sched: minor netns related cleanup These changes were suggested by Alexey Dobriyan <adobriyan@gmail.com>: - psched_show() does not use any private data so just pass NULL to psched_open() - remove unnecessary return statement Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-30 19:44:56 -07:00
Tejun Heo	5a0e3ad6af	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>	2010-03-30 22:02:32 +09:00
Jan Engelhardt	d2a7b6bad2	netfilter: xtables: make use of xt_request_find_target Signed-off-by: Jan Engelhardt <jengelh@medozas.de>	2010-03-25 15:02:19 +01:00
Frans Pop	b138338056	net: remove trailing space in messages Signed-off-by: Frans Pop <elendil@planet.nl> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-24 14:01:54 -07:00
Ben Blum	8e039d84b3	cgroups: net_cls as module Allows the net_cls cgroup subsystem to be compiled as a module This patch modifies net/sched/cls_cgroup.c to allow the net_cls subsystem to be optionally compiled as a module instead of builtin. The cgroup_subsys struct is moved around a bit to allow the subsys_id to be either declared as a compile-time constant by the cgroup_subsys.h include in cgroup.h, or, if it's a module, initialized within the struct by cgroup_load_subsys. Signed-off-by: Ben Blum <bblum@andrew.cmu.edu> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Paul Menage <menage@google.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-23 13:06:14 -07:00
Tom Goff	7316ae88c4	net_sched: make traffic control network namespace aware Mostly minor changes to add a net argument to various functions and remove initial network namespace checks. Make /proc/net/psched per network namespace. Signed-off-by: Tom Goff <thomas.goff@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-03-22 20:26:25 -07:00
David S. Miller	b1109bf085	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2010-02-09 11:44:44 -08:00
Jan Luebbe	d4ae20b379	net/sched: Fix module name in Kconfig The action modules have been prefixed with 'act_', but the Kconfig description was not changed. Signed-off-by: Jan Luebbe <jluebbe@debian.org> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-02-08 22:41:44 -08:00
Hagen Paul Pfeifer	57dbb2d83d	sched: add head drop fifo queue This adds an additional queuing strategy, called pfifo_head_drop, to remove the oldest skb in the case of an overflow within the queue - the head element - instead of the last skb (tail). To remove the oldest skb in congested situations is useful for sensor network environments where newer packets reflect the superior information. Reviewed-by: Florian Westphal <fw@strlen.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2010-01-28 21:27:00 -08:00
Linus Torvalds	4ef58d4e2a	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits) tree-wide: fix misspelling of "definition" in comments reiserfs: fix misspelling of "journaled" doc: Fix a typo in slub.txt. inotify: remove superfluous return code check hdlc: spelling fix in find_pvc() comment doc: fix regulator docs cut-and-pasteism mtd: Fix comment in Kconfig doc: Fix IRQ chip docs tree-wide: fix assorted typos all over the place drivers/ata/libata-sff.c: comment spelling fixes fix typos/grammos in Documentation/edac.txt sysctl: add missing comments fs/debugfs/inode.c: fix comment typos sgivwfb: Make use of ARRAY_SIZE. sky2: fix sky2_link_down copy/paste comment error tree-wide: fix typos "couter" -> "counter" tree-wide: fix typos "offest" -> "offset" fix kerneldoc for set_irq_msi() spidev: fix double "of of" in comment comment typo fix: sybsystem -> subsystem ...	2009-12-09 19:43:33 -08:00
Joe Perches	f64f9e7192	net: Move && and \|\| to end of previous line Not including net/atm/ Compiled tested x86 allyesconfig only Added a > 80 column line or two, which I ignored. Existing checkpatch plaints willfully, cheerfully ignored. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-29 16:55:45 -08:00
Octavian Purdila	09ad9bc752	net: use net_eq to compare nets Generated with the following semantic patch @@ struct net n1; struct net n2; @@ - n1 == n2 + net_eq(n1, n2) @@ struct net n1; struct net n2; @@ - n1 != n2 + !net_eq(n1, n2) applied over {include,net,drivers/net}. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-25 15:14:13 -08:00
Eric Dumazet	8964be4a9a	net: rename skb->iif to skb->skb_iif To help grep games, rename iif to skb_iif Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-20 15:35:04 -08:00
Eric Dumazet	2939e27599	netsched: Allow var_sk_bound_if meta to work on all namespaces This fix can probably wait 2.6.33, or should use another patch if needed in 2.6.32 (no get_dev_by_index_rcu() before 2.6.33) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-18 23:24:41 -08:00
Changli Gao	b76965e02b	act_mirred: optimization. move checking if eaction is valid in tcf_mirred_init() Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-17 04:15:38 -08:00
Changli Gao	feed1f1724	act_mirred: cleanup 1. don't let go back using goto. 2. don't call skb_act_clone() until it is necessary. 3. one exit of the critical context. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-17 04:15:37 -08:00
Jarek Poplawski	9a1654ba0b	net: Optimize hard_start_xmit() return checking Recent changes in the TX error propagation require additional checking and masking of values returned from hard_start_xmit(), mainly to separate cases where skb was consumed. This aim can be simplified by changing the order of NETDEV_TX and NET_XMIT codes, because the latter are treated similarly to negative (ERRNO) values. After this change much simpler dev_xmit_complete() is also used in sch_direct_xmit(), so it is moved to netdevice.h. Additionally NET_RX definitions in netdevice.h are moved up from between TX codes to avoid confusion while reading the TX comment. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-15 22:08:33 -08:00
Patrick McHardy	572a9d7b6f	net: allow to propagate errors through ->ndo_hard_start_xmit() Currently the ->ndo_hard_start_xmit() callbacks are only permitted to return one of the NETDEV_TX codes. This prevents any kind of error propagation for virtual devices, like queue congestion of the underlying device in case of layered devices, or unreachability in case of tunnels. This patches changes the NET_XMIT codes to avoid clashes with the NETDEV_TX codes and changes the two callers of dev_hard_start_xmit() to expect either errno codes, NET_XMIT codes or NETDEV_TX codes as return value. In case of qdisc_restart(), all non NETDEV_TX codes are mapped to NETDEV_TX_OK since no error propagation is possible when using qdiscs. In case of dev_queue_xmit(), the error is propagated upwards. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-13 14:07:32 -08:00
stephen hemminger	f1e9016da6	net: use rcu for network scheduler API Use RCU to walk list of network devices in qdisc dump. This could be optimized for large number of devices. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-10 22:26:30 -08:00
Dirk Hohndel	06fe9fb418	tree-wide: fix a very frequent spelling mistake something-bility is spelled as something-blity so a grep for 'blit' would find these lines this is so trivial that I didn't split it by subsystem / copy additional maintainers - all changes are to comments The only purpose is to get fewer false positives when grepping around the kernel sources. Signed-off-by: Dirk Hohndel <hohndel@infradead.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-11-09 09:40:54 +01:00
Eric Dumazet	bd27a8750c	net_cls: Use __dev_get_by_index() We hold RTNL in tc_dump_tfilter(), we can avoid dev_hold()/dev_put() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-05 22:34:21 -08:00
Eric Dumazet	d0075634cf	em_meta: avoid one dev_put() Another rcu conversion to avoid one dev_hold()/dev_put() pair Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-11-04 05:23:31 -08:00
jamal	1c55d62e77	pkt_sched: skbedit add support for setting mark This adds support for setting the skb mark. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-22 21:56:42 -07:00
David S. Miller	421355de87	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2009-10-13 12:55:20 -07:00
David S. Miller	7fe13c5733	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2009-10-11 23:15:47 -07:00
jamal	53f7e35f8b	pkt_sched: pedit use proper struct This probably deserves to go into -stable. Pedit will reject a policy that is large because it uses the wrong structure in the policy validation. This fixes it. Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-11 23:03:47 -07:00
Jiri Pirko	ad61df918c	netlink: fix typo in initialization Commit `9ef1d4c7c7` ("[NETLINK]: Missing initializations in dumped data") introduced a typo in initialization. This patch fixes this. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-08 01:21:46 -07:00
Eric Dumazet	d250a5f90e	pkt_sched: gen_estimator: Dont report fake rate estimators Jarek Poplawski a écrit : > > > Hmm... So you made me to do some "real" work here, and guess what?: > there is one serious checkpatch warning! ;-) Plus, this new parameter > should be added to the function description. Otherwise: > Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> > > Thanks, > Jarek P. > > PS: I guess full "Don't" would show we really mean it... Okay :) Here is the last round, before the night ! Thanks again [RFC] pkt_sched: gen_estimator: Don't report fake rate estimators We currently send TCA_STATS_RATE_EST elements to netlink users, even if no estimator is running. # tc -s -d qdisc qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 112833764978 bytes 1495081739 pkt (dropped 0, overlimits 0 requeues 0) rate 0bit 0pps backlog 0b 0p requeues 0 User has no way to tell if the "rate 0bit 0pps" is a real estimation, or a fake one (because no estimator is active) After this patch, tc command output is : $ tc -s -d qdisc qdisc pfifo_fast 0: dev eth0 root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 561075 bytes 1196 pkt (dropped 0, overlimits 0 requeues 0) backlog 0b 0p requeues 0 We add a parameter to gnet_stats_copy_rate_est() function so that it can use gen_estimator_active(bstats, r), as suggested by Jarek. This parameter can be NULL if check is not necessary, (htb for example has a mandatory rate estimator) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-10-07 01:07:42 -07:00
Anand Gadiyar	fd589a8f0a	trivial: fix typo "to to" in multiple files Signed-off-by: Anand Gadiyar <gadiyar@ti.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>	2009-09-21 15:14:55 +02:00
Jarek Poplawski	a19d215843	pkt_sched: Fix qstats.qlen updating in dump_stats Some classful qdiscs miss qstats.qlen updating with q.qlen of their child qdiscs in dump_stats methods. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-17 10:26:07 -07:00
Jarek Poplawski	7c64b9f3f5	pkt_sched: Fix qdisc_create on stab error handling If qdisc_get_stab returns error in qdisc_create there is skipped qdisc ops->destroy, which is necessary because it's after ops->init at the moment, so memory leaks are quite probable. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-15 23:42:05 -07:00
Jarek Poplawski	926e61b7c4	pkt_sched: Fix tx queue selection in tc_modify_qdisc After the recent mq change there is the new select_queue qdisc class method used in tc_modify_qdisc, but it works OK only for direct child qdiscs of mq qdisc. Grandchildren always get the first tx queue, which would give wrong qdisc_root etc. results (e.g. for sch_htb as child of sch_prio). This patch fixes it by using parent's dev_queue for such grandchildren qdiscs. The select_queue method's return type is changed BTW. With feedback from: Patrick McHardy <kaber@trash.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-15 02:53:07 -07:00
Jarek Poplawski	036d6a673f	pkt_sched: Fix qdisc_graft WRT ingress qdisc After the recent mq change using ingress qdisc overwrites dev->qdisc; there is also a wrong old qdisc pointer passed to notify_and_destroy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-14 17:03:57 -07:00
Linus Torvalds	d7e9660ad9	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits) netxen: update copyright netxen: fix tx timeout recovery netxen: fix file firmware leak netxen: improve pci memory access netxen: change firmware write size tg3: Fix return ring size breakage netxen: build fix for INET=n cdc-phonet: autoconfigure Phonet address Phonet: back-end for autoconfigured addresses Phonet: fix netlink address dump error handling ipv6: Add IFA_F_DADFAILED flag net: Add DEVTYPE support for Ethernet based devices mv643xx_eth.c: remove unused txq_set_wrr() ucc_geth: Fix hangs after switching from full to half duplex ucc_geth: Rearrange some code to avoid forward declarations phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs drivers/net/phy: introduce missing kfree drivers/net/wan: introduce missing kfree net: force bridge module(s) to be GPL Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded ... Fixed up trivial conflicts: - arch/x86/include/asm/socket.h converted to <asm-generic/socket.h> in the x86 tree. The generic header has the same new #define's, so that works out fine. - drivers/net/tun.c fix conflict between `89f56d1e9` ("tun: reuse struct sock fields") that switched over to using 'tun->socket.sk' instead of the redundantly available (and thus removed) 'tun->sk', and `2b980dbd` ("lsm: Add hooks to the TUN driver") which added a new 'tun->sk' use. Noted in 'next' by Stephen Rothwell.	2009-09-14 10:37:28 -07:00
David S. Miller	9a0da0d19c	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2009-09-10 18:17:09 -07:00
Patrick McHardy	23bcf634c8	net_sched: fix estimator lock selection for mq child qdiscs When new child qdiscs are attached to the mq qdisc, they are actually attached as root qdiscs to the device queues. The lock selection for new estimators incorrectly picks the root lock of the existing and to be replaced qdisc, which results in a use-after-free once the old qdisc has been destroyed. Mark mq qdisc instances with a new flag and treat qdiscs attached to mq as children similar to regular root qdiscs. Additionally prevent estimators from being attached to the mq qdisc itself since it only updates its byte and packet counters during dumps. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-09 18:11:23 -07:00
David S. Miller	6ec1c69a8f	net_sched: add classful multiqueue dummy scheduler This patch adds a classful dummy scheduler which can be used as root qdisc for multiqueue devices and exposes each device queue as a child class. This allows to address queues individually and graft them similar to regular classes. Additionally it presents an accumulated view of the statistics of all real root qdiscs in the dummy root. Two new callbacks are added to the qdisc_ops and qdisc_class_ops: - cl_ops->select_queue selects the tx queue number for new child classes. - qdisc_ops->attach() overrides root qdisc device grafting to attach non-shared qdiscs to the queues. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-06 02:07:05 -07:00
Patrick McHardy	589983cd21	net_sched: move dev_graft_qdisc() to sch_generic.c It will be used in a following patch by the multiqueue qdisc. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-06 02:07:05 -07:00
Patrick McHardy	af356afa01	net_sched: reintroduce dev->qdisc for use by sch_api Currently the multiqueue integration with the qdisc API suffers from a few problems: - with multiple queues, all root qdiscs use the same handle. This means they can't be exposed to userspace in a backwards compatible fashion. - all API operations always refer to queue number 0. Newly created qdiscs are automatically shared between all queues, its not possible to address individual queues or restore multiqueue behaviour once a shared qdisc has been attached. - Dumps only contain the root qdisc of queue 0, in case of non-shared qdiscs this means the statistics are incomplete. This patch reintroduces dev->qdisc, which points to the (single) root qdisc from userspace's point of view. Currently it either points to the first (non-shared) default qdisc, or a qdisc shared between all queues. The following patches will introduce a classful dummy qdisc, which will be used as root qdisc and contain the per-queue qdiscs as children. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-06 02:07:03 -07:00
Patrick McHardy	5b9a9ccfad	net_sched: remove some unnecessary checks in classful schedulers The class argument to the ->graft(), ->leaf(), ->dump(), ->dump_stats() all originate from either ->get() or ->walk() and are always valid. Remove unnecessary checks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-06 02:07:02 -07:00
Patrick McHardy	de6d5cdf88	net_sched: make cls_ops->change and cls_ops->delete optional Some schedulers don't support creating, changing or deleting classes. Make the respective callbacks optionally and consistently return -EOPNOTSUPP for unsupported operations, instead of currently either -EOPNOTSUPP, -ENOSYS or no error. In case of sch_prio and sch_multiq, the removed operations additionally checked for an invalid class. This is not necessary since the class argument can only orginate from ->get() or in case of ->change is 0 for creation of new classes, in which case ->change() incorrectly returned -ENOENT. As a side-effect, this patch fixes a possible (root-only) NULL pointer function call in sch_ingress, which didn't implement a so far mandatory ->delete() operation. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-06 02:07:02 -07:00
Patrick McHardy	71ebe5e919	net_sched: make cls_ops->tcf_chain() optional Some qdiscs don't support attaching filters. Handle this centrally in cls_api and return a proper errno code (EOPNOTSUPP) instead of EINVAL. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-06 02:06:12 -07:00
Patrick McHardy	c9f1d0389b	net_sched: fix class grafting errno codes If the parent qdisc doesn't support classes, use EOPNOTSUPP. If the parent class doesn't exist, use ENOENT. Currently EINVAL is returned in both cases. Additionally check whether grafting is supported and remove a now unnecessary graft function from sch_ingress. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-04 23:10:15 -07:00
Eric Dumazet	16ebb5e0b3	tc: Fix unitialized kernel memory leak Three bytes of uninitialized kernel memory are currently leaked to user Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Jiri Pirko <jpirko@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-02 22:55:17 -07:00
David S. Miller	6cdee2f96a	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/yellowfin.c	2009-09-02 00:32:56 -07:00
David S. Miller	2fbd3da387	pkt_sched: Revert tasklet_hrtimer changes. These are full of unresolved problems, mainly that conversions don't work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers tasklets can't be killed from softirq context. And when a qdisc gets reset, that's exactly what we need to do here. We'll work this out in the net-next-2.6 tree and if warranted we'll backport that work to -stable. This reverts the following 3 changesets: `a2cb6a4dd4` ("pkt_sched: Fix bogon in tasklet_hrtimer changes.") `38acce2d79` ("pkt_sched: Convert CBQ to tasklet_hrtimer.") `ee5f9757ea` ("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer") Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-01 17:59:25 -07:00
Stephen Hemminger	6fef4c0c8e	netdev: convert pseudo-devices to netdev_tx_t Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-09-01 01:13:07 -07:00
Krishna Kumar	a453e0689a	pkt_sched: Fix resource limiting in pfifo_fast pfifo_fast_enqueue has this check: if (skb_queue_len(list) < qdisc_dev(qdisc)->tx_queue_len) { which allows each band to enqueue upto tx_queue_len skbs for a total of 3*tx_queue_len skbs. I am not sure if this was the intention of limiting in qdisc. Patch compiled and 32 simultaneous netperf testing ran fine. Also: # tc -s qdisc show dev eth2 qdisc pfifo_fast 0: root bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 Sent 16835026752 bytes 373116 pkt (dropped 0, overlimits 0 requeues 25) rate 0bit 0pps backlog 0b 0p requeues 25 Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-08-30 22:20:28 -07:00
Krishna Kumar	fd3ae5e8fc	Speed-up pfifo_fast lookup using a private bitmap Maintain a per-qdisc bitmap for pfifo_fast giving availability of skbs for each band. This allows faster lookup for a skb when there are no high priority skbs. Also, it helps in (rare) cases when there are no skbs on the list, where an immediate lookup is faster than iterating through the three bands. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-08-29 00:19:21 -07:00
Patrick McHardy	3a6c2b419b	netlink: constify nlmsghdr arguments Consitfy nlmsghdr arguments to a couple of functions as preparation for the next patch, which will constify the netlink message data in all nfnetlink users. Signed-off-by: Patrick McHardy <kaber@trash.net>	2009-08-25 16:07:40 +02:00
David S. Miller	a2cb6a4dd4	pkt_sched: Fix bogon in tasklet_hrtimer changes. Reported by Stephen Rothwell, luckily it's harmless: net/sched/sch_api.c: In function 'qdisc_watchdog': net/sched/sch_api.c:460: warning: initialization from incompatible pointer type net/sched/sch_cbq.c: In function 'cbq_undelay': net/sched/sch_cbq.c:595: warning: initialization from incompatible pointer type Signed-off-by: David S. Miller <davem@davemloft.net>	2009-08-24 19:37:05 -07:00
David S. Miller	38acce2d79	pkt_sched: Convert CBQ to tasklet_hrtimer. This code expects to run in softirq context, and bare hrtimers run in hw IRQ context. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-23 18:52:38 -07:00
David S. Miller	ee5f9757ea	pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer None of this stuff should execute in hw IRQ context, therefore use a tasklet_hrtimer so that it runs in softirq context. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Thomas Gleixner <tglx@linutronix.de>	2009-08-22 18:09:17 -07:00
Eric Dumazet	c1a8f1f1c8	net: restore gnet_stats_basic to previous definition In `5e140dfc1f` "net: reorder struct Qdisc for better SMP performance" the definition of struct gnet_stats_basic changed incompatibly, as copies of this struct are shipped to userland via netlink. Restoring old behavior is not welcome, for performance reason. Fix is to use a private structure for kernel, and teach gnet_stats_copy_basic() to convert from kernel to user land, using legacy structure (struct gnet_stats_basic) Based on a report and initial patch from Michael Spang. Reported-by: Michael Spang <mspang@csclub.uwaterloo.ca> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-08-17 21:33:49 -07:00
Krishna Kumar	bbd8a0d3a3	net: Avoid enqueuing skb for default qdiscs dev_queue_xmit enqueue's a skb and calls qdisc_run which dequeue's the skb and xmits it. In most cases, the skb that is enqueue'd is the same one that is dequeue'd (unless the queue gets stopped or multiple cpu's write to the same queue and ends in a race with qdisc_run). For default qdiscs, we can remove the redundant enqueue/dequeue and simply xmit the skb since the default qdisc is work-conserving. The patch uses a new flag - TCQ_F_CAN_BYPASS to identify the default fast queue. The controversial part of the patch is incrementing qlen when a skb is requeued - this is to avoid checks like the second line below: + } else if ((q->flags & TCQ_F_CAN_BYPASS) && !qdisc_qlen(q) && >> !q->gso_skb && + !test_and_set_bit(__QDISC_STATE_RUNNING, &q->state)) { Results of a 2 hour testing for multiple netperf sessions (1, 2, 4, 8, 12 sessions on a 4 cpu system-X). The BW numbers are aggregate Mb/s across iterations tested with this version on System-X boxes with Chelsio 10gbps cards: ---------------------------------- Size \| ORG BW NEW BW \| ---------------------------------- 128K \| 156964 159381 \| 256K \| 158650 162042 \| ---------------------------------- Changes from ver1: 1. Move sch_direct_xmit declaration from sch_generic.h to pkt_sched.h 2. Update qdisc basic statistics for direct xmit path. 3. Set qlen to zero in qdisc_reset. 4. Changed some function names to more meaningful ones. Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-08-06 20:10:18 -07:00
Patrick McHardy	6ed106549d	net: use NETDEV_TX_OK instead of 0 in ndo_start_xmit() functions This patch is the result of an automatic spatch transformation to convert all ndo_start_xmit() return values of 0 to NETDEV_TX_OK. Some occurences are missed by the automatic conversion, those will be handled in a seperate patch. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-07-05 19:16:04 -07:00
Eric Dumazet	31e6d363ab	net: correct off-by-one write allocations reports commit `2b85a34e91` (net: No more expensive sock_hold()/sock_put() on each tx) changed initial sk_wmem_alloc value. We need to take into account this offset when reporting sk_wmem_alloc to user, in PROC_FS files or various ioctls (SIOCOUTQ/TIOCOUTQ) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-18 00:29:12 -07:00
Jarek Poplawski	b964758050	pkt_sched: Update drops stats in act_police Action police statistics could be misleading because drops are not shown when expected. With feedback from: Jamal Hadi Salim <hadi@cyberus.ca> Reported-by: Pawel Staszewski <pstaszewski@itcare.pl> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Acked-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-17 18:56:45 -07:00
David S. Miller	9cbc1cb8cd	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c	2009-06-15 03:02:23 -07:00
Jarek Poplawski	ca44d6e60f	pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS (like in PSCHED_TICKS_PER_SEC already) to avoid misleading. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-15 02:31:47 -07:00
Patrick McHardy	5b54814022	net: use symbolic values for ndo_start_xmit() return codes Convert magic values 1 and -1 to NETDEV_TX_BUSY and NETDEV_TX_LOCKED respectively. 0 (NETDEV_TX_OK) is not changed to keep the noise down, except in very few cases where its in direct proximity to one of the other values. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-13 01:18:50 -07:00
Jarek Poplawski	728bf09827	pkt_sched: Use PSCHED_SHIFT in PSCHED time conversion Use PSCHED_SHIFT constant instead of '10' in PSCHED_US2NS() and PSCHED_NS2US() macros to enable changing this value later. Additionally use PSCHED_SHIFT in sch_hfsc SM_SHIFT and ISM_SHIFT definitions. This part of the patch is based on feedback from Patrick McHardy <kaber@trash.net>. Reported-by: Antonio Almeida <vexwek@gmail.com> Tested-by: Antonio Almeida <vexwek@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-09 05:25:29 -07:00
Minoru Usui	52ea3a56a3	cls_cgroup: Fix oops when user send improperly 'tc filter add' request I found a bug in cls_cgroup_change() in cls_cgroup.c. cls_cgroup_change() expected tca[TCA_OPTIONS] was set from user space properly, but tc in iproute2-2.6.29-1 (which I used) didn't set it. In the current source code of tc in git, it set tca[TCA_OPTIONS]. git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git If we always use a newest iproute2 in git when we use cls_cgroup, we don't face this oops probably. But I think, kernel shouldn't panic regardless of use program's behaviour. Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-09 04:03:09 -07:00
Eric Dumazet	adf30907d6	net: skb->dst accessors Define three accessors to get/set dst attached to a skb struct dst_entry skb_dst(const struct sk_buff skb) void skb_dst_set(struct sk_buff skb, struct dst_entry dst) void skb_dst_drop(struct sk_buff *skb) This one should replace occurrences of : dst_release(skb->dst) skb->dst = NULL; Delete skb->dst field Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 02:51:04 -07:00
Eric Dumazet	511c3f92ad	net: skb->rtable accessor Define skb_rtable(const struct sk_buff *skb) accessor to get rtable from skb Delete skb->rtable field Setting rtable is not allowed, just set dst instead as rtable is an alias. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-03 02:51:02 -07:00
David S. Miller	b2f8f7525c	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/forcedeth.c	2009-06-03 02:43:41 -07:00
Minoru Usui	12186be7d2	net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup This patch fixes a bug which unconfigured struct tcf_proto keeps chaining in tc_ctl_tfilter(), and avoids kernel panic in cls_cgroup_classify() when we use cls_cgroup. When we execute 'tc filter add', tcf_proto is allocated, initialized by classifier's init(), and chained. After it's chained, tc_ctl_tfilter() calls classifier's change(). When classifier's change() fails, tc_ctl_tfilter() does not free and keeps tcf_proto. In addition, cls_cgroup is initialized in change() not in init(). It accesses unconfigured struct tcf_proto which is chained before change(), then hits Oops. Signed-off-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Tested-by: Minoru Usui <usui@mxm.nes.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-06-02 02:17:34 -07:00
Paul Menage	e65fcfd63a	cls_cgroup: read classid atomically in classifier Avoid reading the unsynchronized value cs->classid multiple times, since it could change concurrently from non-zero to zero; this would result in the classifier returning a positive result with a bogus (zero) classid. Signed-off-by: Paul Menage <menage@google.com> Reviewed-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-26 20:47:02 -07:00
Eric Dumazet	08baf56108	net: txq_trans_update() helper We would like to get rid of netdev->trans_start = jiffies; that about all net drivers have to use in their start_xmit() function, and use txq->trans_start instead. This can be done generically in core network, as suggested by David. Some devices, (particularly loopback) dont need trans_start update, because they dont have transmit watchdog. We could add a new device flag, or rely on fact that txq->tran_start can be updated is txq->xmit_lock_owner is different than -1. Use a helper function to hide our choice. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-25 22:58:01 -07:00
Eric Dumazet	ab35cd4b8f	sch_teql: Use net_device internal stats We can slightly reduce size of teqlN structure, not duplicating stats structure in teql_master but using stats field from net_device.stats for tx_errors and from netdev_queue for tx_bytes/tx_packets/tx_dropped values. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-19 15:36:15 -07:00
David S. Miller	bb803cfbec	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/scsi/fcoe/fcoe.c	2009-05-18 21:08:20 -07:00
Eric Dumazet	c0f84d0d4b	sch_teql: should not dereference skb after ndo_start_xmit() It is illegal to dereference a skb after a successful ndo_start_xmit() call. We must store skb length in a local variable instead. Bug was introduced in 2.6.27 by commit `0abf77e55a` (net_sched: Add accessor function for packet length for qdiscs) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-18 15:12:31 -07:00
Eric Dumazet	9d21493b4b	net: tx scalability works : trans_start struct net_device trans_start field is a hot spot on SMP and high performance devices, particularly multi queues ones, because every transmitter dirties it. Is main use is tx watchdog and bonding alive checks. But as most devices dont use NETIF_F_LLTX, we have to lock a netdev_queue before calling their ndo_start_xmit(). So it makes sense to move trans_start from net_device to netdev_queue. Its update will occur on a already present (and in exclusive state) cache line, for free. We can do this transition smoothly. An old driver continue to update dev->trans_start, while an updated one updates txq->trans_start. Further patches could also put tx_bytes/tx_packets counters in netdev_queue to avoid dirtying dev->stats (vlan device comes to mind) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-17 20:55:16 -07:00
Li Zefan	cb1c4b71f6	cls_cgroup: remove unneeded cgroup_lock We can remove this lock here, since we are in cgroup write handler and thus the cgrp is guaranteed to be valid, and no lock is needed when writing a u32 variable. Signed-off-by: Li Zefan <lizf@cn.fujitsuc.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-17 11:59:48 -07:00
Patrick McHardy	6473990c7f	net-sched: fix bfifo default limit When no limit is given, the bfifo uses a default of tx_queue_len * mtu. Packets handled by qdiscs include the link layer header, so this should be taken into account, similar to what other qdiscs do. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-06 16:45:07 -07:00
Robert Love	d0ab8ff81b	net: Only store high 16 bits of kernel generated filter priorities The kernel should only be using the high 16 bits of a kernel generated priority. Filter priorities in all other cases only use the upper 16 bits of the u32 'prio' field of 'struct tcf_proto', but when the kernel generates the priority of a filter is saves all 32 bits which can result in incorrect lookup failures when a filter needs to be deleted or modified. Signed-off-by: Robert Love <robert.w.love@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-05-02 13:48:32 -07:00
Jarek Poplawski	8caf153974	net: sch_netem: Fix an inconsistency in ingress netem timestamps. Alex Sidorenko reported: "while experimenting with 'netem' we have found some strange behaviour. It seemed that ingress delay as measured by 'ping' command shows up on some hosts but not on others. After some investigation I have found that the problem is that skbuff->tstamp field value depends on whether there are any packet sniffers enabled. That is: - if any ptype_all handler is registered, the tstamp field is as expected - if there are no ptype_all handlers, the tstamp field does not show the delay" This patch prevents unnecessary update of tstamp in dev_queue_xmit_nit() on ingress path (with act_mirred) adding a check, so minimal overhead on the fast path, but only when sniffers etc. are active. Since netem at ingress seems to logically emulate a network before a host, tstamp is zeroed to trigger the update and pretend delays are from the outside. Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Tested-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-20 02:14:59 -07:00
Stephen Hemminger	1a31f2042e	netsched: Allow meta match on vlan tag on receive When vlan acceleration is used on receive, the vlan tag is maintained outside of the skb data. The existing vlan tag match only works on TX path because it uses vlan_get_tag which tests for VLAN_HW_TX_ACCEL. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-04-13 18:12:57 -07:00
Ilpo Järvinen	a0bffffc14	net/*: use linux/kernel.h swap() tcp_sack_swap seems unnecessary so I pushed swap to the caller. Also removed comment that seemed then pointless, and added include when not already there. Compile tested. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-21 13:36:17 -07:00
Jarek Poplawski	7cd0a63872	pkt_sched: Change misleading code in class delete. While looking for a possible reason of bugzilla report on HTB oops: http://bugzilla.kernel.org/show_bug.cgi?id=12858 I found the code in htb_delete calling htb_destroy_class on zero refcount is very misleading: it can suggest this is a common path, and destroy is called under sch_tree_lock. Actually, this can never happen like this because before deletion cops->get() is done, and after delete a class is still used by tclass_notify. The class destroy is always called from cops->put(), so without sch_tree_lock. This doesn't mean much now (since 2.6.27) because all vulnerable calls were moved from htb_destroy_class to htb_delete, but there was a bug in older kernels. The same change is done for other classful scheds, which, it seems, didn't have similar locking problems here. Reported-by: m0sia <m0sia@m0sia.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-15 20:00:19 -07:00
David S. Miller	508827ff0a	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tokenring/tmspci.c drivers/net/ucc_geth_mii.c	2009-03-05 02:06:47 -08:00
Jarek Poplawski	a883bf564e	pkt_sched: act_police: Fix a rate estimator test. A commit `c1b56878fb` "tc: policing requires a rate estimator" introduced a test which invalidates previously working configs, based on examples from iproute2: doc/actions/actions-general. This is too rigorous: a rate estimator is needed only when police's "avrate" option is used. Reported-by: Joao Correia <joaomiguelcorreia@gmail.com> Diagnosed-by: John Dykstra <john.dykstra1@gmail.com> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-03-04 17:38:10 -08:00
David S. Miller	aa4abc9bcc	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/wireless/iwlwifi/iwl-tx.c net/8021q/vlan_core.c net/core/dev.c	2009-03-01 21:35:16 -08:00
Jarek Poplawski	1844f74794	pkt_sched: sch_drr: Fix oops in drr_change_class. drr_change_class lacks a check for NULL of tca[TCA_OPTIONS], so oops is possible. Reported-by: Denys Fedoryschenko <denys@visp.net.lb> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-27 02:42:38 -08:00
Jarek Poplawski	149490f131	pkt_sched: sch_multiq: Change errno on non-multiqueue devices use. Current "RTNETLINK answers: Invalid argument" warning, while trying to add multiq qdisc to non-multiqueue device, isn't very helpful and some of these devs can be changed btw., so let's use a better errno. With feedback from Stephen Hemminger <shemminger@vyatta.com> Reported-by: Badalian Vyacheslav <slavon@bigtelecom.ru> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-10 00:11:21 -08:00
Jarek Poplawski	1224736d97	pkt_sched: sch_htb: Use workqueue to schedule after too many events. Patrick McHardy <kaber@trash.net> suggested using a workqueue instead of hrtimers to trigger netif_schedule() when there is a problem with setting exact time of this event: 'The differnce - yeah, it shouldn't make much, mainly wake up the qdisc earlier (but not too early) after "too many events" occured _and_ no further enqueue events wake up the qdisc anyways.' Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-01 01:13:22 -08:00
Jarek Poplawski	e82181de5e	pkt_sched: sch_htb: Warn on too many events. Let's get some info on possible config problems. This patch brings back an old warning, but is printed only once now. With feedback from Patrick McHardy <kaber@trash.net> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-01 01:13:05 -08:00
Jarek Poplawski	b00355db3f	pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler. Patrick McHardy <kaber@trash.net> suggested: > How about making this flag and the warning message (in a out-of-line > function) globally available? Other qdiscs (f.i. HFSC) can't deal with > inner non-work-conserving qdiscs as well. This patch uses qdisc->flags field of "suspected" child qdisc. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-02-01 01:12:42 -08:00
Jarek Poplawski	a73be04065	pkt_sched: sch_htb: Break all htb_do_events() after 2 jiffies Currently htb_do_events() breaks events recounting for a level after 2 jiffies, but there is no reason to repeat this for next levels and increase delays even more (with softirqs disabled). htb_dequeue_tree() can add to this too, btw. In such a case q->now time is invalid anyway. Thanks to Patrick McHardy for spotting an error around earlier version of this patch. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-12 21:54:40 -08:00
Jarek Poplawski	c085134719	pkt_sched: sch_htb: Consider used jiffies in htb_do_events() Next event time should consider jiffies used for recounting. Otherwise qdisc_watchdog_schedule() triggers hrtimer immediately with the event in the past, and may cause very high ksoftirqd cpu usage (if highres is on). There is also removed checking "event" for zero in htb_dequeue(): it's always true in this place. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2009-01-12 21:54:16 -08:00
Linus Torvalds	5fbbf5f648	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (84 commits) wimax: fix kernel-doc for debufs_dentry member of struct wimax_dev net: convert pegasus driver to net_device_ops bnx2x: Prevent eeprom set when driver is down net: switch kaweth driver to netdevops pcnet32: round off carrier watch timer i2400m/usb: wrap USB power saving in #ifdef CONFIG_PM wimax: testing for rfkill support should also test for CONFIG_RFKILL_MODULE wimax: fix kconfig interactions with rfkill and input layers wimax: fix '#ifndef CONFIG_BUG' layout to avoid warning r6040: bump release number to 0.20 r6040: warn about MAC address being unset r6040: check PHY status when bringing interface up r6040: make printks consistent with DRV_NAME gianfar: Fixup use of BUS_ID_SIZE mlx4_en: Returning real Max in get_ringparam mlx4_en: Consider inline packets on completion netdev: bfin_mac: enable bfin_mac net dev driver for BF51x qeth: convert to net_device_ops vlan: add neigh_setup dm9601: warn on invalid mac address ...	2009-01-08 14:25:41 -08:00

... 2 3 4 5 6 ...

914 Commits