* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
block: kill loop_mutex
blktrace: Remove blk_fill_rwbs_rq.
block: blk-flush shouldn't call directly into q->request_fn() __blk_run_queue()
block: add @force_kblockd to __blk_run_queue()
block: fix kernel-doc format for blkdev_issue_zeroout
blk-throttle: Do not use kblockd workqueue for throtl work
Netlink message processing in the kernel is synchronous these days,
capabilities can be checked directly in security_netlink_recv() from
the current process.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Reviewed-by: James Morris <jmorris@namei.org>
[chrisw: update to include pohmelfs and uvesafb]
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Netlink message processing in the kernel is synchronous these days, the
session information can be collected when needed.
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we enable trace events to trace block actions, We use
blk_fill_rwbs_rq to analyze the corresponding actions
in request's cmd_flags, but we only choose the minor 2 bits
from it, so most of other flags(e.g, REQ_SYNC) are missing.
For example, with a sync write we get:
write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
8 [write_test]
Since now we have integrated the flags of both bio and request,
it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
blk_fill_rwbs_rq isn't needed any more.
With this patch, after a sync write we get:
write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
8 [write_test]
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This patch adds the support for retrieving the remote or peer DCBX
configuration via dcbnl for embedded DCBX stacks supporting the CEE DCBX
standard.
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These 2 patches add the support for retrieving the remote or peer DCBX
configuration via dcbnl for embedded DCBX stacks. The peer configuration
is part of the DCBX MIB and is useful for debugging and diagnostics of
the overall DCB configuration. The first patch add this support for IEEE
802.1Qaz standard the second patch add the same support for the older
CEE standard. Diff for v2 - the peer-app-info is CEE specific.
Signed-off-by: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
INIT_NETDEV_GROUP is needed by userspace, move it outside __KERNEL__
guards.
Signed-off-by: Vlad Dogaru <ddvlad@rosedu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd. Add @force_kblockd.
All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.
stable: This is prerequisite for fixing ide oops caused by the new
blk-flush implementation.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
ASoC supports keeping the audio subsysetm active over suspend in order
to support use cases such as audio passthrough from a cellular modem
with the main CPU suspended. Ensure that we don't power down the CODEC
when this is happening by checking to see if VMID is up and skipping
suspend and resume when it is. If the CODEC has suspended then it'll
turn VMID off before the core suspend() gets called.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Also fix a typo in the STATION_INFO_TX_BITRATE description
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
o Dominik Klein reported a system hang issue while doing some blkio
throttling testing.
https://lkml.org/lkml/2011/2/24/173
o Some tracing revealed that CFQ was not dispatching any more jobs as
queue unplug was not happening. And queue unplug was not happening
because unplug work was not being called as there was one throttling
work on same cpu which as not finished yet. And throttling work had not
finished as it was tyring to dispatch a bio to CFQ but all the request
descriptors were consume to it was put to sleep.
o So basically it is a cyclic dependecny between CFQ unplug work and
throtl dispatch work. Tejun suggested that use separate workqueue for
such cases.
o This patch uses a separate workqueue for throttle related work and
does not rely on kblockd workqueue anymore.
Cc: stable@kernel.org
Reported-by: Dominik Klein <dk@in-telegence.net>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Several ACPI drivers fail to build if CONFIG_NET is unset, because
they refer to things depending on CONFIG_THERMAL that in turn depends
on CONFIG_NET. However, CONFIG_THERMAL doesn't really need to depend
on CONFIG_NET, because the only part of it requiring CONFIG_NET is
the netlink interface in thermal_sys.c.
Put the netlink interface in thermal_sys.c under #ifdef CONFIG_NET
and remove the dependency of CONFIG_THERMAL on CONFIG_NET from
drivers/thermal/Kconfig.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Len Brown <lenb@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Luming Yu <luming.yu@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
V4: rebase to net-next-2.6
This patch removes the flag IFF_IN_NETPOLL, we don't need it any more since
we have netpoll_tx_running() now.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
PM: Make ACPI wakeup from S5 work again when CONFIG_PM_SLEEP is unset
Fixes sysfs config attribute to allow access to entire 16MB maintenance
space of RapidIO devices.
Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: Thomas Moll <thomas.moll@sysgo.com>
Cc: Micha Nelissen <micha@neli.hopto.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://neil.brown.name/md:
md: Fix - again - partition detection when array becomes active
Fix over-zealous flush_disk when changing device size.
md: avoid spinlock problem in blk_throtl_exit
md: correctly handle probe of an 'mdp' device.
md: don't set_capacity before array is active.
md: Fix raid1->raid0 takeover
This is a patch originated with Stefano Salsano and Fabio Ludovici.
It provides several alternative loss models for use with netem.
This patch adds two state machine based loss models.
See: http://netgroup.uniroma2.it/twiki/bin/view.cgi/Main/NetemCLG
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Rather than magic constant in code, expose the maximum size of
packet distribution table in API. In iproute2, q_netem defines
MAX_DIST as 16K already.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 074037e (PM / Wakeup: Introduce wakeup source objects and
event statistics (v3)) caused ACPI wakeup to only work if
CONFIG_PM_SLEEP is set, but it also worked for CONFIG_PM_SLEEP unset
before. This can be fixed by making device_set_wakeup_enable(),
device_init_wakeup() and device_may_wakeup() work in the same way
as before commit 074037e when CONFIG_PM_SLEEP is unset.
Reported-and-tested-by: Justin Maggard <jmaggard10@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
There are two cases when we call flush_disk.
In one, the device has disappeared (check_disk_change) so any
data will hold becomes irrelevant.
In the oter, the device has changed size (check_disk_size_change)
so data we hold may be irrelevant.
In both cases it makes sense to discard any 'clean' buffers,
so they will be read back from the device if needed.
In the former case it makes sense to discard 'dirty' buffers
as there will never be anywhere safe to write the data. In the
second case it *does*not* make sense to discard dirty buffers
as that will lead to file system corruption when you simply enlarge
the containing devices.
flush_disk calls __invalidate_devices.
__invalidate_device calls both invalidate_inodes and invalidate_bdev.
invalidate_inodes *does* discard I_DIRTY inodes and this does lead
to fs corruption.
invalidate_bev *does*not* discard dirty pages, but I don't really care
about that at present.
So this patch adds a flag to __invalidate_device (calling it
__invalidate_device2) to indicate whether dirty buffers should be
killed, and this is passed to invalidate_inodes which can choose to
skip dirty inodes.
flusk_disk then passes true from check_disk_change and false from
check_disk_size_change.
dm avoids tripping over this problem by calling i_size_write directly
rathher than using check_disk_size_change.
md does use check_disk_size_change and so is affected.
This regression was introduced by commit 608aeef17a which causes
check_disk_size_change to call flush_disk, so it is suitable for any
kernel since 2.6.27.
Cc: stable@kernel.org
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Andrew Patterson <andrew.patterson@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NeilBrown <neilb@suse.de>
Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"
Gurudas Pai reported the same bug on NFS.
The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode. For example:
thread1: going through a big range, stops in the middle of a vma and
stores the restart address in vm_truncate_count.
thread2: comes in with a small (e.g. single page) unmap request on
the same vma, somewhere before restart_address, finds that the
vma was already unmapped up to the restart address and happily
returns without doing anything.
Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value. This could go on forever without any of them being able to
finish.
Truncate and hole punching already serialize with i_mutex. Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers. In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.
This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.
[ We'll hopefully get rid of all this with the upcoming mm
preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
lockbreak" patch in particular. But that is for 2.6.39 ]
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Reported-by: Michael Leun <lkml20101129@newton.leun.net>
Reported-by: Gurudas Pai <gurudas.pai@oracle.com>
Tested-by: Gurudas Pai <gurudas.pai@oracle.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (33 commits)
Added support for usb ethernet (0x0fe6, 0x9700)
r8169: fix RTL8168DP power off issue.
r8169: correct settings of rtl8102e.
r8169: fix incorrect args to oob notify.
DM9000B: Fix PHY power for network down/up
DM9000B: Fix reg_save after spin_lock in dm9000_timeout
net_sched: long word align struct qdisc_skb_cb data
sfc: lower stack usage in efx_ethtool_self_test
bridge: Use IPv6 link-local address for multicast listener queries
bridge: Fix MLD queries' ethernet source address
bridge: Allow mcast snooping for transient link local addresses too
ipv6: Add IPv6 multicast address flag defines
bridge: Add missing ntohs()s for MLDv2 report parsing
bridge: Fix IPv6 multicast snooping by correcting offset in MLDv2 report
bridge: Fix IPv6 multicast snooping by storing correct protocol type
p54pci: update receive dma buffers before and after processing
fix cfg80211_wext_siwfreq lock ordering...
rt2x00: Fix WPA TKIP Michael MIC failures.
ath5k: Fix fast channel switching
tcp: undo_retrans counter fixes
...
Remove all instances of legacy or proposed-but-not-implemented code
that lives within an #if 0 ... #endif block. If some of it is needed
in the future it can recovered out of history, but there is no need
for it to clutter up the active code base.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Enhances TIPC link code to ignore an invalid link tolerance value
contained in an incoming LINK_PROTOCOL message, rather than
processing the value and potentially causing a divide-by-zero error.
Also add a compile-time check that catches attempts to redefine
TIPC's minimum link tolerance value in a manner that might result
in the same divide-by-zero error at run-time.
Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Use discrete setting ops for not updated drivers. This will not make
them conform to full G/SFEATURES semantics, though.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This is the Stochastic Fair Blue scheduler, based on work from :
W. Feng, D. Kandlur, D. Saha, K. Shin. Blue: A New Class of Active Queue
Management Algorithms. U. Michigan CSE-TR-387-99, April 1999.
http://www.thefengs.com/wuchang/blue/CSE-TR-387-99.pdf
This implementation is based on work done by Juliusz Chroboczek
General SFB algorithm can be found in figure 14, page 15:
B[l][n] : L x N array of bins (L levels, N bins per level)
enqueue()
Calculate hash function values h{0}, h{1}, .. h{L-1}
Update bins at each level
for i = 0 to L - 1
if (B[i][h{i}].qlen > bin_size)
B[i][h{i}].p_mark += p_increment;
else if (B[i][h{i}].qlen == 0)
B[i][h{i}].p_mark -= p_decrement;
p_min = min(B[0][h{0}].p_mark ... B[L-1][h{L-1}].p_mark);
if (p_min == 1.0)
ratelimit();
else
mark/drop with probabilty p_min;
I did the adaptation of Juliusz code to meet current kernel standards,
and various changes to address previous comments :
http://thread.gmane.org/gmane.linux.network/90225http://thread.gmane.org/gmane.linux.network/90375
Default flow classifier is the rxhash introduced by RPS in 2.6.35, but
we can use an external flow classifier if wanted.
tc qdisc add dev $DEV parent 1:11 handle 11: \
est 0.5sec 2sec sfb limit 128
tc filter add dev $DEV protocol ip parent 11: handle 3 \
flow hash keys dst divisor 1024
Notes:
1) SFB default child qdisc is pfifo_fast. It can be changed by another
qdisc but a child qdisc MUST not drop a packet previously queued. This
is because SFB needs to handle a dequeued packet in order to maintain
its virtual queue states. pfifo_head_drop or CHOKe should not be used.
2) ECN is enabled by default, unlike RED/CHOKe/GRED
With help from Patrick McHardy & Andi Kleen
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Juliusz Chroboczek <Juliusz.Chroboczek@pps.jussieu.fr>
CC: Stephen Hemminger <shemminger@vyatta.com>
CC: Patrick McHardy <kaber@trash.net>
CC: Andi Kleen <andi@firstfloor.org>
CC: John W. Linville <linville@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add proper RCU annotations/verbs to sk_wq and wq members
Fix __sctp_write_space() sk_sleep() abuse (and sock->wq access)
Fix sunrpc sk_sleep() abuse too
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We force particular alignment when we generate attribute structures
when generation MODULE_VERSION() data and we need to make sure that
this alignment is followed when we iterate over these structures,
otherwise we may crash on platforms whose natural alignment is not
sizeof(void *), such as m68k.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dmitry Torokhov <dtor@vmware.com>
[ There are more issues here, but the fixes are incredibly ugly - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The dcb_app protocol field is a __u32 however the 802.1Qaz
specification defines it as a 16 bit field. This patch brings
the structure inline with the spec making it a __u16.
CC: Shmulik Ravid <shmulikr@broadcom.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It was incorrectly introduced in d2730b2a6a. We
have already fixed function to use correct define, but forgot remove old one.
Signed-off-by: Rafał Miłecki <zajec5@gmail.com>
Cc: Gábor Stefanik <netrolller.3d@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
workqueue: wake up a worker when a rescuer is leaving a gcwq
When list debugging is enabled, we aim to readably show list corruption
errors, and the basic list_add/list_del operations end up having extra
debugging code in them to do some basic validation of the list entries.
However, "list_del_init()" and "list_move[_tail]()" ended up avoiding
the debug code due to how they were written. This fixes that.
So the _next_ time we have list_move() problems with stale list entries,
we'll hopefully have an easier time finding them..
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch re-enables UIE timer/polling emulation for rtc devices
that do not support alarm irqs.
CC: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Tested-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Uwe pointed out that my alarm based UIE emulation is not sufficient
to replace the older timer/polling based UIE emulation on devices
where there is no alarm irq. This causes rtc devices without alarms
to return -EINVAL to UIE ioctls. The fix is to re-instate the old
timer/polling method for devices without alarm irqs.
This patch reverts the following commits:
042620a018 - Remove UIE emulation
1daeddd596 - Cleanup removed UIE emulation declaration
b5cc8ca1c9 - Remove Kconfig symbol for UIE emulation
The emulation mode will still need to be wired-in with a following
patch before it will work.
CC: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
CC: Thomas Gleixner <tglx@linutronix.de>
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Introduce NETIF_F_RXCSUM to replace device-private flags for RX checksum
offload. Integrate it with ndo_fix_features.
ethtool_op_get_rx_csum() is removed altogether as nothing in-tree uses it.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This introduces a new framework to handle device features setting.
It consists of:
- new fields in struct net_device:
+ hw_features - features that hw/driver supports toggling
+ wanted_features - features that user wants enabled, when possible
- new netdev_ops:
+ feat = ndo_fix_features(dev, feat) - API checking constraints for
enabling features or their combinations
+ ndo_set_features(dev) - API updating hardware state to match
changed dev->features
- new ethtool commands:
+ ETHTOOL_GFEATURES/ETHTOOL_SFEATURES: get/set dev->wanted_features
and trigger device reconfiguration if resulting dev->features
changed
+ ETHTOOL_GSTRINGS(ETH_SS_FEATURES): get feature bits names (meaning)
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows to enable GRO even if RX csum is disabled. GRO will not
be used for packets without hardware checksum anyway.
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'. The former is the more prominent one. The latter is
mostly used by workqueue and in a few other odd places. Unify the
spelling to 'freezable'.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>