and add negotiate request type to let set_credits know that
we are only on negotiate stage and no need to make a decision
about disabling echos and oplocks.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Use SMB2 header size values for allocation and memset because they
are bigger and suitable for both CIFS and SMB2.
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
Now we can process SMB2 messages: check message, get message id
and wakeup awaiting routines.
Signed-off-by: Pavel Shilovsky <piastryyy@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Commit 30d9049474 caused a regression
in cifs open codepath.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull networking changes from David S Miller:
1) Remove the ipv4 routing cache. Now lookups go directly into the FIB
trie and use prebuilt routes cached there.
No more garbage collection, no more rDOS attacks on the routing
cache. Instead we now get predictable and consistent performance,
no matter what the pattern of traffic we service.
This has been almost 2 years in the making. Special thanks to
Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who
have helped along the way.
I'm sure that with a change of this magnitude there will be some
kind of fallout, but such things ought the be simple to fix at this
point. Luckily I'm not European so I'll be around all of August to
fix things :-)
The major stages of this work here are each fronted by a forced
merge commit whose commit message contains a top-level description
of the motivations and implementation issues.
2) Pre-demux of established ipv4 TCP sockets, saves a route demux on
input.
3) TCP SYN/ACK performance tweaks from Eric Dumazet.
4) Add namespace support for netfilter L4 conntrack helpers, from Gao
Feng.
5) Add config mechanism for Energy Efficient Ethernet to ethtool, from
Yuval Mintz.
6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet.
7) Support for connection tracker helpers in userspace, from Pablo
Neira Ayuso.
8) Allow userspace driven TX load balancing functions in TEAM driver,
from Jiri Pirko.
9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with
embedded gotos.
10) TCP Small Queues, essentially minimize the amount of TCP data queued
up in the packet scheduler layer. Whereas the existing BQL (Byte
Queue Limits) limits the pkt_sched --> netdevice queuing levels,
this controls the TCP --> pkt_sched queueing levels.
From Eric Dumazet.
11) Reduce the number of get_page/put_page ops done on SKB fragments,
from Alexander Duyck.
12) Implement protection against blind resets in TCP (RFC 5961), from
Eric Dumazet.
13) Support the client side of TCP Fast Open, basically the ability to
send data in the SYN exchange, from Yuchung Cheng.
Basically, the sender queues up data with a sendmsg() call using
MSG_FASTOPEN, then they do the connect() which emits the queued up
fastopen data.
14) Avoid all the problems we get into in TCP when timers or PMTU events
hit a locked socket. The TCP Small Queues changes added a
tcp_release_cb() that allows us to queue work up to the
release_sock() caller, and that's what we use here too. From Eric
Dumazet.
15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits)
genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
r8169: revert "add byte queue limit support".
ipv4: Change rt->rt_iif encoding.
net: Make skb->skb_iif always track skb->dev
ipv4: Prepare for change of rt->rt_iif encoding.
ipv4: Remove all RTCF_DIRECTSRC handliing.
ipv4: Really ignore ICMP address requests/replies.
decnet: Don't set RTCF_DIRECTSRC.
net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.
ipv4: Remove redundant assignment
rds: set correct msg_namelen
openvswitch: potential NULL deref in sample()
tcp: dont drop MTU reduction indications
bnx2x: Add new 57840 device IDs
tcp: avoid oops in tcp_metrics and reset tcpm_stamp
niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
niu: Fix to check for dma mapping errors.
net: Fix references to out-of-scope variables in put_cmsg_compat()
net: ethernet: davinci_emac: add pm_runtime support
net: ethernet: davinci_emac: Remove unnecessary #include
...
Pull s390 changes from Martin Schwidefsky:
"No new functions, a few changes to make the code more robust, some
cleanups and bug fixes."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (21 commits)
s390/vtimer: rework virtual timer interface
s390/dis: Add the servc instruction to the disassembler.
s390/comments: unify copyright messages and remove file names
s390/lgr: Add init check to lgr_info_log()
s390/cpu init: use __get_cpu_var instead of per_cpu
s390/idle: reduce size of s390_idle_data structure
s390/idle: fix sequence handling vs cpu hotplug
s390/ap: resend enable adapter interrupt request.
s390/hypfs: Add missing get_next_ino()
s390/dasd: add shutdown action
s390/ipl: Fix ipib handling for "dumpreipl" shutdown action
s390/smp: make absolute lowcore / cpu restart parameter accesses more robust
s390/vmlogrdr: cleanup driver attribute usage
s390/vmlogrdr: cleanup device attribute usage
s390/ccwgroup: remove unused ccwgroup_device member
s390/cio/chp: cleanup attribute usage
s390/sigp: use sigp order code defines in assembly code
s390/smp: use sigp cpu status definitions
s390/smp/kvm: unifiy sigp definitions
s390/smp: remove redundant check
...
Pull blackfin changes from Bob Liu:
"The big changes are adding PM and HDMI support for bf60x, other
patches are various bug fix and code cleanup."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lliubbo/blackfin: (48 commits)
bf60x: fix build warning
PM: add BF60x flash suspend and resume support
blackfin: twi: read twi mmr via bfin_read macro
dpm: deepsleep: reserve stack
bf60x: cpufreq: fix anomaly 05000273
bf609: add adv7511 display support
blackfin: cplb-nompu: fix ROM cplb size for bf609-ezkit
bf60x: Add double fault, hardware error and NMI SEC handler
bf60x: update anomaly id in serial and twi driver headers.
bf60x: vs6624 pin update
bf60x: add default anomaly setting.
bf60x: update bf60x anomaly list.
bf60x: sec: Enable sec interrupt source priority configuration.
bf60x: sec: Clean up interrupt initialization code for SEC.
bf609: reuse bf5xx-i2s-pcm.c as i2s pcm driver
bf561: add capabilities in adv7183_inputs
bf609: convert vs6624 blank_clocks to black_pixels
blackfin: fix musb macro name
cleanup: sec and linkport only built on bf60x
bfin: pint: add pint suspend and resume
...
- remove use of legacy irqs which really wasn't needed
- add support for C66x SoC on EVMC6678 board
- clean up compiler warning
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABAgAGBQJQDI3MAAoJEOiN4VijXeFPKvUP/2zn/na3klMiJIV8K5fOh8U0
j4mf5vCAtk7p4Xd3iAzD2jAKKEbdGX4Cgy9iZAtvIMONdGEjF7B3+pSaC1aW9Uni
EHCs5S3bJVwDa2rkKOwWwBu/WVC5vcx1hsu34A3+75lTTWVzQ1Fz9twTxnfs9+hH
mVtclpucNZa3dn4CWT+1USD0JHVO/f4D1zsesGgPSOqwkYruaudRk48zepoH0j+S
aYZTgCc8u64laUYeIhPHXLanRUUFf3Ozuf1mfHCbZyJEF24Kulx+tYkrVakK7eoD
mTkZuPTydc4wk7dHFz0bxwd83kP9z6pyRvPF+0tLi7V0/DTC5w3MkDq4feR0KI9I
Rs8baHwnTwTEmwJ7vcJlxUijLIlNn3OJ3PDqYcH1GTckVuoLUFqGt0xfck8t9NPg
jky6FWY0cjS4ypstaq3MYXIC+5c15ICBx2MptLGev0CwSIJco8fdqReXq+KYoC0d
Ne/om10UO4jb4BDKyceFPXC4U/TngBIbaydqqiteqNHDSPClDVXPPy3yuuxPDWPi
BU6vf+Z91AGdb90xWQHnjKmQOQNX52Mpdg2qXHIhG47lvUY2Wa43zQsGkN+sixUq
BWVFat1bwb/dzwuDGueuhNR18gyeYhftfJ/qURRSyTRxVuhTMWh/CHCCqT2p4UOW
p4f7a0yaFDT++HPhSZ1x
=BAES
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming
Pull C6X changes from Mark Salter:
- remove use of legacy irqs which really wasn't needed
- add support for C66x SoC on EVMC6678 board
- clean up compiler warning
* tag 'for-linus' of git://linux-c6x.org/git/projects/linux-c6x-upstreaming:
C6X: clean up compiler warning
C6X: add basic support for TMS320C6678 SoC
C6X: remove dependence on legacy IRQs
C6X: remove megamod-pic requirement on direct-mapped core pic
For SMB2 protocol we can add more than one credit for one received
request: it depends on CreditRequest field in SMB2 response header.
Also we divide all requests by type: echoes, oplocks and others.
Each type uses its own slot pull.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Add mapping table for 32 bit SMB2 status codes to linux errors.
Note that SMB2 does not use DOS/OS2 errors (ever) so mapping to
DOS/OS2 errors as a common network subset (as we do for cifs)
doesn't help. And note that the set of status codes is much more
complete here.
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
and consider such codes as CIFS errors.
Reviewed-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
and rename variables around the code changes.
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Generate a stop condition after each message marked with I2C_M_STOP.
[JD: Add I2C_FUNC_PROTOCOL_MANGLING.]
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Adapter drivers might support only a subset of the SMBus operations
natively. Those drivers currently have to manually emulate unsupported
operations using I2C.
Make the i2c_smbus_xfer() function fall back to
i2c_smbus_xfer_emulated() when the adapter's .smbus_xfer() operation
returns -EOPNOTSUPP, like it already does when the .smbus_xfer()
operation isn't available at all.
[JD: Minor optimization.]
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
SCCB is a serial communication bus developed by Omnivision. Its 2-wire
mode is very similar to SMBus byte data transactions, but requires the
controller to ignore the ACK bit and to insert a stop condition after
each message.
Add a device SCCB flag and a message stop flag to be passed to
controller drivers.
[JD: Kill rogue definition in go7007 driver.]
Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Robofuzz OSIF is a generic USB/iIC interface that embeds an ATMega8A
AVR-RISC microcontroler.
The device is based upon Till Harbaum's i2c-tiny-usb and although it
enhances the original design with further functionnalities it still
maintain compatibility with it with respect to the USB/I2C interface.
Signed-off-by: Emmanuel Deloget <logout@free.fr>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Byte-by-byte transactions are used primarily for accessing I2C devices
with an SMBus controller. For these transactions, for each byte that is
read or written, the SMBus controller generates a BYTE_DONE IRQ. The isr
reads/writes the next byte, and clears the IRQ flag to start the next byte.
On the penultimate IRQ, the isr also sets the LAST_BYTE flag.
There is no locking around the cmd/len/count/data variables, since the
I2C adapter lock ensures there is never multiple simultaneous transactions
for the same device, and the driver thread never accesses these variables
while interrupts might be occurring.
The end result is faster I2C block read and write transactions.
Note: This patch has only been tested and verified by doing I2C read and
write block transfers on Cougar Point 6 Series PCH, as well as I2C read
block transfers on ICH5.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Enable interrupts on more devices. ICH5, ICH7(-M) and ICH10 have been
tested to work OK. ICH8 and ICH9 are expected to work just fine as
they are very close to ICH7 and ICH10.
Ultimately we want to enable this feature on at least every device
since the ICH5, but for now we limit the exposure. We'll enable it for
other devices if we don't get negative feedback.
As a bonus, let the user know when interrupts are used.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Daniel Kurtz <djkurtz@chromium.org>
Add a new 'feature' to i2c-i801 to enable using PCI interrupts.
When the feature is enabled, then an isr is installed for the device's
PCI IRQ.
An I2C/SMBus transaction is always terminated by one of the following
interrupt sources: FAILED, BUS_ERR, DEV_ERR, or on success: INTR.
When the isr fires for one of these cases, it sets the ->status variable
and wakes up the waitq. The waitq then saves off the status code, and
clears ->status (in preparation for some future transaction).
The SMBus controller generates an INTR irq at the end of each
transaction where INTREN was set in the HST_CNT register.
No locking is needed around accesses to priv->status since all writes to
it are serialized: it is only ever set once in the isr at the end of a
transaction, and cleared while no interrupts can occur. In addition, the
I2C adapter lock guarantees that entire I2C transactions for a single
adapter are always serialized.
For this patch, the INTREN bit is set only for SMBus block, byte and word
transactions, but not for I2C reads or writes. The use of the DS
(BYTE_DONE) interrupt with byte-by-byte I2C transactions is implemented in
a subsequent patch.
The interrupt feature has only been enabled for COUGARPOINT hardware.
In addition, it is disabled if SMBus is using the SMI# interrupt.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
(Based on earlier work by Daniel Kurtz.)
Come up with a consistent, driver-wide strategy for event polling. For
intermediate steps of byte-by-byte block transactions, check for
BYTE_DONE or any error flag being set. At the end of every transaction
(regardless of PEC being used), check for both BUSY being cleared and
INTR or any error flag being set. This ensures proper action for all
transaction types.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Daniel Kurtz <djkurtz@chromium.org>
Later patches enable interrupts. This preliminary patch removes the older
unsupported ENABLE_INT9 flag.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Rename the SMBHSTCNT register bit access constants to match the style of
other register bits.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
If an error is detected in the polling loop, abort the transaction and
return an error code.
* DEV_ERR is set if the device does not respond with an acknowledge, and
the SMBus controller times out (minimum 25ms).
* BUS_ERR is set if a bus arbitration collision is detected. In other
words, when the SMBus controller tries to generate a START condition, but
detects that the SMBDATA is being held low, usually by another SMBus/I2C
master.
* FAILED is only set if a transaction is stopped by software (using
the SMBHSTCNT KILL bit).
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Writing back the whole status register could clear unwanted bits.
In particular, it could clear the "INUSE_STS" bit, which is a
'hardware semaphore', that might be useful to use some day.
To prepare for this, let's ban writing back the whole status to register
HST_STS, of which this is the only instance.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
As a slight optimization, pull some logic out of the polling loop during
byte-by-byte transactions by just setting the I801_LAST_BYTE bit, as
defined in the i801 (PCH) datasheet, when reading the last byte of a
byte-by-byte I2C_SMBUS_READ.
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Using module_i2c_driver() makes the code smaller and cleaner.
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Based on a previous patch from Peter Meerwald.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Peter Meerwald <p.meerwald@bct-electronic.com>
Some AMD chipsets, such as the SP5100, have an auxiliary SMBus
controller with a second set of registers. This patch adds
support for this auxiliary controller.
Tested on ASUS KCMA-D8 motherboard.
Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Some chipsets have multiple sets of SMBus registers each controlling a
separate SMBus. Supporting these chipsets properly will require registering
multiple I2C adapters for one piix4.
The code to initialize and register the i2c_adapter structure has been
separated from piix4_probe and allows registration of a piix4 adapter
given its base address. Note that the i2c_adapter and i2c_piix4_adapdata
structures are now dynamically allocated.
Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Some chipsets have multiple sets of piix4-compatible SMBus registers.
Eliminating the global variable will allow these chipsets to be fully
supported.
Return value from piix4_setup and piix4_sb800_setup now returns the smba
value detected. This is stored in a struct i2c_piix4_adapdata. Thus
the global variable is eliminated.
Signed-off-by: Andrew Armenia <andrew@asquaredlabs.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Convert the drivers in drivers/i2c/busses/* to usemodule_pci_driver()
macro which makes the code smaller and a bit simpler.
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Cc: Rudolf Marek <r.marek@assembler.cz>
Cc: Olof Johansson <olof@lixom.net>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Cc: Tomoya MORINAGA <tomoya.rohm@gmail.com>
My old e-mail address won't be valid for much longer. Time to update it.
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Stefan reported a crash on a kernel before a3e5d1091c ("sched:
Don't call task_group() too many times in set_task_rq()"), he
found the reason to be that the multiple task_group()
invocations in set_task_rq() returned different values.
Looking at all that I found a lack of serialization and plain
wrong comments.
The below tries to fix it using an extra pointer which is
updated under the appropriate scheduler locks. Its not pretty,
but I can't really see another way given how all the cgroup
stuff works.
Reported-and-tested-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1340364965.18025.71.camel@twins
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Current load balance scheme requires only one cpu in a
sched_group (balance_cpu) to look at other peer sched_groups for
imbalance and pull tasks towards itself from a busy cpu. Tasks
thus pulled by balance_cpu could later get picked up by cpus
that are in the same sched_group as that of balance_cpu.
This scheme however fails to pull tasks that are not allowed to
run on balance_cpu (but are allowed to run on other cpus in its
sched_group). That can affect fairness and in some worst case
scenarios cause starvation.
Consider a two core (2 threads/core) system running tasks as
below:
Core0 Core1
/ \ / \
C0 C1 C2 C3
| | | |
v v v v
F0 T1 F1 [idle]
T2
F0 = SCHED_FIFO task (pinned to C0)
F1 = SCHED_FIFO task (pinned to C2)
T1 = SCHED_OTHER task (pinned to C1)
T2 = SCHED_OTHER task (pinned to C1 and C2)
F1 could become a cpu hog, which will starve T2 unless C1 pulls
it. Between C0 and C1 however, C0 is required to look for
imbalance between cores, which will fail to pull T2 towards
Core0. T2 will starve eternally in this case. The same scenario
can arise in presence of non-rt tasks as well (say we replace F1
with high irq load).
We tackle this problem by having balance_cpu move pinned tasks
to one of its sibling cpus (where they can run). We first check
if load balance goal can be met by ignoring pinned tasks,
failing which we retry move_tasks() with a new env->dst_cpu.
This patch modifies load balance semantics on who can move load
towards a given cpu in a given sched_domain.
Before this patch, a given_cpu or a ilb_cpu acting on behalf of
an idle given_cpu is responsible for moving load to given_cpu.
With this patch applied, balance_cpu can in addition decide on
moving some load to a given_cpu.
There is a remote possibility that excess load could get moved
as a result of this (balance_cpu and given_cpu/ilb_cpu deciding
*independently* and at *same* time to move some load to a
given_cpu). However we should see less of such conflicting
decisions in practice and moreover subsequent load balance
cycles should correct the excess load moved to given_cpu.
Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06CDB.2060605@linux.vnet.ibm.com
[ minor edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While load balancing, if all tasks on the source runqueue are pinned,
we retry after excluding the corresponding source cpu. However, loop counters
env.loop and env.loop_break are not reset before retrying, which can lead
to failure in moving the tasks. In this patch we reset env.loop and
env.loop_break to their inital values before we retry.
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06EEF.2090709@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Members of 'struct lb_env' are not in appropriate order to reuse compiler
added padding on 64bit architectures. In this patch we reorder those struct
members and help reduce the size of the structure from 96 bytes to 80
bytes on 64 bit architectures.
Suggested-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06DDE.7000403@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Traversing an entire package is not only expensive, it also leads to tasks
bouncing all over a partially idle and possible quite large package. Fix
that up by assigning a 'buddy' CPU to try to motivate. Each buddy may try
to motivate that one other CPU, if it's busy, tough, it may then try its
SMT sibling, but that's all this optimization is allowed to cost.
Sibling cache buddies are cross-wired to prevent bouncing.
4 socket 40 core + SMT Westmere box, single 30 sec tbench runs, higher is better:
clients 1 2 4 8 16 32 64 128
..........................................................................
pre 30 41 118 645 3769 6214 12233 14312
post 299 603 1211 2418 4697 6847 11606 14557
A nice increase in performance.
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1339471112.7352.32.camel@marge.simpson.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
cpuset_track_online_cpus() is no longer present. So remove the
outdated comment and replace it with reference to cpuset_update_active_cpus()
which is its equivalent.
Also, we don't lack memory hot-unplug anymore. And David Rientjes pointed
out how it is dealt with. So update that comment as well.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141700.3692.98192.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Separate out the cpuset related handling for CPU/Memory online/offline.
This also helps us exploit the most obvious and basic level of optimization
that any notification mechanism (CPU/Mem online/offline) has to offer us:
"We *know* why we have been invoked. So stop pretending that we are lost,
and do only the necessary amount of processing!".
And while at it, rename scan_for_empty_cpusets() to
scan_cpusets_upon_hotplug(), which is more appropriate considering how
it is restructured.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141650.3692.48637.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
At present, the functions that deal with cpusets during CPU/Mem hotplug
are quite messy, since a lot of the functionality is mixed up without clear
separation. And this takes a toll on optimization as well. For example,
the function cpuset_update_active_cpus() is called on both CPU offline and CPU
online events; and it invokes scan_for_empty_cpusets(), which makes sense
only for CPU offline events. And hence, the current code ends up unnecessarily
traversing the cpuset tree during CPU online also.
As a first step towards cleaning up those functions, encapsulate the cpuset
tree traversal in a helper function, so as to facilitate upcoming changes.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141635.3692.893.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
In the event of CPU hotplug, the kernel modifies the cpusets' cpus_allowed
masks as and when necessary to ensure that the tasks belonging to the cpusets
have some place (online CPUs) to run on. And regular CPU hotplug is
destructive in the sense that the kernel doesn't remember the original cpuset
configurations set by the user, across hotplug operations.
However, suspend/resume (which uses CPU hotplug) is a special case in which
the kernel has the responsibility to restore the system (during resume), to
exactly the same state it was in before suspend.
In order to achieve that, do the following:
1. Don't modify cpusets during suspend/resume. At all.
In particular, don't move the tasks from one cpuset to another, and
don't modify any cpuset's cpus_allowed mask. So, simply ignore cpusets
during the CPU hotplug operations that are carried out in the
suspend/resume path.
2. However, cpusets and sched domains are related. We just want to avoid
altering cpusets alone. So, to keep the sched domains updated, build
a single sched domain (containing all active cpus) during each of the
CPU hotplug operations carried out in s/r path, effectively ignoring
the cpusets' cpus_allowed masks.
(Since userspace is frozen while doing all this, it will go unnoticed.)
3. During the last CPU online operation during resume, build the sched
domains by looking up the (unaltered) cpusets' cpus_allowed masks.
That will bring back the system to the same original state as it was in
before suspend.
Ultimately, this will not only solve the cpuset problem related to suspend
resume (ie., restores the cpusets to exactly what it was before suspend, by
not touching it at all) but also speeds up suspend/resume because we avoid
running cpuset update code for every CPU being offlined/onlined.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20120524141611.3692.20155.stgit@srivatsabhat.in.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>