This problem was reported by Moni Shoua <monis@mellanox.com> and Amir
Vadai <amirv@mellanox.com>:
When destroying a cm_id from a context of a work queue and if
the lap_state of this cm_id is IB_CM_LAP_SENT, we need to
release the reference of this id that was taken upon the send
of the LAP message. Otherwise, if the expected APR message
gets lost, it is only after a long time that the reference
will be released, while during that the work handler thread is
not available to process other things.
It turns out that we need to cancel any pending LAP messages whenever
we transition out of the IB_CM_ESTABLISH state. This occurs when
disconnecting - either sending or receiving a DREQ. It can also
happen in a corner case where we receive a REJ message after sending
an RTU, followed by a LAP. Add checks and cancel any outstanding LAP
messages in these three cases.
Canceling the LAP when sending a DREQ fixes the destroy problem
reported by Moni. When a cm_id is destroyed in the IB_CM_ESTABLISHED
state, it sends a DREQ to the remote side to notify the peer that the
connection is going away.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
When processing a SIDR REQ, the ib_cm allocates a new cm_id. The
refcount of the cm_id is initialized to 1. However, cm_process_work
will decrement the refcount after invoking all callbacks. The result
is that the cm_id will end up with refcount set to 0 by the end of the
sidr req handler.
If a user tries to destroy the cm_id, the destruction will proceed,
under the incorrect assumption that no other threads are referencing
the cm_id. This can lead to a crash when the cm callback thread tries
to access the cm_id.
This problem was noticed as part of a larger investigation with kernel
crashes in the rdma_cm when running on a real time OS.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Doug Ledford and Red Hat reported a crash when running the rdma_cm on
a real-time OS. The crash has the following call trace:
cm_process_work
cma_req_handler
cma_disable_callback
rdma_create_id
kzalloc
init_completion
cma_get_net_info
cma_save_net_info
cma_any_addr
cma_zero_addr
rdma_translate_ip
rdma_copy_addr
cma_acquire_dev
rdma_addr_get_sgid
ib_find_cached_gid
cma_attach_to_dev
ucma_event_handler
kzalloc
ib_copy_ah_attr_to_user
cma_comp
[ preempted ]
cma_write
copy_from_user
ucma_destroy_id
copy_from_user
_ucma_find_context
ucma_put_ctx
ucma_free_ctx
rdma_destroy_id
cma_exch
cma_cancel_operation
rdma_node_get_transport
rt_mutex_slowunlock
bad_area_nosemaphore
oops_enter
They were able to reproduce the crash multiple times with the
following details:
Crash seems to always happen on the:
mutex_unlock(&conn_id->handler_mutex);
as conn_id looks to have been freed during this code path.
An examination of the code shows that a race exists in the request
handlers. When a new connection request is received, the rdma_cm
allocates a new connection identifier. This identifier has a single
reference count on it. If a user calls rdma_destroy_id() from another
thread after receiving a callback, rdma_destroy_id will proceed to
destroy the id and free the associated memory. However, the request
handlers may still be in the process of running. When control returns
to the request handlers, they can attempt to access the newly created
identifiers.
Fix this by holding a reference on the newly created rdma_cm_id until
the request handler is through accessing it.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
* 'fixes-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: make sure MAYDAY_INITIAL_TIMEOUT is at least 2 jiffies long
workqueue, freezer: unify spelling of 'freeze' + 'able' to 'freezable'
workqueue: wake up a worker when a rescuer is leaving a gcwq
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
drm/radeon/kms: add missing frac fb div flag for dce4+
drm/radeon/kms: do not reject X16 and Y16X16 floating-point formats on r300
drm/nouveau: fix suspend/resume on GPUs that don't have PM support
drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup()
drm/nv40: fix tiling-related setup for a number of chipsets
drm/nouveau: fix non-EDIDful native mode selection
drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.
drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode.
drm/nv10: Fix crash when allocating a BO larger than half the available VRAM.
There is a double completion associated with error handling for RC QPs.
The sequence is:
- The do_rc_ack() routine fields an RNR nack and there are 0
rnr_retries configured on the QP.
- qib_error_qp() stops the pending timer
- qib_rc_send_complete() is called from sdma_complete()
- qib_rc_send_complete() starts the timer because the msb of the psn
just completed says an ack is needed.
- a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
- rc_timeout() calls qib_restart_rc()
- qib_restart_rc() calls qib_send_complete() with a
IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
past
The fix avoids starting the timer since another packet will never
arrive.
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Otherwise we fail to properly suspend/resume all of the emulated devices.
Something between 2.6.38-rc2 and rc3 appears to have exposed this
issue, but it's always been wrong not to do this.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy@goop.org>
* 'nouveau/drm-nouveau-next' of /ssd/git/drm-nouveau-next:
drm/nouveau: fix suspend/resume on GPUs that don't have PM support
drm/nouveau: flips/flipd need to always set 'evict' for move_accel_cleanup()
drm/nv40: fix tiling-related setup for a number of chipsets
drm/nouveau: fix non-EDIDful native mode selection
drm/nouveau: Fix detection of DDC-based LVDS on DCB15 boards.
drm/nv04-nv40: Fix NULL dereference when we fail to find an LVDS native mode.
drm/nv10: Fix crash when allocating a BO larger than half the available VRAM.
The fixed ref/post dividers are set by the AdjustPll table
rather than the ss info table on dce4+. Make sure we enable
the fractional feedback dividers when using a fixed post
or ref divider on them as well.
Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=29272
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
We free the temporary binding before leaving this function, so we also have
to wait for the move to actually complete.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Due to the default case handling the older chipsets, a bunch of the newer
ones ended up having the wrong tiling regs used. This commit switches the
default case to handle the newest chipsets.
This also makes nv4e touch the "extra" tiling regs. "nv" doesn't touch
them for C51 but traces of the NVIDIA binary driver show it being done
there.
I couldn't find NV41/NV45 traces to confirm the behaviour there, but an
educated guess was taken at each of them.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
The DRM core fills this value, but at too late a stage for this to work,
possibly resulting in an undesirable mode being selected.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reported-by: Alex Buell <alex.buell@munted.org.uk>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reported-by: Alex Buell <alex.buell@munted.org.uk>
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
There are two spellings in use for 'freeze' + 'able' - 'freezable' and
'freezeable'. The former is the more prominent one. The latter is
mostly used by workqueue and in a few other odd places. Unify the
spelling to 'freezable'.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Steven Whitehouse <swhiteho@redhat.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: set flow handler for secondary interrupt controller of 5249
m68knommu: remove use of IRQ_FLG_LOCK from 68360 platform support
m68knommu: fix dereference of port.tty
m68knommu: add missing linker __modver section
m68knommu: fix mis-named variable int set_irq_chip loop
m68knommu: add optimize memmove() function
m68k: remove arch specific non-optimized memcmp()
m68knommu: fix use of un-defined _TIF_WORK_MASK
m68knommu: Rename m548x_wdt.c to m54xx_wdt.c
m68knommu: fix m548x_wdt.c compilation after headers renaming
m68knommu: Remove dependencies on nonexistent M68KNOMMU
The struct_tty associated with a port is now a direct pointer
from within the local private driver info struct. So fix all uses
of it.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
* 'drm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6: (27 commits)
drm/radeon/kms: hopefully fix pll issues for real (v3)
drm/radeon/kms: add bounds checking to avivo pll algo
drm: fix wrong usages of drm_device in DRM Developer's Guide
drm/radeon/kms: fix a few more atombios endian issues
drm/radeon/kms: improve 6xx/7xx CS error output
drm/radeon/kms: check AA resolve registers on r300
drm/radeon/kms: fix tracking of BLENDCNTL, COLOR_CHANNEL_MASK, and GB_Z on r300
drm/radeon/kms: use linear aligned for evergreen/ni bo blits
drm/radeon/kms: use linear aligned for 6xx/7xx bo blits
drm/radeon: fix race between GPU reset and TTM delayed delete thread.
drm/radeon/kms: evergreen/ni big endian fixes (v2)
drm/radeon/kms: 6xx/7xx big endian fixes
drm/radeon/kms: atombios big endian fixes
drm/radeon: 6xx/7xx non-kms endian fixes
drm/radeon/kms: optimize CS state checking for r100->r500
drm: do not leak kernel addresses via /proc/dri/*/vma
drm/radeon/kms: add connector table for mac g5 9600
radeon mkregtable: Add missing fclose() calls
drm/radeon/kms: fix interlaced modes on dce4+
drm/radeon: fix memory debugging since d961db75ce
...
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
pci: use security_capable() when checking capablities during config space read
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
ACPI / Video: Probe for output switch method when searching video devices.
ACPI / Wakeup: Enable button GPEs unconditionally during initialization
ACPI / ACPICA: Avoid crashing if _PRW is defined for the root object
ACPI: Fix acpi_os_read_memory() and acpi_os_write_memory() (v2)
Right now the platform device and its platform data is included in one big
struct which requires its custom ->release function. The problem with the
release function within the driver is that it might be called after the
driver was removed because someone was holding a reference to it and it
was not called right after platform_device_unregister(). So we also free
the platform device memory to which one might hold a reference.
This patch uses the normal pdev functions so this kind of race does not
occur.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (21 commits)
dmaengine: add slave-dma maintainer
dma: ipu_idmac: do not lose valid received data in the irq handler
dmaengine: imx-sdma: fix up param for the last BD in sdma_prep_slave_sg()
dmaengine: imx-sdma: correct sdmac->status in sdma_handle_channel_loop()
dmaengine: imx-sdma: return sdmac->status in sdma_tx_status()
dmaengine: imx-sdma: set sdmac->status to DMA_ERROR in err_out of sdma_prep_slave_sg()
dmaengine: imx-sdma: remove IMX_DMA_SG_LOOP handling in sdma_prep_slave_sg()
dmaengine i.MX dma: initialize dma capabilities outside channel loop
dmaengine i.MX DMA: do not initialize chan_id field
dmaengine i.MX dma: check sg entries for valid addresses and lengths
dmaengine i.MX dma: set maximum segment size for our device
dmaengine i.MX SDMA: reserve channel 0 by not registering it
dmaengine i.MX SDMA: initialize dma capabilities outside channel loop
dmaengine i.MX SDMA: do not initialize chan_id field
dmaengine i.MX sdma: check sg entries for valid addresses and lengths
dmaengine i.MX sdma: set maximum segment size for our device
DMA: PL08x: fix channel pausing to timeout rather than lockup
DMA: PL08x: fix infinite wait when terminating transfers
dmaengine: imx-sdma: fix inconsistent naming in sdma_assign_cookie()
dmaengine: imx-sdma: propagate error in sdma_probe() instead of returning 0
...
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, dmi, debug: Log board name (when present) in dmesg/oops output
x86, ioapic: Don't warn about non-existing IOAPICs if we have none
x86: Fix mwait_usable section mismatch
x86: Readd missing irq_to_desc() in fixup_irq()
x86: Fix section mismatch in LAPIC initialization
If the target device gets lost, this fix is needed, as it causes
negative unintended responses on basic I/O tests. If the target device
gets lost, the upstream qla2xxx driver returns
SCSI_MLQUEUE_TARGET_BUSY which causes an immediate retry without drop
in the number of allowed retries. This semantic change, as a result of
removing FC_DEVICE_LOST check is reasonable, as it only extends a
short transitional period, until the transport is called to notify
that the rport as lost (fc_remote_port_delete()). Once transport
notification is done, fc_remote_port_chkready() check will take over.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Madhuranath Iyengar <Madhu.Iyengar@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This reintroduces commit 47970b1b which was subsequently reverted
as f00eaeea. The original change was broken and caused X startup
failures and generally made privileged processes incapable of reading
device dependent config space. The normal capable() interface returns
true on success, but the LSM interface returns 0 on success. This thinko
is now fixed in this patch, and has been confirmed to work properly.
So, once again...Eric Paris noted that commit de139a3 ("pci: check caps
from sysfs file open to read device dependent config space") caused the
capability check to bypass security modules and potentially auditing.
Rectify this by calling security_capable() when checking the open file's
capabilities for config space reads.
Reported-by: Eric Paris <eparis@redhat.com>
Tested-by: Dave Young <hidave.darkstar@gmail.com>
Acked-by: James Morris <jmorris@namei.org>
Cc: Dave Airlie <airlied@gmail.com>
Cc: Alex Riesen <raa.lkml@gmail.com>
Cc: Sedat Dilek <sedat.dilek@googlemail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: James Morris <jmorris@namei.org>
The "Type 2" SMBIOS record that contains Board Name is not
strictly required and may be absent in the SMBIOS on some
platforms.
( Please note that Type 2 is not listed in Table 3 in Sec 6.2
("Required Structures and Data") of the SMBIOS v2.7
Specification. )
Use the Manufacturer Name (aka System Vendor) name.
Print Board Name only when it is present.
Before the fix:
(i) dmesg output: DMI: /ProLiant DL380 G6, BIOS P62 01/29/2011
(ii) oops output: Pid: 2170, comm: bash Not tainted 2.6.38-rc4+ #3 /ProLiant DL380 G6
After the fix:
(i) dmesg output: DMI: HP ProLiant DL380 G6, BIOS P62 01/29/2011
(ii) oops output: Pid: 2278, comm: bash Not tainted 2.6.38-rc4+ #4 HP ProLiant DL380 G6
Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: <stable@kernel.org> # .3x - good for debugging, please apply as far back as it applies cleanly
LKML-Reference: <20110214224423.2182.13929.sendpatchset@nchumbalkar.americas.hpqcorp.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The problematic boards have a recommended reference divider
to be used when spread spectrum is enabled on the laptop panel.
Enable the use of the recommended reference divider along with
the new pll algo.
v2: testing options
v3: When using the fixed reference divider with LVDS, prefer
min m to max p and use fractional feedback dividers.
Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=28852https://bugzilla.kernel.org/show_bug.cgi?id=24462https://bugzilla.kernel.org/show_bug.cgi?id=26552
MacbookPro issues reported by Justin Mattock <justinmattock@gmail.com>
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
* 'rtc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
RTC: Fix minor compile warning
RTC: Convert rtc drivers to use the alarm_irq_enable method
RTC: Fix rtc driver ioctl specific shortcutting
Currently when two or more buffers are queued by the camera driver
and so the double buffering is enabled in the idmac, we lose one
frame comming from CSI since the reporting of arrival of the first
frame is deferred by the DMAIC_7_EOF interrupt handler and reporting
of the arrival of the last frame is not done at all. So when requesting
N frames from the image sensor we actually receive N - 1 frames in
user space.
The reason for this behaviour is that the DMAIC_7_EOF interrupt
handler misleadingly assumes that the CUR_BUF flag is pointing to the
buffer used by the IDMAC. Actually it is not the case since the
CUR_BUF flag will be flipped by the FSU when the FSU is sending the
<TASK>_NEW_FRM_RDY signal when new frame data is delivered by the CSI.
When sending this singal, FSU updates the DMA_CUR_BUF and the
DMA_BUFx_RDY flags: the DMA_CUR_BUF is flipped, the DMA_BUFx_RDY
is cleared, indicating that the frame data is beeing written by
the IDMAC to the pointed buffer. DMA_BUFx_RDY is supposed to be
set to the ready state again by the MCU, when it has handled the
received data. DMAIC_7_CUR_BUF flag won't be flipped here by the
IPU, so waiting for this event in the EOF interrupt handler is wrong.
Actually there is no spurious interrupt as described in the comments,
this is the valid DMAIC_7_EOF interrupt indicating reception of the
frame from CSI.
The patch removes code that waits for flipping of the DMAIC_7_CUR_BUF
flag in the DMAIC_7_EOF interrupt handler. As the comment in the
current code denotes, this waiting doesn't help anyway. As a result
of this removal the reporting of the first arrived frame is not
deferred to the time of arrival of the next frame and the drivers
software flag 'ichan->active_buffer' is in sync with DMAIC_7_CUR_BUF
flag, so the reception of all requested frames works.
This has been verified on the hardware which is triggering the
image sensor by the programmable state machine, allowing to
obtain exact number of frames. On this hardware we do not tolerate
losing frames.
This patch also removes resetting the DMA_BUFx_RDY flags of
all channels in ipu_disable_channel() since transfers on other
DMA channels might be triggered by other running tasks and the
buffers should always be ready for data sending or reception.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Reviewed-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Tested-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* 'intel/drm-intel-fixes' of /ssd/git/drm-next:
drm/i915: Fix resume regression from 5d1d0cc
drm/i915/tv: Use polling rather than interrupt-based hotplug
drm/i915: Trigger modesetting if force-audio changes
drm/i915/sdvo: If we have an EDID confirm it matches the mode of the connection
drm/i915: Disable RC6 on Ironlake
drm/i915/lvds: Restore dithering on native modes for gen2/3
drm/i915: Invalidate TLB caches on SNB BLT/BSD rings
Makes debugging CS rejections much easier.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
This is an important security fix because we allowed arbitrary values
to be passed to AARESOLVE_OFFSET. This also puts the right buffer address
in the register.
Signed-off-by: Marek Olšák <maraeo@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not only is linear aligned supposedly more performant,
linear general is only supported by the CB in single
slice mode. The texture hardware doesn't support
linear general, but I think the hw automatically
upgrades it to linear aligned.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Not only is linear aligned supposedly more performant,
linear general is only supported by the CB in single
slice mode. The texture hardware doesn't support
linear general, but I think the hw automatically
upgrades it to linear aligned.
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
My evergreen has been in a remote PC for week and reset has never once
saved me from certain doom, I finally relocated to the box with a
serial cable and noticed an oops when the GPU resets, and the TTM
delayed delete thread tries to remove something from the GTT.
This stops the delayed delete thread from executing across the GPU
reset handler, and woot I can GPU reset now.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Based on 6xx/7xx endian fixes from Cédric Cano.
v2: fix typo in shader
Signed-off-by: Alex Deucher <alexdeucher@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>