Commit Graph

159 Commits

Author SHA1 Message Date
Konstantin Khlebnikov 314e51b985 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
Linus Torvalds 7a9a2970b5 First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
  - A couple of SRP initiator fixes
  - Batch of nes hardware driver fixes
  - Fix for long-standing use-after-free crash in IPoIB
  - Other miscellaneous fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
 kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
 dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
 0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
 L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
 /Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
 YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
 Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
 Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
 60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
 ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
 fgN5n987wru91ewdX4gW
 =PAcy
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "First batch of InfiniBand/RDMA changes for the 3.7 merge window:
   - mlx4 IB support for SR-IOV
   - A couple of SRP initiator fixes
   - Batch of nes hardware driver fixes
   - Fix for long-standing use-after-free crash in IPoIB
   - Other miscellaneous fixes"

This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.

That said, I suspect the sequence in question should probably use
"mod_delayed_work()".  I just did the minimal "don't use deprecated
functions" fixup, though.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
  IB/qib: Fix local access validation for user MRs
  mlx4_core: Disable SENSE_PORT for multifunction devices
  mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
  mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
  IB/srp: Avoid having aborted requests hang
  IB/srp: Fix use-after-free in srp_reset_req()
  IB/qib: Add a qib driver version
  RDMA/nes: Fix compilation error when nes_debug is enabled
  RDMA/nes: Print hardware resource type
  RDMA/nes: Fix for crash when TX checksum offload is off
  RDMA/nes: Cosmetic changes
  RDMA/nes: Fix for incorrect MSS when TSO is on
  RDMA/nes: Fix incorrect resolving of the loopback MAC address
  mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
  mlx4_core: Trivial cleanups to driver log messages
  mlx4_core: Trivial readability fix: "0X30" -> "0x30"
  IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
  mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
  mlx4: Paravirtualize Node Guids for slaves
  mlx4: Activate SR-IOV mode for IB
  ...
2012-10-02 17:20:40 -07:00
Linus Torvalds 437589a74b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace changes from Eric Biederman:
 "This is a mostly modest set of changes to enable basic user namespace
  support.  This allows the code to code to compile with user namespaces
  enabled and removes the assumption there is only the initial user
  namespace.  Everything is converted except for the most complex of the
  filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
  nfs, ocfs2 and xfs as those patches need a bit more review.

  The strategy is to push kuid_t and kgid_t values are far down into
  subsystems and filesystems as reasonable.  Leaving the make_kuid and
  from_kuid operations to happen at the edge of userspace, as the values
  come off the disk, and as the values come in from the network.
  Letting compile type incompatible compile errors (present when user
  namespaces are enabled) guide me to find the issues.

  The most tricky areas have been the places where we had an implicit
  union of uid and gid values and were storing them in an unsigned int.
  Those places were converted into explicit unions.  I made certain to
  handle those places with simple trivial patches.

  Out of that work I discovered we have generic interfaces for storing
  quota by projid.  I had never heard of the project identifiers before.
  Adding full user namespace support for project identifiers accounts
  for most of the code size growth in my git tree.

  Ultimately there will be work to relax privlige checks from
  "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
  root in a user names to do those things that today we only forbid to
  non-root users because it will confuse suid root applications.

  While I was pushing kuid_t and kgid_t changes deep into the audit code
  I made a few other cleanups.  I capitalized on the fact we process
  netlink messages in the context of the message sender.  I removed
  usage of NETLINK_CRED, and started directly using current->tty.

  Some of these patches have also made it into maintainer trees, with no
  problems from identical code from different trees showing up in
  linux-next.

  After reading through all of this code I feel like I might be able to
  win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
  userns: Convert the ufs filesystem to use kuid/kgid where appropriate
  userns: Convert the udf filesystem to use kuid/kgid where appropriate
  userns: Convert ubifs to use kuid/kgid
  userns: Convert squashfs to use kuid/kgid where appropriate
  userns: Convert reiserfs to use kuid and kgid where appropriate
  userns: Convert jfs to use kuid/kgid where appropriate
  userns: Convert jffs2 to use kuid and kgid where appropriate
  userns: Convert hpfs to use kuid and kgid where appropriate
  userns: Convert btrfs to use kuid/kgid where appropriate
  userns: Convert bfs to use kuid/kgid where appropriate
  userns: Convert affs to use kuid/kgid wherwe appropriate
  userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
  userns: On ia64 deal with current_uid and current_gid being kuid and kgid
  userns: On ppc convert current_uid from a kuid before printing.
  userns: Convert s390 getting uid and gid system calls to use kuid and kgid
  userns: Convert s390 hypfs to use kuid and kgid where appropriate
  userns: Convert binder ipc to use kuids
  userns: Teach security_path_chown to take kuids and kgids
  userns: Add user namespace support to IMA
  userns: Convert EVM to deal with kuids and kgids in it's hmac computation
  ...
2012-10-02 11:11:09 -07:00
Linus Torvalds fdb2f9c2eb PCI changes for the 3.7 merge window:
Host bridge hotplug
     - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
     - Clear host bridge resource info to avoid issue when releasing (Yinghai Lu)
     - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
     - Use standard list ops for acpi_pci_drivers (Jiang Liu)
 
   Device hotplug
     - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang Liu)
     - Remove fakephp driver (Bjorn Helgaas)
     - Fix VGA ref count in hotplug remove path (Yinghai Lu)
     - Allow acpiphp to handle PCIe ports without native hotplug (Jiang Liu)
     - Implement resume regardless of pciehp_force param (Oliver Neukum)
     - Make pci_fixup_irqs() work after init (Thierry Reding)
 
   Miscellaneous
     - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
     - Factor out PCI Express Capability accessors (Jiang Liu)
     - Add pcibios_window_alignment() so powerpc EEH can use generic resource assignment (Gavin Shan)
     - Make pci_error_handlers const (Stephen Hemminger)
     - Cleanup drivers/pci/remove.c (Bjorn Helgaas)
     - Improve Vendor-Specific Extended Capability support (Bjorn Helgaas)
     - Use standard list ops for bus->devices (Bjorn Helgaas)
     - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
     - Reassign invalid bus number ranges (Intel DP43BF workaround) (Yinghai Lu)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQac4hAAoJEPGMOI97Hn6zjZYP/iaqU9kjmgTsBbSyzB4oApv/
 RRxo3I+ad9GF6XlMQfVAtyx1pgCD1gdGAtoDgGSCTqgdYD3CO10AxKU+yleAk1wo
 dbMxLifJNTrT3G1mZ/NL16yEGhCwvhfwzRtB1VoZmCT4lSApO/7cJkXl2DzHfA/i
 pmltOOiQCN8kbUcJbVPtUyTVPi2zl/8bsyCyTkS7YG0VXeGRM+ZUvPWZJ7MnWYYB
 5qoCdrw5ENCCiDQ9yw5SAfgL23b+0p6OI/x3Lkex0QQOWwSqGSiaHt4b7eitrC5b
 2eAJg32f/AzZke1YbKLMfdsL0VJP3GAswhDVHlgmo63rZkOZChm+97dgZ35Mcv5v
 kEXkWyBb1xJ3t8rZir6Qer9Iv2wOB+MkZ5qtU/Vf+l0wLQLYTrRVsKngrEDREONk
 dXbokp6iVSPeA1sTSdH9MmHlTUIj82ZLSGcxcjTsN8NWZjxx6g3rNx1uay+5MYOW
 4ET9zNu5snrAqN6N4Tb81gvtG8qYfxzdvVfrA9AaGKI6xxB7jkqgFJRp55JiEcFc
 x4cmWkhvdlhVsG2TQwFxYNfswOqD+7NCs6V4kSVZX6ezpDrH7I5VvcnnhstF7C8l
 KZul0EV7OW+kDK23pNe24lVP2xtOv6G8eK/3PmeKIXWl1V83nqre/oLufRzTfs+Z
 SxkILwY/MFpuCFteKE1t
 =haBu
 -----END PGP SIGNATURE-----

Merge tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI changes from Bjorn Helgaas:
 "Host bridge hotplug
    - Protect acpi_pci_drivers and acpi_pci_roots (Taku Izumi)
    - Clear host bridge resource info to avoid issue when releasing
      (Yinghai Lu)
    - Notify acpi_pci_drivers when hot-plugging host bridges (Jiang Liu)
    - Use standard list ops for acpi_pci_drivers (Jiang Liu)

  Device hotplug
    - Use pci_get_domain_bus_and_slot() to close hotplug races (Jiang
      Liu)
    - Remove fakephp driver (Bjorn Helgaas)
    - Fix VGA ref count in hotplug remove path (Yinghai Lu)
    - Allow acpiphp to handle PCIe ports without native hotplug (Jiang
      Liu)
    - Implement resume regardless of pciehp_force param (Oliver Neukum)
    - Make pci_fixup_irqs() work after init (Thierry Reding)

  Miscellaneous
    - Add pci_pcie_type(dev) and remove pci_dev.pcie_type (Yijing Wang)
    - Factor out PCI Express Capability accessors (Jiang Liu)
    - Add pcibios_window_alignment() so powerpc EEH can use generic
      resource assignment (Gavin Shan)
    - Make pci_error_handlers const (Stephen Hemminger)
    - Cleanup drivers/pci/remove.c (Bjorn Helgaas)
    - Improve Vendor-Specific Extended Capability support (Bjorn
      Helgaas)
    - Use standard list ops for bus->devices (Bjorn Helgaas)
    - Avoid kmalloc in pci_get_subsys() and pci_get_class() (Feng Tang)
    - Reassign invalid bus number ranges (Intel DP43BF workaround)
      (Yinghai Lu)"

* tag 'for-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (102 commits)
  PCI: acpiphp: Handle PCIe ports without native hotplug capability
  PCI/ACPI: Use acpi_driver_data() rather than searching acpi_pci_roots
  PCI/ACPI: Protect acpi_pci_roots list with mutex
  PCI/ACPI: Use acpi_pci_root info rather than looking it up again
  PCI/ACPI: Pass acpi_pci_root to acpi_pci_drivers' add/remove interface
  PCI/ACPI: Protect acpi_pci_drivers list with mutex
  PCI/ACPI: Notify acpi_pci_drivers when hot-plugging PCI root bridges
  PCI/ACPI: Use normal list for struct acpi_pci_driver
  PCI/ACPI: Use DEVICE_ACPI_HANDLE rather than searching acpi_pci_roots
  PCI: Fix default vga ref_count
  ia64/PCI: Clear host bridge aperture struct resource
  x86/PCI: Clear host bridge aperture struct resource
  PCI: Stop all children first, before removing all children
  Revert "PCI: Use hotplug-safe pci_get_domain_bus_and_slot()"
  PCI: Provide a default pcibios_update_irq()
  PCI: Discard __init annotations for pci_fixup_irqs() and related functions
  PCI: Use correct type when freeing bus resource list
  PCI: Check P2P bridge for invalid secondary/subordinate range
  PCI: Convert "new_id"/"remove_id" into generic pci_bus driver attributes
  xen-pcifront: Use hotplug-safe pci_get_domain_bus_and_slot()
  ...
2012-10-01 12:05:36 -07:00
Mike Marciniszyn c00aaa1a02 IB/qib: Fix local access validation for user MRs
Commit 8aac4cc3a9 ("IB/qib: RCU locking for MR validation") introduced
a bug that broke user post sends.  The proper validation of the MR
was lost in the patch.

This patch corrects that validation.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-01 09:59:26 -07:00
Dean Luick e20d583818 IB/qib: Add a qib driver version
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:36:29 -07:00
Eric W. Biederman d03ca5820d userns: Convert ipathfs to use GLOBAL_ROOT_UID and GLOBAL_ROOT_GID
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-09-21 03:13:19 -07:00
Mike Marciniszyn 4c3550057b IB/qib: Fix failure of compliance test C14-024#06_LocalPortNum
Commit 3236b2d469 ("IB/qib: MADs with misset M_Keys should return
failure") introduced a return code assignment that unfortunately
introduced an unconditional exit for the routine due to missing braces.

This patch adds the braces to correct the original patch.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-14 10:42:32 -07:00
Bjorn Helgaas 78890b5989 Merge commit 'v3.6-rc5' into next
* commit 'v3.6-rc5': (1098 commits)
  Linux 3.6-rc5
  HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
  Remove user-triggerable BUG from mpol_to_str
  xen/pciback: Fix proper FLR steps.
  uml: fix compile error in deliver_alarm()
  dj: memory scribble in logi_dj
  Fix order of arguments to compat_put_time[spec|val]
  xen: Use correct masking in xen_swiotlb_alloc_coherent.
  xen: fix logical error in tlb flushing
  xen/p2m: Fix one-off error in checking the P2M tree directory.
  powerpc: Don't use __put_user() in patch_instruction
  powerpc: Make sure IPI handlers see data written by IPI senders
  powerpc: Restore correct DSCR in context switch
  powerpc: Fix DSCR inheritance in copy_thread()
  powerpc: Keep thread.dscr and thread.dscr_inherit in sync
  powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
  powerpc/powernv: Always go into nap mode when CPU is offline
  powerpc: Give hypervisor decrementer interrupts their own handler
  powerpc/vphn: Fix arch_update_cpu_topology() return value
  ARM: gemini: fix the gemini build
  ...

Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	drivers/rapidio/devices/tsi721.c
2012-09-13 08:41:01 -06:00
Bjorn Helgaas 1959ec5f82 Merge branch 'pci/stephen-const' into next
* pci/stephen-const:
  make drivers with pci error handlers const
  scsi: make pci error handlers const
  netdev: make pci_error_handlers const
  PCI: Make pci_error_handlers const
2012-09-12 13:54:10 -06:00
Stephen Hemminger 1d3520357d make drivers with pci error handlers const
Covers the rest of the uses of pci error handler.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-09-07 16:35:00 -06:00
Jiang Liu 0921caf326 IB/qib: Use PCI Express Capability accessors
Use PCI Express Capability access functions to simplify qib driver.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
2012-08-23 10:11:15 -06:00
Roland Dreier c0369b296e Merge branches 'cma', 'ipoib', 'misc', 'mlx4', 'ocrdma', 'qib' and 'srp' into for-next 2012-08-16 09:38:39 -07:00
Julia Lawall 51fa3ca37e IB/qib: Fix error return code in qib_init_7322_variables()
Convert a 0 error return code to a negative one, as returned elsewhere
in the function.

A simplified version of the semantic match that finds this problem is
as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@

(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
    when != ret = e3
*if (x == NULL || ...)
{
  ... when != ret = e4
*  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 11:58:21 -07:00
Masanari Iida 142ad5db2b IB: Fix typos in infiniband drivers
Correct spelling typos in comments in drivers/infiniband.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 11:56:19 -07:00
Mike Marciniszyn 5d7fe4efbf IB/qib: Fix size of cc_supported_table_entries
Commit 36a8f01cd2 ("IB/qib: Add congestion control agent
implementation") tries to store the value 1984 in a u8, which leads to
truncation.  Fix this by making the member big enough.

This bug was detected by a smatch warning.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-29 20:26:10 -07:00
Mike Marciniszyn 7fac33014f IB/qib: checkpatch fixes
Elminate some simple_strto* usage.

checkpatch also noted pr_ conversations, which have been done as
recommended.  The pr_fmt() define is used to shorten line length.

Other multi-line string warnings are also elmininated.

Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19 11:20:04 -07:00
Mike Marciniszyn 36a8f01cd2 IB/qib: Add congestion control agent implementation
Add a congestion control agent in the driver that handles gets and
sets from the congestion control manager in the fabric for the
Performance Scale Messaging (PSM) library.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19 11:20:04 -07:00
Mike Marciniszyn 551ace124d IB/qib: Reduce sdma_lock contention
Profiling has shown that sdma_lock is proving a bottleneck for
performance. The situations include:
 - RDMA reads when krcvqs > 1
 - post sends from multiple threads

For RDMA read the current global qib_wq mechanism runs on all CPUs
and contends for the sdma_lock when multiple RMDA read requests are
fielded on differenct CPUs. For post sends, the direct call to
qib_do_send() from multiple threads causes the contention.

Since the sdma mechanism is per port, this fix converts the existing
workqueue to a per port single thread workqueue to reduce the lock
contention in the RDMA read case, and for any other case where the QP
is scheduled via the workqueue mechanism from more than 1 CPU.

For the post send case, This patch modifies the post send code to test
for a non empty sdma engine.  If the sdma is not idle the (now single
thread) workqueue will be used to trigger the send engine instead of
the direct call to qib_do_send().

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19 11:19:58 -07:00
Betty Dall f3331f88a4 IB/qib: Fix an incorrect log message
There is a cut-and-paste typo in the function qib_pci_slot_reset()
where it prints that the "link_reset" function is called rather than
the "slot_reset" function.  This makes the message misleading.

Signed-off-by: Betty Dall <betty.dall@hp.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-19 11:19:08 -07:00
Mike Marciniszyn 1fb9fed6d4 IB/qib: Fix QP RCU sparse warnings
Commit af061a644a ("IB/qib: Use RCU for qpn lookup") introduced sparse
warnings.

This patch corrects those issues.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-17 10:18:37 -07:00
Mike Marciniszyn 7e23017704 IB/qib: Fix sparse RCU warnings in qib_keys.c
Commit 8aac4cc3a9 ("IB/qib: RCU locking for MR validation") introduced
new sparse warnings in qib_keys.c.

Acked-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 10:01:56 -07:00
Mike Marciniszyn 8aac4cc3a9 IB/qib: RCU locking for MR validation
Profiling indicates that MR validation locking is expensive.  The MR
table is largely read-only and is a suitable candidate for RCU locking.

The patch uses RCU locking during validation to eliminate one
lock/unlock during that validation.

Reviewed-by: Mike Heinz <michael.william.heinz@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:19 -07:00
Mike Marciniszyn 6a82649f21 IB/qib: Avoid returning EBUSY from MR deregister
A timing issue can occur where qib_mr_dereg can return -EBUSY if the
MR use count is not zero.

This can occur if the MR is de-registered while RDMA read response
packets are being progressed from the SDMA ring.  The suspicion is
that the peer sent an RDMA read request, which has already been copied
across to the peer.  The peer sees the completion of his request and
then communicates to the responder that the MR is not needed any
longer.  The responder tries to de-register the MR, catching some
responses remaining in the SDMA ring holding the MR use count.

The code now uses a get/put paradigm to track MR use counts and
coordinates with the MR de-registration process using a completion
when the count has reached zero.  A timeout on the delay is in place
to catch other EBUSY issues.

The reference count protocol is as follows:
- The return to the user counts as 1
- A reference from the lk_table or the qib_ibdev counts as 1.
- Transient I/O operations increase/decrease as necessary

A lot of code duplication has been folded into the new routines
init_qib_mregion() and deinit_qib_mregion().  Additionally, explicit
initialization of fields to zero is now handled by kzalloc().

Also, duplicated code 'while.*num_sge' that decrements reference
counts have been consolidated in qib_put_ss().

Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:19 -07:00
Mike Marciniszyn 354dff1bd8 IB/qib: Fix UC MR refs for immediate operations
An MR reference leak exists when handling UC RDMA writes with
immediate data because we manipulate the reference counts as if the
operation had been a send.

This patch moves the last_imm label so that the RDMA write operations
with immediate data converge at the cq building code.  The copy/mr
deref code is now done correctly prior to the branch to last_imm.

Reviewed-by: Edward Mascarenhas <edward.mascarenhas@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:19 -07:00
Mike Marciniszyn 1c94283ddb IB/qib: Add cache line awareness to qib_qp and qib_devdata structures
This patch reorganizes the QP and devdata files to be more cache line aware.

qib_qp fields in particular are split into read-mostly, send, and receive fields.

qib_devdata fields are split into read-mostly and read/write fields

Testing has show that bidirectional tests improve by as much as 100%
with this patch.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 12:43:34 -07:00
Jim Foraker 3236b2d469 IB/qib: MADs with misset M_Keys should return failure
If a MAD is sent directly to the local HCA rather than placed on a QP
and the MAD fails M_Key checks, there is no means to generate a
timeout for the client, which may hang.  Instead we report
IB_MAD_RESULT_FAILURE, which operates the same for on-the-wire
packets, but will generate a send failure back to the client.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 12:42:44 -07:00
Jim Foraker 6199c8961e IB/qib: Fix M_Key lease timeout handling
If a port has an M_Key lease set, the M_Key protect bits set to 1, and
a SubnSet arrives with an invalid M_Key, an M_Key mismatch trap is
generated, the lease timer begins as expected, and eventually the
M_Key protect bits will be set back to 0 as per the spec.  However, if
any other SMP with an invalid M_Key arrives, the lease timer is expired
and the M_Key protect bits remain in force.

This is not according to to spec.  In particular, C14-17 says that
a lease timer that is underway is not affected by protection level
checks (ie, at protection level 1, a SubnGet with a bad M_Key may be
successful, but does not stop the timer), and C14-19 says that the timer
shall stop when a valid M_Key has been received.  C14-19 is the only
compliance statement that specifies a stopping condition for the timer.

This behavior is magnified if the port's Master SM LID attribute
points at itself.  In that case, the M_Key mismatch trap is sufficient to
expire the timer, and the mkey lease attribute is rendered useless.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 12:42:42 -07:00
Mitko Haralanov f665acb3cb IB/qib: Fix QLE734X link cycling
The SERDES was using the incorrect Frequency Loop Bandwidth setting
causing the link to cycle through the Physical link negotiation state
machine.  Fixing the Frequency Loop Bandwidth setting in the SERDES
helps the link come up faster and more reliably.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 12:39:26 -07:00
Mitko Haralanov 6ceaadee34 IB/qib: Display correct value for number of contexts
A "fix" for a bug with the number of contexts on a single-port board
caused the calculation to be off by one, which causes problems with
the upper layers.  The same problem exists for number of free
contexts, which is also fixed here.

Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 12:39:04 -07:00
Todd Rimmer 4ccf28a26c IB/qib: Correct ordering of reregister vs. port active events
When a port first goes active with SMA Set(PortInfo) and reregister
bit set, the driver sends up the reregister event followed by a port
active event.

The problem is that in response to reregister event most apps try to
issue a SA query of some sort, but that fails because port is not
active.

The qib driver needs to a trivial change to correct this behavior.

This issue has been there for a while; however the recent serdes work
has probably made the delay between the reregister event and the
active event larger and hence opened the race far enough so that its
being seen more often.

The patch also changes the clientrereg local to a u8 and saves off the
rereg bit into it.  The code following the nested subn_get_portinfo()
now restores that bit per o14-12.2.1 with a logical OR from that copy.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 12:38:28 -07:00
Mike Marciniszyn bb77a07723 IB/qib: Optimize pio ack buffer allocation
This patch optimizes pio buffer allocation in the kernel.

For qib, kernel pio buffers are used for sending acks.  The code to
allocate the buffer would always start at 0 until it found a buffer.

This means that an average of 64 comparisions were done on each
allocate, since the busy bit won't be cleared until the bits are
refreshed when buffers are exhausted.

This patch adds two new fields in the devdata struct, last_pio and
min_kernel_pio.  last_pio is the last buffer that was allocated.
min_kernel_pio is the lowest potential available buffer.

min_kernel_pio is modifed as contexts are allocated and deallocted.

Reviewed-by: Ramkrishna Vepa <ramkrishna.vepa@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 12:37:03 -07:00
Mike Marciniszyn cca195a168 IB/qib: Add prefetch for eager buffers
Add a prefetch call when a packet has been stored.  The nature of the
prefetch is correctly determined by the alternatives mechanism.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 12:36:18 -07:00
Roland Dreier f0e88aeb19 Merge branches 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iser', 'mad', 'nes', 'qib', 'srp' and 'srpt' into for-next 2012-03-19 09:50:33 -07:00
Or Gerlitz d927d505c5 IB: Change CQE "csum_ok" field to a bit flag
Use a bit in wc_flags rather then a whole integer to hold the
"checksum OK" flag.  By itself, this change doesn't reduce the size of
struct ib_wc on 64bit machines -- it stays on 56 bytes because of
padding.  However, it will allow to add more fields in the future
without enlarging the struct.  Also, it will let us have a unified
approach with future libibverbs checksum offload reporting, because a
bit flag doesn't break the library ABI.

This patch was suggested during conversation with Liran Liss
<liranl@mellanox.com>.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-08 12:34:27 -08:00
Mike Marciniszyn 520b3ee705 IB/qib: Avoid filtering LID on SMA portinfo
The current get portinfo handling filters the LID being sent,
changing zero to 0xffff.

This causes OpenSM to log excessive warning messages.

Reviewed-by: Edward Mascarenhas <edward.mascarenhas@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-25 17:45:50 -08:00
Mike Marciniszyn a778f3fddc IB/qib: Add logic for affinity hint
Call irq_set_affinity_hint() to give userspace programs such as
irqbalance the information to be able to distribute qib interrupts
appropriately.

The logic allocates all non-receive interrupts to the first CPU local
to the HCA.  Receive interrupts are allocated round robin starting
with the second CPU local to the HCA with potential wrap back to the
second CPU.

This patch also adds a refinement to the name registered for MSI-X
interrupts so that user level scripts can determine the device
associated with the IRQs when there are multiple HCAs with a
potentially different set of local CPUs.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-25 17:45:49 -08:00
Mike Marciniszyn b6bfefb041 IB/qib: Roll back PCIe tuning change
Commit 8d4548f2b ("IB/qib: Default some module parameters optimally")
introduced an issue with older root complexes.  They cannot handle the
pcie_caps of 0x51 (MaxReadReq 4096, MaxPayload=256).

A typical diagnostic in this situation reported by syslog contains
the text:

  [PCIe Poisoned TLP][Send DMA memory read]

Restore the module paramter default to zero with will avoid any
changes in the root complex.

Reviewed-by: Mark Debbage <mark.debbage@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-27 10:03:38 -08:00
Julia Lawall 0f3696eb21 IB/qib: Use GFP_ATOMIC when locks are held
alloc_dummy_hdrq() is called with locks held and thus should not use
GFP_KERNEL.

The semantic patch that makes this report is available in
scripts/coccinelle/locks/call_kern.cocci.

Signed-off-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-27 09:59:22 -08:00
Linus Torvalds 48fa57ac2c infiniband changes for 3.3 merge window
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPBQQDAAoJEENa44ZhAt0hy1EP/A/dz741mEd2QZxYHgloK8XU
 sSoHvq+vDxHnOvBrDuaHXT47FoY+OSVE+ESeJJQJ9L+B6g3yacP3hNSIcguXFs8Z
 v011AZAeRvQx3bLu5R9+eDL+YTyotkR0sl/huoLkSwlqrEqGA85eLqf5RSQdxYZf
 iC1ZXfg0KTrtb6rBvohNcijpmIVEe83SWfnD/ZuCGuWq++DyVJxzECnR7p5D8a9q
 eMJEnIKVEIpqkqXrPQr/blVSfGQL54QuUdYtoKAS8ZW6BzjIwCGUmKdoT1vaNqX5
 sIntxXMcgIgE2r0y/nDK+QIFS4U784eUevIC/LeunbhWUEQX05f3l6+V566/T9hX
 lvp5M6aonsSSvtqrVi6SF5rvSHFlwPpvAY3+jhjXKLpZ5OxMqf/ZlTN1xN4bin1A
 whGnznU+51Tjzph6Or8iXo5yExDUQhowX1Z3CYDmh/UqzKHRqaFAuiC071r8GZW3
 BEOV9yf/+qPsgtXAiO4jSKlLrOJbMgEI4BoITXTO9HvZH9dHGXDYLvULdHDmFaBi
 XLg5zcAjou24855miv/gnBQzDc0NWW184BGS9hPE9zmbQlJr7gA4zI0Eggtj+3MO
 7z/SLTrxKSfjZJR8Z3cGsnBjCs1VFqV+YQnTkyZYLORLf4F3RbDLe6aJQ+9WBA1g
 86J11MjrG30erg3gbXun
 =8ZmW
 -----END PGP SIGNATURE-----

Merge tag 'infiniband-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

infiniband changes for 3.3 merge window

* tag 'infiniband-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  rdma/core: Fix sparse warnings
  RDMA/cma: Fix endianness bugs
  RDMA/nes: Fix terminate during AE
  RDMA/nes: Make unnecessarily global nes_set_pau() static
  RDMA/nes: Change MDIO bus clock to 2.5MHz
  IB/cm: Fix layout of APR message
  IB/mlx4: Fix SL to 802.1Q priority-bits mapping for IBoE
  IB/qib: Default some module parameters optimally
  IB/qib: Optimize locking for get_txreq()
  IB/qib: Fix a possible data corruption when receiving packets
  IB/qib: Eliminate 64-bit jiffies use
  IB/qib: Fix style issues
  IB/uverbs: Protect QP multicast list
2012-01-08 14:05:48 -08:00
Mike Marciniszyn 8d4548f2b7 IB/qib: Default some module parameters optimally
Minimize the need for users to have to set module parameters to get
good performance.

The following two parameters are changed:
 - rcvhdrcnt to twice the rcvegrcnt
 - pcie_caps=0x51

The rcvhdrcnt at twice the egrcount allows the preemptive NAK code
during reception to function in 100% of the cases rather than a sender
jiffies-based timeout.

The pcie_caps default of 0x51 will set the proposed MaxPayload and
MaxReceiveReqest to 256 and 4096 respectively.  The capabilities on
the root complex will be used to limit those values.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-03 20:54:01 -08:00
Mike Marciniszyn 4894710951 IB/qib: Optimize locking for get_txreq()
The current code locks the QP s_lock, followed by the pending_lock, I
guess to to protect against the allocate failing.

This patch only locks the pending_lock, assuming that the empty case
is an exeception, in which case the pending_lock is dropped, and the
original code is executed.  This will save a lock of s_lock in the
normal case.

The observation is that the sdma descriptors will deplete at twice the
rate of txreq's, so this should be rare.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-03 20:53:31 -08:00
Ram Vepa eddfb67525 IB/qib: Fix a possible data corruption when receiving packets
Prevent a receive data corruption by ensuring that the write to update
the rcvhdrheadn register to generate an interrupt is at the very end
of the receive processing.

Signed-off-by: Ramkrishna Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-03 20:53:02 -08:00
Mike Marciniszyn 8482d5d1bc IB/qib: Eliminate 64-bit jiffies use
The qib driver makes use of the the 64-bit jiffies API.

Code inspection reveals that that version of the API is not really
required.  This patch converts to use the "normal" jiffies.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-03 20:52:12 -08:00
Mike Marciniszyn 865b64be86 IB/qib: Fix style issues
More style issues revealed with checkpatch.pl -f.

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-01-03 20:51:42 -08:00
Al Viro f9ec80061a infiniband: umode_t noise, including open-coded S_ISDIR()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:55:03 -05:00
Mike Marciniszyn 29d1b16145 IB/qib: Correct sense on freectxts increment and decrement
Commit 53ab1c6498 ("IB/qib: Correct nfreectxts for multiple HCAs")
reversed the increments and decrements of dd->nfreectxts.  Fix it.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-12-19 09:19:34 -08:00
Mike Marciniszyn 8ee887d74b IB/qib: Fix over-scheduling of QSFP work
Don't over-schedule QSFP work on driver initialization.  It could end
up being run simultaneously on two different CPUs resulting in bad
EEPROM reads.  In combination with setting the physical IB link state
prior to the IBC being brought out of reset, this can cause the link
state machine to start training early with wrong settings.

Signed-off-by: Mitko Haralanov <mitko@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-28 12:17:33 -08:00
Mike Marciniszyn 042f36e156 IB/qib: Don't use schedule_work()
It was mistakenly introduced by dde05cbdf8 ("IB/qib: Hold links
until tuning data is available").

Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-08 10:37:53 -08:00
Linus Torvalds 32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00