Commit Graph

29136 Commits

Author SHA1 Message Date
Greg Kroah-Hartman cda5e83fde Revert "driver core: move knode_driver into private structure"
This reverts commit 93e746db18.

Turns out that device_initialize shouldn't fail silently.
This series needs to be reworked in order to get into proper
shape.

Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-09 14:44:18 -08:00
Greg Kroah-Hartman 4db8e282f2 Revert "driver core: move knode_bus into private structure"
This reverts commit b9daa99ee5.

Turns out that device_initialize shouldn't fail silently.
This series needs to be reworked in order to get into proper
shape.

Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-09 14:32:46 -08:00
Linus Torvalds c40f6f8bbc Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu
* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-nommu:
  NOMMU: Support XIP on initramfs
  NOMMU: Teach kobjsize() about VMA regions.
  FLAT: Don't attempt to expand the userspace stack to fill the space allocated
  FDPIC: Don't attempt to expand the userspace stack to fill the space allocated
  NOMMU: Improve procfs output using per-MM VMAs
  NOMMU: Make mmap allocation page trimming behaviour configurable.
  NOMMU: Make VMAs per MM as for MMU-mode linux
  NOMMU: Delete askedalloc and realalloc variables
  NOMMU: Rename ARM's struct vm_region
  NOMMU: Fix cleanup handling in ramfs_nommu_get_umapped_area()
2009-01-09 14:00:58 -08:00
Linus Torvalds d7d717fa88 Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds:
  leds: ledtrig-timer - on deactivation hardware blinking should be disabled
  leds: Add suspend/resume to the core class
  leds: Add WM8350 LED driver
  leds: leds-pcs9532 - Move i2c work to a workqueque
  leds: leds-pca9532 - fix memory leak and properly handle errors
  leds: Fix wrong loop direction on removal in leds-ams-delta
  leds: fix Cobalt Raq LED dependency
  leds: Fix sparse warning in leds-ams-delta
  leds: Fixup kdoc comment to match parameter names
  leds: Make header variable naming consistent
  leds: eds-pca9532: mark pca9532_event() static
  leds: ALIX.2 LEDs driver
2009-01-09 13:55:37 -08:00
Linus Torvalds b64dc5a484 Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-backlight:
  backlight: Rename the corgi backlight driver to generic
  backlight: add support for Toppoly TDO35S series to tdo24m lcd driver
  backlight: Add suspend/resume support to the backlight core
  bd->props.brightness doesn't reflect the actual backlight level.
  backlight: Support VGA/QVGA mode switching in tosa_lcd
  backlight: Catch invalid input in sysfs attributes
  backlight: Value of ILI9320_RGB_IF2 register should not be hardcoded
  backlight: crbllcd_bl - Use platform_device_register_simple()
  backlight: progear_bl - Use platform_device_register_simple()
  backlight: hp680_bl - Use platform_device_register_simple()
2009-01-09 13:55:13 -08:00
Martin Bachem 3f75e84a6a mISDN: Add layer1 prim MPH_INFORMATION_REQ
MPH_INFORMATION provides full D- and B-Channel status overview

- new layer1 primitive: MPF_INFORMATON_REQ
- layer1 replies with MPH_INFORMATION_IND containing
   - dch->[state,Flags,nrbchan]
   - bch[]->[protocol,Flags]
- hardware driver should send MPH_INFORMATION_IND
  on all ph state changes and BChannel state changes to MISDN_ID_ANY

Signed-off-by: Martin Bachem <m.bachem@gmx.de>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:29 +01:00
Matthias Urlichs b36b654a7e mISDN: Create /sys/class/mISDN
Create /sys/class/mISDN and implement functions to handle
device renames.

Signed-Off-By: Matthias Urlichs <matthias@urlichs.de>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:28 +01:00
Andreas Eversberg 1b4d33121f mISDN: Fix deactivation, if peer IP is removed from l1oip instance.
Added GETPEER operation.
 Socket now checks if device is already busy at a differen mode.

Signed-off-by: Andreas Eversberg <andreas@eversberg.eu>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:27 +01:00
Martin Bachem 02282eee56 mISDN: Add ISDN_P_TE_UP0 / ISDN_P_NT_UP0
- new layer1 protocols for UP0 bus
- helper #defines to test for TE/NT/S0/E1/UP0

Signed-off-by: Martin Bachem <m.bachem@gmx.de>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:27 +01:00
Andreas Eversberg 3bd69ad197 mISDN: Add ISDN sample clock API to mISDN core
Add ISDN sample clock API to mISDN core (new file clock.c)
hfcmulti and mISDNdsp use clock API.

Signed-off-by: Andreas Eversberg <andreas@eversberg.eu>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:27 +01:00
Martin Bachem 1f28fa19d3 mISDN: Add E-Channel logging features
New prim PH_DATA_E_IND.

 - all E-ch frames are indicated by recv_Echannel(), which pushes E-Channel
   frames into dch's rqueue
 - if dchannel is opened with channel nr 0, no E-Channel logging
   is requested
 - if dchannel is opened with channel nr 1, E-Channel logging
   is requested. if layer1 does not support that, -EINVAL
   in return is appropriate

Signed-off-by: Martin Bachem <m.bachem@gmx.de>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:25 +01:00
Matthias Urlichs 837468d135 mISDN: Use struct device name field
struct device already has a 'name' member, use it.

Signed-off-by: Matthias Urlichs <matthias@urlichs.de>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:24 +01:00
Matthias Urlichs 8b6015f736 mISDN: Added an ioctl to change the device name
To get persistent device names with hotplug we need to rename devices
sometime.

Signed-off-by: Matthias Urlichs <matthias@urlichs.de>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:23 +01:00
Andreas Eversberg 8dd2f36f31 mISDN: Add feature via MISDN_CTRL_FILL_EMPTY to fill fifo if empty
This prevents underrun of fifo when filled and in case of an underrun it
prevents subsequent underruns due to jitter.
Improve dsp, so buffers are kept filled with a certain delay, so moderate
jitter will not cause underrun all the time -> the audio quality is highly
improved. tones are not interrupted by gaps anymore, except when CPU is
stalling or in high load.

Signed-off-by: Andreas Eversberg <andreas@eversberg.eu>
Signed-off-by: Karsten Keil <kkeil@suse.de>
2009-01-09 22:44:22 +01:00
Linus Torvalds 4ce5f24193 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile: (31 commits)
  powerpc/oprofile: fix whitespaces in op_model_cell.c
  powerpc/oprofile: IBM CELL: add SPU event profiling support
  powerpc/oprofile: fix cell/pr_util.h
  powerpc/oprofile: IBM CELL: cleanup and restructuring
  oprofile: make new cpu buffer functions part of the api
  oprofile: remove #ifdef CONFIG_OPROFILE_IBS in non-ibs code
  ring_buffer: fix ring_buffer_event_length()
  oprofile: use new data sample format for ibs
  oprofile: add op_cpu_buffer_get_data()
  oprofile: add op_cpu_buffer_add_data()
  oprofile: rework implementation of cpu buffer events
  oprofile: modify op_cpu_buffer_read_entry()
  oprofile: add op_cpu_buffer_write_reserve()
  oprofile: rename variables in add_ibs_begin()
  oprofile: rename add_sample() in cpu_buffer.c
  oprofile: rename variable ibs_allowed to has_ibs in op_model_amd.c
  oprofile: making add_sample_entry() inline
  oprofile: remove backtrace code for ibs
  oprofile: remove unused ibs macro
  oprofile: remove unused components in struct oprofile_cpu_buffer
  ...
2009-01-09 12:43:06 -08:00
Linus Torvalds 7c51d57e9d Merge git://git.infradead.org/mtd-2.6
* git://git.infradead.org/mtd-2.6: (67 commits)
  [MTD] [MAPS] Fix printk format warning in nettel.c
  [MTD] [NAND] add cmdline parsing (mtdparts=) support to cafe_nand
  [MTD] CFI: remove major/minor version check for command set 0x0002
  [MTD] [NAND] ndfc driver
  [MTD] [TESTS] Fix some size_t printk format warnings
  [MTD] LPDDR Makefile and KConfig
  [MTD] LPDDR extended physmap driver to support LPDDR flash
  [MTD] LPDDR added new pfow_base parameter
  [MTD] LPDDR Command set driver
  [MTD] LPDDR PFOW definition
  [MTD] LPDDR QINFO records definitions
  [MTD] LPDDR qinfo probing.
  [MTD] [NAND] pxa3xx: convert from ns to clock ticks more accurately
  [MTD] [NAND] pxa3xx: fix non-page-aligned reads
  [MTD] [NAND] fix nandsim sched.h references
  [MTD] [NAND] alauda: use USB API functions rather than constants
  [MTD] struct device - replace bus_id with dev_name(), dev_set_name()
  [MTD] fix m25p80 64-bit divisions
  [MTD] fix dataflash 64-bit divisions
  [MTD] [NAND] Set the fsl elbc ECCM according the settings in bootloader.
  ...

Fixed up trivial debug conflicts in drivers/mtd/devices/{m25p80.c,mtd_dataflash.c}
2009-01-09 12:37:15 -08:00
Linus Torvalds a3a798c88a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (94 commits)
  ACPICA: hide private headers
  ACPICA: create acpica/ directory
  ACPI: fix build warning
  ACPI : Use RSDT instead of XSDT by adding boot option of "acpi=rsdt"
  ACPI: Avoid array address overflow when _CST MWAIT hint bits are set
  fujitsu-laptop: Simplify SBLL/SBL2 backlight handling
  fujitsu-laptop: Add BL power, LED control and radio state information
  ACPICA: delete utcache.c
  ACPICA: delete acdisasm.h
  ACPICA: Update version to 20081204.
  ACPICA: FADT: Update error msgs for consistency
  ACPICA: FADT: set acpi_gbl_use_default_register_widths to TRUE by default
  ACPICA: FADT parsing changes and fixes
  ACPICA: Add ACPI_MUTEX_TYPE configuration option
  ACPICA: Fixes for various ACPI data tables
  ACPICA: Restructure includes into public/private
  ACPI: remove private acpica headers from driver files
  ACPI: reboot.c: use new acpi_reset interface
  ACPICA: New: acpi_reset interface - write to reset register
  ACPICA: Move all public H/W interfaces to new hwxface
  ...
2009-01-09 11:55:14 -08:00
Linus Torvalds d9e8a3a5b8 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (22 commits)
  ioat: fix self test for multi-channel case
  dmaengine: bump initcall level to arch_initcall
  dmaengine: advertise all channels on a device to dma_filter_fn
  dmaengine: use idr for registering dma device numbers
  dmaengine: add a release for dma class devices and dependent infrastructure
  ioat: do not perform removal actions at shutdown
  iop-adma: enable module removal
  iop-adma: kill debug BUG_ON
  iop-adma: let devm do its job, don't duplicate free
  dmaengine: kill enum dma_state_client
  dmaengine: remove 'bigref' infrastructure
  dmaengine: kill struct dma_client and supporting infrastructure
  dmaengine: replace dma_async_client_register with dmaengine_get
  atmel-mci: convert to dma_request_channel and down-level dma_slave
  dmatest: convert to dma_request_channel
  dmaengine: introduce dma_request_channel and private channels
  net_dma: convert to dma_find_channel
  dmaengine: provide a common 'issue_pending_all' implementation
  dmaengine: centralize channel allocation, introduce dma_find_channel
  dmaengine: up-level reference counting to the module level
  ...
2009-01-09 11:52:14 -08:00
Wolfgang Grandegger fefae48bf8 [MTD] CFI: remove major/minor version check for command set 0x0002
The NOR Flash memory K8P2815UQB from Samsung uses the major version
number '0'. Add a quirk to cope with it.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2009-01-09 12:16:28 +00:00
Len Brown ec9f168fcc Merge branch 'simplify_PRT' into release
Conflicts:
	drivers/acpi/pci_irq.c

Note that this merge disables
e1d3a90846
pci, acpi: reroute PCI interrupt to legacy boot interrupt equivalent

Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-09 03:41:08 -05:00
Len Brown b2576e1d44 Merge branch 'linus' into release 2009-01-09 03:39:43 -05:00
Len Brown 3cc8a5f4ba Merge branch 'suspend' into release 2009-01-09 03:38:15 -05:00
Len Brown d0302bc62a Merge branch 'misc' into release
Conflicts:
	include/acpi/acpixf.h

Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-09 03:37:48 -05:00
Len Brown e2f7a77728 ACPICA: hide private headers
Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-09 03:31:01 -05:00
Zhao Yakui 237889bf0a ACPI : Use RSDT instead of XSDT by adding boot option of "acpi=rsdt"
On some boxes there exist both RSDT and XSDT table. But unfortunately
sometimes there exists the following error when XSDT table is used:
   a. 32/64X address mismatch
   b. The 32/64X FACS address mismatch

   In such case the boot option of "acpi=rsdt" is provided so that
RSDT is tried instead of XSDT table when the system can't work well.

http://bugzilla.kernel.org/show_bug.cgi?id=8246

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
cc:Thomas Renninger <trenn@suse.de>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-09 01:41:58 -05:00
Len Brown 5b929aa1ae ACPICA: delete acdisasm.h
it is referenced only #ifdef ACPI_DISASSEMBLER,
which is never set by the kernel.

Signed-off-by: Len Brown <len.brown@intel.com>
2009-01-08 23:44:17 -05:00
Linus Torvalds 2150edc6c5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
  jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
  ext4: Remove "extents" mount option
  block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
  ext4: Make printk's consistently prefixed with "EXT4-fs: "
  ext4: Add sanity checks for the superblock before mounting the filesystem
  ext4: Add mount option to set kjournald's I/O priority
  jbd2: Submit writes to the journal using WRITE_SYNC
  jbd2: Add pid and journal device name to the "kjournald2 starting" message
  ext4: Add markers for better debuggability
  ext4: Remove code to create the journal inode
  ext4: provide function to release metadata pages under memory pressure
  ext3: provide function to release metadata pages under memory pressure
  add releasepage hooks to block devices which can be used by file systems
  ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
  ext4: Init the complete page while building buddy cache
  ext4: Don't allow new groups to be added during block allocation
  ext4: mark the blocks/inode bitmap beyond end of group as used
  ext4: Use new buffer_head flag to check uninit group bitmaps initialization
  ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
  ext4: code cleanup
  ...
2009-01-08 17:14:59 -08:00
Linus Torvalds cd764695b6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (45 commits)
  [SCSI] qla2xxx: Update version number to 8.03.00-k1.
  [SCSI] qla2xxx: Add ISP81XX support.
  [SCSI] qla2xxx: Use proper request/response queues with MQ instantiations.
  [SCSI] qla2xxx: Correct MQ-chain information retrieval during a firmware dump.
  [SCSI] qla2xxx: Collapse EFT/FCE copy procedures during a firmware dump.
  [SCSI] qla2xxx: Don't pollute kernel logs with ZIO/RIO status messages.
  [SCSI] qla2xxx: Don't fallback to interrupt-polling during re-initialization with MSI-X enabled.
  [SCSI] qla2xxx: Remove support for reading/writing HW-event-log.
  [SCSI] cxgb3i: add missing include
  [SCSI] scsi_lib: fix DID_RESET status problems
  [SCSI] fc transport: restore missing dev_loss_tmo callback to LLDD
  [SCSI] aha152x_cs: Fix regression that keeps driver from using shared interrupts
  [SCSI] sd: Correctly handle 6-byte commands with DIX
  [SCSI] sd: DIF: Fix tagging on platforms with signed char
  [SCSI] sd: DIF: Show app tag on error
  [SCSI] Fix error handling for DIF/DIX
  [SCSI] scsi_lib: don't decrement busy counters when inserting commands
  [SCSI] libsas: fix test for negative unsigned and typos
  [SCSI] a2091, gvp11: kill warn_unused_result warnings
  [SCSI] fusion: Move a dereference below a NULL test
  ...

Fixed up trivial conflict due to moving the async part of sd_probe
around in the async probes vs using dev_set_name() in naming.
2009-01-08 16:27:31 -08:00
Linus Torvalds 022992ee59 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
  regulator: fix kernel-doc warnings
  regulator: catch some registration errors
  regulator: Add basic DocBook manual
  regulator: Fix some kerneldoc rendering issues
  regulator: Add missing kerneldoc
  regulator: Clean up kerneldoc warnings
  regulator: Remove extraneous kerneldoc annotations
  regulator: init/link earlier
  regulator: move set_machine_constraints after regulator device initialization
  regulator: da903x: make da903x_is_enabled return 0 or 1
  regulator: da903x: add '\n' to error messages
  regulator: sysfs attribute reduction (v2)
  regulator: code shrink (v2)
  regulator: improved mode error checks
  regulator: enable/disable refcounting
  regulator: struct device - replace bus_id with dev_name(), dev_set_name()
2009-01-08 14:51:11 -08:00
Linus Torvalds 5fbbf5f648 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (84 commits)
  wimax: fix kernel-doc for debufs_dentry member of struct wimax_dev
  net: convert pegasus driver to net_device_ops
  bnx2x: Prevent eeprom set when driver is down
  net: switch kaweth driver to netdevops
  pcnet32: round off carrier watch timer
  i2400m/usb: wrap USB power saving in #ifdef CONFIG_PM
  wimax: testing for rfkill support should also test for CONFIG_RFKILL_MODULE
  wimax: fix kconfig interactions with rfkill and input layers
  wimax: fix '#ifndef CONFIG_BUG' layout to avoid warning
  r6040: bump release number to 0.20
  r6040: warn about MAC address being unset
  r6040: check PHY status when bringing interface up
  r6040: make printks consistent with DRV_NAME
  gianfar: Fixup use of BUS_ID_SIZE
  mlx4_en: Returning real Max in get_ringparam
  mlx4_en: Consider inline packets on completion
  netdev: bfin_mac: enable bfin_mac net dev driver for BF51x
  qeth: convert to net_device_ops
  vlan: add neigh_setup
  dm9601: warn on invalid mac address
  ...
2009-01-08 14:25:41 -08:00
Linus Torvalds 894bcdfb1a Merge branch 'for-linus' of git://neil.brown.name/md
* 'for-linus' of git://neil.brown.name/md:
  md: don't retry recovery of raid1 that fails due to error on source drive.
  md: Allow md devices to be created by name.
  md: make devices disappear when they are no longer needed.
  md: centralise all freeing of an 'mddev' in 'md_free'
  md: move allocation of ->queue from mddev_find to md_probe
  md: need another print_sb for mdp_superblock_1
  md: use list_for_each_entry macro directly
  md: raid0: make hash_spacing and preshift sector-based.
  md: raid0: Represent the size of strip zones in sectors.
  md: raid0 create_strip_zones(): Add KERN_INFO/KERN_ERR to printk's.
  md: raid0 create_strip_zones(): Make two local variables sector-based.
  md: raid0: Represent zone->zone_offset in sectors.
  md: raid0: Represent device offset in sectors.
  md: raid0_make_request(): Replace local variable block by sector.
  md: raid0_make_request(): Remove local variable chunk_size.
  md: raid0_make_request(): Replace chunksize_bits by chunksect_bits.
  md: use sysfs_notify_dirent to notify changes to md/sync_action.
  md: fix bitmap-on-external-file bug.
2009-01-08 14:03:34 -08:00
Alan Cox 871af1210f libata: Add 32bit PIO support
This matters for some controllers and in one or two cases almost doubles
PIO performance. Add a bmdma32 operations set we can inherit and activate
it for some controllers

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2009-01-08 16:34:27 -05:00
NeilBrown 4044ba58dd md: don't retry recovery of raid1 that fails due to error on source drive.
If a raid1 has only one working drive and it has a sector which
gives an error on read, then an attempt to recover onto a spare will
fail, but as the single remaining drive is not removed from the
array, the recovery will be immediately re-attempted, resulting
in an infinite recovery loop.

So detect this situation and don't retry recovery once an error
on the lone remaining drive is detected.

Allow recovery to be retried once every time a spare is added
in case the problem wasn't actually a media error.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:11 +11:00
NeilBrown efeb53c0e5 md: Allow md devices to be created by name.
Using sequential numbers to identify md devices is somewhat artificial.
Using names can be a lot more user-friendly.

Also, creating md devices by opening the device special file is a bit
awkward.

So this patch provides a new option for creating and naming devices.

Writing a name such as "md_home" to
    /sys/modules/md_mod/parameters/new_array
will cause an array with that name to be created.  It will appear in
/sys/block/ /proc/partitions and /proc/mdstat as 'md_home'.
It will have an arbitrary minor number allocated.

md devices that a created by an open are destroyed on the last
close when the device is inactive.
For named md devices, they will not be destroyed until the array
is explicitly stopped, either with the STOP_ARRAY ioctl or by
writing 'clear' to /sys/block/md_XXXX/md/array_state.

The name of the array must start 'md_' to avoid conflict with
other devices.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:10 +11:00
NeilBrown d3374825ce md: make devices disappear when they are no longer needed.
Currently md devices, once created, never disappear until the module
is unloaded.  This is essentially because the gendisk holds a
reference to the mddev, and the mddev holds a reference to the
gendisk, this a circular reference.

If we drop the reference from mddev to gendisk, then we need to ensure
that the mddev is destroyed when the gendisk is destroyed.  However it
is not possible to hook into the gendisk destruction process to enable
this.

So we drop the reference from the gendisk to the mddev and destroy the
gendisk when the mddev gets destroyed.  However this has a
complication.
Between the call
   __blkdev_get->get_gendisk->kobj_lookup->md_probe
and the call
   __blkdev_get->md_open

there is no obvious way to hold a reference on the mddev any more, so
unless something is done, it will disappear and gendisk will be
destroyed prematurely.

Also, once we decide to destroy the mddev, there will be an unlockable
moment before the gendisk is unlinked (blk_unregister_region) during
which a new reference to the gendisk can be created.  We need to
ensure that this reference can not be used.  i.e. the ->open must
fail.

So:
 1/  in md_probe we set a flag in the mddev (hold_active) which
     indicates that the array should be treated as active, even
     though there are no references, and no appearance of activity.
     This is cleared by md_release when the device is closed if it
     is no longer needed.
     This ensures that the gendisk will survive between md_probe and
     md_open.

 2/  In md_open we check if the mddev we expect to open matches
     the gendisk that we did open.
     If there is a mismatch we return -ERESTARTSYS and modify
     __blkdev_get to retry from the top in that case.
     In the -ERESTARTSYS sys case we make sure to wait until
     the old gendisk (that we succeeded in opening) is really gone so
     we loop at most once.

Some udev configurations will always open an md device when it first
appears.   If we allow an md device that was just created by an open
to disappear on an immediate close, then this can race with such udev
configurations and result in an infinite loop the device being opened
and closed, then re-open due to the 'ADD' even from the first open,
and then close and so on.
So we make sure an md device, once created by an open, remains active
at least until some md 'ioctl' has been made on it.  This means that
all normal usage of md devices will allow them to disappear promptly
when not needed, but the worst that an incorrect usage will do it
cause an inactive md device to be left in existence (it can easily be
removed).

As an array can be stopped by writing to a sysfs attribute
  echo clear > /sys/block/mdXXX/md/array_state
we need to use scheduled work for deleting the gendisk and other
kobjects.  This allows us to wait for any pending gendisk deletion to
complete by simply calling flush_scheduled_work().



Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:10 +11:00
Cheng Renquan cd2ac9321c md: need another print_sb for mdp_superblock_1
md_print_devices is called in two code path: MD_BUG(...), and md_ioctl
with PRINT_RAID_DEBUG.  it will dump out all in use md devices
information;

However, it wrongly processed two types of superblock in one:

The header file <linux/raid/md_p.h> has defined two types of superblock,
struct mdp_superblock_s (typedefed with mdp_super_t) according to md with
metadata 0.90, and struct mdp_superblock_1 according to md with metadata
1.0 and later,

These two types of superblock are very different,

The md_print_devices code processed them both in mdp_super_t, that would
lead to wrong informaton dump like:

	[ 6742.345877]
	[ 6742.345887] md:	**********************************
	[ 6742.345890] md:	* <COMPLETE RAID STATE PRINTOUT> *
	[ 6742.345892] md:	**********************************
	[ 6742.345896] md1: <ram7><ram6><ram5><ram4>
	[ 6742.345907] md: rdev ram7, SZ:00065472 F:0 S:1 DN:3
	[ 6742.345909] md: rdev superblock:
	[ 6742.345914] md:  SB: (V:0.90.0) ID:<42ef13c7.598c059a.5f9f1645.801e9ee6> CT:4919856d
	[ 6742.345918] md:     L5 S00065472 ND:4 RD:4 md1 LO:2 CS:65536
	[ 6742.345922] md:     UT:4919856d ST:1 AD:4 WD:4 FD:0 SD:0 CSUM:b7992907 E:00000001
	[ 6742.345924]      D  0:  DISK<N:0,(1,8),R:0,S:6>
	[ 6742.345930]      D  1:  DISK<N:1,(1,10),R:1,S:6>
	[ 6742.345933]      D  2:  DISK<N:2,(1,12),R:2,S:6>
	[ 6742.345937]      D  3:  DISK<N:3,(1,14),R:3,S:6>
	[ 6742.345942] md:     THIS:  DISK<N:3,(1,14),R:3,S:6>
	...
	[ 6742.346058] md0: <ram3><ram2><ram1><ram0>
	[ 6742.346067] md: rdev ram3, SZ:00065472 F:0 S:1 DN:3
	[ 6742.346070] md: rdev superblock:
	[ 6742.346073] md:  SB: (V:1.0.0) ID:<369aad81.00000000.00000000.00000000> CT:9a322a9c
	[ 6742.346077] md:     L-1507699579 S976570180 ND:48 RD:0 md0 LO:65536 CS:196610
	[ 6742.346081] md:     UT:00000018 ST:0 AD:131048 WD:0 FD:8 SD:0 CSUM:00000000 E:00000000
	[ 6742.346084]      D  0:  DISK<N:-1,(-1,-1),R:-1,S:-1>
	[ 6742.346089]      D  1:  DISK<N:-1,(-1,-1),R:-1,S:-1>
	[ 6742.346092]      D  2:  DISK<N:-1,(-1,-1),R:-1,S:-1>
	[ 6742.346096]      D  3:  DISK<N:-1,(-1,-1),R:-1,S:-1>
	[ 6742.346102] md:     THIS:  DISK<N:0,(0,0),R:0,S:0>
	...
	[ 6742.346219] md:	**********************************
	[ 6742.346221]

Here md1 is metadata 0.90.0, and md0 is metadata 1.2

After some more code to distinguish these two types of superblock, in this patch,

it will generate dump information like:

	[ 7906.755790]
	[ 7906.755799] md:	**********************************
	[ 7906.755802] md:	* <COMPLETE RAID STATE PRINTOUT> *
	[ 7906.755804] md:	**********************************
	[ 7906.755808] md1: <ram7><ram6><ram5><ram4>
	[ 7906.755819] md: rdev ram7, SZ:00065472 F:0 S:1 DN:3
	[ 7906.755821] md: rdev superblock (MJ:0):
	[ 7906.755826] md:  SB: (V:0.90.0) ID:<3fca7a0d.a612bfed.5f9f1645.801e9ee6> CT:491989f3
	[ 7906.755830] md:     L5 S00065472 ND:4 RD:4 md1 LO:2 CS:65536
	[ 7906.755834] md:     UT:491989f3 ST:1 AD:4 WD:4 FD:0 SD:0 CSUM:00fb52ad E:00000001
	[ 7906.755836]      D  0:  DISK<N:0,(1,8),R:0,S:6>
	[ 7906.755842]      D  1:  DISK<N:1,(1,10),R:1,S:6>
	[ 7906.755845]      D  2:  DISK<N:2,(1,12),R:2,S:6>
	[ 7906.755849]      D  3:  DISK<N:3,(1,14),R:3,S:6>
	[ 7906.755855] md:     THIS:  DISK<N:3,(1,14),R:3,S:6>
	...
	[ 7906.755972] md0: <ram3><ram2><ram1><ram0>
	[ 7906.755981] md: rdev ram3, SZ:00065472 F:0 S:1 DN:3
	[ 7906.755984] md: rdev superblock (MJ:1):
	[ 7906.755989] md:  SB: (V:1) (F:0) Array-ID:<5fbcf158:55aa:5fbe:9a79:1e939880dcbd>
	[ 7906.755990] md:    Name: "DG5:0" CT:1226410480
	[ 7906.755998] md:       L5 SZ130944 RD:4 LO:2 CS:128 DO:24 DS:131048 SO:8 RO:0
	[ 7906.755999] md:     Dev:00000003 UUID: 9194d744:87f7:a448:85f2:7497b84ce30a
	[ 7906.756001] md:       (F:0) UT:1226410480 Events:0 ResyncOffset:-1 CSUM:0dbcd829
	[ 7906.756003] md:         (MaxDev:384)
	...
	[ 7906.756113] md:	**********************************
	[ 7906.756116]

this md0 (metadata 1.2) information dumping is exactly according to struct
mdp_superblock_1.

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Dan Williams <dan.j.williams@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:08 +11:00
Cheng Renquan 159ec1fc06 md: use list_for_each_entry macro directly
The rdev_for_each macro defined in <linux/raid/md_k.h> is identical to
list_for_each_entry_safe, from <linux/list.h>, it should be defined to
use list_for_each_entry_safe, instead of reinventing the wheel.

But some calls to each_entry_safe don't really need a safe version,
just a direct list_for_each_entry is enough, this could save a temp
variable (tmp) in every function that used rdev_for_each.

In this patch, most rdev_for_each loops are replaced by list_for_each_entry,
totally save many tmp vars; and only in the other situations that will call
list_del to delete an entry, the safe version is used.

Signed-off-by: Cheng Renquan <crquan@gmail.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:08 +11:00
Andre Noll ccacc7d2cf md: raid0: make hash_spacing and preshift sector-based.
This patch renames the hash_spacing and preshift members of struct
raid0_private_data to spacing and sector_shift respectively and
changes the semantics as follows:

We always have spacing = 2 * hash_spacing. In case
sizeof(sector_t) > sizeof(u32) we also have sector_shift = preshift + 1
while sector_shift = preshift = 0 otherwise.

Note that the values of nb_zone and zone are unaffected by these changes
because in the sector_div() preceeding the assignement of these two
variables both arguments double.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:08 +11:00
Andre Noll 83838ed878 md: raid0: Represent the size of strip zones in sectors.
This completes the block -> sector conversion of struct strip_zone.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:07 +11:00
Andre Noll 6199d3db0f md: raid0: Represent zone->zone_offset in sectors.
For the same reason as in the previous patch, rename it from zone_offset
to zone_start.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:07 +11:00
Andre Noll 019c4e2f3e md: raid0: Represent device offset in sectors.
Rename zone->dev_offset to zone->dev_start to make sure all users
have been converted.

Signed-off-by: Andre Noll <maan@systemlinux.org>
Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:06 +11:00
NeilBrown 0c3573f19d md: use sysfs_notify_dirent to notify changes to md/sync_action.
There is no compelling need for this, but sysfs_notify_dirent is a
nicer interface and the change is good for consistency.

Signed-off-by: NeilBrown <neilb@suse.de>
2009-01-09 08:31:05 +11:00
Inaky Perez-Gonzalez 56cf391a94 wimax: fix kernel-doc for debufs_dentry member of struct wimax_dev
Reported by Randy Dunlap from a warning in the v2.6.29 merge window
tree as of 2009/1/8.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-08 12:56:57 -08:00
Mike Rapoport f4f6bda00f backlight: add support for Toppoly TDO35S series to tdo24m lcd driver
Signed-off-by: Mike Rapoport <mike@compulab.co.il>
Acked-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2009-01-08 20:11:07 +00:00
Randy Dunlap 0ba4887c63 regulator: fix kernel-doc warnings
Fix kernel-doc warnings in regulator/driver.h:

Warning(linux-next-20090108//include/linux/regulator/driver.h:95): Excess struct/union/enum/typedef member 'set_current' description in 'regulator_ops'
Warning(linux-next-20090108//include/linux/regulator/driver.h:95): Excess struct/union/enum/typedef member 'get_current' description in 'regulator_ops'
Warning(linux-next-20090108//include/linux/regulator/driver.h:124): No description found for parameter 'irq'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
cc: Liam Girdwood <lrg@slimlogic.co.uk>
cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
2009-01-08 20:10:38 +00:00
Mark Brown c8e7e4640f regulator: Add missing kerneldoc
This is only the documentation that the kerneldoc system warns about.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
2009-01-08 20:10:33 +00:00
Mark Brown 69279fb9a9 regulator: Clean up kerneldoc warnings
Remove kerneldoc warnings that don't relate to missing documentation,
mostly by renaming parameters in the documentation to match their
actual names.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Liam Girdwood <lrg@slimlogic.co.uk>
2009-01-08 20:10:33 +00:00
David S. Miller 7f46b1343f Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 2009-01-08 11:05:59 -08:00
Herbert Xu 787e920836 ipv6: Add GRO support
This patch adds GRO support for IPv6.  IPv6 GRO supports extension
headers in the same way as GSO (by using the same infrastructure).
It's also simpler compared to IPv4 since we no longer have to worry
about fragmentation attributes or header checksums.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-01-08 10:40:57 -08:00
Richard Purdie 859cb7f2a4 leds: Add suspend/resume to the core class
Add suspend/resume to the core class and remove all the now unneeded
code from various drivers. Originally the class code couldn't support
suspend/resume but since class_device can there is no reason for
each driver doing its own suspend/resume anymore.
2009-01-08 17:55:03 +00:00
Linus Torvalds 85da1fb545 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (53 commits)
  serial: Add driver for the Cell Network Processor serial port NWP device
  powerpc: enable dynamic ftrace
  powerpc/cell: Fix the prototype of create_vma_map()
  powerpc/mm: Make clear_fixmap() actually work
  powerpc/kdump: Use ppc_save_regs() in crash_setup_regs()
  powerpc: Export cacheable_memzero as its now used in a driver
  powerpc: Fix missing semicolons in mmu_decl.h
  powerpc/pasemi: local_irq_save uses an unsigned long
  powerpc/cell: Fix some u64 vs. long types
  powerpc/cell: Use correct types in beat files
  powerpc: Use correct type in prom_init.c
  powerpc: Remove unnecessary casts
  mtd/ps3vram: Use _PAGE_NO_CACHE in memory ioremap
  mtd/ps3vram: Use msleep in waits
  mtd/ps3vram: Use proper kernel types
  mtd/ps3vram: Cleanup ps3vram driver messages
  mtd/ps3vram: Remove ps3vram debug routines
  mtd/ps3vram: Add modalias support to the ps3vram driver
  mtd/ps3vram: Add ps3vram driver for accessing video RAM as MTD
  powerpc: Fix iseries drivers build failure without CONFIG_VIOPATH
  ...
2009-01-08 09:10:16 -08:00
Wu Fengguang 91f68b7359 generic swap(): introduce global macro swap(a, b)
There have been some local definitions of swap(), it's time to replace
them all with a uniform one.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:15 -08:00
Kees Cook f06295b44c ELF: implement AT_RANDOM for glibc PRNG seeding
While discussing[1] the need for glibc to have access to random bytes
during program load, it seems that an earlier attempt to implement
AT_RANDOM got stalled.  This implements a random 16 byte string, available
to every ELF program via a new auxv AT_RANDOM vector.

[1] http://sourceware.org/ml/libc-alpha/2008-10/msg00006.html

Ulrich said:

glibc needs right after startup a bit of random data for internal
protections (stack canary etc).  What is now in upstream glibc is that we
always unconditionally open /dev/urandom, read some data, and use it.  For
every process startup.  That's slow.

...

The solution is to provide a limited amount of random data to the
starting process in the aux vector.  I suggested 16 bytes and this is
what the patch implements.  If we need only 16 bytes or less we use the
data directly.  If we need more we'll use the 16 bytes to see a PRNG.
This avoids the costly /dev/urandom use and it allows the kernel to use
the most adequate source of random data for this purpose.  It might not
be the same pool as that for /dev/urandom.

Concerns were expressed about the depletion of the randomness pool.  But
this patch doesn't make the situation worse, it doesn't deplete entropy
more than happens now.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:12 -08:00
Eric W. Biederman 61bce0f137 pid: generalize task_active_pid_ns
Currently task_active_pid_ns is not safe to call after a task becomes a
zombie and exit_task_namespaces is called, as nsproxy becomes NULL.  By
reading the pid namespace from the pid of the task we can trivially solve
this problem at the cost of one extra memory read in what should be the
same cacheline as we read the namespace from.

When moving things around I have made task_active_pid_ns out of line
because keeping it in pid_namespace.h would require adding includes of
pid.h and sched.h that I don't think we want.

This change does make task_active_pid_ns unsafe to call during
copy_process until we attach a pid on the task_struct which seems to be a
reasonable trade off.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Bastian Blank <bastian@waldi.eu.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:12 -08:00
Eric W. Biederman f9fb860f67 pid: implement ns_of_pid
A current problem with the pid namespace is that it is easy to do pid
related work after exit_task_namespaces which drops the nsproxy pointer.

However if we are doing pid namespace related work we are always operating
on some struct pid which retains the pid_namespace pointer of the pid
namespace it was allocated in.

So provide ns_of_pid which allows us to find the pid namespace a pid was
allocated in.

Using this we have the needed infrastructure to do pid namespace related
work at anytime we have a struct pid, removing the chance of accidentally
having a NULL pointer dereference when accessing current->nsproxy.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Bastian Blank <bastian@waldi.eu.org>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:12 -08:00
Li Zefan 6af866af34 cpuset: remove remaining pointers to cpumask_t
Impact: cleanups, use new cpumask API

Final trivial cleanups: mainly s/cpumask_t/struct cpumask

Note there is a FIXME in generate_sched_domains(). A future patch will
change struct cpumask *doms to struct cpumask *doms[].
(I suppose Rusty will do this.)

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Mike Travis <travis@sgi.com>
Cc: Paul Menage <menage@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:11 -08:00
Paul Menage e7c5ec9193 cgroups: add css_tryget()
Add css_tryget(), that obtains a counted reference on a CSS.  It is used
in situations where the caller has a "weak" reference to the CSS, i.e.
one that does not protect the cgroup from removal via a reference count,
but would instead be cleaned up by a destroy() callback.

css_tryget() will return true on success, or false if the cgroup is being
removed.

This is similar to Kamezawa Hiroyuki's patch from a week or two ago, but
with the difference that in the event of css_tryget() racing with a
cgroup_rmdir(), css_tryget() will only return false if the cgroup really
does get removed.

This implementation is done by biasing css->refcnt, so that a refcnt of 1
means "releasable" and 0 means "released or releasing".  In the event of a
race, css_tryget() distinguishes between "released" and "releasing" by
checking for the CSS_REMOVED flag in css->flags.

Signed-off-by: Paul Menage <menage@google.com>
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:10 -08:00
Paul Menage 999cd8a450 cgroups: add a per-subsystem hierarchy_mutex
These patches introduce new locking/refcount support for cgroups to
reduce the need for subsystems to call cgroup_lock(). This will
ultimately allow the atomicity of cgroup_rmdir() (which was removed
recently) to be restored.

These three patches give:

1/3 - introduce a per-subsystem hierarchy_mutex which a subsystem can
     use to prevent changes to its own cgroup tree

2/3 - use hierarchy_mutex in place of calling cgroup_lock() in the
     memory controller

3/3 - introduce a css_tryget() function similar to the one recently
      proposed by Kamezawa, but avoiding spurious refcount failures in
      the event of a race between a css_tryget() and an unsuccessful
      cgroup_rmdir()

Future patches will likely involve:

- using hierarchy mutex in place of cgroup_lock() in more subsystems
 where appropriate

- restoring the atomicity of cgroup_rmdir() with respect to cgroup_create()

This patch:

Add a hierarchy_mutex to the cgroup_subsys object that protects changes to
the hierarchy observed by that subsystem.  It is taken by the cgroup
subsystem (in addition to cgroup_mutex) for the following operations:

- linking a cgroup into that subsystem's cgroup tree
- unlinking a cgroup from that subsystem's cgroup tree
- moving the subsystem to/from a hierarchy (including across the
  bind() callback)

Thus if the subsystem holds its own hierarchy_mutex, it can safely
traverse its own hierarchy.

Signed-off-by: Paul Menage <menage@google.com>
Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:10 -08:00
KAMEZAWA Hiroyuki b5a84319a4 memcg: fix shmem's swap accounting
Now, you can see following even when swap accounting is enabled.

 1. Create Group 01, and 02.
 2. allocate a "file" on tmpfs by a task under 01.
 3. swap out the "file" (by memory pressure)
 4. Read "file" from a task in group 02.
 5. the charge of "file" is moved to group 02.

This is not ideal behavior. This is because SwapCache which was loaded
by read-ahead is not taken into account..

This is a patch to fix shmem's swapcache behavior.
  - remove mem_cgroup_cache_charge_swapin().
  - Add SwapCache handler routine to mem_cgroup_cache_charge().
    By this, shmem's file cache is charged at add_to_page_cache()
    with GFP_NOWAIT.
  - pass the page of swapcache to shrink_mem_cgroup.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:10 -08:00
Daisuke Nishimura a5e924f5f8 memcg: remove mem_cgroup_try_charge
After previous patch, mem_cgroup_try_charge is not used by anyone, so we
can remove it.

Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:09 -08:00
KOSAKI Motohiro c772be939e memcg: fix calculation of active_ratio
Currently, inactive_ratio of memcg is calculated at setting limit.
because page_alloc.c does so and current implementation is straightforward
porting.

However, memcg introduced hierarchy feature recently.  In hierarchy
restriction, memory limit is not only decided memory.limit_in_bytes of
current cgroup, but also parent limit and sibling memory usage.

Then, The optimal inactive_ratio is changed frequently.  So, everytime
calculation is better.

Tested-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:09 -08:00
KOSAKI Motohiro a7885eb8ad memcg: swappiness
Currently, /proc/sys/vm/swappiness can change swappiness ratio for global
reclaim.  However, memcg reclaim doesn't have tuning parameter for itself.

In general, the optimal swappiness depend on workload.  (e.g.  hpc
workload need to low swappiness than the others.)

Then, per cgroup swappiness improve administrator tunability.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:08 -08:00
KOSAKI Motohiro 9439c1c95b memcg: remove mem_cgroup_cal_reclaim()
Now, get_scan_ratio() return correct value although memcg reclaim.  Then,
mem_cgroup_calc_reclaim() can be removed.

So, memcg reclaim get the same capability of anon/file reclaim balancing
as global reclaim now.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:08 -08:00
KOSAKI Motohiro 3e2f41f1f6 memcg: add zone_reclaim_stat
Introduce mem_cgroup_per_zone::reclaim_stat member and its statics
collecting function.

Now, get_scan_ratio() can calculate correct value on memcg reclaim.

[hugh@veritas.com: avoid reclaim_stat oops when disabled]
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:08 -08:00
KOSAKI Motohiro a3d8e0549d memcg: add mem_cgroup_zone_nr_pages()
Introduce mem_cgroup_zone_nr_pages().  It is called by zone_nr_pages()
helper function.

This patch doesn't have any behavior change.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:08 -08:00
KOSAKI Motohiro 14797e2363 memcg: add inactive_anon_is_low()
The inactive_anon_is_low() is key component of active/inactive anon
balancing on reclaim.  However current inactive_anon_is_low() function
only consider global reclaim.

Therefore, we need following ugly scan_global_lru() condition.

	if (lru == LRU_ACTIVE_ANON &&
	    (!scan_global_lru(sc) || inactive_anon_is_low(zone))) {
		shrink_active_list(nr_to_scan, zone, sc, priority, file);
		return 0;

it cause that memcg reclaim always deactivate pages when shrink_list() is
called.  To make mem_cgroup_inactive_anon_is_low() improve active/inactive
anon balancing of memcgroup.

Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: "Pekka Enberg" <penberg@cs.helsinki.fi>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:08 -08:00
KOSAKI Motohiro 6e9015716a mm: introduce zone_reclaim struct
Add zone_reclam_stat struct for later enhancement.

A later patch uses this.  This patch doesn't any behavior change (yet).

Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:07 -08:00
KOSAKI Motohiro f89eb90e33 inactive_anon_is_low: move to vmscan
The inactive_anon_is_low() is called only vmscan.  Then it can move to
vmscan.c

This patch doesn't have any functional change.

Reviewd-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:07 -08:00
KAMEZAWA Hiroyuki 2c26fdd70c memcg: revert gfp mask fix
My patch, memcg-fix-gfp_mask-of-callers-of-charge.patch changed gfp_mask
of callers of charge to be GFP_HIGHUSER_MOVABLE for showing what will
happen at memory reclaim.

But in recent discussion, it's NACKed because it sounds ugly.

This patch is for reverting it and add some clean up to gfp_mask of
callers of charge.  No behavior change but need review before generating
HUNK in deep queue.

This patch also adds explanation to meaning of gfp_mask passed to charge
functions in memcontrol.h.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:06 -08:00
KAMEZAWA Hiroyuki a636b327f7 memcg: avoid unnecessary system-wide-oom-killer
Current mmtom has new oom function as pagefault_out_of_memory().  It's
added for select bad process rathar than killing current.

When memcg hit limit and calls OOM at page_fault, this handler called and
system-wide-oom handling happens.  (means kernel panics if panic_on_oom is
true....)

To avoid overkill, check memcg's recent behavior before starting
system-wide-oom.

And this patch also fixes to guarantee "don't accnout against process with
TIF_MEMDIE".  This is necessary for smooth OOM.

[akpm@linux-foundation.org: build fix]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Jan Blunck <jblunck@suse.de>
Cc: Hirokazu Takahashi <taka@valinux.co.jp>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:06 -08:00
Lai Jiangshan 2e4d40915f memcontrol: rcu_read_lock() to protect mm_match_cgroup()
mm_match_cgroup() calls cgroup_subsys_state().

We must use rcu_read_lock() to protect cgroup_subsys_state().

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Paul Menage <menage@google.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:06 -08:00
Balbir Singh 28dbc4b6a0 memcg: memory cgroup resource counters for hierarchy
Add support for building hierarchies in resource counters.  Cgroups allows
us to build a deep hierarchy, but we currently don't link the resource
counters belonging to the memory controller control groups, in the same
fashion as the corresponding cgroup entries in the cgroup hierarchy.  This
patch provides the infrastructure for resource counters that have the same
hiearchy as their cgroup counter parts.

These set of patches are based on the resource counter hiearchy patches
posted by Pavel Emelianov.

NOTE: Building hiearchies is expensive, deeper hierarchies imply charging
the all the way up to the root.  It is known that hiearchies are
expensive, so the user needs to be careful and aware of the trade-offs
before creating very deep ones.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
Cc: Paul Menage <menage@google.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
Hirokazu Takahashi f8d6654226 memcg: add mem_cgroup_disabled()
We check mem_cgroup is disabled or not by checking
mem_cgroup_subsys.disabled.  I think it has more references than expected,
now.

replacing
   if (mem_cgroup_subsys.disabled)
with
   if (mem_cgroup_disabled())

give us good look, I think.

[kamezawa.hiroyu@jp.fujitsu.com: fix typo]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
KAMEZAWA Hiroyuki 08e552c69c memcg: synchronized LRU
A big patch for changing memcg's LRU semantics.

Now,
  - page_cgroup is linked to mem_cgroup's its own LRU (per zone).

  - LRU of page_cgroup is not synchronous with global LRU.

  - page and page_cgroup is one-to-one and statically allocated.

  - To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
    - lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);

  - SwapCache is handled.

And, when we handle LRU list of page_cgroup, we do following.

	pc = lookup_page_cgroup(page);
	lock_page_cgroup(pc); .....................(1)
	mz = page_cgroup_zoneinfo(pc);
	spin_lock(&mz->lru_lock);
	.....add to LRU
	spin_unlock(&mz->lru_lock);
	unlock_page_cgroup(pc);

But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.

This is a trial to remove this dirty nesting of locks.
This patch changes mz->lru_lock to be zone->lru_lock.
Then, above sequence will be written as

        spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
	mem_cgroup_add/remove/etc_lru() {
		pc = lookup_page_cgroup(page);
		mz = page_cgroup_zoneinfo(pc);
		if (PageCgroupUsed(pc)) {
			....add to LRU
		}
        spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU

This is much simpler.
(*) We're safe even if we don't take lock_page_cgroup(pc). Because..
    1. When pc->mem_cgroup can be modified.
       - at charge.
       - at account_move().
    2. at charge
       the PCG_USED bit is not set before pc->mem_cgroup is fixed.
    3. at account_move()
       the page is isolated and not on LRU.

Pros.
  - easy for maintenance.
  - memcg can make use of laziness of pagevec.
  - we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
  - LRU status of memcg will be synchronized with global LRU's one.
  - # of locks are reduced.
  - account_move() is simplified very much.
Cons.
  - may increase cost of LRU rotation.
    (no impact if memcg is not configured.)

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
KAMEZAWA Hiroyuki 8c7c6e34a1 memcg: mem+swap controller core
This patch implements per cgroup limit for usage of memory+swap.  However
there are SwapCache, double counting of swap-cache and swap-entry is
avoided.

Mem+Swap controller works as following.
  - memory usage is limited by memory.limit_in_bytes.
  - memory + swap usage is limited by memory.memsw_limit_in_bytes.

This has following benefits.
  - A user can limit total resource usage of mem+swap.

    Without this, because memory resource controller doesn't take care of
    usage of swap, a process can exhaust all the swap (by memory leak.)
    We can avoid this case.

    And Swap is shared resource but it cannot be reclaimed (goes back to memory)
    until it's used. This characteristic can be trouble when the memory
    is divided into some parts by cpuset or memcg.
    Assume group A and group B.
    After some application executes, the system can be..

    Group A -- very large free memory space but occupy 99% of swap.
    Group B -- under memory shortage but cannot use swap...it's nearly full.

    Ability to set appropriate swap limit for each group is required.

Maybe someone wonder "why not swap but mem+swap ?"

  - The global LRU(kswapd) can swap out arbitrary pages. Swap-out means
    to move account from memory to swap...there is no change in usage of
    mem+swap.

    In other words, when we want to limit the usage of swap without affecting
    global LRU, mem+swap limit is better than just limiting swap.

Accounting target information is stored in swap_cgroup which is
per swap entry record.

Charge is done as following.
  map
    - charge  page and memsw.

  unmap
    - uncharge page/memsw if not SwapCache.

  swap-out (__delete_from_swap_cache)
    - uncharge page
    - record mem_cgroup information to swap_cgroup.

  swap-in (do_swap_page)
    - charged as page and memsw.
      record in swap_cgroup is cleared.
      memsw accounting is decremented.

  swap-free (swap_free())
    - if swap entry is freed, memsw is uncharged by PAGE_SIZE.

There are people work under never-swap environments and consider swap as
something bad. For such people, this mem+swap controller extension is just an
overhead.  This overhead is avoided by config or boot option.
(see Kconfig. detail is not in this patch.)

TODO:
 - maybe more optimization can be don in swap-in path. (but not very safe.)
   But we just do simple accounting at this stage.

[nishimura@mxp.nes.nec.co.jp: make resize limit hold mutex]
[hugh@veritas.com: memswap controller core swapcache fixes]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
KAMEZAWA Hiroyuki 27a7faa077 memcg: swap cgroup for remembering usage
For accounting swap, we need a record per swap entry, at least.

This patch adds following function.
  - swap_cgroup_swapon() .... called from swapon
  - swap_cgroup_swapoff() ... called at the end of swapoff.

  - swap_cgroup_record() .... record information of swap entry.
  - swap_cgroup_lookup() .... lookup information of swap entry.

This patch just implements "how to record information".  No actual method
for limit the usage of swap.  These routine uses flat table to record and
lookup.  "wise" lookup system like radix-tree requires requires memory
allocation at new records but swap-out is usually called under memory
shortage (or memcg hits limit.) So, I used static allocation.  (maybe
dynamic allocation is not very hard but it adds additional memory
allocation in memory shortage path.)

Note1: In this, we use pointer to record information and this means
      8bytes per swap entry. I think we can reduce this when we
      create "id of cgroup" in the range of 0-65535 or 0-255.

Reported-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Reported-by: Hugh Dickins <hugh@veritas.com>
Reported-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pavel Emelianov <xemul@openvz.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
KAMEZAWA Hiroyuki c077719be8 memcg: mem+swap controller Kconfig
Config and control variable for mem+swap controller.

This patch adds CONFIG_CGROUP_MEM_RES_CTLR_SWAP
(memory resource controller swap extension.)

For accounting swap, it's obvious that we have to use additional memory to
remember "who uses swap".  This adds more overhead.  So, it's better to
offer "choice" to users.  This patch adds 2 choices.

This patch adds 2 parameters to enable swap extension or not.
  - CONFIG
  - boot option

Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
KAMEZAWA Hiroyuki d13d144309 memcg: handle swap caches
SwapCache support for memory resource controller (memcg)

Before mem+swap controller, memcg itself should handle SwapCache in proper
way.  This is cut-out from it.

In current memcg, SwapCache is just leaked and the user can create tons of
SwapCache.  This is a leak of account and should be handled.

SwapCache accounting is done as following.

  charge (anon)
	- charged when it's mapped.
	  (because of readahead, charge at add_to_swap_cache() is not sane)
  uncharge (anon)
	- uncharged when it's dropped from swapcache and fully unmapped.
	  means it's not uncharged at unmap.
	  Note: delete from swap cache at swap-in is done after rmap information
	        is established.
  charge (shmem)
	- charged at swap-in. this prevents charge at add_to_page_cache().

  uncharge (shmem)
	- uncharged when it's dropped from swapcache and not on shmem's
	  radix-tree.

  at migration, check against 'old page' is modified to handle shmem.

Comparing to the old version discussed (and caused troubles), we have
advantages of
  - PCG_USED bit.
  - simple migrating handling.

So, situation is much easier than several months ago, maybe.

[hugh@veritas.com: memcg: handle swap caches build fix]
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Tested-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:05 -08:00
KAMEZAWA Hiroyuki 01b1ae63c2 memcg: simple migration handling
Now, management of "charge" under page migration is done under following
manner. (Assume migrate page contents from oldpage to newpage)

 before
  - "newpage" is charged before migration.
 at success.
  - "oldpage" is uncharged at somewhere(unmap, radix-tree-replace)
 at failure
  - "newpage" is uncharged.
  - "oldpage" is charged if necessary (*1)

But (*1) is not reliable....because of GFP_ATOMIC.

This patch tries to change behavior as following by charge/commit/cancel ops.

 before
  - charge PAGE_SIZE (no target page)
 success
  - commit charge against "newpage".
 failure
  - commit charge against "oldpage".
    (PCG_USED bit works effectively to avoid double-counting)
  - if "oldpage" is obsolete, cancel charge of PAGE_SIZE.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:04 -08:00
KAMEZAWA Hiroyuki 7a81b88cb5 memcg: introduce charge-commit-cancel style of functions
There is a small race in do_swap_page().  When the page swapped-in is
charged, the mapcount can be greater than 0.  But, at the same time some
process (shares it ) call unmap and make mapcount 1->0 and the page is
uncharged.

      CPUA 			CPUB
       mapcount == 1.
   (1) charge if mapcount==0     zap_pte_range()
                                (2) mapcount 1 => 0.
			        (3) uncharge(). (success)
   (4) set page's rmap()
       mapcount 0=>1

Then, this swap page's account is leaked.

For fixing this, I added a new interface.
  - charge
   account to res_counter by PAGE_SIZE and try to free pages if necessary.
  - commit
   register page_cgroup and add to LRU if necessary.
  - cancel
   uncharge PAGE_SIZE because of do_swap_page failure.

     CPUA
  (1) charge (always)
  (2) set page's rmap (mapcount > 0)
  (3) commit charge was necessary or not after set_pte().

This protocol uses PCG_USED bit on page_cgroup for avoiding over accounting.
Usual mem_cgroup_charge_common() does charge -> commit at a time.

And this patch also adds following function to clarify all charges.

  - mem_cgroup_newpage_charge() ....replacement for mem_cgroup_charge()
	called against newly allocated anon pages.

  - mem_cgroup_charge_migrate_fixup()
        called only from remove_migration_ptes().
	we'll have to rewrite this later.(this patch just keeps old behavior)
	This function will be removed by additional patch to make migration
	clearer.

Good for clarifying "what we do"

Then, we have 4 following charge points.
  - newpage
  - swap-in
  - add-to-cache.
  - migration.

[akpm@linux-foundation.org: add missing inline directives to stubs]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:04 -08:00
Paul Menage a47295e6bc cgroups: make cgroup_path() RCU-safe
Fix races between /proc/sched_debug by freeing cgroup objects via an RCU
callback.  Thus any cgroup reference obtained from an RCU-safe source will
remain valid during the RCU section.  Since dentries are also RCU-safe,
this allows us to traverse up the tree safely.

Additionally, make cgroup_path() check for a NULL cgrp->dentry to avoid
trying to report a path for a partially-created cgroup.

[lizf@cn.fujitsu.com: call deactive_super() in cgroup_diput()]
Signed-off-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Tested-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:03 -08:00
Lai Jiangshan b2aa30f7bb cgroups: don't put struct cgroupfs_root protected by RCU
We don't access struct cgroupfs_root in fast path, so we should not put
struct cgroupfs_root protected by RCU

But the comment in struct cgroup_subsys.root confuse us.

struct cgroup_subsys.root is used in these places:

1 find_css_set(): if (ss->root->subsys_list.next == &ss->sibling)
2 rebind_subsystems(): if (ss->root != &rootnode)
                       rcu_assign_pointer(ss->root, root);
                       rcu_assign_pointer(subsys[i]->root, &rootnode);
3 cgroup_has_css_refs(): if (ss->root != cgrp->root)
4 cgroup_init_subsys(): ss->root = &rootnode;
5 proc_cgroupstats_show(): ss->name, ss->root->subsys_bits,
                           ss->root->number_of_cgroups, !ss->disabled);
6 cgroup_clone(): root = subsys->root;
                  if ((root != subsys->root) ||

All these place we have held cgroup_lock() or we don't dereference to
struct cgroupfs_root.  It's means wo don't need RCU when use struct
cgroup_subsys.root, and we should not put struct cgroupfs_root protected
by RCU.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Reviewed-by: Paul Menage <menage@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:02 -08:00
Duane Griffin 04143e2fb9 ext3: tighten restrictions on inode flags
At the moment there are few restrictions on which flags may be set on
which inodes.  Specifically DIRSYNC may only be set on directories and
IMMUTABLE and APPEND may not be set on links.  Tighten that to disallow
TOPDIR being set on non-directories and only NODUMP and NOATIME to be set
on non-regular file, non-directories.

Introduces a flags masking function which masks flags based on mode and
use it during inode creation and when flags are set via the ioctl to
facilitate future consistency.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:01 -08:00
Duane Griffin 2e8671cb56 ext3: don't inherit inappropriate inode flags from parent
At present INDEX is the only flag that new ext3 inodes do NOT inherit from
their parent.  In addition prevent the flags DIRTY, ECOMPR, IMAGIC and
TOPDIR from being inherited.  List inheritable flags explicitly to prevent
future flags from accidentally being inherited.

This fixes the TOPDIR flag inheritance bug reported at
http://bugzilla.kernel.org/show_bug.cgi?id=9866.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:01 -08:00
Pekka Enberg 5df096d67e ext3: allocate ->s_blockgroup_lock separately
As spotted by kmemtrace, struct ext3_sb_info is 17152 bytes on 64-bit
which makes it a very bad fit for SLAB allocators.  The culprit of the
wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when
NR_CPUS >= 32.

To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2
page in the worst case, separately.  This shinks down struct ext3_sb_info
enough to fit a 1 KB slab cache so now we allocate 16 KB + 1 KB instead of
32 KB saving 15 KB of memory.

Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:00 -08:00
Josef Bacik f420d4dc42 jbd: improve fsync batching
There is a flaw with the way jbd handles fsync batching.  If we fsync() a
file and we were not the last person to run fsync() on this fs then we
automatically sleep for 1 jiffie in order to wait for new writers to join
into the transaction before forcing the commit.  The problem with this is
that with really fast storage (ie a Clariion) the time it takes to commit
a transaction to disk is way faster than 1 jiffie in most cases, so
sleeping means waiting longer with nothing to do than if we just committed
the transaction and kept going.  Ric Wheeler noticed this when using
fs_mark with more than 1 thread, the throughput would plummet as he added
more threads.

This patch attempts to fix this problem by recording the average time in
nanoseconds that it takes to commit a transaction to disk, and what time
we started the transaction.  If we run an fsync() and we have been running
for less time than it takes to commit the transaction to disk, we sleep
for the delta amount of time and then commit to disk.  We acheive
sub-jiffie sleeping using schedule_hrtimeout.  This means that the wait
time is auto-tuned to the speed of the underlying disk, instead of having
this static timeout.  I weighted the average according to somebody's
comments (Andreas Dilger I think) in order to help normalize random
outliers where we take way longer or way less time to commit than the
average.  I also have a min() check in there to make sure we don't sleep
longer than a jiffie in case our storage is super slow, this was requested
by Andrew.

I unfortunately do not have access to a Clariion, so I had to use a
ramdisk to represent a super fast array.  I tested with a SATA drive with
barrier=1 to make sure there was no regression with local disks, I tested
with a 4 way multipathed Apple Xserve RAID array and of course the
ramdisk.  I ran the following command

fs_mark -d /mnt/ext3-test -s 4096 -n 2000 -D 64 -t $i

where $i was 2, 4, 8, 16 and 32.  I mkfs'ed the fs each time.  Here are my
results

type	threads		with patch	without patch
sata	2		24.6		26.3
sata	4		49.2		48.1
sata	8		70.1		67.0
sata	16		104.0		94.1
sata	32		153.6		142.7

xserve	2		246.4		222.0
xserve	4		480.0		440.8
xserve	8		829.5		730.8
xserve	16		1172.7		1026.9
xserve	32		1816.3		1650.5

ramdisk	2		2538.3		1745.6
ramdisk	4		2942.3		661.9
ramdisk	8		2882.5		999.8
ramdisk	16		2738.7		1801.9
ramdisk	32		2541.9		2394.0

Signed-off-by: Josef Bacik <jbacik@redhat.com>
Cc: Andreas Dilger <adilger@sun.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Ric Wheeler <rwheeler@redhat.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:00 -08:00
Duane Griffin ef8b646183 ext2: tighten restrictions on inode flags
At the moment there are few restrictions on which flags may be set on
which inodes.  Specifically DIRSYNC may only be set on directories and
IMMUTABLE and APPEND may not be set on links.  Tighten that to disallow
TOPDIR being set on non-directories and only NODUMP and NOATIME to be set
on non-regular file, non-directories.

Introduces a flags masking function which masks flags based on mode and
use it during inode creation and when flags are set via the ioctl to
facilitate future consistency.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:00 -08:00
Duane Griffin 0e090f1e05 ext2: don't inherit inappropriate inode flags from parent
At present BTREE/INDEX is the only flag that new ext2 inodes do NOT
inherit from their parent.  In addition prevent the flags DIRTY, ECOMPR,
INDEX, IMAGIC and TOPDIR from being inherited.  List inheritable flags
explicitly to prevent future flags from accidentally being inherited.

This fixes the TOPDIR flag inheritance bug reported at
http://bugzilla.kernel.org/show_bug.cgi?id=9866.

Signed-off-by: Duane Griffin <duaneg@dghda.com>
Acked-by: Andreas Dilger <adilger@sun.com>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:00 -08:00
Pekka J Enberg 18a82eb9f9 ext2: allocate ->s_blockgroup_lock separately
As spotted by kmemtrace, struct ext2_sb_info is 17024 bytes on 64-bit
which makes it a very bad fit for SLAB allocators.  The culprit of the
wasted memory is ->s_blockgroup_lock which can be as big as 16 KB when
NR_CPUS >= 32.

To fix that, allocate ->s_blockgroup_lock, which fits nicely in a order 2
page in the worst case, separately.  This shinks down struct ext2_sb_info
enough to fit a 1 KB slab cache so now we allocate 16 KB + 1 KB instead of
32 KB saving 15 KB of memory.

Acked-by: Andreas Dilger <adilger@sun.com>
Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <linux-ext4@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:31:00 -08:00
Alex Zeffertt 1107ba885e xen: add xenfs to allow usermode <-> Xen interaction
The xenfs filesystem exports various interfaces to usermode.  Initially
this exports a file to allow usermode to interact with xenbus/xenstore.

Traditionally this appeared in /proc/xen.  Rather than extending procfs,
this patch adds a backward-compat mountpoint on /proc/xen, and provides
a xenfs filesystem which can be mounted there.

Signed-off-by: Alex Zeffertt <alex.zeffertt@eu.citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-08 08:30:59 -08:00
Richard Purdie c835ee7f41 backlight: Add suspend/resume support to the backlight core
Add suspend/resume support to the backlight core and enable use of it
by appropriate drivers.

Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2009-01-08 15:37:43 +00:00
Robert Richter d2852b932f Merge branch 'oprofile/ring_buffer' into oprofile/oprofile-for-tip 2009-01-08 14:27:34 +01:00
Peter Ujfalusi 741555568f ASoC: Merge the soc_value_enum to soc_enum struct
Merge the recently introduced soc_value_enum structure to the soc_enum.
The value based enums are still handled separately from the normal enum types,
but with the merge some of the newly introduced functions can be removed.

Signed-off-by: Peter Ujfalusi <peter.ujfalusi@nokia.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
2009-01-08 13:09:52 +00:00
Mark Brown 0081e8020e leds: Add WM8350 LED driver
The voltage and current regulators on the WM8350 AudioPlus PMIC can be
used in concert to provide a power efficient LED driver.  This driver
implements support for this within the standard LED class.

Platform initialisation code should configure the LED hardware in the
init callback provided by the WM8350 core driver.  The callback should
use wm8350_isink_set_flash(), wm8350_dcdc25_set_mode() and
wm8350_dcdc_set_slot() to configure the operating parameters of the
regulators for their hardware and then then use wm8350_register_led() to
instantiate the LED driver.

This driver was originally written by Liam Girdwood, though it has been
extensively modified since then.

Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2009-01-08 12:38:58 +00:00
Riku Voipio 934cd3f979 leds: leds-pcs9532 - Move i2c work to a workqueque
Apparently these might be called under atomic context,
and i2c operations may sleep. BUG found by
Ross Burton <ross@burtonini.com>

Signed-off-by: Riku Voipio <riku.voipio@iki.fi>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2009-01-08 12:38:58 +00:00
Wolfram Sang e2387d6c20 leds: Make header variable naming consistent
There is one place where the struct led_classdev as the function
argument is named differently. Fix it.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Signed-off-by: Richard Purdie <rpurdie@linux.intel.com>
2009-01-08 12:38:57 +00:00
Paul Mundt dd8632a12e NOMMU: Make mmap allocation page trimming behaviour configurable.
NOMMU mmap allocates a piece of memory for an mmap that's rounded up in size to
the nearest power-of-2 number of pages.  Currently it then discards the excess
pages back to the page allocator, making that memory available for use by other
things.  This can, however, cause greater amount of fragmentation.

To counter this, a sysctl is added in order to fine-tune the trimming
behaviour.  The default behaviour remains to trim pages aggressively, while
this can either be disabled completely or set to a higher page-granular
watermark in order to have finer-grained control.

vm region vm_top bits taken from an earlier patch by David Howells.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Mike Frysinger <vapier.adi@gmail.com>
2009-01-08 12:04:47 +00:00
David Howells 8feae13110 NOMMU: Make VMAs per MM as for MMU-mode linux
Make VMAs per mm_struct as for MMU-mode linux.  This solves two problems:

 (1) In SYSV SHM where nattch for a segment does not reflect the number of
     shmat's (and forks) done.

 (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
     exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
     that a VMA might be shared and already have its vm_mm assigned to another
     process or a dead process.

A new struct (vm_region) is introduced to track a mapped region and to remember
the circumstances under which it may be shared and the vm_list_struct structure
is discarded as it's no longer required.

This patch makes the following additional changes:

 (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
     with no recourse to __GFP_COMP, so the pages are not composite.  Instead,
     each page has a reference on it held by the region.  Anything else that is
     interested in such a page will have to get a reference on it to retain it.
     When the pages are released due to unmapping, each page is passed to
     put_page() and will be freed when the page usage count reaches zero.

 (2) Excess pages are trimmed after an allocation as the allocation must be
     made as a power-of-2 quantity of pages.

 (3) VMAs are added to the parent MM's R/B tree and mmap lists.  As an MM may
     end up with overlapping VMAs within the tree, the VMA struct address is
     appended to the sort key.

 (4) Non-anonymous VMAs are now added to the backing inode's prio list.

 (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
     the backing region.  The VMA and region structs will be split if
     necessary.

 (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
     segment instead of all the attachments at that addresss.  Multiple
     shmat()'s return the same address under NOMMU-mode instead of different
     virtual addresses as under MMU-mode.

 (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

 (8) /proc/maps is now the global list of mapped regions, and may list bits
     that aren't actually mapped anywhere.

 (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
     of RAM currently allocated by mmap to hold mappable regions that can't be
     mapped directly.  These are copies of the backing device or file if not
     anonymous.

These changes make NOMMU mode more similar to MMU mode.  The downside is that
NOMMU mode requires some extra memory to track things over NOMMU without this
patch (VMAs are no longer shared, and there are now region structs).

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Mike Frysinger <vapier.adi@gmail.com>
Acked-by: Paul Mundt <lethal@linux-sh.org>
2009-01-08 12:04:47 +00:00
Benjamin Krill 5886188dc7 serial: Add driver for the Cell Network Processor serial port NWP device
Add support for the nwp serial device which is connected to a DCR bus. It
uses the of_serial device driver to determine necessary properties from
the device tree.  The supported device is added as serial port number 85.

NWP stands for network processor and it is part of the QPACE - Quantum
Chromodynamics Parallel Computing on the Cell Broadband Engine project.
The implementation is a lightweight uart implementation with the focus
to consume as little resources as possible and it is connected to a
DCR bus.

Signed-off-by: Benjamin Krill <ben@codiert.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-01-08 16:25:18 +11:00
Linus Torvalds 713404d608 Merge branch 'for-2.6.29' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.29' of git://linux-nfs.org/~bfields/linux: (67 commits)
  nfsd: get rid of NFSD_VERSION
  nfsd: last_byte_offset
  nfsd: delete wrong file comment from nfsd/nfs4xdr.c
  nfsd: git rid of nfs4_cb_null_ops declaration
  nfsd: dprint each op status in nfsd4_proc_compound
  nfsd: add etoosmall to nfserrno
  NFSD: FIDs need to take precedence over UUIDs
  SUNRPC: The sunrpc server code should not be used by out-of-tree modules
  svc: Clean up deferred requests on transport destruction
  nfsd: fix double-locks of directory mutex
  svc: Move kfree of deferral record to common code
  CRED: Fix NFSD regression
  NLM: Clean up flow of control in make_socks() function
  NLM: Refactor make_socks() function
  nfsd: Ensure nfsv4 calls the underlying filesystem on LOCKT
  SUNRPC: Ensure the server closes sockets in a timely fashion
  NFSD: Add documenting comments for nfsctl interface
  NFSD: Replace open-coded integer with macro
  NFSD: Fix a handful of coding style issues in write_filehandle()
  NFSD: clean up failover sysctl function naming
  ...
2009-01-07 17:21:24 -08:00
Linus Torvalds b424e8d3b4 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (98 commits)
  PCI PM: Put PM callbacks in the order of execution
  PCI PM: Run default PM callbacks for all devices using new framework
  PCI PM: Register power state of devices during initialization
  PCI PM: Call pci_fixup_device from legacy routines
  PCI PM: Rearrange code in pci-driver.c
  PCI PM: Avoid touching devices behind bridges in unknown state
  PCI PM: Move pci_has_legacy_pm_support
  PCI PM: Power-manage devices without drivers during suspend-resume
  PCI PM: Add suspend counterpart of pci_reenable_device
  PCI PM: Fix poweroff and restore callbacks
  PCI: Use msleep instead of cpu_relax during ASPM link retraining
  PCI: PCIe portdrv: Add kerneldoc comments to remining core funtions
  PCI: PCIe portdrv: Rearrange code so that related things are together
  PCI: PCIe portdrv: Fix suspend and resume of PCI Express port services
  PCI: PCIe portdrv: Add kerneldoc comments to some core functions
  x86/PCI: Do not use interrupt links for devices using MSI-X
  net: sfc: Use pci_clear_master() to disable bus mastering
  PCI: Add pci_clear_master() as opposite of pci_set_master()
  PCI hotplug: remove redundant test in cpq hotplug
  PCI: pciehp: cleanup register and field definitions
  ...
2009-01-07 15:41:01 -08:00
Linus Torvalds 7c7758f99d Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (123 commits)
  wimax/i2400m: add CREDITS and MAINTAINERS entries
  wimax: export linux/wimax.h and linux/wimax/i2400m.h with headers_install
  i2400m: Makefile and Kconfig
  i2400m/SDIO: TX and RX path backends
  i2400m/SDIO: firmware upload backend
  i2400m/SDIO: probe/disconnect, dev init/shutdown and reset backends
  i2400m/SDIO: header for the SDIO subdriver
  i2400m/USB: TX and RX path backends
  i2400m/USB: firmware upload backend
  i2400m/USB: probe/disconnect, dev init/shutdown and reset backends
  i2400m/USB: header for the USB bus driver
  i2400m: debugfs controls
  i2400m: various functions for device management
  i2400m: RX and TX data/control paths
  i2400m: firmware loading and bootrom initialization
  i2400m: linkage to the networking stack
  i2400m: Generic probe/disconnect, reset and message passing
  i2400m: host/device procotol and core driver definitions
  i2400m: documentation and instructions for usage
  wimax: Makefile, Kconfig and docbook linkage for the stack
  ...
2009-01-07 15:37:24 -08:00
Linus Torvalds 67acd8b4b7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async
* git://git.kernel.org/pub/scm/linux/kernel/git/arjan/linux-2.6-async:
  async: don't do the initcall stuff post boot
  bootchart: improve output based on Dave Jones' feedback
  async: make the final inode deletion an asynchronous event
  fastboot: Make libata initialization even more async
  fastboot: make the libata port scan asynchronous
  fastboot: make scsi probes asynchronous
  async: Asynchronous function calls to speed up kernel boot
2009-01-07 15:35:47 -08:00
Benny Halevy db43910cb4 nfsd: get rid of NFSD_VERSION
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-07 17:38:32 -05:00
Benny Halevy 87df4de807 nfsd: last_byte_offset
refactor the nfs4 server lock code to use last_byte_offset
to compute the last byte covered by the lock.  Check for overflow
so that the last byte is set to NFS4_MAX_UINT64 if offset + len
wraps around.

Also, use NFS4_MAX_UINT64 for ~(u64)0 where appropriate.

Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
2009-01-07 17:38:31 -05:00
Robert Richter 14f0ca8eae oprofile: make new cpu buffer functions part of the api
This patch creates the new functions

 oprofile_write_reserve()
 oprofile_add_data()
 oprofile_write_commit()

and makes them part of the oprofile api.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2009-01-07 22:48:15 +01:00
Linus Torvalds c6906a2cb7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
  kbuild: fix typos (s/bin_shipped/bin.o_shipped/) in Documentation
  kbuild: add a symlink to the source for separate objdirs
  kconfig: add script to manipulate .config files on the command line
  kbuild: reintroduce ALLSOURCE_ARCHS support for tags/cscope
  bootchart: improve output based on Dave Jones' feedback
  fix modules_install via NFS
  qnx: include <linux/types.h> for definitions of __[us]{8,16,32,64} types
2009-01-07 13:11:28 -08:00
Anders Larsen 8d1a0a13ed qnx: include <linux/types.h> for definitions of __[us]{8,16,32,64} types
On 2008-12-30 11:32:33, Sam Ravnborg wrote:
> We have added a few additional validation checks of the userspace headers:
...
> 3) We should include <linux/types.h> and not <asm/types.h>
> 4) If we use a __[us]{8,16,32,64} type then we must include <linux/types.h>

Satisfy these requirements for the linux/qnx*.h headers.

Signed-off-by: Anders Larsen <al@alarsen.net>
Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2009-01-07 21:44:20 +01:00
Linus Torvalds fa7b906e7f Merge branch 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6
* 'i2c-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
  i2c: Use snprintf to set adapter names
  Input: apanel - convert to new i2c binding
  i2c: Drop I2C_CLASS_CAM_DIGITAL
  i2c: Drop I2C_CLASS_CAM_ANALOG and I2C_CLASS_SOUND
  i2c: Drop I2C_CLASS_ALL
  i2c: Get rid of remaining bus_id access
  i2c: Replace bus_id with dev_name(), dev_set_name()
2009-01-07 11:59:27 -08:00
Linus Torvalds 08249903ea Merge git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/hskinnemoen/avr32-2.6:
  avr32: Move syscalls.h under arch/avr32/include/asm/
  avr32: Define DIE_OOPS
  avr32: Remove DMATEST from defconfigs
  arch/avr32: Eliminate NULL test and memset after alloc_bootmem
  avr32: data param to at32_add_device_mci() must be non-NULL
  atmel-mci: move atmel-mci.h file to include/linux
  avr32: Hammerhead board support
  avr32: Allow reserving multiple pins at once
  favr-32: Remove deprecated call
  MIMC200: Remove deprecated call
  avr: struct device - replace bus_id with dev_name(), dev_set_name()
  avr32: Introducing asm/syscalls.h
2009-01-07 11:58:30 -08:00
Linus Torvalds 52fefcec97 Merge git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6:
  xtensa: Update platform files to reflect new location of the header files.
  xtensa: switch to packed struct unaligned access implementation
  xtensa: Add xt2000 support files.
  xtensa: move headers files to arch/xtensa/include
  xtensa: use the new byteorder headers
2009-01-07 11:56:29 -08:00
Linus Torvalds 57c44c5f6f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
  trivial: chack -> check typo fix in main Makefile
  trivial: Add a space (and a comma) to a printk in 8250 driver
  trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
  trivial: Fix misspelling of "firmware" in powerpc Makefile
  trivial: Fix misspelling of "firmware" in usb.c
  trivial: Fix misspelling of "firmware" in qla1280.c
  trivial: Fix misspelling of "firmware" in a100u2w.c
  trivial: Fix misspelling of "firmware" in megaraid.c
  trivial: Fix misspelling of "firmware" in ql4_mbx.c
  trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
  trivial: Fix misspelling of "firmware" in ipw2100.c
  trivial: Fix misspelling of "firmware" in atmel.c
  trivial: Fix misspelled firmware in Kconfig
  trivial: fix an -> a typos in documentation and comments
  trivial: fix then -> than typos in comments and documentation
  trivial: update Jesper Juhl CREDITS entry with new email
  trivial: fix singal -> signal typo
  trivial: Fix incorrect use of "loose" in event.c
  trivial: printk: fix indentation of new_text_line declaration
  trivial: rtc-stk17ta8: fix sparse warning
  ...
2009-01-07 11:31:52 -08:00
Detlef Riekenberg 940fbf411e linux/types.h: Don't depend on __GNUC__ for __le64/__be64
The typedefs for __u64 and __s64 where fixed to be available for other
compiler on May 2 2008 by H.  Peter Anvin (in commit edfa5cfa3d)

Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Detlef Riekenberg <wine.dev@web.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-01-07 11:27:12 -08:00
Rafael J. Wysocki 16cf0ebc35 x86/PCI: Do not use interrupt links for devices using MSI-X
pcibios_enable_device() and pcibios_disable_device() don't handle
IRQs for devices that have MSI enabled and it should treat the
devices with MSI-X enabled in the same way.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:25 -08:00
Ben Hutchings 6a479079c0 PCI: Add pci_clear_master() as opposite of pci_set_master()
During an online device reset it may be useful to disable bus-mastering.
pci_disable_device() does that, and far more besides, so is not suitable
for an online reset.

Add pci_clear_master() which does just this.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:23 -08:00
Kenji Kaneshige 322162a71b PCI: pciehp: cleanup register and field definitions
Clean up register definitions related to PCI Express Hot plug.

  - Add register definitions into include/linux/pci_regs.h, and use
    them instead of pciehp's locally definied register definitions.
  - Remove pciehp's locally defined register definitions
  - Remove unused register definitions in pciehp.
  - Some minor cleanups.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:22 -08:00
Stephen Hemminger db5679437a PCI: add interface to set visible size of VPD
The VPD on all devices may not be 32K. Unfortunately, there is no
generic way to find the size, so this adds a simple API hook
to reset it.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:18 -08:00
Stephen Hemminger 287d19ce2e PCI: revise VPD access interface
Change PCI VPD API which was only used by sysfs to something usable
in drivers.
   * move iteration over multiple words to the low level
   * use conventional types for arguments
   * add exportable wrapper

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:17 -08:00
Bjorn Helgaas 68feac87de PCI: add pci_common_swizzle() for INTx swizzling
This patch adds pci_common_swizzle(), which swizzles INTx values all the
way up to a root bridge.

This common implementation can replace several architecture-specific
ones.  This should someday be combined with pci_get_interrupt_pin(),
but I left it separate for now to make reviewing easier.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:12 -08:00
Kenji Kaneshige e8c331e963 PCI hotplug: introduce functions for ACPI slot detection
Some ACPI related PCI hotplug code can be shared among PCI hotplug
drivers. This patch introduces the following functions in
drivers/pci/hotplug/acpi_pcihp.c to share the code, and changes
acpiphp and pciehp to use them.

- int acpi_pci_detect_ejectable(struct pci_bus *pbus)
  This checks if the specified PCI bus has ejectable slots.

- int acpi_pci_check_ejectable(struct pci_bus *pbus, acpi_handle handle)
  This checks if the specified handle is ejectable ACPI PCI slot. The
  'pbus' parameter is needed to check if 'handle' is PCI related ACPI
  object.

This patch also introduces the following inline function in
include/linux/pci-acpi.h, which is useful to get ACPI handle of the
PCI bridge from struct pci_bus of the bridge's secondary bus.

- static inline acpi_handle acpi_pci_get_bridge_handle(struct pci_bus *pbus)
  This returns ACPI handle of the PCI bridge which generates PCI bus
  specified by 'pbus'.

Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:11 -08:00
Yu Zhao fde09c6d8f PCI: define PCI resource names in an 'enum'
This patch moves all definitions of the PCI resource names to an 'enum',
and also replaces some hard-coded resource variables with symbol
names. This change eases introduction of device specific resources.

Reviewed-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:01 -08:00
Yu Zhao 14add80b51 PCI: remove unnecessary arg of pci_update_resource()
This cleanup removes unnecessary argument 'struct resource *res' in
pci_update_resource(), so it takes same arguments as other companion
functions (pci_assign_resource(), etc.).

Signed-off-by: Yu Zhao <yu.zhao@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:13:00 -08:00
Andrew Morton 1684f5ddd4 PCI: uninline pci_ioremap_bar()
It's too large to be inlined.

Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:57 -08:00
Bjorn Helgaas 57c2cf71c1 PCI: add pci_swizzle_interrupt_pin()
This patch adds pci_swizzle_interrupt_pin(), which implements the
INTx swizzling algorithm specified in Table 9-1 of the "PCI-to-PCI
Bridge Architecture Specification," revision 1.2.

There are many architecture-specific implementations of this
swizzle that can be replaced by this common one.

Reviewed-by: David Howells <dhowells@redhat.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:50 -08:00
Arjan van de Ven e8de1481fd resource: allow MMIO exclusivity for device drivers
Device drivers that use pci_request_regions() (and similar APIs) have a
reasonable expectation that they are the only ones accessing their device.
As part of the e1000e hunt, we were afraid that some userland (X or some
bootsplash stuff) was mapping the MMIO region that the driver thought it
had exclusively via /dev/mem or via various sysfs resource mappings.

This patch adds the option for device drivers to cause their reserved
regions to the "banned from /dev/mem use" list, so now both kernel memory
and device-exclusive MMIO regions are banned.
NOTE: This is only active when CONFIG_STRICT_DEVMEM is set.

In addition to the config option, a kernel parameter iomem=relaxed is
provided for the cases where developers want to diagnose, in the field,
drivers issues from userspace.

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:32 -08:00
Andrew Patterson 2361694191 ACPI/PCI: remove obsolete _OSC capability support functions
The acpi_query_osc, __pci_osc_support_set, pci_osc_support_set, and
pcie_osc_support_set functions have been obsoleted in favor of setting
these capabilities during root bridge discovery with
pci_acpi_osc_support.  There are no longer any callers of these
functions, so remove them.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:32 -08:00
Andrew Patterson 07ae95f988 ACPI/PCI: PCI MSI _OSC support capabilities called when root bridge added
The _OSC capability OSC_MSI_SUPPORT is set when the root bridge is added
with pci_acpi_osc_support(), so we no longer need to do it in the PCI
MSI driver.  Also adds the function pci_msi_enabled, which returns true
if pci=nomsi is not on the kernel command-line.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:31 -08:00
Andrew Patterson 3e1b16002a ACPI/PCI: PCIe ASPM _OSC support capabilities called when root bridge added
The _OSC capabilities OSC_ACTIVE_STATE_PWR_SUPPORT and
OSC_CLOCK_PWR_CAPABILITY_SUPPORT are set when the root bridge is added
with pci_acpi_osc_support(), so we no longer need to do it in the ASPM
driver.  Also add the function pcie_aspm_enabled, which returns true if
pcie_aspm=off is not on the kernel command-line.

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:29 -08:00
Andrew Patterson 0ef5f8f615 ACPI/PCI: PCI extended config _OSC support called when root bridge added
The _OSC capability OSC_EXT_PCI_CONFIG_SUPPORT is set when the root
bridge is added with pci_acpi_osc_support() if we can access PCI
extended config space.

This adds the function pci_ext_cfg_avail which returns true if we can
access PCI extended config space (offset greater than 0xff). It
currently only returns false if arch=x86 and raw_pci_ext_ops is not set
(which might happen if pci=nommcfg is set on the kernel command-line).

Signed-off-by: Andrew Patterson <andrew.patterson@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:28 -08:00
Andrew Patterson 990a7ac564 ACPI/PCI: call _OSC support during root bridge discovery
Add pci_acpi_osc_support() and call it when a PCI bridge is added.  This
allows us to avoid having every individual PCI root bridge driver call
_OSC support for every root bridge in their probe functions, a
significant savings in boot time.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:27 -08:00
Andrew Patterson 8b62091e20 ACPI/PCI: include missing acpi.h file in pci-acpi.h.
The pci-acpi.h file will not compile without including linux/acpi.h.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:26 -08:00
Sheng Yang f7b7baae6b PCI: add PCI Advanced Feature Capability defines
PCI Advanced Features Capability is introduced by "Conventional PCI
Advanced Caps ECN" (can be downloaded in pcisig.com).  Add defines for
the various AF capabilities, including function level reset (FLR).

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2009-01-07 11:12:24 -08:00
Inaky Perez-Gonzalez e306987434 wimax: export linux/wimax.h and linux/wimax/i2400m.h with headers_install
These two files are what user space can use to establish communication
with the WiMAX kernel API and to speak the Intel 2400m Wireless WiMAX
connection's control protocol.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:22 -08:00
Inaky Perez-Gonzalez ea24652d25 i2400m: host/device procotol and core driver definitions
The wimax/i2400m.h defines the structures and constants for the
host-device protocols:

 - boot / firmware upload protocol

 - general data transport protocol

 - control protocol

It is done in such a way that can also be used verbatim by user space.

drivers/net/wimax/i2400m.h defines all the APIs used by the core,
bus-generic driver (i2400m) and the bus specific drivers
(i2400m-BUSNAME). It also gives a roadmap to the driver
implementation.

debug-levels.h adds the core driver's debug settings.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:18 -08:00
Inaky Perez-Gonzalez ea912f4e7f wimax: debug macros and debug settings for the WiMAX stack
This file contains a simple debug framework that is used in the stack;
it allows the debug level to be controlled at compile-time (so the
debug code is optimized out) and at run-time (for what wasn't compiled
out).

This is eventually going to be moved to use dynamic_printk(). Just
need to find time to do it.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:17 -08:00
Inaky Perez-Gonzalez ace22f0881 wimax: headers for kernel API and user space interaction
Definitions for the user/kernel API protocol through generic
netlink. User space can copy it verbatim and use it.

Kernel API definition declares the main data types and calls for the
drivers to integrate into the WiMAX stack. Provides usage
documentation.

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:16 -08:00
Inaky Perez-Gonzalez 5e07878787 debugfs: add helpers for exporting a size_t simple value
In the same spirit as debugfs_create_*(), introduce helpers for
exporting size_t values over debugfs.

The only trick done is that the format verifier is kept at %llu
instead of %zu; otherwise type warnings would pop up:

format ‘%zu’ expects type ‘size_t’, but argument 2 has type ‘long long unsigned int’

There is no real way to fix this one--however, we can consider %llu
and %zu to be compatible if we consider that we are using the same for
validating in debugfs_create_{x,u}{8,16,32}().

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:16 -08:00
Greg Kroah-Hartman 34c65d82e0 USB: remove info() macro from usb.h
USB should not be having it's own printk macros, so remove info() and
use the system-wide standard of dev_info() wherever possible.

No one in the tree is using the macro, so it can now be removed.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:14 -08:00
Greg Kroah-Hartman 338b67b0c1 USB: remove warn() macro from usb.h
USB should not be having it's own printk macros, so remove warn() and
use the system-wide standard of dev_warn() wherever possible.  In the
few places that will not work out, use a basic printk().

Now that all in-tree users are gone, remove the macro.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:14 -08:00
Alan Stern 25ff1c316f USB: storage: add last-sector hacks
This patch (as1189b) adds some hacks to usb-storage for dealing with
the growing problems involving bad capacity values and last-sector
accesses:

	A new flag, US_FL_CAPACITY_OK, is created to indicate that
	the device is known to report its capacity correctly.  An
	unusual_devs entry for Linux's own File-backed Storage Gadget
	is added with this flag set, since g_file_storage always
	reports the correct capacity and since the capacity need
	not be even (it is determined by the size of the backing
	file).

	An entry in unusual_devs.h which has only the CAPACITY_OK
	flag set shouldn't prejudice libusual, since the device will
	work perfectly well with either usb-storage or ub.  So a
	new macro, COMPLIANT_DEV, is added to let libusual know
	about these entries.

	When a last-sector access succeeds and the total number of
	sectors is odd (the unexpected case, in which guessing that
	the number is even might cause trouble), a WARN is triggered.
	The kerneloops.org project will collect these warnings,
	allowing us to add CAPACITY_OK flags for the devices in
	question before implementing the default-to-even heuristic.
	If users want to prevent the stack dump produced by the WARN,
	they can disable the hack by adding an unusual_devs entry
	for their device with the CAPACITY_OK flag.

	When a last-sector access fails three times in a row and
	neither the FIX_CAPACITY nor the CAPACITY_OK flag is set,
	we assume the last-sector bug is present.  We replace the
	existing status and sense data with values that will cause
	the SCSI core to fail the access immediately rather than
	retry indefinitely.  This should fix the difficulties
	people have been having with Nokia phones.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable <stable@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:11 -08:00
Oliver Neukum 856395d6e1 USB: extension of anchor API to unpoison an anchor
This extension allows unpoisoning an anchor allowing drivers that
resubmit URBs to reuse an anchor for methods like resume()

Signed-off-by: Oliver Neukum <oneukum@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:11 -08:00
Ming Lei 49367d8f1d USB: mark "reject" field of struct urb as atomic_t
It is enough to protect accesses to reject field of urb
by marking it as atomic_t,also it is the only reason of
existence of usb_reject_lock,so remove the lock to make
code more clean.

Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Acked-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:08 -08:00
Alan Stern 3b23dd6f8a USB: utilize the bus notifiers
This patch (as1185) makes usbcore take advantage of the bus
notifications sent out by the driver core.  Now we can create all our
device and interface attribute files before the device or interface
uevent is broadcast.

A side effect is that we no longer create the endpoint "pseudo"
devices at the same time as a device or interface is registered -- it
seems like a bad idea to try registering an endpoint before the
registration of its parent is complete.  So the routines for creating
and removing endpoint devices have been split out and renamed, and
they are called explicitly when needed.  A new bitflag is used for
keeping track of whether or not the interface's endpoint devices have
been created, since (just as with the interface attributes) they vary
with the altsetting and hence can be changed at random times.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:08 -08:00
Bryan Wu 2ffcdb3bda USB: musb: use new platform data interface of musb to replace old one
Signed-off-by: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Felipe Balbi <felipe.balbi@nokia.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:06 -08:00
Alan Stern 65bfd2967c USB: Enhance usage of pm_message_t
This patch (as1177) modifies the USB core suspend and resume
routines.  The resume functions now will take a pm_message_t argument,
so they will know what sort of resume is occurring.  The new argument
is also passed to the port suspend/resume and bus suspend/resume
routines (although they don't use it for anything but debugging).

In addition, special pm_message_t values are used for user-initiated,
device-initiated (i.e., remote wakeup), and automatic suspend/resume.
By testing these values, drivers can tell whether or not a particular
suspend was an autosuspend.  Unfortunately, they can't do the same for
resumes -- not until the pm_message_t argument is also passed to the
drivers' resume methods.  That will require a bigger change.

IMO, the whole Power Management framework should have been set up this
way in the first place.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:03 -08:00
Philipp Zabel 68144e0cc9 USB: otg: add otg_put_transceiver()
As Russell King points out, calling put_device(otg_transceiver->dev)
directly in driver cleanup paths makes assumptions about otg_transceiver
internals.

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:02 -08:00
Philipp Zabel 6084f1bf0c USB: otg: gpio_vbus transceiver stub
gpio_vbus provides simple GPIO VBUS sensing for peripheral
controllers with an internal transceiver.
Optionally, a second GPIO can be used to control D+ pullup.

It also interfaces with the regulator framework to limit charging
currents when powered via USB. gpio_vbus requests the regulator
supplying "vbus_draw" and can enable/disable it or limit its
current depending on USB state.

[dbrownell@users.sourceforge.net: use drivers/otg, cleanups ]

Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 10:00:02 -08:00
Ben Efros 1537e0ad94 USB: storage devices and SAT
Add the SANE SENSE flag to indicate that a device is capable of handling
more than 18-bytes of sense data.  This functionality is required for
USB-ATA bridges implementing SAT.  A future patch will actually enable this
function for several devices.

The logic behind this is that we can detect support for SANE_SENSE in a few ways:
 1) ATA PASS THROUGH (12) or (16) execute successfully
 2) SPC-3 or higher is in use
 3) A previous CHECK CONDITION occurred with sense format 70-73 and had
    a length greater than 18-bytes total

Signed-off-by: Ben Efros <ben@pc-doctor.com>
Signed-off-by: Matthew Dharm <mdharm-usb@one-eyed-alien.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 09:59:55 -08:00
Pete Zaitcev f150fa1afb USB: Allow usbmon as a module even if usbcore is builtin
usbmon can only be built as a module if usbcore is a module too. Trivial
changes to the relevant Kconfig and Makefile (and a few trivial changes
elsewhere) allow usbmon to be built as a module even if usbcore is
builtin.

This is verified to work in all 9 permutations (3 correctly prohibited
by Kconfig, 6 build a suitable result).

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pete Zaitcev <zaitcev@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 09:59:54 -08:00
Inaky Perez-Gonzalez dc023dceec USB: Introduce usb_queue_reset() to do resets from atomic contexts
This patch introduces a new call to be able to do a USB reset from an
atomic contect. This is quite helpful in USB callbacks to handle
errors (when the only thing that can be done is to do a device
reset).

It is done queuing a work struct that will do the actual reset. The
struct is "attached" to an interface so pending requests from an
interface are removed when said interface is unbound from the driver.

The call flow then becomes:

usb_queue_reset_device()
  __usb_queue_reset_device() [workqueue]
    usb_reset_device()

usb_probe_interface()
  usb_cancel_queue_reset()      [error path]

usb_unbind_interface()
  usb_cancel_queue_reset()

usb_driver_release_interface()
  usb_cancel_queue_reset()

Note usb_cancel_queue_reset() needs smarts to try not to unqueue when
it is actually being executed. This happens when we run the reset from
the workqueue: usb_reset_device() is called and on interface unbind
time, usb_cancel_queue_reset() would be called. That would deadlock on
cancel_work_sync(). To avoid that, we set (before running
usb_reset_device()) usb_intf->reset_running and clear it inmediately
after returning.

Patch is against 2.6.28-rc2 and depends on
http://marc.info/?l=linux-usb&m=122581634925308&w=2 (as submitted by
Alan Stern).

Signed-off-by: Inaky Perez-Gonzalez <inaky@linux.intel.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-01-07 09:59:53 -08:00