Commit Graph

2000 Commits

Author SHA1 Message Date
Andrei Vagin 769031c92c scsi: target: target/file: Add support of direct and async I/O
There are two advantages:

* Direct I/O allows to avoid the write-back cache, so it reduces affects
  to other processes in the system.
* Async I/O allows to handle a few commands concurrently.

DIO + AIO shows a better perfomance for random write operations:

Mode: O_DSYNC Async: 1
$ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sda --runtime=20 --numjobs=2
  WRITE: bw=45.9MiB/s (48.1MB/s), 21.9MiB/s-23.0MiB/s (22.0MB/s-25.2MB/s), io=919MiB (963MB), run=20002-20020msec

Mode: O_DSYNC Async: 0
$ ./fio --bs=4K --direct=1 --rw=randwrite --ioengine=libaio --iodepth=64 --name=/dev/sdb --runtime=20 --numjobs=2
  WRITE: bw=1607KiB/s (1645kB/s), 802KiB/s-805KiB/s (821kB/s-824kB/s), io=31.8MiB (33.4MB), run=20280-20295msec

Known issue:

DIF (PI) emulation doesn't work when a target uses async I/O, because
DIF metadata is saved in a separate file, and it is another non-trivial
task how to synchronize writing in two files, so that a following read
operation always returns a consisten metadata for a specified block.

Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Andrei Vagin <avagin@openvz.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-14 22:40:08 -04:00
Kent Overstreet f4f8154a08 block: Use bioset_init() for fs_bio_set
Minor optimization - remove a pointer indirection when using fs_bio_set.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14 13:16:06 -06:00
Christoph Hellwig ff005a0662 block: sanitize blk_get_request calling conventions
Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14 08:55:12 -06:00
Zhu Lingshan 33d065ccb3 scsi: tcmu: refactor nl wr_cache attr with new helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_WRITECACHE(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
emulate_write_cache in configFS.

Removed tcmu_netlink_event() since we have new netlink
events helpers now.

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:52:50 -04:00
Zhu Lingshan 84e2850628 scsi: tcmu: refactor nl dev_size attr with new helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_DEV_SIZE(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
dev_size in configFS.

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:52:50 -04:00
Zhu Lingshan 02ccfb54ba scsi: tcmu: refactor nl dev_cfg attr with new nl helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event attribute
TCMU_ATTR_DEV_CFG(belongs to TCMU_CMD_RECONFIG_DEVICE) which is also
dev_config in configFS.

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:52:50 -04:00
Zhu Lingshan f892bd8ec1 scsi: tcmu: refactor rm_device cmd with new nl helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event TCMU_CMD_REMOVED_DEVICE

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:52:50 -04:00
Zhu Lingshan e0c240ac3a scsi: tcmu: refactor add_device cmd with new nl helpers
use new netlink events helpers tcmu_netlink_init() and
tcmu_netlink_send() to refactor netlink event TCMU_CMD_ADDED_DEVICE

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:52:49 -04:00
Zhu Lingshan 0e5aee3939 scsi: tcmu: add new netlink events helpers
Add new netlink events helpers tcmu_netlink_event_init() and
tcmu_netlink_event_send(). These new functions intend to replace
existing netlink events helper function tcmu_netlink_event().

The existing function tcmu_netlink_event() works well for events like
TCMU_ADDED_DEVICE and TCMU_REMOVED_DEVICE which only has one netlink
attribute. But if there is a command requires more than one attributes
to send out, we have to use a struct to adapt the paremeter
reconfig_data, it is hard to use one struct or a union in one struct to
adapt every command with different attributes, it may get long and ugly.

With the new two functions, we can call tcmu_netlink_event_init() to
initialize a netlink event, then add all attributes we need by using
nla_put_xxx(), at last use tcmu_netlink_event_send() to send it out. So
that we don't need to use a long struct or union if we want to send
mulitple attributes for different commands.

[mkp: typos]

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-05-08 01:52:49 -04:00
Bryant G Ly f5957dade4 scsi: target: Fix fortify_panic kernel exception
memcmp() requires the two buffers passed as arguments to be at least
'size' bytes long, otherwise a fortify_panic will trigger.

Use memchr_inv() instead of memcmp() to determine whether the received
payload is zeroed or not.

The bug was found by running a block backstore via LIO.

[  496.212958] Call Trace:
[  496.212960] [c0000007e58e3800] [c000000000cbbefc] fortify_panic+0x24/0x38 (unreliable)
[  496.212965] [c0000007e58e3860] [d00000000f150c28] iblock_execute_write_same+0x3b8/0x3c0 [target_core_iblock]
[  496.212976] [c0000007e58e3910] [d000000006c737d4] __target_execute_cmd+0x54/0x150 [target_core_mod]
[  496.212982] [c0000007e58e3940] [d000000006d32ce4] ibmvscsis_write_pending+0x74/0xe0 [ibmvscsis]
[  496.212991] [c0000007e58e39b0] [d000000006c74fc8] transport_generic_new_cmd+0x318/0x370 [target_core_mod]
[  496.213001] [c0000007e58e3a30] [d000000006c75084] transport_handle_cdb_direct+0x64/0xd0 [target_core_mod]
[  496.213011] [c0000007e58e3aa0] [d000000006c75298] target_submit_cmd_map_sgls+0x1a8/0x320 [target_core_mod]
[  496.213021] [c0000007e58e3b30] [d000000006c75458] target_submit_cmd+0x48/0x60 [target_core_mod]
[  496.213026] [c0000007e58e3bd0] [d000000006d34c20] ibmvscsis_scheduler+0x370/0x600 [ibmvscsis]
[  496.213031] [c0000007e58e3c90] [c00000000013135c] process_one_work+0x1ec/0x580
[  496.213035] [c0000007e58e3d20] [c000000000131798] worker_thread+0xa8/0x600
[  496.213039] [c0000007e58e3dc0] [c00000000013a468] kthread+0x168/0x1b0
[  496.213044] [c0000007e58e3e30] [c00000000000b528] ret_from_kernel_thread+0x5c/0xb4

[mkp: tweaked commit message]

Fixes: 2237498f0b ("target/iblock: Convert WRITE_SAME to blkdev_issue_zeroout")
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-by: Steven Royer <seroyer@linux.vnet.ibm.com>
Tested-by: Taylor Jakobson <tjakobs@us.ibm.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Cc: <stable@vger.kernel.org> # v4.13+
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-20 19:24:01 -04:00
Randy Dunlap 572ccdab50 scsi: target: target_core_user.[ch]: convert comments into DOC:
Make documentation on target-supported userspace-I/O design be
usable by kernel-doc by using "DOC:". This is used in the driver-api
Documentation chapter.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: linux-scsi@vger.kernel.org
Cc: target-devel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-20 19:14:39 -04:00
Randy Dunlap 9ad97b8b40 scsi: target: target_core_transport.c: enable+fix kernel-doc
For exported functions that already have near-kernel-doc notation,
fix them to begin with "/**" and make a few corrections so that they
don't have any kernel-doc warnings.

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: linux-scsi@vger.kernel.org
Cc: target-devel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-20 19:14:38 -04:00
Randy Dunlap 1e74aff12d scsi: target: target_core_transport.c: fix kernel-doc warnings
Correct a function parameter's name to eliminate kernel-doc warnings
in drivers/target/target_core_transport.c.

Fixes these kernel-doc warnings: (tested by adding these files to a new
target.rst documentation file)

../drivers/target/target_core_transport.c:1671: warning: No description found for parameter 'fabric_tmr_ptr'
../drivers/target/target_core_transport.c:1671: warning: Excess function parameter 'fabric_context' description in 'target_submit_tmr'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
To: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: linux-scsi@vger.kernel.org
Cc: target-devel@vger.kernel.org
Cc: linux-doc@vger.kernel.org
Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-20 19:14:38 -04:00
Souptick Joarder 69589c9bb9 scsi: target: Change return type to vm_fault_t
Use new return type vm_fault_t for fault handler in struct
vm_operations_struct.

Signed-off-by: Souptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:49:37 -04:00
Lee Duncan 78a6295c71 scsi: target: prefer dbroot of /etc/target over /var/target
The target database root directory, dbroot, has defaulted to /var/target
for a while, but its main client, targetcli-fb, has been moving it to
/etc/target for quite some time. With the plethora of target drivers now
appearing, it has become more difficult to initialize this attribute
before use by any child drivers.

If the directory /etc/target exists, use that as the DB root. Otherwise,
fall back to using /var/target.

The ability to override this dbroot attribute still exists via sysfs.

Signed-off-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:47:02 -04:00
Ming Lei 8e1ceafe50 scsi: target: fix crash with iscsi target and dvd
When the current page can't be added to bio, one new bio should be
created for adding this page again, instead of ignoring this page.

This patch fixes kernel crash with iscsi target and dvd, as reported by
Wakko.

Cc: Wakko Warner <wakko@animx.eu.org>
Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: target-devel@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Christoph Hellwig <hch@lst.de>
Fixes: 84c8590646 ("target: avoid accessing .bi_vcnt directly")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2018-04-19 00:41:03 -04:00
Linus Torvalds 3526dd0c78 for-4.17/block-20180402
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJawr05AAoJEPfTWPspceCmT2UP/1uuaqwzyl4VjFNb/k7KS7UM
 +Cs/1HBlGomgMA8orDTGqtWqLRdR3z4RSh0+MvXTzQ78HpFVYz7CbDc9itHm+G9M
 X0ypD4kF/JGCFb5cxk+x6qv28uO2nv4DP3+0hHqJWLH4UVJBWDY6bs4BPShsf9QB
 I6XjioNMhoqylXgdOITLODJZz+TcChlJMDAqwhpJwh9TH1wjobleAZ6AdmCPfgi5
 h0UCKMUKzcVJlNZwQUrzrs2cxcx9Uhunnbz7HK0ZV4n/FKFtDpGynFpQQ71pZxKe
 Be0ZOBPCQvC3ykOM/egCIvC/e5y7FgrjORD6jxyu1PTwAugI5E1VYSMxHkXvgPAx
 zOo9A7RT4GPO2tDQv+DbzNFpqeSAclTgSmr+/y1wmheBs8DiSt7MPVBiNM4zdCNv
 NLk9z7IEjFhdmluSB/LbTb1aokypMb/q7QTLouPHdwGn80k7yrhFyLHgdjpNTQ2K
 UHfHZvGxkOX6SmFhBNOtIFUkuSceenh64a0RkRle7filx+ImpbCVm2/GYi9zZNCu
 EtctgzLbLmz40zMiyDaZS2bxBgGzfn6yf4xd9LsaAJPMhvZnmXogT0D9ctWXB0WU
 mMaS7sOkLnNjnGkzF1fHkeiZ/oigrstJbe+CA7BtOdwxpWn6MZBgKEoFQ6iA2b3X
 5J1axMgVH5LAsIEcEQVq
 =RVhK
 -----END PGP SIGNATURE-----

Merge tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block

Pull block layer updates from Jens Axboe:
 "It's a pretty quiet round this time, which is nice. This contains:

   - series from Bart, cleaning up the way we set/test/clear atomic
     queue flags.

   - series from Bart, fixing races between gendisk and queue
     registration and removal.

   - set of bcache fixes and improvements from various folks, by way of
     Michael Lyle.

   - set of lightnvm updates from Matias, most of it being the 1.2 to
     2.0 transition.

   - removal of unused DIO flags from Nikolay.

   - blk-mq/sbitmap memory ordering fixes from Omar.

   - divide-by-zero fix for BFQ from Paolo.

   - minor documentation patches from Randy.

   - timeout fix from Tejun.

   - Alpha "can't write a char atomically" fix from Mikulas.

   - set of NVMe fixes by way of Keith.

   - bsg and bsg-lib improvements from Christoph.

   - a few sed-opal fixes from Jonas.

   - cdrom check-disk-change deadlock fix from Maurizio.

   - various little fixes, comment fixes, etc from various folks"

* tag 'for-4.17/block-20180402' of git://git.kernel.dk/linux-block: (139 commits)
  blk-mq: Directly schedule q->timeout_work when aborting a request
  blktrace: fix comment in blktrace_api.h
  lightnvm: remove function name in strings
  lightnvm: pblk: remove some unnecessary NULL checks
  lightnvm: pblk: don't recover unwritten lines
  lightnvm: pblk: implement 2.0 support
  lightnvm: pblk: implement get log report chunk
  lightnvm: pblk: rename ppaf* to addrf*
  lightnvm: pblk: check for supported version
  lightnvm: implement get log report chunk helpers
  lightnvm: make address conversions depend on generic device
  lightnvm: add support for 2.0 address format
  lightnvm: normalize geometry nomenclature
  lightnvm: complete geo structure with maxoc*
  lightnvm: add shorten OCSSD version in geo
  lightnvm: add minor version to generic geometry
  lightnvm: simplify geometry structure
  lightnvm: pblk: refactor init/exit sequences
  lightnvm: Avoid validation of default op value
  lightnvm: centralize permission check for lightnvm ioctl
  ...
2018-04-05 14:27:02 -07:00
Sagi Grimberg 7bfca0cfe6 iscsi-target: use common inet_addr_is_any
Instead of open-coding it.

Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: target-devel@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-26 08:53:43 -06:00
Bart Van Assche bf3a2b310e target/tcm_loop: Use blk_queue_flag_set()
Use blk_queue_flag_set() instead of open-coding this function.

Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-03-08 14:13:48 -07:00
Denys Vlasenko 9b2c45d479 net: make getname() functions return length rather than use int* parameter
Changes since v1:
Added changes in these files:
    drivers/infiniband/hw/usnic/usnic_transport.c
    drivers/staging/lustre/lnet/lnet/lib-socket.c
    drivers/target/iscsi/iscsi_target_login.c
    drivers/vhost/net.c
    fs/dlm/lowcomms.c
    fs/ocfs2/cluster/tcp.c
    security/tomoyo/network.c

Before:
All these functions either return a negative error indicator,
or store length of sockaddr into "int *socklen" parameter
and return zero on success.

"int *socklen" parameter is awkward. For example, if caller does not
care, it still needs to provide on-stack storage for the value
it does not need.

None of the many FOO_getname() functions of various protocols
ever used old value of *socklen. They always just overwrite it.

This change drops this parameter, and makes all these functions, on success,
return length of sockaddr. It's always >= 0 and can be differentiated
from an error.

Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.

rpc_sockname() lost "int buflen" parameter, since its only use was
to be passed to kernel_getsockname() as &buflen and subsequently
not used in any way.

Userspace API is not changed.

    text    data     bss      dec     hex filename
30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
30108109 2633612  873672 33615393 200ee21 vmlinux.o

Signed-off-by: Denys Vlasenko <dvlasenk@redhat.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-kernel@vger.kernel.org
CC: netdev@vger.kernel.org
CC: linux-bluetooth@vger.kernel.org
CC: linux-decnet-user@lists.sourceforge.net
CC: linux-wireless@vger.kernel.org
CC: linux-rdma@vger.kernel.org
CC: linux-sctp@vger.kernel.org
CC: linux-nfs@vger.kernel.org
CC: linux-x25@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-02-12 14:15:04 -05:00
Linus Torvalds 858f45bff3 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights include:

   - numerous target-core-user improvements related to queue full and
     timeout handling. (MNC)

   - prevent target-core-user corruption when invalid data page is
     requested. (MNC)

   - add target-core device action configfs attributes to allow
     user-space to trigger events separate from existing attributes
     exposed to end-users. (MNC)

   - fix iscsi-target NULL pointer dereference 4.6+ regression in CHAP
     error path. (David Disseldorp)

   - avoid target-core backend UNMAP callbacks if range is zero. (Andrei
     Vagin)

   - fix a iscsi-target 4.14+ regression related multiple PDU logins,
     that was exposed due to removal of TCP prequeue support. (Florian
     Westphal + MNC)

  Also, there is a iser-target bug still being worked on for post -rc1
  code to address a long standing issue resulting in persistent
  ib_post_send() failures, for RNICs with small max_send_sge"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (36 commits)
  iscsi-target: make sure to wake up sleeping login worker
  tcmu: Fix trailing semicolon
  tcmu: fix cmd user after free
  target: fix destroy device in target_configure_device
  tcmu: allow userspace to reset ring
  target core: add device action configfs files
  tcmu: fix error return code in tcmu_configure_device()
  target_core_user: add cmd id to broken ring message
  target: add SAM_STAT_BUSY sense reason
  tcmu: prevent corruption when invalid data page requested
  target: don't call an unmap callback if a range length is zero
  target/iscsi: avoid NULL dereference in CHAP auth error path
  cxgbit: call neigh_event_send() to update MAC address
  target: tcm_loop: Use seq_puts() in tcm_loop_show_info()
  target: tcm_loop: Delete an unnecessary return statement in tcm_loop_submission_work()
  target: tcm_loop: Delete two unnecessary variable initialisations in tcm_loop_issue_tmr()
  target: tcm_loop: Combine substrings for 26 messages
  target: tcm_loop: Improve a size determination in two functions
  target: tcm_loop: Delete an error message for a failed memory allocation in four functions
  sbp-target: Delete an error message for a failed memory allocation in three functions
  ...
2018-02-09 14:49:46 -08:00
Linus Torvalds 0a4b6e2f80 Merge branch 'for-4.16/block' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
 "This is the main pull request for block IO related changes for the
  4.16 kernel. Nothing major in this pull request, but a good amount of
  improvements and fixes all over the map. This contains:

   - BFQ improvements, fixes, and cleanups from Angelo, Chiara, and
     Paolo.

   - Support for SMR zones for deadline and mq-deadline from Damien and
     Christoph.

   - Set of fixes for bcache by way of Michael Lyle, including fixes
     from himself, Kent, Rui, Tang, and Coly.

   - Series from Matias for lightnvm with fixes from Hans Holmberg,
     Javier, and Matias. Mostly centered around pblk, and the removing
     rrpc 1.2 in preparation for supporting 2.0.

   - A couple of NVMe pull requests from Christoph. Nothing major in
     here, just fixes and cleanups, and support for command tracing from
     Johannes.

   - Support for blk-throttle for tracking reads and writes separately.
     From Joseph Qi. A few cleanups/fixes also for blk-throttle from
     Weiping.

   - Series from Mike Snitzer that enables dm to register its queue more
     logically, something that's alwways been problematic on dm since
     it's a stacked device.

   - Series from Ming cleaning up some of the bio accessor use, in
     preparation for supporting multipage bvecs.

   - Various fixes from Ming closing up holes around queue mapping and
     quiescing.

   - BSD partition fix from Richard Narron, fixing a problem where we
     can't mount newer (10/11) FreeBSD partitions.

   - Series from Tejun reworking blk-mq timeout handling. The previous
     scheme relied on atomic bits, but it had races where we would think
     a request had timed out if it to reused at the wrong time.

   - null_blk now supports faking timeouts, to enable us to better
     exercise and test that functionality separately. From me.

   - Kill the separate atomic poll bit in the request struct. After
     this, we don't use the atomic bits on blk-mq anymore at all. From
     me.

   - sgl_alloc/free helpers from Bart.

   - Heavily contended tag case scalability improvement from me.

   - Various little fixes and cleanups from Arnd, Bart, Corentin,
     Douglas, Eryu, Goldwyn, and myself"

* 'for-4.16/block' of git://git.kernel.dk/linux-block: (186 commits)
  block: remove smart1,2.h
  nvme: add tracepoint for nvme_complete_rq
  nvme: add tracepoint for nvme_setup_cmd
  nvme-pci: introduce RECONNECTING state to mark initializing procedure
  nvme-rdma: remove redundant boolean for inline_data
  nvme: don't free uuid pointer before printing it
  nvme-pci: Suspend queues after deleting them
  bsg: use pr_debug instead of hand crafted macros
  blk-mq-debugfs: don't allow write on attributes with seq_operations set
  nvme-pci: Fix queue double allocations
  block: Set BIO_TRACE_COMPLETION on new bio during split
  blk-throttle: use queue_is_rq_based
  block: Remove kblockd_schedule_delayed_work{,_on}()
  blk-mq: Avoid that blk_mq_delay_run_hw_queue() introduces unintended delays
  blk-mq: Rename blk_mq_request_direct_issue() into blk_mq_request_issue_directly()
  lib/scatterlist: Fix chaining support in sgl_alloc_order()
  blk-throttle: track read and write request individually
  block: add bdev_read_only() checks to common helpers
  block: fail op_is_write() requests to read-only partitions
  blk-throttle: export io_serviced_recursive, io_service_bytes_recursive
  ...
2018-01-29 11:51:49 -08:00
Florian Westphal 1c130ae00b iscsi-target: make sure to wake up sleeping login worker
Mike Christie reports:
  Starting in 4.14 iscsi logins will fail around 50% of the time.

Problem appears to be that iscsi_target_sk_data_ready() callback may
return without doing anything in case it finds the login work queue
is still blocked in sock_recvmsg().

Nicholas Bellinger says:
  It would indicate users providing their own ->sk_data_ready() callback
  must be responsible for waking up a kthread context blocked on
  sock_recvmsg(..., MSG_WAITALL), when a second ->sk_data_ready() is
  received before the first sock_recvmsg(..., MSG_WAITALL) completes.

So, do this and invoke the original data_ready() callback -- in
case of tcp sockets this takes care of waking the thread.

Disclaimer: I do not understand why this problem did not show up before
tcp prequeue removal.

(Drop WARN_ON usage - nab)

Reported-by: Mike Christie <mchristi@redhat.com>
Bisected-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Diagnosed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Fixes: e7942d0633 ("tcp: remove prequeue support")
Signed-off-by: Florian Westphal <fw@strlen.de>
Cc: stable@vger.kernel.org # 4.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-23 23:03:54 -08:00
Bart Van Assche 8c7a8d1c4b lib/scatterlist: Fix chaining support in sgl_alloc_order()
This patch avoids that workloads with large block sizes (megabytes)
can trigger the following call stack with the ib_srpt driver (that
driver is the only driver that chains scatterlists allocated by
sgl_alloc_order()):

BUG: Bad page state in process kworker/0:1H  pfn:2423a78
page:fffffb03d08e9e00 count:-3 mapcount:0 mapping:          (null) index:0x0
flags: 0x57ffffc0000000()
raw: 0057ffffc0000000 0000000000000000 0000000000000000 fffffffdffffffff
raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
page dumped because: nonzero _count
CPU: 0 PID: 733 Comm: kworker/0:1H Tainted: G          I      4.15.0-rc7.bart+ #1
Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
Call Trace:
 dump_stack+0x5c/0x83
 bad_page+0xf5/0x10f
 get_page_from_freelist+0xa46/0x11b0
 __alloc_pages_nodemask+0x103/0x290
 sgl_alloc_order+0x101/0x180
 target_alloc_sgl+0x2c/0x40 [target_core_mod]
 srpt_alloc_rw_ctxs+0x173/0x2d0 [ib_srpt]
 srpt_handle_new_iu+0x61e/0x7f0 [ib_srpt]
 __ib_process_cq+0x55/0xa0 [ib_core]
 ib_cq_poll_work+0x1b/0x60 [ib_core]
 process_one_work+0x141/0x340
 worker_thread+0x47/0x3e0
 kthread+0xf5/0x130
 ret_from_fork+0x1f/0x30

Fixes: e80a0af475 ("lib/scatterlist: Introduce sgl_alloc() and sgl_free()")
Reported-by: Laurence Oberman <loberman@redhat.com>
Tested-by: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-19 12:31:03 -07:00
Luis de Bethencourt 85fae482a9 tcmu: Fix trailing semicolon
The trailing semicolon is an empty statement that does no operation.
It is completely stripped out by the compiler. Removing it since it doesn't do
anything.

Signed-off-by: Luis de Bethencourt <luisbg@kernel.org>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-18 01:21:24 -08:00
Mike Christie 45dc488c0e tcmu: fix cmd user after free
If we are failing the command due to a qfull timeout we are
also freeing the tcmu command, so we cannot access it later
to get the se_cmd.

Note: The clearing of cmd->se_cmd is not needed. We do not check
it later for something like determining if the command was failed
due to a timeout. As a result I am dropping it.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-18 01:21:23 -08:00
tangwenji c82b59e7c3 target: fix destroy device in target_configure_device
After dev->transport->configure_device succeeds, target_configure_device
exits abnormally, dev_flags has not set DF_CONFIGURED yet, does not call
destroy_device function in free_device.

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-18 01:21:11 -08:00
Mike Christie 892782caf1 tcmu: allow userspace to reset ring
This patch adds 2 tcmu attrs to block/unblock a device and
reset the ring buffer. They are used when the userspace
daemon has crashed or forced to shutdown while IO is executing.
On restart, the daemon can block the device so new IO is not
sent to userspace while it puts the ring in a clean state.

Notes: The reset ring opreation is specific to tcmu, but the
block one could be generic. I kept it tcmu specific, because
it requires some extra locking/state checks in the main IO
path and since other backend modules did not need this
functionality I thought only tcmu should take the perf hit.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-16 18:05:04 -08:00
Mike Christie 8dc31ff929 target core: add device action configfs files
This patch adds a new group of files that are to be used to
have the kernel module execution some action. The next patch
will have target_core_user use the group/files to be able to block
a device and to reset its memory buffer used to pass commands
between user/kernel space.

This type of file is different from the existing device attributes
in that they may be write only and when written to they result in
the kernel module executing some function. These need to be
separate from the normal device attributes which get/set device
values so userspace can continue to loop over all the attribs and
get/set them during initialization.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-16 18:05:04 -08:00
Wei Yongjun a24e7917e1 tcmu: fix error return code in tcmu_configure_device()
Fix to return error code -ENOMEM from the kzalloc() error handling
case instead of 0, as done elsewhere in this function.

Fixes: 80eb876 ("tcmu: allow max block and global max blocks to be settable")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 21:17:35 -08:00
Mike Christie 88cf107325 target_core_user: add cmd id to broken ring message
Log cmd id that was not found in the tcmu_handle_completions
lookup failure path.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:30 -08:00
Mike Christie d120c7083f target: add SAM_STAT_BUSY sense reason
Add SAM_STAT_BUSY sense_reason. The next patch will have
target_core_user return this value while it is temporarily
blocked and restarting.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:30 -08:00
Mike Christie c1c390ba53 tcmu: prevent corruption when invalid data page requested
We will always have a page mapped for cmd data if it is
valid command. If the mapping does not exist then something
bad happened in userspace and it should not proceed. This
has us return VM_FAULT_SIGBUS when this happens instead of
returning a freshly allocated paged. The latter can cause
corruption because userspace might write the pages data
overwriting valid data or return it to the initiator.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:29 -08:00
Andrei Vagin 9960f85181 target: don't call an unmap callback if a range length is zero
If a length of a range is zero, it means there is nothing to unmap
and we can skip this range.

Here is one more reason, why we have to skip such ranges.  An unmap
callback calls file_operations->fallocate(), but the man page for the
fallocate syscall says that fallocate(fd, mode, offset, let) returns
EINVAL, if len is zero. It means that file_operations->fallocate() isn't
obligated to handle zero ranges too.

Signed-off-by: Andrei Vagin <avagin@openvz.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:28 -08:00
David Disseldorp ce512d79d0 target/iscsi: avoid NULL dereference in CHAP auth error path
If chap_server_compute_md5() fails early, e.g. via CHAP_N mismatch, then
crypto_free_shash() is called with a NULL pointer which gets
dereferenced in crypto_shash_tfm().

Fixes: 69110e3ced ("iscsi-target: Use shash and ahash")
Suggested-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: David Disseldorp <ddiss@suse.de>
Cc: stable@vger.kernel.org # 4.6+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:27 -08:00
Varun Prakash 093ec1430f cxgbit: call neigh_event_send() to update MAC address
If nud_state is not valid then call neigh_event_send() to update MAC
address.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:26 -08:00
Markus Elfring 09f99a3dfa target: tcm_loop: Use seq_puts() in tcm_loop_show_info()
The script "checkpatch.pl" pointed information out like the following.

WARNING: Prefer seq_puts to seq_printf

Thus fix the affected source code place.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:26 -08:00
Markus Elfring 2dfe3511ce target: tcm_loop: Delete an unnecessary return statement in tcm_loop_submission_work()
The script "checkpatch.pl" pointed information out like the following.

WARNING: void function return statements are not generally useful

Thus remove such a statement in the affected function.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:25 -08:00
Markus Elfring 7deeceb4c7 target: tcm_loop: Delete two unnecessary variable initialisations in tcm_loop_issue_tmr()
The variables "se_cmd" and "tl_cmd" will eventually be set to appropriate
pointers a bit later.
Thus omit the explicit initialisation at the beginning.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:24 -08:00
Markus Elfring c8d1f4b76b target: tcm_loop: Combine substrings for 26 messages
The script "checkpatch.pl" pointed information out like the following.

WARNING: quoted string split across lines

Thus fix the affected source code places.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:23 -08:00
Markus Elfring a572dba9e0 target: tcm_loop: Improve a size determination in two functions
Replace the specification of data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:23 -08:00
Markus Elfring cd3fb32a72 target: tcm_loop: Delete an error message for a failed memory allocation in four functions
Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:22 -08:00
Markus Elfring e0f3d4c23e sbp-target: Delete an error message for a failed memory allocation in three functions
Omit an extra message for a memory allocation failure in these functions.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Chris Boot <bootc@boo.tc>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:21 -08:00
Mike Christie 80eb876138 tcmu: allow max block and global max blocks to be settable
Users might have a physical system to a target so they could
have a lot more than 2 gigs of memory they want to devote to
tcmu. OTOH, we could be running in a vm and so a 2 gig
global and 1 gig per dev limit might be too high. This patch
allows the user to specify the limits.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:20 -08:00
Mike Christie 9103575ae3 tcmu: make ring buffer timer configurable
This adds a timer, qfull_time_out, that controls how long a
device will wait for ring buffer space to open before
failing the commands in the queue. It is useful to separate
this timer from the cmd_time_out and default 30 sec one,
because for HA setups cmd_time_out may be disbled and 30
seconds is too long to wait when some OSs like ESX will
timeout commands after as little as 8 - 15 seconds.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:19 -08:00
Mike Christie af1dd7ff46 tcmu: don't block submitting context for block waits
This patch has tcmu internally queue cmds if its ring buffer
is full. It also makes the TCMU_GLOBAL_MAX_BLOCKS limit a
hint instead of a hard limit, so we do not have to add any
new locks/atomics in the main IO path except when IO is not
running.

This fixes the following bugs:

1. We cannot sleep from the submitting context because it might be
called from a target recv context. This results in transport level
commands timing out. For example if the ring is full, we would
sleep, and a iscsi initiator would send a iscsi ping/nop which
times out because the target's recv thread is sleeping here.

2. Devices were not fairly scheduled to run when they hit the global
limit so they could time out waiting for ring space while others
got run.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:19 -08:00
Mike Christie f890f5799a tcmu: simplify dbi thresh handling
We do not really save a lot by trying to increase thresh
a multiple of the existing value. This just simplifies the
code by increasing it to whatever is needed for the command
being executed.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:18 -08:00
Mike Christie 6fd0ce7972 tcmu: prep queue_cmd_ring to be used by unmap wq
In the next patches we will call queue_cmd_ring from the submitting
context and also the completion path. This changes the queue_cmd_ring
return code so in the next patches we can return a sense_reason_t
and also signal if a command was requeued.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:17 -08:00
Xiubo Li 3e60913579 tcmu: clean up the scatter helper
Add some comments to make the scatter code to be more readable,
and drop unused arg to new_iov.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:16 -08:00
Mike Christie 3c0f26ff9d tcmu: fix free block calculation
The blocks_left calculation does not account for free blocks
between 0 and thresh, so we could be queueing/waiting when
there are enough blocks free.

This has us add in the blocks between 0 and thresh as well as
at the end from thresh to DATA_BLOCK_BITS.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:16 -08:00
Mike Christie 1a1fc0b88e tcmu: simplify scatter_data_area error handling
scatter_data_area always returns 0, so stop checking
for errors.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:15 -08:00
Mike Christie 810b8153c4 tcmu: release blocks for partially setup cmds
If we cannot setup a cmd because we run out of ring space
or global pages release the blocks before sleeping. This
prevents a deadlock where dev0 has waiting_blocks set and
needs N blocks, but dev1 to devX have each allocated N / X blocks
and also hit the global block limit so they went to sleep.

find_free_blocks is not able to take the sleeping dev's
blocks becaause their waiting_blocks is set and even
if it was not the block returned by find_last_bit could equal
dbi_max. The latter will probably never happen because
DATA_BLOCK_BITS is so high but in the next patches
DATA_BLOCK_BITS and TCMU_GLOBAL_MAX_BLOCKS will be settable so
it might be lower and could happen.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:14 -08:00
Mike Christie 6fddcb7754 tcmu: remove commands_lock
No need for the commands_lock. The cmdr_lock is already held during
idr addition and deletion, so just grab it during traversal.

Note: This also fixes a issue where we should have been using at
least _bh locking in tcmu_handle_completions when taking the commands
lock to prevent the case where tcmu_handle_completions could be
interrupted by a timer softirq while the commands_lock is held.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:13 -08:00
Mike Christie 488ebe4c35 tcmu: move expired command completion to unmap thread
This moves the expired command completion handling to
the unmap wq, so the next patch can use a mutex
in tcmu_check_expired_cmd.

Note:
tcmu_device_timedout's use of spin_lock_irq was not needed.
The commands_lock is used between thread context (tcmu_queue_cmd_ring
and tcmu_irqcontrol (even though this is named irqcontrol it is not
run in irq context)) and timer/bh context. In the timer/bh context
bhs are disabled, so you need to use the _bh lock calls from the
thread context callers.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:13 -08:00
Mike Christie 9972cebb59 tcmu: fix unmap thread race
If the unmap thread has already run find_free_blocks
but not yet run prepare_to_wait when a wake_up(&unmap_wait)
call is done, the unmap thread is going to miss the wake
call. Instead of adding checks for if new waiters were added
this just has us use a work queue which will run us again
in this type of case.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:12 -08:00
Mike Christie 89ec9cfd3b tcmu: split unmap_thread_fn
Separate unmap_thread_fn to make it easier to read.

Note: this patch does not fix the bug where we might
miss a wake up call. The next patch will fix that.
This patch only separates the code into functions.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:11 -08:00
Mike Christie bf99ec1332 tcmu: merge common block release code
Have unmap_thread_fn use tcmu_blocks_release.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:10 -08:00
tangwenji 26d2b3106f tcmu: fix page addr in tcmu_flush_dcache_range
The page addr should be update.

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:10 -08:00
Rasmus Villemoes 094bb5d766 target-core: don't use "const char*" for a buffer that is written to
iscsi_parse_pr_out_transport_id launders the const away via a call to
strstr(), and then modifies the buffer (writing a nul byte) through
the return value. It's cleaner to be honest and simply declare the
parameter as "char*", fixing up the call chain, and allowing us to
drop the cast in the return statement.

Amusingly, the two current callers found it necessary to cast a
non-const pointer to a const.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2018-01-12 15:07:09 -08:00
Bart Van Assche 14db491726 target: Use sgl_alloc_order() and sgl_free()
Use the sgl_alloc_order() and sgl_free() functions instead of open
coding these functions.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-01-06 09:18:00 -07:00
Jens Axboe 0abc2a1038 block: fix blk_rq_append_bio
Commit caa4b02476e3(blk-map: call blk_queue_bounce from blk_rq_append_bio)
moves blk_queue_bounce() into blk_rq_append_bio(), but don't consider
the fact that the bounced bio becomes invisible to caller since the
parameter type is 'struct bio *'. Make it a pointer to a pointer to
a bio, so the caller sees the right bio also after a bounce.

Fixes: caa4b02476 ("blk-map: call blk_queue_bounce from blk_rq_append_bio")
Cc: Christoph Hellwig <hch@lst.de>
Reported-by: Michele Ballabio <barra_cuda@katamail.com>
(handling failure of blk_rq_append_bio(), only call bio_get() after
blk_rq_append_bio() returns OK)
Tested-by: Michele Ballabio <barra_cuda@katamail.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-12-18 13:55:43 -07:00
Linus Torvalds 844056fd74 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:

 - The final conversion of timer wheel timers to timer_setup().

   A few manual conversions and a large coccinelle assisted sweep and
   the removal of the old initialization mechanisms and the related
   code.

 - Remove the now unused VSYSCALL update code

 - Fix permissions of /proc/timer_list. I still need to get rid of that
   file completely

 - Rename a misnomed clocksource function and remove a stale declaration

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
  m68k/macboing: Fix missed timer callback assignment
  treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts
  timer: Remove redundant __setup_timer*() macros
  timer: Pass function down to initialization routines
  timer: Remove unused data arguments from macros
  timer: Switch callback prototype to take struct timer_list * argument
  timer: Pass timer_list pointer to callbacks unconditionally
  Coccinelle: Remove setup_timer.cocci
  timer: Remove setup_*timer() interface
  timer: Remove init_timer() interface
  treewide: setup_timer() -> timer_setup() (2 field)
  treewide: setup_timer() -> timer_setup()
  treewide: init_timer() -> setup_timer()
  treewide: Switch DEFINE_TIMER callbacks to struct timer_list *
  s390: cmm: Convert timers to use timer_setup()
  lightnvm: Convert timers to use timer_setup()
  drivers/net: cris: Convert timers to use timer_setup()
  drm/vc4: Convert timers to use timer_setup()
  block/laptop_mode: Convert timers to use timer_setup()
  net/atm/mpc: Avoid open-coded assignment of timer callback function
  ...
2017-11-25 08:37:16 -10:00
Linus Torvalds eda5d47134 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:

 "This series is predominantly bug-fixes, with a few small improvements
  that have been outstanding over the last release cycle.

  As usual, the associated bug-fixes have CC' tags for stable.

  Also, things have been particularly quiet wrt new developments the
  last months, with most folks continuing to focus on stability atop 4.x
  stable kernels for their respective production configurations.

  Also at this point, the stable trees have been synced up with
  mainline. This will continue to be a priority, as production users
  tend to run exclusively atop stable kernels, a few releases behind
  mainline.

  The highlights include:

   - Fix PR PREEMPT_AND_ABORT null pointer dereference regression in
     v4.11+ (tangwenji)

   - Fix OOPs during removing TCMU device (Xiubo Li + Zhang Zhuoyu)

   - Add netlink command reply supported option for each device (Kenjiro
     Nakayama)

   - cxgbit: Abort the TCP connection in case of data out timeout (Varun
     Prakash)

   - Fix PR/ALUA file path truncation (David Disseldorp)

   - Fix double se_cmd completion during ->cmd_time_out (Mike Christie)

   - Fix QUEUE_FULL + SCSI task attribute handling in 4.1+ (Bryant Ly +
     nab)

   - Fix quiese during transport_write_pending_qf endless loop (nab)

   - Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK in 3.14+
     (Don White + nab)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (35 commits)
  tcmu: Add a missing unlock on an error path
  tcmu: Fix some memory corruption
  iscsi-target: Fix non-immediate TMR reference leak
  iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
  target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
  target: Fix quiese during transport_write_pending_qf endless loop
  target: Fix caw_sem leak in transport_generic_request_failure
  target: Fix QUEUE_FULL + SCSI task attribute handling
  iSCSI-target: Use common error handling code in iscsi_decode_text_input()
  target/iscsi: Detect conn_cmd_list corruption early
  target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
  target/iscsi: Modify iscsit_do_crypto_hash_buf() prototype
  target/iscsi: Fix endianness in an error message
  target/iscsi: Use min() in iscsit_dump_data_payload() instead of open-coding it
  target/iscsi: Define OFFLOAD_BUF_SIZE once
  target: Inline transport_put_cmd()
  target: Suppress gcc 7 fallthrough warnings
  target: Move a declaration of a global variable into a header file
  tcmu: fix double se_cmd completion
  target: return SAM_STAT_TASK_SET_FULL for TCM_OUT_OF_RESOURCES
  ...
2017-11-24 19:19:20 -10:00
Kees Cook e99e88a9d2 treewide: setup_timer() -> timer_setup()
This converts all remaining cases of the old setup_timer() API into using
timer_setup(), where the callback argument is the structure already
holding the struct timer_list. These should have no behavioral changes,
since they just change which pointer is passed into the callback with
the same available pointers after conversion. It handles the following
examples, in addition to some other variations.

Casting from unsigned long:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, ptr);

and forced object casts:

    void my_callback(struct something *ptr)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr);

become:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

Direct function assignments:

    void my_callback(unsigned long data)
    {
        struct something *ptr = (struct something *)data;
    ...
    }
    ...
    ptr->my_timer.function = my_callback;

have a temporary cast added, along with converting the args:

    void my_callback(struct timer_list *t)
    {
        struct something *ptr = from_timer(ptr, t, my_timer);
    ...
    }
    ...
    ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback;

And finally, callbacks without a data assignment:

    void my_callback(unsigned long data)
    {
    ...
    }
    ...
    setup_timer(&ptr->my_timer, my_callback, 0);

have their argument renamed to verify they're unused during conversion:

    void my_callback(struct timer_list *unused)
    {
    ...
    }
    ...
    timer_setup(&ptr->my_timer, my_callback, 0);

The conversion is done with the following Coccinelle script:

spatch --very-quiet --all-includes --include-headers \
	-I ./arch/x86/include -I ./arch/x86/include/generated \
	-I ./include -I ./arch/x86/include/uapi \
	-I ./arch/x86/include/generated/uapi -I ./include/uapi \
	-I ./include/generated/uapi --include ./include/linux/kconfig.h \
	--dir . \
	--cocci-file ~/src/data/timer_setup.cocci

@fix_address_of@
expression e;
@@

 setup_timer(
-&(e)
+&e
 , ...)

// Update any raw setup_timer() usages that have a NULL callback, but
// would otherwise match change_timer_function_usage, since the latter
// will update all function assignments done in the face of a NULL
// function initialization in setup_timer().
@change_timer_function_usage_NULL@
expression _E;
identifier _timer;
type _cast_data;
@@

(
-setup_timer(&_E->_timer, NULL, _E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E->_timer, NULL, (_cast_data)_E);
+timer_setup(&_E->_timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, &_E);
+timer_setup(&_E._timer, NULL, 0);
|
-setup_timer(&_E._timer, NULL, (_cast_data)&_E);
+timer_setup(&_E._timer, NULL, 0);
)

@change_timer_function_usage@
expression _E;
identifier _timer;
struct timer_list _stl;
identifier _callback;
type _cast_func, _cast_data;
@@

(
-setup_timer(&_E->_timer, _callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, &_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, _E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, &_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E);
+timer_setup(&_E._timer, _callback, 0);
|
 _E->_timer@_stl.function = _callback;
|
 _E->_timer@_stl.function = &_callback;
|
 _E->_timer@_stl.function = (_cast_func)_callback;
|
 _E->_timer@_stl.function = (_cast_func)&_callback;
|
 _E._timer@_stl.function = _callback;
|
 _E._timer@_stl.function = &_callback;
|
 _E._timer@_stl.function = (_cast_func)_callback;
|
 _E._timer@_stl.function = (_cast_func)&_callback;
)

// callback(unsigned long arg)
@change_callback_handle_cast
 depends on change_timer_function_usage@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
identifier _handle;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
(
	... when != _origarg
	_handletype *_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(_handletype *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
|
	... when != _origarg
	_handletype *_handle;
	... when != _handle
	_handle =
-(void *)_origarg;
+from_timer(_handle, t, _timer);
	... when != _origarg
)
 }

// callback(unsigned long arg) without existing variable
@change_callback_handle_cast_no_arg
 depends on change_timer_function_usage &&
                     !change_callback_handle_cast@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _origtype;
identifier _origarg;
type _handletype;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *t
 )
 {
+	_handletype *_origarg = from_timer(_origarg, t, _timer);
+
	... when != _origarg
-	(_handletype *)_origarg
+	_origarg
	... when != _origarg
 }

// Avoid already converted callbacks.
@match_callback_converted
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
	    !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier t;
@@

 void _callback(struct timer_list *t)
 { ... }

// callback(struct something *handle)
@change_callback_handle_arg
 depends on change_timer_function_usage &&
	    !match_callback_converted &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
@@

 void _callback(
-_handletype *_handle
+struct timer_list *t
 )
 {
+	_handletype *_handle = from_timer(_handle, t, _timer);
	...
 }

// If change_callback_handle_arg ran on an empty function, remove
// the added handler.
@unchange_callback_handle_arg
 depends on change_timer_function_usage &&
	    change_callback_handle_arg@
identifier change_timer_function_usage._callback;
identifier change_timer_function_usage._timer;
type _handletype;
identifier _handle;
identifier t;
@@

 void _callback(struct timer_list *t)
 {
-	_handletype *_handle = from_timer(_handle, t, _timer);
 }

// We only want to refactor the setup_timer() data argument if we've found
// the matching callback. This undoes changes in change_timer_function_usage.
@unchange_timer_function_usage
 depends on change_timer_function_usage &&
            !change_callback_handle_cast &&
            !change_callback_handle_cast_no_arg &&
	    !change_callback_handle_arg@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type change_timer_function_usage._cast_data;
@@

(
-timer_setup(&_E->_timer, _callback, 0);
+setup_timer(&_E->_timer, _callback, (_cast_data)_E);
|
-timer_setup(&_E._timer, _callback, 0);
+setup_timer(&_E._timer, _callback, (_cast_data)&_E);
)

// If we fixed a callback from a .function assignment, fix the
// assignment cast now.
@change_timer_function_assignment
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression change_timer_function_usage._E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_func;
typedef TIMER_FUNC_TYPE;
@@

(
 _E->_timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E->_timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-&_callback;
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)_callback
+(TIMER_FUNC_TYPE)_callback
 ;
|
 _E._timer.function =
-(_cast_func)&_callback
+(TIMER_FUNC_TYPE)_callback
 ;
)

// Sometimes timer functions are called directly. Replace matched args.
@change_timer_function_calls
 depends on change_timer_function_usage &&
            (change_callback_handle_cast ||
             change_callback_handle_cast_no_arg ||
             change_callback_handle_arg)@
expression _E;
identifier change_timer_function_usage._timer;
identifier change_timer_function_usage._callback;
type _cast_data;
@@

 _callback(
(
-(_cast_data)_E
+&_E->_timer
|
-(_cast_data)&_E
+&_E._timer
|
-_E
+&_E->_timer
)
 )

// If a timer has been configured without a data argument, it can be
// converted without regard to the callback argument, since it is unused.
@match_timer_function_unused_data@
expression _E;
identifier _timer;
identifier _callback;
@@

(
-setup_timer(&_E->_timer, _callback, 0);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0L);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E->_timer, _callback, 0UL);
+timer_setup(&_E->_timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0L);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_E._timer, _callback, 0UL);
+timer_setup(&_E._timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0L);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(&_timer, _callback, 0UL);
+timer_setup(&_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0L);
+timer_setup(_timer, _callback, 0);
|
-setup_timer(_timer, _callback, 0UL);
+timer_setup(_timer, _callback, 0);
)

@change_callback_unused_data
 depends on match_timer_function_unused_data@
identifier match_timer_function_unused_data._callback;
type _origtype;
identifier _origarg;
@@

 void _callback(
-_origtype _origarg
+struct timer_list *unused
 )
 {
	... when != _origarg
 }

Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-21 15:57:07 -08:00
Linus Torvalds abc36be236 A couple of configfs cleanups:
- proper use of the bool type (Thomas Meyer)
   - constification of struct config_item_type (Bhumika Goyal)
 -----BEGIN PGP SIGNATURE-----
 
 iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAloLSTALHGhjaEBsc3Qu
 ZGUACgkQD55TZVIEUYNxfhAAv3cunxiEPEAvs+1xuGd3cZYaxz7qinvIODPxIKoF
 kRWiuy5PUklRMnJ8seOgJ1p1QokX6Sk4cZ8HcctDJVByqODjOq4K5eaKVN1ZqJoz
 BUzO/gOqfs64r9yaFIlKfe8nFA+gpUftSeWyv3lThxAIJ1iSbue7OZ/A10tTOS1m
 RWp9FPepFv+nJMfWqeQU64BsoDQ4kgZ2NcEA+jFxNx5dlmIbLD49tk0lfddvZQXr
 j5WyAH73iugilLtNUGVOqSzHBY4kUvfCKUV7leirCegyMoGhFtA87m6Wzwbo6ZUI
 DwQLzWvuPaGv1P2PpNEHfKiNbfIEp75DRyyyf87DD3lc5ffAxQSm28mGuwcr7Rn5
 Ow/yWL6ERMzCLExoCzEkXYJISy7T5LIzYDgNggKMpeWxysAduF7Onx7KfW1bTuhK
 mHvY7iOXCjEvaIVaF8uMKE6zvuY1vCMRXaJ+kC9jcIE3gwhg+2hmQvrdJ2uAFXY+
 rkeF2Poj/JlblPU4IKWAjiPUbzB7Lv0gkypCB2pD4riaYIN5qCAgF8ULIGQp2hsO
 lYW1EEgp5FBop85oSO/HAGWeH9dFg0WaV7WqNRVv0AGXhKjgy+bVd7iYPpvs7mGw
 z9IqSQDORcG2ETLcFhZgiJpCk/itwqXBD+wgMOjJPP8lL+4kZ8FcuhtY9kc9WlJE
 Tew=
 =+tMO
 -----END PGP SIGNATURE-----

Merge tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs

Pull configfs updates from Christoph Hellwig:
 "A couple of configfs cleanups:

   - proper use of the bool type (Thomas Meyer)

   - constification of struct config_item_type (Bhumika Goyal)"

* tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs:
  RDMA/cma: make config_item_type const
  stm class: make config_item_type const
  ACPI: configfs: make config_item_type const
  nvmet: make config_item_type const
  usb: gadget: configfs: make config_item_type const
  PCI: endpoint: make config_item_type const
  iio: make function argument and some structures const
  usb: gadget: make config_item_type structures const
  dlm: make config_item_type const
  netconsole: make config_item_type const
  nullb: make config_item_type const
  ocfs2/cluster: make config_item_type const
  target: make config_item_type const
  configfs: make ci_type field, some pointers and function arguments const
  configfs: make config_item_type const
  configfs: Fix bool initialization/comparison
2017-11-14 14:44:04 -08:00
Linus Torvalds 2bcc673101 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "Yet another big pile of changes:

   - More year 2038 work from Arnd slowly reaching the point where we
     need to think about the syscalls themself.

   - A new timer function which allows to conditionally (re)arm a timer
     only when it's either not running or the new expiry time is sooner
     than the armed expiry time. This allows to use a single timer for
     multiple timeout requirements w/o caring about the first expiry
     time at the call site.

   - A new NMI safe accessor to clock real time for the printk timestamp
     work. Can be used by tracing, perf as well if required.

   - A large number of timer setup conversions from Kees which got
     collected here because either maintainers requested so or they
     simply got ignored. As Kees pointed out already there are a few
     trivial merge conflicts and some redundant commits which was
     unavoidable due to the size of this conversion effort.

   - Avoid a redundant iteration in the timer wheel softirq processing.

   - Provide a mechanism to treat RTC implementations depending on their
     hardware properties, i.e. don't inflict the write at the 0.5
     seconds boundary which originates from the PC CMOS RTC to all RTCs.
     No functional change as drivers need to be updated separately.

   - The usual small updates to core code clocksource drivers. Nothing
     really exciting"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits)
  timers: Add a function to start/reduce a timer
  pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday()
  timer: Prepare to change all DEFINE_TIMER() callbacks
  netfilter: ipvs: Convert timers to use timer_setup()
  scsi: qla2xxx: Convert timers to use timer_setup()
  block/aoe: discover_timer: Convert timers to use timer_setup()
  ide: Convert timers to use timer_setup()
  drbd: Convert timers to use timer_setup()
  mailbox: Convert timers to use timer_setup()
  crypto: Convert timers to use timer_setup()
  drivers/pcmcia: omap1: Fix error in automated timer conversion
  ARM: footbridge: Fix typo in timer conversion
  drivers/sgi-xp: Convert timers to use timer_setup()
  drivers/pcmcia: Convert timers to use timer_setup()
  drivers/memstick: Convert timers to use timer_setup()
  drivers/macintosh: Convert timers to use timer_setup()
  hwrng/xgene-rng: Convert timers to use timer_setup()
  auxdisplay: Convert timers to use timer_setup()
  sparc/led: Convert timers to use timer_setup()
  mips: ip22/32: Convert timers to use timer_setup()
  ...
2017-11-13 17:56:58 -08:00
Dan Carpenter 97488c7319 tcmu: Add a missing unlock on an error path
We added a new error path here but we forgot to drop the lock first
before returning.

Fixes: 0d44374c1a ("tcmu: fix double se_cmd completion")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-08 01:42:35 -08:00
Dan Carpenter 16b9327704 tcmu: Fix some memory corruption
"udev->nl_reply_supported" is an int but on 64 bit arches we are writing
8 bytes of data to it so it corrupts four bytes beyond the end of the
struct.

Fixes: b849b45675 ("target: Add netlink command reply supported option for each device")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-08 01:42:31 -08:00
Nicholas Bellinger 3fc9fb13a4 iscsi-target: Fix non-immediate TMR reference leak
This patch fixes a se_cmd->cmd_kref reference leak that can
occur when a non immediate TMR is proceeded our of command
sequence number order, and CMDSN_LOWER_THAN_EXP is returned
by iscsit_sequence_cmd().

To address this bug, call target_put_sess_cmd() during this
special case following what iscsit_process_scsi_cmd() does
upon CMDSN_LOWER_THAN_EXP.

Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-07 19:51:24 -08:00
Nicholas Bellinger ae072726f6 iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
Since commit 59b6986dbf fixed a potential NULL pointer dereference
by allocating a se_tmr_req for ISCSI_TM_FUNC_TASK_REASSIGN, the
se_tmr_req is currently leaked by iscsit_free_cmd() because no
iscsi_cmd->se_cmd.se_tfo was associated.

To address this, treat ISCSI_TM_FUNC_TASK_REASSIGN like any other
TMR and call transport_init_se_cmd() + target_get_sess_cmd() to
setup iscsi_cmd->se_cmd.se_tfo with se_cmd->cmd_kref of 2.

This will ensure normal release operation once se_cmd->cmd_kref
reaches zero and target_release_cmd_kref() is invoked, se_tmr_req
will be released via existing target_free_cmd_mem() and
core_tmr_release_req() code.

Reported-by: Donald White <dew@datera.io>
Cc: Donald White <dew@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-07 19:51:23 -08:00
Nicholas Bellinger 1c21a48055 target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
This patch fixes bug where early se_cmd exceptions that occur
before backend execution can result in use-after-free if/when
a subsequent ABORT_TASK occurs for the same tag.

Since an early se_cmd exception will have had se_cmd added to
se_session->sess_cmd_list via target_get_sess_cmd(), it will
not have CMD_T_COMPLETE set by the usual target_complete_cmd()
backend completion path.

This causes a subsequent ABORT_TASK + __target_check_io_state()
to signal ABORT_TASK should proceed.  As core_tmr_abort_task()
executes, it will bring the outstanding se_cmd->cmd_kref count
down to zero releasing se_cmd, after se_cmd has already been
queued with error status into fabric driver response path code.

To address this bug, introduce a CMD_T_PRE_EXECUTE bit that is
set at target_get_sess_cmd() time, and cleared immediately before
backend driver dispatch in target_execute_cmd() once CMD_T_ACTIVE
is set.

Then, check CMD_T_PRE_EXECUTE within __target_check_io_state() to
determine when an early exception has occured, and avoid aborting
this se_cmd since it will have already been queued into fabric
driver response path code.

Reported-by: Donald White <dew@datera.io>
Cc: Donald White <dew@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-07 19:50:24 -08:00
Nicholas Bellinger 9574a497df target: Fix quiese during transport_write_pending_qf endless loop
This patch fixes a potential end-less loop during QUEUE_FULL,
where cmd->se_tfo->write_pending() callback fails repeatedly
but __transport_wait_for_tasks() has already been invoked to
quiese the outstanding se_cmd descriptor.

To address this bug, this patch adds a CMD_T_STOP|CMD_T_ABORTED
check within transport_write_pending_qf() and invokes the
existing se_cmd->t_transport_stop_comp to signal quiese
completion back to __transport_wait_for_tasks().

Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-07 19:45:28 -08:00
Nicholas Bellinger fd2f928b0d target: Fix caw_sem leak in transport_generic_request_failure
With the recent addition of transport_check_aborted_status() within
transport_generic_request_failure() to avoid sending a SCSI status
exception after CMD_T_ABORTED w/ TAS=1 has occured, it introduced
a COMPARE_AND_WRITE early failure regression.

Namely when COMPARE_AND_WRITE fails and se_device->caw_sem has
been taken by sbc_compare_and_write(), if the new check for
transport_check_aborted_status() returns true and exits,
cmd->transport_complete_callback() -> compare_and_write_post()
is skipped never releasing se_device->caw_sem.

This regression was originally introduced by:

  commit e3b88ee95b
  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Tue Feb 14 16:25:45 2017 -0800

      target: Fix handling of aborted failed commands

To address this bug, move the transport_check_aborted_status()
call after transport_complete_task_attr() and
cmd->transport_complete_callback().

Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-07 19:43:55 -08:00
Nicholas Bellinger 1c79df1f34 target: Fix QUEUE_FULL + SCSI task attribute handling
This patch fixes a bug during QUEUE_FULL where transport_complete_qf()
calls transport_complete_task_attr() after it's already been invoked
by target_complete_ok_work() or transport_generic_request_failure()
during initial completion, preceeding QUEUE_FULL.

This will result in se_device->simple_cmds, se_device->dev_cur_ordered_id
and/or se_device->dev_ordered_sync being updated multiple times for
a single se_cmd.

To address this bug, clear SCF_TASK_ATTR_SET after the first call
to transport_complete_task_attr(), and avoid updating SCSI task
attribute related counters for any subsequent calls.

Also, when a se_cmd is deferred due to ordered tags and executed
via target_restart_delayed_cmds(), set CMD_T_SENT before execution
matching what target_execute_cmd() does.

Cc: Michael Cyr <mikecyr@linux.vnet.ibm.com>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-07 19:41:37 -08:00
Ingo Molnar 8c5db92a70 Merge branch 'linus' into locking/core, to resolve conflicts
Conflicts:
	include/linux/compiler-clang.h
	include/linux/compiler-gcc.h
	include/linux/compiler-intel.h
	include/uapi/linux/stddef.h

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-11-07 10:32:44 +01:00
Markus Elfring 17c45b9006 iSCSI-target: Use common error handling code in iscsi_decode_text_input()
Add a jump target so that a bit of exception handling can be better reused
at the end of this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:17:49 -07:00
Bart Van Assche 6eaf69e4ec target/iscsi: Detect conn_cmd_list corruption early
Certain behavior of the initiator can cause the target driver to
send both a reject and a SCSI response. If that happens two
target_put_sess_cmd() calls will occur without the command having
been removed from conn_cmd_list. In other words, conn_cmd_list
will get corrupted once the freed memory is reused. Although the
Linux kernel can detect list corruption if list debugging is
enabled, in this case the context in which list corruption is
detected is not related to the context that caused list corruption.
Hence add WARN_ON() statements that report the context that is
causing list corruption.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:16:10 -07:00
Bart Van Assche cfe2b621bb target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
Avoid that cmd->se_cmd.se_tfo is read after a command has already been
freed.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:16:06 -07:00
Bart Van Assche e1dfb21f00 target/iscsi: Modify iscsit_do_crypto_hash_buf() prototype
Change the type of the last two arguments from u8 * into const void *
and void * respectively such that the u8 * casts can be left out
from the callers. This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:16:01 -07:00
Bart Van Assche de3493aea6 target/iscsi: Fix endianness in an error message
Since hdr->offset is a big endian number, convert it to CPU endian
before printing it.

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:15:56 -07:00
Bart Van Assche 919765e968 target/iscsi: Use min() in iscsit_dump_data_payload() instead of open-coding it
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:15:50 -07:00
Bart Van Assche 8d973ab5d4 target/iscsi: Define OFFLOAD_BUF_SIZE once
The constant OFFLOAD_BUF_SIZE is defined twice - once in
iscsi_target_seq_pdu_list.c and once in iscsi_target_erl1.c. Since
that constant is not used in the former source file, remove its
definition from that source file.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:15:46 -07:00
Bart Van Assche c017069823 target: Inline transport_put_cmd()
Since all transput_put_cmd() does is to call target_put_sess_cmd(),
inline transport_put_cmd() into its callers. Leave out the BUG_ON()
statement because if cmd->se_tfo == NULL then cmd->cmd_kref is 0
and kref_put() will complain anyway. Notes:
- transport_init_se_cmd() initializes both .se_tfo and .cmd_kref.
- The only target driver that does not call transport_init_se_cmd()
  for all commands is the iSCSI target driver. See also
  iscsi_target_rx_opcode().

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:15:41 -07:00
Bart Van Assche d7e595ddd5 target: Suppress gcc 7 fallthrough warnings
Avoid that gcc 7 reports the following warning when building with W=1:

warning: this statement may fall through [-Wimplicit-fallthrough=]

Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Varun Prakash <varun@chelsio.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:15:35 -07:00
Bart Van Assche c48e5594d0 target: Move a declaration of a global variable into a header file
This patch avoids that sparse reports the following warning:

drivers/target/target_core_configfs.c:2267:33: warning: symbol 'target_core_dev_item_ops' was not declared. Should it be static?

Fixes: c17cd24959 ("target/configfs: Kill se_device->dev_link_magic")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:15:30 -07:00
Mike Christie 0d44374c1a tcmu: fix double se_cmd completion
If cmd_time_out != 0, then tcmu_queue_cmd_ring could end up
sleeping waiting for ring space, timing out and then returning
failure to lio, and tcmu_check_expired_cmd could also detect
the timeout and call target_complete_cmd on the cmd.

This patch just delays setting up the deadline value and adding
the cmd to the udev->commands idr until we have allocated ring
space and are about to send the cmd to userspace.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:01:55 -07:00
Mike Christie a271eac46a target: return SAM_STAT_TASK_SET_FULL for TCM_OUT_OF_RESOURCES
TCM_OUT_OF_RESOURCES is getting translated to
TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE which seems like a heavy
error when we just cannot allocate a resource that may be
allocatable later. This has us translate TCM_OUT_OF_RESOURCES
to SAM_STAT_TASK_SET_FULL instead.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:01:49 -07:00
David Disseldorp 55435badda target: fix ALUA state file path truncation
A sufficiently long Unit Serial string, dbroot path, and/or ALUA target
portal group name may result in truncation of the ALUA state file path
prior to usage. Fix this by using kasprintf() instead.

Fixes: fdddf93226 ("target: use new "dbroot" target attribute")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:00:30 -07:00
David Disseldorp bdc79f0ed1 target: fix PR state file path truncation
If an LIO backstore is configured with a sufficiently long Unit Serial
string, alongside a similarly long dbroot path, then a truncated
Persistent Reservation APTPL state file path will be used. This
truncation can unintentionally lead to two LUs with differing serial
numbers sharing PR state file.

Fixes: fdddf93226 ("target: use new "dbroot" target attribute")
Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 15:00:17 -07:00
Varun Prakash 1ae01724ae cxgbit: Abort the TCP connection in case of data out timeout
If DDP is programmed for a WRITE cmd and data out
timer gets expired then abort the TCP connection
before freeing the cmd to avoid any possibility of
DDP after freeing the cmd.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:54:40 -07:00
Kenjiro Nakayama b849b45675 target: Add netlink command reply supported option for each device
Currently netlink command reply support option
(TCMU_ATTR_SUPP_KERN_CMD_REPLY) can be enabled only on module
scope. Because of that, once an application enables the netlink
command reply support, all applications using target_core_user.ko
would be expected to support the netlink reply. To make matters worse,
users will not be able to add a device via configfs manually.

To fix these issues, this patch adds an option to make netlink command
reply disabled on each device through configfs. Original
TCMU_ATTR_SUPP_KERN_CMD_REPLY is still enabled on module scope to keep
backward-compatibility and used by default, however once users set
nl_reply_supported=<NAGATIVE_VALUE> via configfs for a particular
device, the device disables the netlink command reply support.

Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:54:07 -07:00
Kenjiro Nakayama b5ab697c62 target/tcmu: Use macro to call container_of in tcmu_cmd_time_out_show
This patch makes a tiny change that using TCMU_DEV in
tcmu_cmd_time_out_show so it is consistent with other functions.

Signed-off-by: Kenjiro Nakayama <nakayamakenjiro@gmail.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:52:39 -07:00
Xiubo Li c22adc0b0c tcmu: fix crash when removing the tcmu device
Before the nl REMOVE msg has been sent to the userspace, the ring's
and other resources have been released, but the userspace maybe still
using them. And then we can see the crash messages like:

ring broken, not handling completions
BUG: unable to handle kernel paging request at ffffffffffffffd0
IP: tcmu_handle_completions+0x134/0x2f0 [target_core_user]
PGD 11bdc0c067
P4D 11bdc0c067
PUD 11bdc0e067
PMD 0

Oops: 0000 [#1] SMP
cmd_id not found, ring is broken
RIP: 0010:tcmu_handle_completions+0x134/0x2f0 [target_core_user]
RSP: 0018:ffffb8a2d8983d88 EFLAGS: 00010296
RAX: 0000000000000000 RBX: ffffb8a2aaa4e000 RCX: 00000000ffffffff
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000220
R10: 0000000076c71401 R11: ffff8d2e76c713f0 R12: ffffb8a2aad56bc0
R13: 000000000000001c R14: ffff8d2e32c90000 R15: ffff8d2e76c713f0
FS:  00007f411ffff700(0000) GS:ffff8d1e7fdc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd0 CR3: 0000001027070000 CR4:
00000000001406e0
Call Trace:
? tcmu_irqcontrol+0x2a/0x40 [target_core_user]
? uio_write+0x7b/0xc0 [uio]
? __vfs_write+0x37/0x150
? __getnstimeofday64+0x3b/0xd0
? vfs_write+0xb2/0x1b0
? syscall_trace_enter+0x1d0/0x2b0
? SyS_write+0x55/0xc0
? do_syscall_64+0x67/0x150
? entry_SYSCALL64_slow_path+0x25/0x25
Code: 41 5d 41 5e 41 5f 5d c3 83 f8 01 0f 85 cf 01 00
00 48 8b 7d d0 e8 dd 5c 1d f3 41 0f b7 74 24 04 48 8b
7d c8 31 d2 e8 5c c7 1b f3 <48> 8b 7d d0 49 89 c7 c6 07
00 0f 1f 40 00 4d 85 ff 0f 84 82 01  RIP:
tcmu_handle_completions+0x134/0x2f0 [target_core_user]
RSP: ffffb8a2d8983d88
CR2: ffffffffffffffd0

And the crash also could happen in tcmu_page_fault and other places.

Signed-off-by: Zhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:48:21 -07:00
tangwenji a0884d489e iscsi-target: fix memory leak in iscsit_release_discovery_tpg()
Need to release the param_list for tpg in iscsi_release_discovery_tpg
function, this is also required before the iscsit_load_discovery_tpg
function exits abnormally.

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:47:04 -07:00
tangwenji 12d5a43b2d iscsi-target: fix memory leak in lio_target_tiqn_addtpg()
tpg must free when call core_tpg_register() return fail

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:45:25 -07:00
tangwenji 24528f089d target:fix condition return in core_pr_dump_initiator_port()
When is pr_reg->isid_present_at_reg is false,this function should return.

This fixes a regression originally introduced by:

  commit d2843c173e
  Author: Andy Grover <agrover@redhat.com>
  Date:   Thu May 16 10:40:55 2013 -0700

      target: Alter core_pr_dump_initiator_port for ease of use

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:45:24 -07:00
tangwenji a2db857bf9 target: fix match_token option in target_core_configfs.c
The match_token function does not recognize the option 'l',
so that both the mapped_lun and target_lun parameters
can not be resolved correctly.

And parsed u64-type parameters should use match_u64().

(Use %u instead of %s for Opt_mapped_lun + Opt_target_lun - nab)

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:45:23 -07:00
tangwenji 79dd6f2fd1 target: add sense code INSUFFICIENT REGISTRATION RESOURCES
If a PERSISTENT RESERVE OUT command with a REGISTER service action or a
REGISTER AND IGNORE EXISTING KEY service action or REGISTER AND MOVE
service action is attempted, but there are insufficient device server
resources to complete the operation, then the command shall be terminated
with CHECK CONDITION status, with the sense key set to ILLEGAL REQUEST,and
the additonal sense code set to INSUFFICIENT REGISTRATION RESOURCES.

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:45:23 -07:00
tangwenji e437fa3e5d target: fix double unmap data sg in core_scsi3_emulate_pro_register_and_move()
Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:45:22 -07:00
tangwenji c58a252beb target: fix buffer offset in core_scsi3_pri_read_full_status
When at least two initiators register pr on the same LUN,
the target returns the exception data due to buffer offset
error, therefore the initiator executes command 'sg_persist -s'
may cause the initiator to appear segfault error.

This fixes a regression originally introduced by:

  commit a85d667e58
  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Tue May 23 16:48:27 2017 -0700

      target: Use {get,put}_unaligned_be*() instead of open coding these functions

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Cc: stable@vger.kernel.org # 4.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:45:21 -07:00
tangwenji 88fb2fa7db target: fix null pointer regression in core_tmr_drain_tmr_list
The target system kernel crash when the initiator executes
the sg_persist -A command,because of the second argument to
be set to NULL when core_tmr_lun_reset is called in
core_scsi3_pro_preempt function.

This fixes a regression originally introduced by:

  commit 51ec502a32
  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Tue Feb 14 16:25:54 2017 -0800

      target: Delete tmr from list before processing

Signed-off-by: tangwenji <tang.wenji@zte.com.cn>
Cc: stable@vger.kernel.org # 4.11+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:45:20 -07:00
Jiang Yi 594e25e734 target/file: Do not return error for UNMAP if length is zero
The function fd_execute_unmap() in target_core_file.c calles

ret = file->f_op->fallocate(file, mode, pos, len);

Some filesystems implement fallocate() to return error if
length is zero (e.g. btrfs) but according to SCSI Block
Commands spec UNMAP should return success for zero length.

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-11-04 14:45:16 -07:00
Greg Kroah-Hartman b24413180f License cleanup: add SPDX GPL-2.0 license identifier to files with no license
Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.

Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 11:10:55 +01:00
Kees Cook f7c9564a7c target/iscsi: Convert timers to use timer_setup()
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Includes a fix for correcting an
on-stack timer usage.

Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Jiang Yi <jiangyilism@gmail.com>
Cc: Varun Prakash <varun@chelsio.com>
Cc: linux-scsi@vger.kernel.org
Cc: target-devel@vger.kernel.org
Reviewed-and-Tested-by: Bart Van Assche <Bart.VanAssche@wdc.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-01 11:44:50 -07:00
Bart Van Assche 8a47aa9dc6 target/iscsi: Simplify timer manipulation code
Move timer initialization from before add_timer() to the context
where the containing object is initialized. Use setup_timer() and
mod_timer() instead of open coding these. Use 'jiffies' instead
of get_jiffies_64() when calculating expiry times because expiry
times have type unsigned long, just like 'jiffies'.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Kees Cook <keescook@chromium.org>
2017-11-01 11:44:49 -07:00
Mark Rutland 6aa7de0591 locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.

For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.

However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:

----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()

// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch

virtual patch

@ depends on patch @
expression E1, E2;
@@

- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)

@ depends on patch @
expression E;
@@

- ACCESS_ONCE(E)
+ READ_ONCE(E)
----

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2017-10-25 11:01:08 +02:00
Bhumika Goyal ece550b575 target: make config_item_type const
Make these structures const as they are either passed to the functions
having the argument as const or stored as a reference in the "ci_type"
const field of a config_item structure.

Done using Coccinelle.

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2017-10-19 16:15:17 +02:00
Linus Torvalds 581bfce969 Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more set_fs removal from Al Viro:
 "Christoph's 'use kernel_read and friends rather than open-coding
  set_fs()' series"

* 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fs: unexport vfs_readv and vfs_writev
  fs: unexport vfs_read and vfs_write
  fs: unexport __vfs_read/__vfs_write
  lustre: switch to kernel_write
  gadget/f_mass_storage: stop messing with the address limit
  mconsole: switch to kernel_read
  btrfs: switch write_buf to kernel_write
  net/9p: switch p9_fd_read to kernel_write
  mm/nommu: switch do_mmap_private to kernel_read
  serial2002: switch serial2002_tty_write to kernel_{read/write}
  fs: make the buf argument to __kernel_write a void pointer
  fs: fix kernel_write prototype
  fs: fix kernel_read prototype
  fs: move kernel_read to fs/read_write.c
  fs: move kernel_write to fs/read_write.c
  autofs4: switch autofs4_write to __kernel_write
  ashmem: switch to ->read_iter
2017-09-14 18:13:32 -07:00
Linus Torvalds a0725ab0c7 Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:
 "This is the first pull request for 4.14, containing most of the code
  changes. It's a quiet series this round, which I think we needed after
  the churn of the last few series. This contains:

   - Fix for a registration race in loop, from Anton Volkov.

   - Overflow complaint fix from Arnd for DAC960.

   - Series of drbd changes from the usual suspects.

   - Conversion of the stec/skd driver to blk-mq. From Bart.

   - A few BFQ improvements/fixes from Paolo.

   - CFQ improvement from Ritesh, allowing idling for group idle.

   - A few fixes found by Dan's smatch, courtesy of Dan.

   - A warning fixup for a race between changing the IO scheduler and
     device remova. From David Jeffery.

   - A few nbd fixes from Josef.

   - Support for cgroup info in blktrace, from Shaohua.

   - Also from Shaohua, new features in the null_blk driver to allow it
     to actually hold data, among other things.

   - Various corner cases and error handling fixes from Weiping Zhang.

   - Improvements to the IO stats tracking for blk-mq from me. Can
     drastically improve performance for fast devices and/or big
     machines.

   - Series from Christoph removing bi_bdev as being needed for IO
     submission, in preparation for nvme multipathing code.

   - Series from Bart, including various cleanups and fixes for switch
     fall through case complaints"

* 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits)
  kernfs: checking for IS_ERR() instead of NULL
  drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set
  drbd: Fix allyesconfig build, fix recent commit
  drbd: switch from kmalloc() to kmalloc_array()
  drbd: abort drbd_start_resync if there is no connection
  drbd: move global variables to drbd namespace and make some static
  drbd: rename "usermode_helper" to "drbd_usermode_helper"
  drbd: fix race between handshake and admin disconnect/down
  drbd: fix potential deadlock when trying to detach during handshake
  drbd: A single dot should be put into a sequence.
  drbd: fix rmmod cleanup, remove _all_ debugfs entries
  drbd: Use setup_timer() instead of init_timer() to simplify the code.
  drbd: fix potential get_ldev/put_ldev refcount imbalance during attach
  drbd: new disk-option disable-write-same
  drbd: Fix resource role for newly created resources in events2
  drbd: mark symbols static where possible
  drbd: Send P_NEG_ACK upon write error in protocol != C
  drbd: add explicit plugging when submitting batches
  drbd: change list_for_each_safe to while(list_first_entry_or_null)
  drbd: introduce drbd_recv_header_maybe_unplug
  ...
2017-09-07 11:59:42 -07:00
Christoph Hellwig e13ec939e9 fs: fix kernel_write prototype
Make the position an in/out argument like all the other read/write
helpers and and make the buf argument a void pointer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-09-04 19:05:15 -04:00
Christoph Hellwig 74d46992e0 block: replace bi_bdev with a gendisk pointer and partitions index
This way we don't need a block_device structure to submit I/O.  The
block_device has different life time rules from the gendisk and
request_queue and is usually only available when the block device node
is open.  Other callers need to explicitly create one (e.g. the lightnvm
passthrough code, or the new nvme multipathing code).

For the actual I/O path all that we need is the gendisk, which exists
once per block device.  But given that the block layer also does
partition remapping we additionally need a partition index, which is
used for said remapping in generic_make_request.

Note that all the block drivers generally want request_queue or
sometimes the gendisk, so this removes a layer of indirection all
over the stack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-08-23 12:49:55 -06:00
Nicholas Bellinger 6f48655fac target: Fix node_acl demo-mode + uncached dynamic shutdown regression
This patch fixes a generate_node_acls = 1 + cache_dynamic_acls = 0
regression, that was introduced by

  commit 01d4d67355
  Author: Nicholas Bellinger <nab@linux-iscsi.org>
  Date:   Wed Dec 7 12:55:54 2016 -0800

which originally had the proper list_del_init() usage, but was
dropped during list review as it was thought unnecessary by HCH.

However, list_del_init() usage is required during the special
generate_node_acls = 1 + cache_dynamic_acls = 0 case when
transport_free_session() does a list_del(&se_nacl->acl_list),
followed by target_complete_nacl() doing the same thing.

This was manifesting as a general protection fault as reported
by Justin:

kernel: general protection fault: 0000 [#1] SMP
kernel: Modules linked in:
kernel: CPU: 0 PID: 11047 Comm: iscsi_ttx Not tainted 4.13.0-rc2.x86_64.1+ #20
kernel: Hardware name: Intel Corporation S5500BC/S5500BC, BIOS S5500.86B.01.00.0064.050520141428 05/05/2014
kernel: task: ffff88026939e800 task.stack: ffffc90007884000
kernel: RIP: 0010:target_put_nacl+0x49/0xb0
kernel: RSP: 0018:ffffc90007887d70 EFLAGS: 00010246
kernel: RAX: dead000000000200 RBX: ffff8802556ca000 RCX: 0000000000000000
kernel: RDX: dead000000000100 RSI: 0000000000000246 RDI: ffff8802556ce028
kernel: RBP: ffffc90007887d88 R08: 0000000000000001 R09: 0000000000000000
kernel: R10: ffffc90007887df8 R11: ffffea0009986900 R12: ffff8802556ce020
kernel: R13: ffff8802556ce028 R14: ffff8802556ce028 R15: ffffffff88d85540
kernel: FS:  0000000000000000(0000) GS:ffff88027fc00000(0000) knlGS:0000000000000000
kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kernel: CR2: 00007fffe36f5f94 CR3: 0000000009209000 CR4: 00000000003406f0
kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
kernel: Call Trace:
kernel:  transport_free_session+0x67/0x140
kernel:  transport_deregister_session+0x7a/0xc0
kernel:  iscsit_close_session+0x92/0x210
kernel:  iscsit_close_connection+0x5f9/0x840
kernel:  iscsit_take_action_for_connection_exit+0xfe/0x110
kernel:  iscsi_target_tx_thread+0x140/0x1e0
kernel:  ? wait_woken+0x90/0x90
kernel:  kthread+0x124/0x160
kernel:  ? iscsit_thread_get_cpumask+0x90/0x90
kernel:  ? kthread_create_on_node+0x40/0x40
kernel:  ret_from_fork+0x22/0x30
kernel: Code: 00 48 89 fb 4c 8b a7 48 01 00 00 74 68 4d 8d 6c 24 08 4c
89 ef e8 e8 28 43 00 48 8b 93 20 04 00 00 48 8b 83 28 04 00 00 4c 89
ef <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 83 20
kernel: RIP: target_put_nacl+0x49/0xb0 RSP: ffffc90007887d70
kernel: ---[ end trace f12821adbfd46fed ]---

To address this, go ahead and use proper list_del_list() for all
cases of se_nacl->acl_list deletion.

Reported-by: Justin Maggard <jmaggard01@gmail.com>
Tested-by: Justin Maggard <jmaggard01@gmail.com>
Cc: Justin Maggard <jmaggard01@gmail.com>
Cc: stable@vger.kernel.org # 4.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-08-09 20:55:19 -07:00
Nicholas Bellinger 978d13d60c iscsi-target: Fix iscsi_np reset hung task during parallel delete
This patch fixes a bug associated with iscsit_reset_np_thread()
that can occur during parallel configfs rmdir of a single iscsi_np
used across multiple iscsi-target instances, that would result in
hung task(s) similar to below where configfs rmdir process context
was blocked indefinately waiting for iscsi_np->np_restart_comp
to finish:

[ 6726.112076] INFO: task dcp_proxy_node_:15550 blocked for more than 120 seconds.
[ 6726.119440]       Tainted: G        W  O     4.1.26-3321 #2
[ 6726.125045] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6726.132927] dcp_proxy_node_ D ffff8803f202bc88     0 15550      1 0x00000000
[ 6726.140058]  ffff8803f202bc88 ffff88085c64d960 ffff88083b3b1ad0 ffff88087fffeb08
[ 6726.147593]  ffff8803f202c000 7fffffffffffffff ffff88083f459c28 ffff88083b3b1ad0
[ 6726.155132]  ffff88035373c100 ffff8803f202bca8 ffffffff8168ced2 ffff8803f202bcb8
[ 6726.162667] Call Trace:
[ 6726.165150]  [<ffffffff8168ced2>] schedule+0x32/0x80
[ 6726.170156]  [<ffffffff8168f5b4>] schedule_timeout+0x214/0x290
[ 6726.176030]  [<ffffffff810caef2>] ? __send_signal+0x52/0x4a0
[ 6726.181728]  [<ffffffff8168d7d6>] wait_for_completion+0x96/0x100
[ 6726.187774]  [<ffffffff810e7c80>] ? wake_up_state+0x10/0x10
[ 6726.193395]  [<ffffffffa035d6e2>] iscsit_reset_np_thread+0x62/0xe0 [iscsi_target_mod]
[ 6726.201278]  [<ffffffffa0355d86>] iscsit_tpg_disable_portal_group+0x96/0x190 [iscsi_target_mod]
[ 6726.210033]  [<ffffffffa0363f7f>] lio_target_tpg_store_enable+0x4f/0xc0 [iscsi_target_mod]
[ 6726.218351]  [<ffffffff81260c5a>] configfs_write_file+0xaa/0x110
[ 6726.224392]  [<ffffffff811ea364>] vfs_write+0xa4/0x1b0
[ 6726.229576]  [<ffffffff811eb111>] SyS_write+0x41/0xb0
[ 6726.234659]  [<ffffffff8169042e>] system_call_fastpath+0x12/0x71

It would happen because each iscsit_reset_np_thread() sets state
to ISCSI_NP_THREAD_RESET, sends SIGINT, and then blocks waiting
for completion on iscsi_np->np_restart_comp.

However, if iscsi_np was active processing a login request and
more than a single iscsit_reset_np_thread() caller to the same
iscsi_np was blocked on iscsi_np->np_restart_comp, iscsi_np
kthread process context in __iscsi_target_login_thread() would
flush pending signals and only perform a single completion of
np->np_restart_comp before going back to sleep within transport
specific iscsit_transport->iscsi_accept_np code.

To address this bug, add a iscsi_np->np_reset_count and update
__iscsi_target_login_thread() to keep completing np->np_restart_comp
until ->np_reset_count has reached zero.

Reported-by: Gary Guo <ghg@datera.io>
Tested-by: Gary Guo <ghg@datera.io>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-08-06 14:41:41 -07:00
Varun Prakash d96adb9b07 cxgbit: fix sg_nents calculation
The current logic of calculating sg_nents can fail
if data_offset % PAGE_SIZE is not zero.

For example -
PAGE_SIZE = 4096
data_len = 3072
data_offset = 3072

As per current logic
sg_nents = max(1UL, DIV_ROUND_UP(data_len, PAGE_SIZE));
sg_nents = max(1UL, DIV_ROUND_UP(3072, 4096));
sg_nents = 1

But as data_offset % PAGE_SIZE = 3072 we should skip 3072 bytes
skip = 3K
sg_nents = max(1UL, DIV_ROUND_UP(3K(skip) + 3K(data_len), 4K(PAGE_SIZE));
sg_nents = 2;

This patch fixes this issue by adding skip to data_len.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-30 15:35:34 -07:00
Varun Prakash 310d40a973 iscsi-target: fix invalid flags in text response
In case of multiple text responses iscsi-target
sets both 'F' and 'C' bit for the final text response
pdu, this issue happens because hdr->flags is not
zeroed out before ORing with 'F' bit.

This patch removes the | operator to fix this issue.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-30 15:32:48 -07:00
Varun Prakash ea8dc5b4cd iscsi-target: fix memory leak in iscsit_setup_text_cmd()
On receiving text request iscsi-target allocates buffer for
payload in iscsit_handle_text_cmd() and assigns buffer pointer
to cmd->text_in_ptr, this buffer is currently freed in
iscsit_release_cmd(), if iscsi-target sets 'C' bit in text
response then it will receive another text request from the
initiator with ttt != 0xffffffff in this case iscsi-target
will find cmd using itt and call iscsit_setup_text_cmd()
which will set cmd->text_in_ptr to NULL without freeing
previously allocated buffer.

This patch fixes this issue by calling kfree(cmd->text_in_ptr)
in iscsit_setup_text_cmd() before assigning NULL to it.

For the first text request cmd->text_in_ptr is NULL as
cmd is memset to 0 in iscsit_allocate_cmd().

Signed-off-by: Varun Prakash <varun@chelsio.com>
Cc: <stable@vger.kernel.org> # 4.0+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-30 15:31:34 -07:00
Varun Prakash 66b59f9b1f cxgbit: add missing __kfree_skb()
Call __kfree_skb() after processing skb to avoid
memory leak.

Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-30 15:26:00 -07:00
Bryant G. Ly ededd039d1 tcmu: free old string on reconfig
On initial tcmu_configure_device call the info->name would
have already been allocated and set, so on the second call
make sure to free it first.

Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-30 15:23:21 -07:00
Xiubo Li c542942cb4 tcmu: Fix possible to/from address overflow when doing the memcpy
For most case the sg->length equals to PAGE_SIZE, so this bug won't
be triggered. Otherwise this will crash the kernel, for example when
all segments' sg->length equal to 1K.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-30 15:12:32 -07:00
Linus Torvalds 52f6c588c7 Add wait_for_random_bytes() and get_random_*_wait() functions so that
callers can more safely get random bytes if they can block until the
 CRNG is initialized.
 
 Also print a warning if get_random_*() is called before the CRNG is
 initialized.  By default, only one single-line warning will be printed
 per boot.  If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
 warning will be printed for each function which tries to get random
 bytes before the CRNG is initialized.  This can get spammy for certain
 architecture types, so it is not enabled by default.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEK2m5VNv+CHkogTfJ8vlZVpUNgaMFAllqXNUACgkQ8vlZVpUN
 gaPtAgf/aUbXZuWYsDQzslHsbzEWi+qz4QgL885/w4L00pEImTTp91Q06SDxWhtB
 KPvGnZHS3IofxBh2DC+6AwN6dPMoWDCfYhhO6po3FSz0DiPRIQCTuvOb8fhKY1X7
 rTdDq2xtDxPGxJ25bMJtlrgzH2XlXPpVyPUeoc9uh87zUK5aesXpUn9kBniRexoz
 ume+M/cDzPKkwNQpbLq8vzhNjoWMVv0FeW2akVvrjkkWko8nZLZ0R/kIyKQlRPdG
 LZDXcz0oTHpDS6+ufEo292ZuWm2IGer2YtwHsKyCAsyEWsUqBz2yurtkSj3mAVyC
 hHafyS+5WNaGdgBmg0zJxxwn5qxxLg==
 =ua7p
 -----END PGP SIGNATURE-----

Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

Pull random updates from Ted Ts'o:
 "Add wait_for_random_bytes() and get_random_*_wait() functions so that
  callers can more safely get random bytes if they can block until the
  CRNG is initialized.

  Also print a warning if get_random_*() is called before the CRNG is
  initialized. By default, only one single-line warning will be printed
  per boot. If CONFIG_WARN_ALL_UNSEEDED_RANDOM is defined, then a
  warning will be printed for each function which tries to get random
  bytes before the CRNG is initialized. This can get spammy for certain
  architecture types, so it is not enabled by default"

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
  random: reorder READ_ONCE() in get_random_uXX
  random: suppress spammy warnings about unseeded randomness
  random: warn when kernel uses unseeded randomness
  net/route: use get_random_int for random counter
  net/neighbor: use get_random_u32 for 32-bit hash random
  rhashtable: use get_random_u32 for hash_rnd
  ceph: ensure RNG is seeded before using
  iscsi: ensure RNG is seeded before use
  cifs: use get_random_u32 for 32-bit lock random
  random: add get_random_{bytes,u32,u64,int,long,once}_wait family
  random: add wait_for_random_bytes() API
2017-07-15 12:44:02 -07:00
Linus Torvalds 48ea2cedde Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "It's been usually busy for summer, with most of the efforts centered
  around TCMU developments and various target-core + fabric driver bug
  fixing activities. Not particularly large in terms of LoC, but lots of
  smaller patches from many different folks.

  The highlights include:

   - ibmvscsis logical partition manager support (Michael Cyr + Bryant
     Ly)

   - Convert target/iblock WRITE_SAME to blkdev_issue_zeroout (hch +
     nab)

   - Add support for TMR percpu LUN reference counting (nab)

   - Fix a potential deadlock between EXTENDED_COPY and iscsi shutdown
     (Bart)

   - Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce (Jiang Yi)

   - Fix TMCU module removal (Xiubo Li)

   - Fix iser-target OOPs during login failure (Andrea Righi + Sagi)

   - Breakup target-core free_device backend driver callback (mnc)

   - Perform TCMU add/delete/reconfig synchronously (mnc)

   - Fix TCMU multiple UIO open/close sequences (mnc)

   - Fix TCMU CHECK_CONDITION sense handling (mnc)

   - Fix target-core SAM_STAT_BUSY + TASK_SET_FULL handling (mnc + nab)

   - Introduce TYPE_ZBC support in PSCSI (Damien Le Moal)

   - Fix possible TCMU memory leak + OOPs when recalculating cmd base
     size (Xiubo Li + Bryant Ly + Damien Le Moal + mnc)

   - Add login_keys_workaround attribute for non RFC initiators (Robert
     LeBlanc + Arun Easi + nab)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (68 commits)
  iscsi-target: Add login_keys_workaround attribute for non RFC initiators
  Revert "qla2xxx: Fix incorrect tcm_qla2xxx_free_cmd use during TMR ABORT"
  tcmu: clean up the code and with one small fix
  tcmu: Fix possbile memory leak / OOPs when recalculating cmd base size
  target: export lio pgr/alua support as device attr
  target: Fix return sense reason in target_scsi3_emulate_pr_out
  target: Fix cmd size for PR-OUT in passthrough_parse_cdb
  tcmu: Fix dev_config_store
  target: pscsi: Introduce TYPE_ZBC support
  target: Use macro for WRITE_VERIFY_32 operation codes
  target: fix SAM_STAT_BUSY/TASK_SET_FULL handling
  target: remove transport_complete
  pscsi: finish cmd processing from pscsi_req_done
  tcmu: fix sense handling during completion
  target: add helper to copy sense to se_cmd buffer
  target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE
  target: make device_mutex and device_list static
  tcmu: Fix flushing cmd entry dcache page
  tcmu: fix multiple uio open/close sequences
  tcmu: drop configured check in destroy
  ...
2017-07-13 14:27:32 -07:00
Michal Hocko dcda9b0471 mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to
the page allocator.  This has been true but only for allocations
requests larger than PAGE_ALLOC_COSTLY_ORDER.  It has been always
ignored for smaller sizes.  This is a bit unfortunate because there is
no way to express the same semantic for those requests and they are
considered too important to fail so they might end up looping in the
page allocator for ever, similarly to GFP_NOFAIL requests.

Now that the whole tree has been cleaned up and accidental or misled
usage of __GFP_REPEAT flag has been removed for !costly requests we can
give the original flag a better name and more importantly a more useful
semantic.  Let's rename it to __GFP_RETRY_MAYFAIL which tells the user
that the allocator would try really hard but there is no promise of a
success.  This will work independent of the order and overrides the
default allocator behavior.  Page allocator users have several levels of
guarantee vs.  cost options (take GFP_KERNEL as an example)

 - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_
   attempt to free memory at all. The most light weight mode which even
   doesn't kick the background reclaim. Should be used carefully because
   it might deplete the memory and the next user might hit the more
   aggressive reclaim

 - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic
   allocation without any attempt to free memory from the current
   context but can wake kswapd to reclaim memory if the zone is below
   the low watermark. Can be used from either atomic contexts or when
   the request is a performance optimization and there is another
   fallback for a slow path.

 - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) -
   non sleeping allocation with an expensive fallback so it can access
   some portion of memory reserves. Usually used from interrupt/bh
   context with an expensive slow path fallback.

 - GFP_KERNEL - both background and direct reclaim are allowed and the
   _default_ page allocator behavior is used. That means that !costly
   allocation requests are basically nofail but there is no guarantee of
   that behavior so failures have to be checked properly by callers
   (e.g. OOM killer victim is allowed to fail currently).

 - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior
   and all allocation requests fail early rather than cause disruptive
   reclaim (one round of reclaim in this implementation). The OOM killer
   is not invoked.

 - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator
   behavior and all allocation requests try really hard. The request
   will fail if the reclaim cannot make any progress. The OOM killer
   won't be triggered.

 - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior
   and all allocation requests will loop endlessly until they succeed.
   This might be really dangerous especially for larger orders.

Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL
because they already had their semantic.  No new users are added.
__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if
there is no progress and we have already passed the OOM point.

This means that all the reclaim opportunities have been exhausted except
the most disruptive one (the OOM killer) and a user defined fallback
behavior is more sensible than keep retrying in the page allocator.

[akpm@linux-foundation.org: fix arch/sparc/kernel/mdesc.c]
[mhocko@suse.com: semantic fix]
  Link: http://lkml.kernel.org/r/20170626123847.GM11534@dhcp22.suse.cz
[mhocko@kernel.org: address other thing spotted by Vlastimil]
  Link: http://lkml.kernel.org/r/20170626124233.GN11534@dhcp22.suse.cz
Link: http://lkml.kernel.org/r/20170623085345.11304-3-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alex Belits <alex.belits@cavium.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2017-07-12 16:26:03 -07:00
Linus Torvalds 130568d5ea Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull more block updates from Jens Axboe:
 "This is a followup for block changes, that didn't make the initial
  pull request. It's a bit of a mixed bag, this contains:

   - A followup pull request from Sagi for NVMe. Outside of fixups for
     NVMe, it also includes a series for ensuring that we properly
     quiesce hardware queues when browsing live tags.

   - Set of integrity fixes from Dmitry (mostly), fixing various issues
     for folks using DIF/DIX.

   - Fix for a bug introduced in cciss, with the req init changes. From
     Christoph.

   - Fix for a bug in BFQ, from Paolo.

   - Two followup fixes for lightnvm/pblk from Javier.

   - Depth fix from Ming for blk-mq-sched.

   - Also from Ming, performance fix for mtip32xx that was introduced
     with the dynamic initialization of commands"

* 'for-linus' of git://git.kernel.dk/linux-block: (44 commits)
  block: call bio_uninit in bio_endio
  nvmet: avoid unneeded assignment of submit_bio return value
  nvme-pci: add module parameter for io queue depth
  nvme-pci: compile warnings in nvme_alloc_host_mem()
  nvmet_fc: Accept variable pad lengths on Create Association LS
  nvme_fc/nvmet_fc: revise Create Association descriptor length
  lightnvm: pblk: remove unnecessary checks
  lightnvm: pblk: control I/O flow also on tear down
  cciss: initialize struct scsi_req
  null_blk: fix error flow for shared tags during module_init
  block: Fix __blkdev_issue_zeroout loop
  nvme-rdma: unconditionally recycle the request mr
  nvme: split nvme_uninit_ctrl into stop and uninit
  virtio_blk: quiesce/unquiesce live IO when entering PM states
  mtip32xx: quiesce request queues to make sure no submissions are inflight
  nbd: quiesce request queues to make sure no submissions are inflight
  nvme: kick requeue list when requeueing a request instead of when starting the queues
  nvme-pci: quiesce/unquiesce admin_q instead of start/stop its hw queues
  nvme-loop: quiesce/unquiesce admin_q instead of start/stop its hw queues
  nvme-fc: quiesce/unquiesce admin_q instead of start/stop its hw queues
  ...
2017-07-11 15:36:52 -07:00
Nicholas Bellinger 138d351eef iscsi-target: Add login_keys_workaround attribute for non RFC initiators
This patch re-introduces part of a long standing login workaround that
was recently dropped by:

  commit 1c99de981f
  Author: Nicholas Bellinger <nab@linux-iscsi.org>
  Date:   Sun Apr 2 13:36:44 2017 -0700

      iscsi-target: Drop work-around for legacy GlobalSAN initiator

Namely, the workaround for FirstBurstLength ended up being required by
Mellanox Flexboot PXE boot ROMs as reported by Robert.

So this patch re-adds the work-around for FirstBurstLength within
iscsi_check_proposer_for_optional_reply(), and makes the key optional
to respond when the initiator does not propose, nor respond to it.

Also as requested by Arun, this patch introduces a new TPG attribute
named 'login_keys_workaround' that controls the use of both the
FirstBurstLength workaround, as well as the two other existing
workarounds for gPXE iSCSI boot client.

By default, the workaround is enabled with login_keys_workaround=1,
since Mellanox FlexBoot requires it, and Arun has verified the Qlogic
MSFT initiator already proposes FirstBurstLength, so it's uneffected
by this re-adding this part of the original work-around.

Reported-by: Robert LeBlanc <robert@leblancnet.us>
Cc: Robert LeBlanc <robert@leblancnet.us>
Reviewed-by: Arun Easi <arun.easi@cavium.com>
Cc: <stable@vger.kernel.org> # 3.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-11 10:56:39 -07:00
Xiubo Li daf78c3051 tcmu: clean up the code and with one small fix
Remove useless blank line and code and at the same time add one error
path to catch the errors.

Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-11 10:48:07 -07:00
Xiubo Li b3743c71b7 tcmu: Fix possbile memory leak / OOPs when recalculating cmd base size
For all the entries allocated from the ring cmd area, the memory is
something like the stack memory, which will always reserve the old
data, so the entry->req.iov_bidi_cnt maybe none zero.

On some environments, the crash could be reproduce very easy and some
not. The following is the crash core trace as reported by Damien:

[  240.143969] CPU: 0 PID: 1285 Comm: iscsi_trx Not tainted 4.12.0-rc1+ #3
[  240.150607] Hardware name: ASUS All Series/H87-PRO, BIOS 2104 10/28/2014
[  240.157331] task: ffff8807de4f5800 task.stack: ffffc900047dc000
[  240.163270] RIP: 0010:memcpy_erms+0x6/0x10
[  240.167377] RSP: 0018:ffffc900047dfc68 EFLAGS: 00010202
[  240.172621] RAX: ffffc9065db85540 RBX: ffff8807f7980000 RCX: 0000000000000010
[  240.179771] RDX: 0000000000000010 RSI: ffff8807de574fe0 RDI: ffffc9065db85540
[  240.186930] RBP: ffffc900047dfd30 R08: ffff8807de41b000 R09: 0000000000000000
[  240.194088] R10: 0000000000000040 R11: ffff8807e9b726f0 R12: 00000006565726b0
[  240.201246] R13: ffffc90007612ea0 R14: 000000065657d540 R15: 0000000000000000
[  240.208397] FS:  0000000000000000(0000) GS:ffff88081fa00000(0000) knlGS:0000000000000000
[  240.216510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  240.222280] CR2: ffffc9065db85540 CR3: 0000000001c0f000 CR4: 00000000001406f0
[  240.229430] Call Trace:
[  240.231887]  ? tcmu_queue_cmd+0x83c/0xa80
[  240.235916]  ? target_check_reservation+0xcd/0x6f0
[  240.240725]  __target_execute_cmd+0x27/0xa0
[  240.244918]  target_execute_cmd+0x232/0x2c0
[  240.249124]  ? __local_bh_enable_ip+0x64/0xa0
[  240.253499]  iscsit_execute_cmd+0x20d/0x270
[  240.257693]  iscsit_sequence_cmd+0x110/0x190
[  240.261985]  iscsit_get_rx_pdu+0x360/0xc80
[  240.267565]  ? iscsi_target_rx_thread+0x54/0xd0
[  240.273571]  iscsi_target_rx_thread+0x9a/0xd0
[  240.279413]  kthread+0x113/0x150
[  240.284120]  ? iscsi_target_tx_thread+0x1e0/0x1e0
[  240.290297]  ? kthread_create_on_node+0x40/0x40
[  240.296297]  ret_from_fork+0x2e/0x40
[  240.301332] Code: 90 90 90 90 90 eb 1e 0f 1f 00 48 89 f8 48 89 d1 48
c1 e9 03 83 e2 07 f3 48 a5 89 d1 f3 a4 c3 66 0f 1f 44 00 00 48 89 f8 48
89 d1 <f3> a4 c3 0f 1f 80 00 00 00 00 48 89 f8 48 83 fa 20 72 7e 40 38
[  240.321751] RIP: memcpy_erms+0x6/0x10 RSP: ffffc900047dfc68
[  240.328838] CR2: ffffc9065db85540
[  240.333667] ---[ end trace b7e5354cfb54d08b ]---

To fix this, just memset all the entry memory before using it, and
also to be more readable we adjust the bidi code.

Fixed: fe25cc34795(tcmu: Recalculate the tcmu_cmd size to save cmd area
		memories)
Reported-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Tested-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-11 10:47:58 -07:00
Mike Christie c17d5d5f51 target: export lio pgr/alua support as device attr
Older kernels could crash or hang if the user write/read some ALUA files
with pscsi and tcmu backends. This patch exports if LIO supports
executing PGR and ALUA scsi commands/checks for the se_device, so userspace
can easily test.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-10 20:05:27 -07:00
Tang Wenji 7ee0031786 target: Fix return sense reason in target_scsi3_emulate_pr_out
The sense reason should be TCM_PARAMETER_LIST_LENGTH_ERROR when
parmeter length error.

Also the cdb[1] & 0x1f has been assigned to local variable sa,
so use sa instead of it.

Signed-off-by: Tang Wenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-09 20:59:17 -07:00
Tang Wenji 388fe6996b target: Fix cmd size for PR-OUT in passthrough_parse_cdb
The cmd size should be 4bytes form byte5 to byte8 when CDB opcode
is PERSISTENT_RESERVE_OUT in SPC3 and SPC4

(Also fix up the same in spc_parse_cdb - MNC)

Signed-off-by: Tang Wenji <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-09 20:58:49 -07:00
Bryant G. Ly de8c5221aa tcmu: Fix dev_config_store
Currently when there is a reconfig, the uio_info->name
does not get updated to reflect the change in the dev_config
name change.

On restart tcmu-runner there will be a mismatch between
the dev_config string in uio and the tcmu structure that contains
the string. When this occurs it'll reload the one in uio
and you lose the reconfigured device path.

v2: Created a helper function for the updating of uio_info

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-09 20:57:56 -07:00
Damien Le Moal 016a5fec19 target: pscsi: Introduce TYPE_ZBC support
TYPE_ZBC host managed zoned block devices are also block devices
despite the non-standard device type (14h). Handle them similarly to
regular TYPE_DISK devices.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:49 -07:00
Damien Le Moal e5dc9a7055 target: Use macro for WRITE_VERIFY_32 operation codes
Add WRITE_VERIFY_32 definition to scsi prototypes and use this macro
definition isntead of the hard coded value.

(Drop WRITE_VERIFY_16 that's already part of another patch - nab)

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:49 -07:00
Mike Christie 402242c904 target: fix SAM_STAT_BUSY/TASK_SET_FULL handling
If the scsi status was not SAM_STAT_GOOD or there was no transport
sense, we would ignore the scsi status and do a generic not ready
LUN communication failure check condition failure.

The problem is that LUN COMM failure is treated as a hard error
sometimes and will cause apps to get IO errors instead of the OS's SCSI
layer retrying. For example, the tcmu daemon will return SAM_STAT_QUEUE_FULL
when memory runs low and can still make progress but wants the initiator to
reduce the work load. Windows will fail this error directly the app
instead of retrying.

This patch is based on Nick's "target/iblock: Use -EAGAIN/-ENOMEM to
propigate SAM BUSY/TASK_SET_FULL" patch here:

http://comments.gmane.org/gmane.linux.scsi.target.devel/11301

but instead of only setting SAM_STAT_GOOD, SAM_STAT_TASK_SET_FULL
and SAM_STAT_BUSY as success, it sets all non check condition status
as success so they are passed back to the initiator, so passthrough
type backends can return all SCSI status codes. Since only passthrough
uses this, I was not sure if we wanted to add checks for non-passthrough
and specific codes.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:48 -07:00
Mike Christie 1a44417548 target: remove transport_complete
transport_complete is no longer used, so drop the code.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:47 -07:00
Mike Christie dce6ce8cfb pscsi: finish cmd processing from pscsi_req_done
This patch performs the pscsi_transport_complete operations from
pscsi_req_done. It looks like the only difference the transport_complete
callout provides is that it is called under t_state_lock which seems to
only be needed for the SCF_TRANSPORT_TASK_SENSE bit handling. We can
now use transport_copy_sense_to_cmd to handle the se_cmd sense bits, and
we can then drop the code where we have to copy the request info to
the pscsi_plugin_task for transport_complete use.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:46 -07:00
Mike Christie 406f74c20d tcmu: fix sense handling during completion
We were just copying the sense to the cmd sense_buffer and
did not implement a transport_complete or set the
SCF_TRANSPORT_TASK_SENSE, so the sense was ignored.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:46 -07:00
Mike Christie c6d66aba98 target: add helper to copy sense to se_cmd buffer
This adds a helper to copy sense from backend module buffer to
the se_cmd's sense buffer.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:45 -07:00
Mike Christie 9fe3698450 target: do not require a transport_complete for SCF_TRANSPORT_TASK_SENSE
tcmu needs to pass raw sense to target_complete_cmd, but a a
transport_complete callout is akward to implement for it.

This moves the check for SCF_TRANSPORT_TASK_SENSE so any backend
can pass sense.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:44 -07:00
Colin Ian King c82ff239ec target: make device_mutex and device_list static
Variables device_mutex and device_list static are local to the source,
so make them static.

Cleans up sparse warnings:
"symbol 'device_list' was not declared. Should it be static?"
"symbol 'device_mutex' was not declared. Should it be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:44 -07:00
Xiubo Li 9d62bc0e6d tcmu: Fix flushing cmd entry dcache page
When feeding the tcmu's cmd ring, we need to flush the dcache page
for the cmd entry to make sure these kernel stores are visible to
user space mappings of that page.

For the none PAD cmd entry, this will be flushed at the end of the
tcmu_queue_cmd_ring().

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:43 -07:00
Mike Christie 9260695d65 tcmu: fix multiple uio open/close sequences
If the uio device is open and closed multiple times, the
kref count will be off due to tcmu_release getting called
multiple times for each close. This patch integrates
Wenji Tang's patch to add a kref_get on open that now
matches the kref_put done on tcmu_release and adds
a kref_put in tcmu_destroy_device to match the kref_get
done in succesful tcmu_configure_device calls.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Cc: Wenji Tang <tang.wenji@zte.com.cn>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:42 -07:00
Mike Christie 531283ff75 tcmu: drop configured check in destroy
destroy_device is only called if we have successfully run
configure_device, so drop the duplicate tcmu_dev_configured check.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:41 -07:00
Mike Christie be50f538e9 target: remove g_device_list
g_device_list is no longer needed because we now use the idr code
for lookups and seaches.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:41 -07:00
Mike Christie 6906d008b4 xcopy: loop over devices using idr helper
This converts the xcopy code to use the idr helper. The next patch
will drop the g_device_list and make g_device_mutex local to the
target_core_device.c file.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:40 -07:00
Mike Christie b1943fd454 target: add helper to iterate over devices
This adds a wrapper around idr_for_each so the xcopy code can loop over
the devices in the next patch.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:39 -07:00
Mike Christie b3af66e243 tcmu: perfom device add, del and reconfig synchronously
This makes the device add, del reconfig operations sync. It fixes
the issue where for add and reconfig, we do not know if userspace
successfully completely the operation, so we leave invalid kernel
structs or report incorrect status for the config/reconfig operations.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:39 -07:00
Mike Christie 85441e6b8c target: add helper to find se_device by dev_index
This adds a helper to find a se_device by dev_index. It will
be used in the next patches so tcmu's netlink interface can
execute commands on specific devices.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:38 -07:00
Mike Christie 0a5eee647b target: use idr for se_device dev index
In the next patches we will add tcmu netlink support that allows
userspace to send commands to target_core_user. To execute operations
on a se_device/tcmu_dev we need to be able to look up a dev by any old
id. This patch replaces the se_device->dev_index with a idr created
id.

The next patches will also remove the g_device_list and replace it with
the idr.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:37 -07:00
Mike Christie 926347061e target: break up free_device callback
With this patch free_device is now used to free what is allocated in the
alloc_device callback and destroy_device tears down the resources that are
setup in the configure_device callback.

This patch will be needed in the next patch where tcmu needs
to be able to look up the device in the destroy callback.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:37 -07:00
Mike Christie 2d76443e02 tcmu: reconfigure netlink attr changes
1. TCMU_ATTR_TYPE is too generic when it describes only the
reconfiguration type, so rename to TCMU_ATTR_RECONFIG_TYPE.

2. Only return the reconfig type when it is a
TCMU_CMD_RECONFIG_DEVICE command.

3. CONFIG_* type is not needed. We can pass the value along with an
ATTR to userspace, so it does not need to read sysfs/configfs.

4. Fix leak in tcmu_dev_path_store and rename to dev_config to
reflect it is more than just a path that can be changed.

6. Don't update kernel struct value if netlink sending fails.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: "Bryant G. Ly" <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:36 -07:00
Colin Ian King 5821783bca tcmu: make array tcmu_attrib_attrs static const
The array tcmu_attrib_attrs does not need to be in global scope, so make
it static.

Cleans up sparse warning:
"symbol 'tcmu_attrib_attrs' was not declared. Should it be static?"

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:34 -07:00
Xiubo Li 07932a023a tcmu: Fix module removal due to stuck unmap_thread thread again
Because the unmap code just after the schdule() returned may take
a long time and if the kthread_stop() is fired just when in this
routine, the module removal maybe stuck too.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:33 -07:00
Jiang Yi 1d6ef27659 target: Fix COMPARE_AND_WRITE caw_sem leak during se_cmd quiesce
This patch addresses a COMPARE_AND_WRITE se_device->caw_sem leak,
that would be triggered during normal se_cmd shutdown or abort
via __transport_wait_for_tasks().

This would occur because target_complete_cmd() would catch this
early and do complete_all(&cmd->t_transport_stop_comp), but since
target_complete_ok_work() or target_complete_failure_work() are
never called to invoke se_cmd->transport_complete_callback(),
the COMPARE_AND_WRITE specific callbacks never release caw_sem.

To address this special case, go ahead and release caw_sem
directly from target_complete_cmd().

(Remove '&& success' from check, to release caw_sem regardless
 of scsi_status - nab)

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:32 -07:00
Bryant G. Ly 8a45885c15 tcmu: Add Type of reconfig into netlink
This patch adds more info about the attribute being changed,
so that usersapce can easily figure out what is happening.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-By: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:32 -07:00
Bryant G. Ly ee01825220 tcmu: Make dev_config configurable
This allows for userspace to change the device path after
it has been created. Thus giving the user the ability to change
the path. The use case for this is to allow for virtual optical
to have media change.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-By: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:31 -07:00
Bryant G. Ly 801fc54d5d tcmu: Make dev_size configurable via userspace
Allow tcmu backstores to be able to set the device size
after it has been configured via set attribute.

Part of support in userspace to support certain backstores
changing device size.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-By: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:30 -07:00
Bryant G. Ly 1068be7bd4 tcmu: Add netlink for device reconfiguration
This gives tcmu the ability to handle events that can cause
reconfiguration, such as resize, path changes, write_cache, etc...

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-By: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:30 -07:00
Bryant G. Ly 9a8bb60650 tcmu: Support emulate_write_cache
This will enable the toggling of write_cache in tcmu through targetcli-fb

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Reviewed-By: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:29 -07:00
Bart Van Assche 8fa4011e0d target/iscsi: Remove dead code from iscsit_process_scsi_cmd()
If an iSCSI command is rejected before iscsit_process_scsi_cmd()
is called, .reject_reason is set but iscsit_process_scsi_cmd() is
not called. This means that the "if (cmd->reject_reason) ..." code
in this function can be removed without changing the behavior of
this function.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:28 -07:00
Bart Van Assche d1c26857cd target/iscsi: Simplify iscsit_free_cmd()
Since .se_tfo is only set if a command has been submitted to
the LIO core, check .se_tfo instead of .iscsi_opcode. Since
__iscsit_free_cmd() only affects SCSI commands but not TMFs,
calling that function for TMFs does not change behavior. This
patch does not change the behavior of iscsit_free_cmd().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:27 -07:00
Bart Van Assche 4412a67131 target/iscsi: Remove second argument of __iscsit_free_cmd()
Initialize .data_direction to DMA_NONE in iscsit_allocate_cmd()
such that the second argument of __iscsit_free_cmd() can be left
out. Note: this patch causes the first part of __iscsit_free_cmd()
no longer to be skipped for TMFs. That's fine since no data
segments are associated with TMFs.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:26 -07:00
Bart Van Assche 4c1f0e6539 target/tcm_loop: Make TMF processing slightly faster
Target drivers must guarantee that struct se_cmd and struct se_tmr_req
exist as long as target_tmr_work() is in progress. This is why the
tcm_loop driver today passes 1 as second argument to
transport_generic_free_cmd() from inside the TMF code. Instead of
making the TMF code wait, make the TMF code obtain two references
(SCF_ACK_KREF) and drop one reference from inside the .check_stop_free()
callback.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:25 -07:00
Bart Van Assche 75f141aaf4 target/tcm_loop: Use target_submit_tmr() instead of open-coding this function
Use target_submit_tmr() instead of open-coding this function. The
only functional change is that TMFs are now added to sess_cmd_list,
something the current code does not do. This behavior change is a
bug fix because it makes LUN RESETs wait for other TMFs that are in
progress for the same LUN.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:25 -07:00
Bart Van Assche d17203c411 target/tcm_loop: Replace a waitqueue and a counter by a completion
This patch simplifies the implementation of the tcm_loop driver
but does not change its behavior.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:24 -07:00
Bart Van Assche 4d3895d5ea target/tcm_loop: Merge struct tcm_loop_cmd and struct tcm_loop_tmr
This patch simplifies the tcm_loop implementation but does not
change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 23:11:23 -07:00
Bart Van Assche c00e622023 target: Introduce a function that shows the command state
Introduce target_show_cmd() and use it where appropriate. If
transport_wait_for_tasks() takes too long, make it show the
state of the command it is waiting for.

(Add missing brackets around multi-line conditions - nab)

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:58:04 -07:00
Nicholas Bellinger 03db016a1b iscsi-target: Kill left-over iscsi_target_do_cleanup
With commit 25cdda95fd in place to address the initial login
PDU asynchronous socket close OOPs, go ahead and kill off the
left-over iscsi_target_do_cleanup() and ->login_cleanup_work.

Reported-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:58:04 -07:00
Bart Van Assche d877d7275b target: Fix a deadlock between the XCOPY code and iSCSI session shutdown
Move the code for parsing an XCOPY command from the context of
the iSCSI receiver thread to the context of the XCOPY workqueue.
Keep the simple XCOPY checks in the context of the iSCSI receiver
thread. Move the code for allocating and freeing struct xcopy_op
from the code that parses an XCOPY command to its caller.

This patch fixes the following deadlock:

======================================================
[ INFO: possible circular locking dependency detected ]
4.10.0-rc7-dbg+ #1 Not tainted
-------------------------------------------------------
rmdir/13321 is trying to acquire lock:
 (&sess->cmdsn_mutex){+.+.+.}, at: [<ffffffffa02cb47d>] iscsit_free_all_ooo_cmdsns+0x2d/0xb0 [iscsi_target_mod]

but task is already holding lock:
 (&sb->s_type->i_mutex_key#14){++++++}, at: [<ffffffff811c6e20>] vfs_rmdir+0x50/0x140

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:
-> #1 (&sb->s_type->i_mutex_key#14){++++++}:
 lock_acquire+0x71/0x90
 down_write+0x3f/0x70
 configfs_depend_item+0x3a/0xb0 [configfs]
 target_depend_item+0x13/0x20 [target_core_mod]
 target_xcopy_locate_se_dev_e4+0xdd/0x1a0 [target_core_mod]
 target_do_xcopy+0x34b/0x970 [target_core_mod]
 __target_execute_cmd+0x22/0xa0 [target_core_mod]
 target_execute_cmd+0x233/0x2c0 [target_core_mod]
 iscsit_execute_cmd+0x208/0x270 [iscsi_target_mod]
 iscsit_sequence_cmd+0x10b/0x190 [iscsi_target_mod]
 iscsit_get_rx_pdu+0x37d/0xcd0 [iscsi_target_mod]
 iscsi_target_rx_thread+0x6e/0xa0 [iscsi_target_mod]
 kthread+0x102/0x140
 ret_from_fork+0x31/0x40

-> #0 (&sess->cmdsn_mutex){+.+.+.}:
 __lock_acquire+0x10e6/0x1260
 lock_acquire+0x71/0x90
 mutex_lock_nested+0x5f/0x670
 iscsit_free_all_ooo_cmdsns+0x2d/0xb0 [iscsi_target_mod]
 iscsit_close_session+0xac/0x200 [iscsi_target_mod]
 lio_tpg_close_session+0x9f/0xb0 [iscsi_target_mod]
 target_shutdown_sessions+0xc3/0xd0 [target_core_mod]
 core_tpg_del_initiator_node_acl+0x91/0x140 [target_core_mod]
 target_fabric_nacl_base_release+0x20/0x30 [target_core_mod]
 config_item_release+0x5a/0xc0 [configfs]
 config_item_put+0x1d/0x1f [configfs]
 configfs_rmdir+0x1a6/0x300 [configfs]
 vfs_rmdir+0xb7/0x140
 do_rmdir+0x1f4/0x200
 SyS_rmdir+0x11/0x20
 entry_SYSCALL_64_fastpath+0x23/0xc6

other info that might help us debug this:

 Possible unsafe locking scenario:
       CPU0                    CPU1
       ----                    ----
  lock(&sb->s_type->i_mutex_key#14);
                               lock(&sess->cmdsn_mutex);
                               lock(&sb->s_type->i_mutex_key#14);
  lock(&sess->cmdsn_mutex);

 *** DEADLOCK ***

3 locks held by rmdir/13321:
 #0:  (sb_writers#10){.+.+.+}, at: [<ffffffff811e1aff>] mnt_want_write+0x1f/0x50
 #1:  (&default_group_class[depth - 1]#2/1){+.+.+.}, at: [<ffffffff811cc8ce>] do_rmdir+0x15e/0x200
 #2:  (&sb->s_type->i_mutex_key#14){++++++}, at: [<ffffffff811c6e20>] vfs_rmdir+0x50/0x140

stack backtrace:
CPU: 2 PID: 13321 Comm: rmdir Not tainted 4.10.0-rc7-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Call Trace:
 dump_stack+0x86/0xc3
 print_circular_bug+0x1c7/0x220
 __lock_acquire+0x10e6/0x1260
 lock_acquire+0x71/0x90
 mutex_lock_nested+0x5f/0x670
 iscsit_free_all_ooo_cmdsns+0x2d/0xb0 [iscsi_target_mod]
 iscsit_close_session+0xac/0x200 [iscsi_target_mod]
 lio_tpg_close_session+0x9f/0xb0 [iscsi_target_mod]
 target_shutdown_sessions+0xc3/0xd0 [target_core_mod]
 core_tpg_del_initiator_node_acl+0x91/0x140 [target_core_mod]
 target_fabric_nacl_base_release+0x20/0x30 [target_core_mod]
 config_item_release+0x5a/0xc0 [configfs]
 config_item_put+0x1d/0x1f [configfs]
 configfs_rmdir+0x1a6/0x300 [configfs]
 vfs_rmdir+0xb7/0x140
 do_rmdir+0x1f4/0x200
 SyS_rmdir+0x11/0x20
 entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:58:00 -07:00
Bart Van Assche a85d667e58 target: Use {get,put}_unaligned_be*() instead of open coding these functions
Introduce the function get_unaligned_be24(). Use {get,put}_unaligned_be*()
where appropriate. This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:57:59 -07:00
Bart Van Assche f2b72d6a8e target: Fix transport_init_se_cmd()
Avoid that aborting a command before it has been submitted onto
a workqueue triggers the following warning:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 3 PID: 46 Comm: kworker/u8:1 Not tainted 4.12.0-rc2-dbg+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
Workqueue: tmr-iblock target_tmr_work [target_core_mod]
Call Trace:
 dump_stack+0x86/0xcf
 register_lock_class+0xe8/0x570
 __lock_acquire+0xa1/0x11d0
 lock_acquire+0x59/0x80
 flush_work+0x42/0x2b0
 __cancel_work_timer+0x10c/0x180
 cancel_work_sync+0xb/0x10
 core_tmr_lun_reset+0x352/0x740 [target_core_mod]
 target_tmr_work+0xd6/0x130 [target_core_mod]
 process_one_work+0x1ca/0x3f0
 worker_thread+0x49/0x3b0
 kthread+0x109/0x140
 ret_from_fork+0x31/0x40

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:57:59 -07:00
Bart Van Assche 9f2f342892 target: Remove se_device.dev_list
The last user of se_device.dev_list was removed through commit
0fd97ccf45 ("target: kill struct se_subsystem_dev"). Hence
also remove se_device.dev_list.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:57:58 -07:00
Bart Van Assche 3e182db787 target: Use symbolic value for WRITE_VERIFY_16
Now that a symbolic value has been introduced for WRITE_VERIFY_16,
use it. This patch does not change any functionality.

References: commit c2d26f18dc ("target: Add WRITE_VERIFY_16")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:57:57 -07:00
Nicholas Bellinger 5465e7d3b9 target: Add TARGET_SCF_LOOKUP_LUN_FROM_TAG support for ABORT_TASK
This patch introduces support in target_submit_tmr() for locating a
unpacked_lun from an existing se_cmd->tag during ABORT_TASK.

When TARGET_SCF_LOOKUP_LUN_FROM_TAG is set, target_submit_tmr()
will do the extra lookup via target_lookup_lun_from_tag() and
subsequently invoke transport_lookup_tmr_lun() so a proper
percpu se_lun->lun_ref is taken before workqueue dispatch into
se_device->tmr_wq happens.

Aside from the extra target_lookup_lun_from_tag(), the existing
code-path remains unchanged.

Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Quinn Tran <quinn.tran@cavium.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:57:56 -07:00
Nicholas Bellinger eeb64d239e target: Add support for TMR percpu reference counting
This patch introduces TMR percpu reference counting using
se_lun->lun_ref in transport_lookup_tmr_lun(), following
how existing non TMR per se_lun reference counting works
within transport_lookup_cmd_lun().

It also adds explicit transport_lun_remove_cmd() calls to
drop the reference in the three tmr related locations that
invoke transport_cmd_check_stop_to_fabric();

   - target_tmr_work() during normal ->queue_tm_rsp()
   - target_complete_tmr_failure() during error ->queue_tm_rsp()
   - transport_generic_handle_tmr() during early failure

Also, note the exception paths in transport_generic_free_cmd()
and transport_cmd_finish_abort() already check SCF_SE_LUN_CMD,
and will invoke transport_lun_remove_cmd() when necessary.

Reviewed-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Quinn Tran <quinn.tran@cavium.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:57:55 -07:00
Jiang Yi 12f66e4a0f target: reject COMPARE_AND_WRITE if emulate_caw is not set
In struct se_dev_attrib, there is a field emulate_caw exposed
as a /sys/kernel/config/target/core/$HBA/$DEV/attrib/.

If this field is set zero, it means the corresponding struct se_device
does not support the scsi cmd COMPARE_AND_WRITE

In function sbc_parse_cdb(), go ahead and reject scsi COMPARE_AND_WRITE
if emulate_caw is not set, because it has been explicitly disabled
from user-space.

(Make pr_err ratelimited - nab)

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-07-06 22:57:54 -07:00
Linus Torvalds 89fbf5384d Merge branch 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull read/write updates from Al Viro:
 "Christoph's fs/read_write.c series - consolidation and cleanups"

* 'work.read_write' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  nfsd: remove nfsd_vfs_read
  nfsd: use vfs_iter_read/write
  fs: implement vfs_iter_write using do_iter_write
  fs: implement vfs_iter_read using do_iter_read
  fs: move more code into do_iter_read/do_iter_write
  fs: remove __do_readv_writev
  fs: remove do_compat_readv_writev
  fs: remove do_readv_writev
2017-07-05 14:35:57 -07:00
Linus Torvalds 5518b69b76 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Reasonably busy this cycle, but perhaps not as busy as in the 4.12
  merge window:

   1) Several optimizations for UDP processing under high load from
      Paolo Abeni.

   2) Support pacing internally in TCP when using the sch_fq packet
      scheduler for this is not practical. From Eric Dumazet.

   3) Support mutliple filter chains per qdisc, from Jiri Pirko.

   4) Move to 1ms TCP timestamp clock, from Eric Dumazet.

   5) Add batch dequeueing to vhost_net, from Jason Wang.

   6) Flesh out more completely SCTP checksum offload support, from
      Davide Caratti.

   7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
      Neira Ayuso, and Matthias Schiffer.

   8) Add devlink support to nfp driver, from Simon Horman.

   9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
      Prabhu.

  10) Add stack depth tracking to BPF verifier and use this information
      in the various eBPF JITs. From Alexei Starovoitov.

  11) Support XDP on qed device VFs, from Yuval Mintz.

  12) Introduce BPF PROG ID for better introspection of installed BPF
      programs. From Martin KaFai Lau.

  13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.

  14) For loads, allow narrower accesses in bpf verifier checking, from
      Yonghong Song.

  15) Support MIPS in the BPF selftests and samples infrastructure, the
      MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
      Daney.

  16) Support kernel based TLS, from Dave Watson and others.

  17) Remove completely DST garbage collection, from Wei Wang.

  18) Allow installing TCP MD5 rules using prefixes, from Ivan
      Delalande.

  19) Add XDP support to Intel i40e driver, from Björn Töpel

  20) Add support for TC flower offload in nfp driver, from Simon
      Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
      Kicinski, and Bert van Leeuwen.

  21) IPSEC offloading support in mlx5, from Ilan Tayari.

  22) Add HW PTP support to macb driver, from Rafal Ozieblo.

  23) Networking refcount_t conversions, From Elena Reshetova.

  24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
      for tuning the TCP sockopt settings of a group of applications,
      currently via CGROUPs"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
  net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
  dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
  cxgb4: Support for get_ts_info ethtool method
  cxgb4: Add PTP Hardware Clock (PHC) support
  cxgb4: time stamping interface for PTP
  nfp: default to chained metadata prepend format
  nfp: remove legacy MAC address lookup
  nfp: improve order of interfaces in breakout mode
  net: macb: remove extraneous return when MACB_EXT_DESC is defined
  bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
  bpf: fix return in load_bpf_file
  mpls: fix rtm policy in mpls_getroute
  net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
  net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
  net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
  net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
  net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
  ...
2017-07-05 12:31:59 -07:00
Dmitry Monakhov 128b6f9fdd t10-pi: Move opencoded contants to common header
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-07-03 16:56:25 -06:00
Linus Torvalds c6b1e36c8f Merge branch 'for-4.13/block' of git://git.kernel.dk/linux-block
Pull core block/IO updates from Jens Axboe:
 "This is the main pull request for the block layer for 4.13. Not a huge
  round in terms of features, but there's a lot of churn related to some
  core cleanups.

  Note this depends on the UUID tree pull request, that Christoph
  already sent out.

  This pull request contains:

   - A series from Christoph, unifying the error/stats codes in the
     block layer. We now use blk_status_t everywhere, instead of using
     different schemes for different places.

   - Also from Christoph, some cleanups around request allocation and IO
     scheduler interactions in blk-mq.

   - And yet another series from Christoph, cleaning up how we handle
     and do bounce buffering in the block layer.

   - A blk-mq debugfs series from Bart, further improving on the support
     we have for exporting internal information to aid debugging IO
     hangs or stalls.

   - Also from Bart, a series that cleans up the request initialization
     differences across types of devices.

   - A series from Goldwyn Rodrigues, allowing the block layer to return
     failure if we will block and the user asked for non-blocking.

   - Patch from Hannes for supporting setting loop devices block size to
     that of the underlying device.

   - Two series of patches from Javier, fixing various issues with
     lightnvm, particular around pblk.

   - A series from me, adding support for write hints. This comes with
     NVMe support as well, so applications can help guide data placement
     on flash to improve performance, latencies, and write
     amplification.

   - A series from Ming, improving and hardening blk-mq support for
     stopping/starting and quiescing hardware queues.

   - Two pull requests for NVMe updates. Nothing major on the feature
     side, but lots of cleanups and bug fixes. From the usual crew.

   - A series from Neil Brown, greatly improving the bio rescue set
     support. Most notably, this kills the bio rescue work queues, if we
     don't really need them.

   - Lots of other little bug fixes that are all over the place"

* 'for-4.13/block' of git://git.kernel.dk/linux-block: (217 commits)
  lightnvm: pblk: set line bitmap check under debug
  lightnvm: pblk: verify that cache read is still valid
  lightnvm: pblk: add initialization check
  lightnvm: pblk: remove target using async. I/Os
  lightnvm: pblk: use vmalloc for GC data buffer
  lightnvm: pblk: use right metadata buffer for recovery
  lightnvm: pblk: schedule if data is not ready
  lightnvm: pblk: remove unused return variable
  lightnvm: pblk: fix double-free on pblk init
  lightnvm: pblk: fix bad le64 assignations
  nvme: Makefile: remove dead build rule
  blk-mq: map all HWQ also in hyperthreaded system
  nvmet-rdma: register ib_client to not deadlock in device removal
  nvme_fc: fix error recovery on link down.
  nvmet_fc: fix crashes on bad opcodes
  nvme_fc: Fix crash when nvme controller connection fails.
  nvme_fc: replace ioabort msleep loop with completion
  nvme_fc: fix double calls to nvme_cleanup_cmd()
  nvme-fabrics: verify that a controller returns the correct NQN
  nvme: simplify nvme_dev_attrs_are_visible
  ...
2017-07-03 10:34:51 -07:00
David S. Miller b079115937 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
A set of overlapping changes in macvlan and the rocker
driver, nothing serious.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-30 12:43:08 -04:00
Christoph Hellwig abbb65899a fs: implement vfs_iter_write using do_iter_write
De-dupliate some code and allow for passing the flags argument to
vfs_iter_write.  Additionally it now properly updates timestamps.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-29 17:49:23 -04:00
Christoph Hellwig 18e9710ee5 fs: implement vfs_iter_read using do_iter_read
De-dupliate some code and allow for passing the flags argument to
vfs_iter_read.  Additional it properly updates atime now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2017-06-29 17:49:23 -04:00
Bart Van Assche ca18d6f769 block: Make most scsi_req_init() calls implicit
Instead of explicitly calling scsi_req_init() after blk_get_request(),
call that function from inside blk_get_request(). Add an
.initialize_rq_fn() callback function to the block drivers that need
it. Merge the IDE .init_rq_fn() function into .initialize_rq_fn()
because it is too small to keep it as a separate function. Keep the
scsi_req_init() call in ide_prep_sense() because it follows a
blk_rq_init() call.

References: commit 82ed4db499 ("block: split scsi_request out of struct request")
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-20 19:27:14 -06:00
yuan linyu de77b966ce net: introduce __skb_put_[zero, data, u8]
follow Johannes Berg, semantic patch file as below,
@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = __skb_put(skb, len);
+p = __skb_put_zero(skb, len);
|
-p = (t)__skb_put(skb, len);
+p = __skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
identifier p;
expression len;
expression skb;
type t;
@@
(
-t p = __skb_put(skb, len);
+t p = __skb_put_zero(skb, len);
)
... when != p
(
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = __skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
|
-p = (t *)__skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(__skb_put(skb, len), 0, len);
+__skb_put_zero(skb, len);

@@
expression skb, len, data;
@@
-memcpy(__skb_put(skb, len), data, len);
+__skb_put_data(skb, data, len);

@@
expression SKB, C, S;
typedef u8;
identifier fn = {__skb_put};
fresh identifier fn2 = fn ## "_u8";
@@
- *(u8 *)fn(SKB, S) = C;
+ fn2(SKB, C);

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-20 13:30:14 -04:00
Jason A. Donenfeld 6787ab81b2 iscsi: ensure RNG is seeded before use
It's not safe to use weak random data here, especially for the challenge
response randomness. Since we're always in process context, it's safe to
simply wait until we have enough randomness to carry out the
authentication correctly.

While we're at it, we clean up a small memleak during an error
condition.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2017-06-19 22:06:28 -04:00
NeilBrown 011067b056 blk: replace bioset_create_nobvec() with a flags arg to bioset_create()
"flags" arguments are often seen as good API design as they allow
easy extensibility.
bioset_create_nobvec() is implemented internally as a variation in
flags passed to __bioset_create().

To support future extension, make the internal structure part of the
API.
i.e. add a 'flags' argument to bioset_create() and discard
bioset_create_nobvec().

Note that the bio_split allocations in drivers/md/raid* do not need
the bvec mempool - they should have used bioset_create_nobvec().

Suggested-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-18 12:40:59 -06:00
Johannes Berg d58ff35122 networking: make skb_push & __skb_push return void pointers
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions return void * and remove all the casts across
the tree, adding a (u8 *) cast only where the unsigned char pointer
was used directly, all done with the following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

    @@
    expression SKB, LEN;
    identifier fn = { skb_push, __skb_push, skb_push_rcsum };
    @@
    - fn(SKB, LEN)[0]
    + *(u8 *)fn(SKB, LEN)

Note that the last part there converts from push(...)[0] to the
more idiomatic *(u8 *)push(...).

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:40 -04:00
Johannes Berg 4df864c1d9 networking: make skb_put & friends return void pointers
It seems like a historic accident that these return unsigned char *,
and in many places that means casts are required, more often than not.

Make these functions (skb_put, __skb_put and pskb_put) return void *
and remove all the casts across the tree, adding a (u8 *) cast only
where the unsigned char pointer was used directly, all done with the
following spatch:

    @@
    expression SKB, LEN;
    typedef u8;
    identifier fn = { skb_put, __skb_put };
    @@
    - *(fn(SKB, LEN))
    + *(u8 *)fn(SKB, LEN)

    @@
    expression E, SKB, LEN;
    identifier fn = { skb_put, __skb_put };
    type T;
    @@
    - E = ((T *)(fn(SKB, LEN)))
    + E = fn(SKB, LEN)

which actually doesn't cover pskb_put since there are only three
users overall.

A handful of stragglers were converted manually, notably a macro in
drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
instances in net/bluetooth/hci_sock.c. In the former file, I also
had to fix one whitespace problem spatch introduced.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-16 11:48:39 -04:00
Jens Axboe 8f66439eec Linux 4.12-rc5
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJZPdbLAAoJEHm+PkMAQRiGx4wH/1nCjfnl6fE8oJ24/1gEAOUh
 biFdqJkYZmlLYHVtYfLm4Ueg4adJdg0wx6qM/4RaAzmQVvLfDV34bc1qBf1+P95G
 kVF+osWyXrZo5cTwkwapHW/KNu4VJwAx2D1wrlxKDVG5AOrULH1pYOYGOpApEkZU
 4N+q5+M0ce0GJpqtUZX+UnI33ygjdDbBxXoFKsr24B7eA0ouGbAJ7dC88WcaETL+
 2/7tT01SvDMo0jBSV0WIqlgXwZ5gp3yPGnklC3F4159Yze6VFrzHMKS/UpPF8o8E
 W9EbuzwxsKyXUifX2GY348L1f+47glen/1sedbuKnFhP6E9aqUQQJXvEO7ueQl4=
 =m2Gx
 -----END PGP SIGNATURE-----

Merge tag 'v4.12-rc5' into for-4.13/block

We've already got a few conflicts and upcoming work depends on some of the
changes that have gone into mainline as regression fixes for this series.

Pull in 4.12-rc5 to resolve these conflicts and make it easier on down stream
trees to continue working on 4.13 changes.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2017-06-12 08:30:13 -06:00
Christoph Hellwig 4e4cbee93d block: switch bios to blk_status_t
Replace bi_error with a new bi_status to allow for a clear conversion.
Note that device mapper overloaded bi_error with a private value, which
we'll have to keep arround at least for now and thus propagate to a
proper blk_status_t value.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Christoph Hellwig 2a842acab1 block: introduce new block status code type
Currently we use nornal Linux errno values in the block layer, and while
we accept any error a few have overloaded magic meanings.  This patch
instead introduces a new  blk_status_t value that holds block layer specific
status codes and explicitly explains their meaning.  Helpers to convert from
and to the previous special meanings are provided for now, but I suspect
we want to get rid of them in the long run - those drivers that have a
errno input (e.g. networking) usually get errnos that don't know about
the special block layer overloads, and similarly returning them to userspace
will usually return somethings that strictly speaking isn't correct
for file system operations, but that's left as an exercise for later.

For now the set of errors is a very limited set that closely corresponds
to the previous overloaded errno values, but there is some low hanging
fruite to improve it.

blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
typechecking, so that we can easily catch places passing the wrong values.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-06-09 09:27:32 -06:00
Nicholas Bellinger eceb4459df iscsi-target: Avoid holding ->tpg_state_lock during param update
As originally reported by Jia-Ju, iscsit_tpg_enable_portal_group()
holds iscsi_portal_group->tpg_state_lock while updating AUTHMETHOD
via iscsi_update_param_value(), which performs a GFP_KERNEL
allocation.

However, since iscsit_tpg_enable_portal_group() is already protected
by iscsit_get_tpg() -> iscsi_portal_group->tpg_access_lock in it's
parent caller, ->tpg_state_lock only needs to be held when setting
TPG_STATE_ACTIVE.

Reported-by: Jia-Ju Bai <baijiaju1990@163.com>
Reviewed-by: Jia-Ju Bai <baijiaju1990@163.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-06-08 23:26:38 -07:00
Nicholas Bellinger 9ae0e9ade5 target/configfs: Kill se_lun->lun_link_magic
Instead of using a hardcoded magic value in se_lun when verifying
a target config_item symlink source during target_fabric_mappedlun_link(),
go ahead and use target_fabric_port_item_ops directly instead.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-06-08 23:26:38 -07:00
Nicholas Bellinger c17cd24959 target/configfs: Kill se_device->dev_link_magic
Instead of using a hardcoded magic value in se_device when verifying
a target config_item symlink source during target_fabric_port_link(),
go ahead and use target_core_dev_item_ops directly instead.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-06-08 23:26:38 -07:00
Nicholas Bellinger 2237498f0b target/iblock: Convert WRITE_SAME to blkdev_issue_zeroout
The people who are actively using iblock_execute_write_same_direct() are
doing so in the context of ESX VAAI BlockZero, together with
EXTENDED_COPY and COMPARE_AND_WRITE primitives.

In practice though I've not seen any users of IBLOCK WRITE_SAME for
anything other than VAAI BlockZero, so just using blkdev_issue_zeroout()
when available, and falling back to iblock_execute_write_same() if the
WRITE_SAME buffer contains anything other than zeros should be OK.

(Hook up max_write_zeroes_sectors to signal LBPRZ feature bit in
 target_configure_unmap_from_queue - nab)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-06-08 23:26:37 -07:00
Gustavo A. R. Silva fb418240ec target: remove dead code
Local variable _ret_ is assigned to a constant value and it is never
updated again. Remove this variable and the dead code it guards.

Addresses-Coverity-ID: 140761
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-06-08 23:26:37 -07:00
Nicholas Bellinger abb85a9b51 iscsi-target: Reject immediate data underflow larger than SCSI transfer length
When iscsi WRITE underflow occurs there are two different scenarios
that can happen.

Normally in practice, when an EDTL vs. SCSI CDB TRANSFER LENGTH
underflow is detected, the iscsi immediate data payload is the
smaller SCSI CDB TRANSFER LENGTH.

That is, when a host fabric LLD is using a fixed size EDTL for
a specific control CDB, the SCSI CDB TRANSFER LENGTH and actual
SCSI payload ends up being smaller than EDTL.  In iscsi, this
means the received iscsi immediate data payload matches the
smaller SCSI CDB TRANSFER LENGTH, because there is no more
SCSI payload to accept beyond SCSI CDB TRANSFER LENGTH.

However, it's possible for a malicous host to send a WRITE
underflow where EDTL is larger than SCSI CDB TRANSFER LENGTH,
but incoming iscsi immediate data actually matches EDTL.

In the wild, we've never had a iscsi host environment actually
try to do this.

For this special case, it's wrong to truncate part of the
control CDB payload and continue to process the command during
underflow when immediate data payload received was larger than
SCSI CDB TRANSFER LENGTH, so go ahead and reject and drop the
bogus payload as a defensive action.

Note this potential bug was originally relaxed by the following
for allowing WRITE underflow in MSFT FCP host environments:

   commit c72c525022
   Author: Roland Dreier <roland@purestorage.com>
   Date:   Wed Jul 22 15:08:18 2015 -0700

      target: allow underflow/overflow for PR OUT etc. commands

Cc: Roland Dreier <roland@purestorage.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <stable@vger.kernel.org> # v4.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-06-08 22:25:29 -07:00
Nicholas Bellinger 105fa2f44e iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
This patch fixes a BUG() in iscsit_close_session() that could be
triggered when iscsit_logout_post_handler() execution from within
tx thread context was not run for more than SECONDS_FOR_LOGOUT_COMP
(15 seconds), and the TCP connection didn't already close before
then forcing tx thread context to automatically exit.

This would manifest itself during explicit logout as:

[33206.974254] 1 connection(s) still exist for iSCSI session to iqn.1993-08.org.debian:01:3f5523242179
[33206.980184] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 2100.772 msecs
[33209.078643] ------------[ cut here ]------------
[33209.078646] kernel BUG at drivers/target/iscsi/iscsi_target.c:4346!

Normally when explicit logout attempt fails, the tx thread context
exits and iscsit_close_connection() from rx thread context does the
extra cleanup once it detects conn->conn_logout_remove has not been
cleared by the logout type specific post handlers.

To address this special case, if the logout post handler in tx thread
context detects conn->tx_thread_active has already been cleared, simply
return and exit in order for existing iscsit_close_connection()
logic from rx thread context do failed logout cleanup.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.14+
Tested-by: Gary Guo <ghg@datera.io>
Tested-by: Chu Yuan Lin <cyl@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-06-08 22:25:14 -07:00
Nicholas Bellinger 73d4e580cc target: Fix kref->refcount underflow in transport_cmd_finish_abort
This patch fixes a se_cmd->cmd_kref underflow during CMD_T_ABORTED
when a fabric driver drops it's second reference from below the
target_core_tmr.c based callers of transport_cmd_finish_abort().

Recently with the conversion of kref to refcount_t, this bug was
manifesting itself as:

[705519.601034] refcount_t: underflow; use-after-free.
[705519.604034] INFO: NMI handler (kgdb_nmi_handler) took too long to run: 20116.512 msecs
[705539.719111] ------------[ cut here ]------------
[705539.719117] WARNING: CPU: 3 PID: 26510 at lib/refcount.c:184 refcount_sub_and_test+0x33/0x51

Since the original kref atomic_t based kref_put() didn't check for
underflow and only invoked the final callback when zero was reached,
this bug did not manifest in practice since all se_cmd memory is
using preallocated tags.

To address this, go ahead and propigate the existing return from
transport_put_cmd() up via transport_cmd_finish_abort(), and
change transport_cmd_finish_abort() + core_tmr_handle_tas_abort()
callers to only do their local target_put_sess_cmd() if necessary.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Tested-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Mike Christie <mchristi@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.14+
Tested-by: Gary Guo <ghg@datera.io>
Tested-by: Chu Yuan Lin <cyl@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-06-08 22:24:18 -07:00
Ganesh Goudar 1dec4cec9f cxgb4: Fix tids count for ipv6 offload connection
the adapter consumes two tids for every ipv6 offload
connection be it active or passive, calculate tid usage
count accordingly.

Also change the signatures of relevant functions to get
the address family.

Signed-off-by: Rizwan Ansari <rizwana@chelsio.com>
Signed-off-by: Varun Prakash <varun@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-07 15:14:34 -04:00
Jiang Yi 5e0cf5e6c4 iscsi-target: Always wait for kthread_should_stop() before kthread exit
There are three timing problems in the kthread usages of iscsi_target_mod:

 - np_thread of struct iscsi_np
 - rx_thread and tx_thread of struct iscsi_conn

In iscsit_close_connection(), it calls

 send_sig(SIGINT, conn->tx_thread, 1);
 kthread_stop(conn->tx_thread);

In conn->tx_thread, which is iscsi_target_tx_thread(), when it receive
SIGINT the kthread will exit without checking the return value of
kthread_should_stop().

So if iscsi_target_tx_thread() exit right between send_sig(SIGINT...)
and kthread_stop(...), the kthread_stop() will try to stop an already
stopped kthread.

This is invalid according to the documentation of kthread_stop().

(Fix -ECONNRESET logout handling in iscsi_target_tx_thread and
 early iscsi_target_rx_thread failure case - nab)

Signed-off-by: Jiang Yi <jiangyilism@gmail.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-31 15:12:57 -07:00
Nicholas Bellinger 25cdda95fd iscsi-target: Fix initial login PDU asynchronous socket close OOPs
This patch fixes a OOPs originally introduced by:

   commit bb048357da
   Author: Nicholas Bellinger <nab@linux-iscsi.org>
   Date:   Thu Sep 5 14:54:04 2013 -0700

   iscsi-target: Add sk->sk_state_change to cleanup after TCP failure

which would trigger a NULL pointer dereference when a TCP connection
was closed asynchronously via iscsi_target_sk_state_change(), but only
when the initial PDU processing in iscsi_target_do_login() from iscsi_np
process context was blocked waiting for backend I/O to complete.

To address this issue, this patch makes the following changes.

First, it introduces some common helper functions used for checking
socket closing state, checking login_flags, and atomically checking
socket closing state + setting login_flags.

Second, it introduces a LOGIN_FLAGS_INITIAL_PDU bit to know when a TCP
connection has dropped via iscsi_target_sk_state_change(), but the
initial PDU processing within iscsi_target_do_login() in iscsi_np
context is still running.  For this case, it sets LOGIN_FLAGS_CLOSED,
but doesn't invoke schedule_delayed_work().

The original NULL pointer dereference case reported by MNC is now handled
by iscsi_target_do_login() doing a iscsi_target_sk_check_close() before
transitioning to FFP to determine when the socket has already closed,
or iscsi_target_start_negotiation() if the login needs to exchange
more PDUs (eg: iscsi_target_do_login returned 0) but the socket has
closed.  For both of these cases, the cleanup up of remaining connection
resources will occur in iscsi_target_start_negotiation() from iscsi_np
process context once the failure is detected.

Finally, to handle to case where iscsi_target_sk_state_change() is
called after the initial PDU procesing is complete, it now invokes
conn->login_work -> iscsi_target_do_login_rx() to perform cleanup once
existing iscsi_target_sk_check_close() checks detect connection failure.
For this case, the cleanup of remaining connection resources will occur
in iscsi_target_do_login_rx() from delayed workqueue process context
once the failure is detected.

Reported-by: Mike Christie <mchristi@redhat.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Mike Christie <mchristi@redhat.com>
Cc: Mike Christie <mchristi@redhat.com>
Reported-by: Hannes Reinecke <hare@suse.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Varun Prakash <varun@chelsio.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-31 15:12:31 -07:00
Mike Christie f3cdbe39b2 tcmu: fix crash during device removal
We currently do

tcmu_free_device ->tcmu_netlink_event(TCMU_CMD_REMOVED_DEVICE) ->
uio_unregister_device -> kfree(tcmu_dev).

The problem is that the kernel does not wait for userspace to
do the close() on the uio device before freeing the tcmu_dev.
We can then hit a race where the kernel frees the tcmu_dev before
userspace does close() and so when close() -> release -> tcmu_release
is done, we try to access a freed tcmu_dev.

This patch made over the target-pending master branch moves the freeing
of the tcmu_dev to when the last reference has been dropped.

This also fixes a leak where if tcmu_configure_device was not called on a
device we did not free udev->name which was allocated at tcmu_alloc_device time.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-23 19:50:49 -07:00
Nicholas Bellinger 4ff83daa02 target: Re-add check to reject control WRITEs with overflow data
During v4.3 when the overflow/underflow check was relaxed by
commit c72c525022:

  commit c72c525022
  Author: Roland Dreier <roland@purestorage.com>
  Date:   Wed Jul 22 15:08:18 2015 -0700

       target: allow underflow/overflow for PR OUT etc. commands

to allow underflow/overflow for Windows compliance + FCP, a
consequence was to allow control CDBs to process overflow
data for iscsi-target with immediate data as well.

As per Roland's original change, continue to allow underflow
cases for control CDBs to make Windows compliance + FCP happy,
but until overflow for control CDBs is supported tree-wide,
explicitly reject all control WRITEs with overflow following
pre v4.3.y logic.

Reported-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Roland Dreier <roland@purestorage.com>
Cc: <stable@vger.kernel.org> # v4.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-15 20:20:29 -07:00
Linus Torvalds bd1286f964 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Things were a lot more calm than previously expected. It's primarily
  fixes in various areas, with most of the new functionality centering
  around TCMU backend driver work that Xiubo Li has been driving.

  Here's the summary on the feature side:

   - Make T10-PI verify configurable for emulated (FILEIO + RD) backends
    (Dmitry Monakhov)
   - Allow target-core/TCMU pass-through to use in-kernel SPC-PR logic
    (Bryant Ly + MNC)
   - Add TCMU support for growing ring buffer size (Xiubo Li + MNC)
   - Add TCMU support for global block data pool (Xiubo Li + MNC)

  and on the bug-fix side:

   - Fix COMPARE_AND_WRITE non GOOD status handling for READ phase
    failures (Gary Guo + nab)
   - Fix iscsi-target hang with explicitly changing per NodeACL
    CmdSN number depth with concurrent login driven session
    reinstatement.  (Gary Guo + nab)
   - Fix ibmvscsis fabric driver ABORT task handling (Bryant Ly)
   - Fix target-core/FILEIO zero length handling (Bart Van Assche)

  Also, there was an OOPs introduced with the WRITE_VERIFY changes that
  I ended up reverting at the last minute, because as not unusual Bart
  and I could not agree on the fix in time for -rc1. Since it's specific
  to a conformance test, it's been reverted for now.

  There is a separate patch in the queue to address the underlying
  control CDB write overflow regression in >= v4.3 separate from the
  WRITE_VERIFY revert here, that will be pushed post -rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (30 commits)
  Revert "target: Fix VERIFY and WRITE VERIFY command parsing"
  IB/srpt: Avoid that aborting a command triggers a kernel warning
  IB/srpt: Fix abort handling
  target/fileio: Fix zero-length READ and WRITE handling
  ibmvscsis: Do not send aborted task response
  tcmu: fix module removal due to stuck thread
  target: Don't force session reset if queue_depth does not change
  iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
  target: Fix compare_and_write_callback handling for non GOOD status
  tcmu: Recalculate the tcmu_cmd size to save cmd area memories
  tcmu: Add global data block pool support
  tcmu: Add dynamic growing data area feature support
  target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
  target: fixup error message in target_tg_pt_gp_alua_access_type_store()
  target/user: PGR Support
  target: Add WRITE_VERIFY_16
  Documentation/target: add an example script to configure an iSCSI target
  target: Use kmalloc_array() in transport_kmap_data_sg()
  target: Use kmalloc_array() in compare_and_write_callback()
  target: Improve size determinations in two functions
  ...
2017-05-12 11:44:13 -07:00
Nicholas Bellinger 984a9d4c40 Revert "target: Fix VERIFY and WRITE VERIFY command parsing"
This reverts commit 0e2eb7d12e

  Author: Bart Van Assche <bart.vanassche@sandisk.com>
  Date:   Thu Mar 30 10:12:39 2017 -0700

      target: Fix VERIFY and WRITE VERIFY command parsing

This patch broke existing behaviour for WRITE_VERIFY because
it dropped the original SCF_SCSI_DATA_CDB assignment for
bytchk = 0 so target_cmd_size_check() no longer rejected
this case, allowing an overflow case to trigger an OOPs
in iscsi-target.

Since the short term and long term fixes are still being
discussed, revert it for now since it's late in the merge
window and try again in v4.13-rc1.

Conflicts:
	drivers/target/target_core_sbc.c

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-11 01:01:05 -07:00
Bart Van Assche 59ac9c0781 target/fileio: Fix zero-length READ and WRITE handling
This patch fixes zero-length READ and WRITE handling in target/FILEIO,
which was broken a long time back by:

Since:

  commit d81cb44726
  Author: Paolo Bonzini <pbonzini@redhat.com>
  Date:   Mon Sep 17 16:36:11 2012 -0700

      target: go through normal processing for all zero-length commands

which moved zero-length READ and WRITE completion out of target-core,
to doing submission into backend driver code.

To address this, go ahead and invoke target_complete_cmd() for any
non negative return value in fd_do_rw().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: David Disseldorp <ddiss@suse.de>
Cc: <stable@vger.kernel.org> # v3.7+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-07 16:05:16 -07:00
Mike Christie d906d8af28 tcmu: fix module removal due to stuck thread
We need to do a kthread_should_stop to check when kthread_stop has been
called.

This was a regression added in

b6df4b79a5
tcmu: Add global data block pool support

so not sure if you wanted to merge it in with that patch or what.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-04 20:01:41 -07:00
Nicholas Bellinger 46861cdd80 target: Don't force session reset if queue_depth does not change
Keeping in the idempotent nature of target_core_fabric_configfs.c,
if a queue_depth value is set and it's the same as the existing
value, don't attempt to force session reinstatement.

Reported-by: Raghu Krishnamurthy <rk@datera.io>
Cc: Raghu Krishnamurthy <rk@datera.io>
Tested-by: Gary Guo <ghg@datera.io>
Cc: Gary Guo <ghg@datera.io>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-04 20:01:40 -07:00
Nicholas Bellinger 197b806ae5 iscsi-target: Set session_fall_back_to_erl0 when forcing reinstatement
While testing modification of per se_node_acl queue_depth forcing
session reinstatement via lio_target_nacl_cmdsn_depth_store() ->
core_tpg_set_initiator_node_queue_depth(), a hung task bug triggered
when changing cmdsn_depth invoked session reinstatement while an iscsi
login was already waiting for session reinstatement to complete.

This can happen when an outstanding se_cmd descriptor is taking a
long time to complete, and session reinstatement from iscsi login
or cmdsn_depth change occurs concurrently.

To address this bug, explicitly set session_fall_back_to_erl0 = 1
when forcing session reinstatement, so session reinstatement is
not attempted if an active session is already being shutdown.

This patch has been tested with two scenarios.  The first when
iscsi login is blocked waiting for iscsi session reinstatement
to complete followed by queue_depth change via configfs, and
second when queue_depth change via configfs us blocked followed
by a iscsi login driven session reinstatement.

Note this patch depends on commit d36ad77f70 to handle multiple
sessions per se_node_acl when changing cmdsn_depth, and for
pre v4.5 kernels will need to be included for stable as well.

Reported-by: Gary Guo <ghg@datera.io>
Tested-by: Gary Guo <ghg@datera.io>
Cc: Gary Guo <ghg@datera.io>
Cc: <stable@vger.kernel.org> # v4.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-04 20:01:39 -07:00
Nicholas Bellinger a71a5dc7f8 target: Fix compare_and_write_callback handling for non GOOD status
Following the bugfix for handling non SAM_STAT_GOOD COMPARE_AND_WRITE
status during COMMIT phase in commit 9b2792c3da, the same bug exists
for the READ phase as well.

This would manifest first as a lost SCSI response, and eventual
hung task during fabric driver logout or re-login, as existing
shutdown logic waited for the COMPARE_AND_WRITE se_cmd->cmd_kref
to reach zero.

To address this bug, compare_and_write_callback() has been changed
to set post_ret = 1 and return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE
as necessary to signal failure status.

Reported-by: Bill Borsari <wgb@datera.io>
Cc: Bill Borsari <wgb@datera.io>
Tested-by: Gary Guo <ghg@datera.io>
Cc: Gary Guo <ghg@datera.io>
Cc: <stable@vger.kernel.org> # v4.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-04 20:01:39 -07:00
Xiubo Li fe25cc3479 tcmu: Recalculate the tcmu_cmd size to save cmd area memories
For the "struct tcmu_cmd_entry" in cmd area, the minimum size
will be sizeof(struct tcmu_cmd_entry) == 112 Bytes. And it could
fill about (sizeof(struct rsp) - sizeof(struct req)) /
sizeof(struct iovec) == 68 / 16 ~= 4 data regions(iov[4]) by
default.

For most tcmu_cmds, the data block indexes allocated from the
data area will be continuous. And for the continuous blocks they
will be merged into the same region using only one iovec. For
the current code, it will always allocates the same number of
iovecs with blocks for each tcmu_cmd, and it will wastes much
memories.

For example, when the block size is 4K and the DATA_OUT buffer
size is 64K, and the regions needed is less than 5(on my
environment is almost 99.7%). The current code will allocate
about 16 iovecs, and there will be (16 - 4) * sizeof(struct
iovec) = 192 Bytes cmd area memories wasted.

Here adds two helpers to calculate the base size and full size
of the tcmu_cmd. And will recalculate them again when it make sure
how many iovs is needed before insert it to cmd area.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-02 20:39:38 -07:00
Xiubo Li b6df4b79a5 tcmu: Add global data block pool support
For each target there will be one ring, when the target number
grows larger and larger, it could eventually runs out of the
system memories.

In this patch for each target ring, currently for the cmd area
the size will be fixed to 8MB and for the data area the size
will grow from 0 to max 256K * PAGE_SIZE(1G for 4K page size).

For all the targets' data areas, they will get empty blocks
from the "global data block pool", which has limited to 512K *
PAGE_SIZE(2G for 4K page size) for now.

When the "global data block pool" has been used up, then any
target could wake up the unmap thread routine to shrink other
targets' data area memories. And the unmap thread routine will
always try to truncate the ring vma from the last using block
offset.

When user space has touched the data blocks out of tcmu_cmd
iov[], the tcmu_page_fault() will try to return one zeroed blocks.

Here we move the timeout's tcmu_handle_completions() into unmap
thread routine, that's to say when the timeout fired, it will
only do the tcmu_check_expired_cmd() and then wake up the unmap
thread to do the completions() and then try to shrink its idle
memories. Then the cmdr_lock could be a mutex and could simplify
this patch because the unmap_mapping_range() or zap_* may go to
sleep.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Jianfei Hu <hujianfei@cmss.chinamobile.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:58:04 -07:00
Xiubo Li 141685a391 tcmu: Add dynamic growing data area feature support
Currently for the TCMU, the ring buffer size is fixed to 64K cmd
area + 1M data area, and this will be bottlenecks for high iops.

The struct tcmu_cmd_entry {} size is fixed about 112 bytes with
iovec[N] & N <= 4, and the size of struct iovec is about 16 bytes.

If N == 0, the ratio will be sizeof(cmd entry) : sizeof(datas) ==
112Bytes : (N * 4096)Bytes = 28 : 0, no data area is need.

If 0 < N <=4, the ratio will be sizeof(cmd entry) : sizeof(datas)
== 112Bytes : (N * 4096)Bytes = 28 : (N * 1024), so the max will
be 28 : 1024.

If N > 4, the sizeof(cmd entry) will be [(N - 4) *16 + 112] bytes,
and its corresponding data size will be [N * 4096], so the ratio
of sizeof(cmd entry) : sizeof(datas) == [(N - 4) * 16 + 112)Bytes
: (N * 4096)Bytes == 4/1024 - 12/(N * 1024), so the max is about
4 : 1024.

When N is bigger, the ratio will be smaller.

As the initial patch, we will set the cmd area size to 2M, and
the cmd area size to 32M. The TCMU will dynamically grows the data
area from 0 to max 32M size as needed.

The cmd area memory will be allocated through vmalloc(), and the
data area's blocks will be allocated individually later when needed.

The allocated data area block memory will be managed via radix tree.
For now the bitmap still be the most efficient way to search and
manage the block index, this could be update later.

Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Signed-off-by: Jianfei Hu <hujianfei@cmss.chinamobile.com>
Acked-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:57:36 -07:00
Hannes Reinecke 3d035237a5 target: fixup error message in target_tg_pt_gp_tg_pt_gp_id_store()
When setting up an ALUA target port group with an invalid ID the
error message

kstrtoul() returned -22 for tg_pt_gp_id

is displayed, which is not really helpful.
Convert it to something sane.
And while we're at it, join the messages onto a single line.

Signed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:53 -07:00
Hannes Reinecke c0dcafd8c5 target: fixup error message in target_tg_pt_gp_alua_access_type_store()
When setting up a target the error message:

Unable to do set ##_name ALUA state on non valid tg_pt_gp ID: 0

is displayed.
Apparently concatenation doesn't work in a string; one should be using
implicit string concatenation here.

Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:49 -07:00
Bryant G. Ly 4ec5bf0ea8 target/user: PGR Support
This adds initial PGR support for just TCMU, since tcmu doesn't
have the necessary IT_NEXUS info to process PGR in userspace,
so have those commands be processed in kernel.

HA support is not available yet, we will work on it if this patch
is acceptable.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:45 -07:00
Bryant G. Ly c2d26f18dc target: Add WRITE_VERIFY_16
This patch addresses clients who needs write_verify_16 for
large volume groups such as AIX.

Signed-off-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:40 -07:00
Markus Elfring df6751f340 target: Use kmalloc_array() in transport_kmap_data_sg()
A multiplication for the size determination of a memory allocation
indicated that an array data structure should be processed.
Thus use the corresponding function "kmalloc_array".

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:33 -07:00
Markus Elfring f318aef55f target: Use kmalloc_array() in compare_and_write_callback()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kmalloc_array".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:32 -07:00
Markus Elfring 5d68fb72d3 target: Improve size determinations in two functions
Replace the specification of two data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determinations a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:30 -07:00
Markus Elfring fbc4040bf6 target: Delete error messages for failed memory allocations
The script "checkpatch.pl" pointed information out like the following.

WARNING: Possible unnecessary 'out of memory' message

Thus remove such statements here.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:28 -07:00
Markus Elfring 55ec409202 target: Use kcalloc() in two functions
* Multiplications for the size determination of memory allocations
  indicated that array data structures should be processed.
  Thus use the corresponding function "kcalloc".

  This issue was detected by using the Coccinelle software.

* Replace the specification of data structures by pointer dereferences
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:27 -07:00
Markus Elfring 3829f38169 iscsi-target: Improve size determinations in four functions
Replace the specification of four data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determinations a bit safer according to the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:08 -07:00
Markus Elfring c46e22f106 iscsi-target: Delete error messages for failed memory allocations
The script "checkpatch.pl" pointed information out like the following.

WARNING: Possible unnecessary 'out of memory' message

Thus remove such statements here.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdf
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:07 -07:00
Markus Elfring f172511032 iscsi-target: Use kcalloc() in iscsit_allocate_iovecs()
* A multiplication for the size determination of a memory allocation
  indicated that an array data structure should be processed.
  Thus use the corresponding function "kcalloc".

  This issue was detected by using the Coccinelle software.

* Replace the specification of a data structure by a pointer dereference
  to make the corresponding size determination a bit safer according to
  the Linux coding style convention.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:21:05 -07:00
Dmitry Monakhov 056e8924a0 tcm: make pi data verification configurable
Currently ramdisk and fileio always perform PI verification
before and after backend IO. This approach is not very flexible.
Because some one may want to postpone this work to other layers in
IO stack. For example if we want to test blk_integrity_profile

testcase:
dee408c868
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:20:58 -07:00
Dmitry Monakhov f11b55d135 tcm_fileio: Prevent information leak for short reads
If we failed to read data from backing file (probably because some one
truncate file under us), we must zerofill cmd's data, otherwise it will
be returned as is. Most likely cmd's data are unitialized pages from
page cache. This result in information leak.

(Change BUG_ON into -EINVAL se_cmd failure - nab)

testcase: e11a1b7b90
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:20:57 -07:00
Bart Van Assche 0e2eb7d12e target: Fix VERIFY and WRITE VERIFY command parsing
Use the value of the BYTCHK field to determine the size of the
Data-Out buffer. For VERIFY, honor the VRPROTECT, DPO and FUA
fields. This patch avoids that LIO complains about a mismatch
between the expected transfer length and the SCSI CDB length
if the value of the BYTCHK field is 0.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Max Lohrmann <post@wickenrode.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:20:49 -07:00
Zhu Lingshan d65e655706 target/pr: update PR out action code table
This commit updated persistent revervation out service action
code table in SPC-5 for development.

Signed-off-by: Zhu Lingshan <lszhu@suse.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:20:44 -07:00
Elena Reshetova 5981c245a8 target/iblock: convert iblock_req.pending from atomic_t to refcount_t
refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.

Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Signed-off-by: Hans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David Windsor <dwindsor@gmail.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-05-01 22:20:43 -07:00
Linus Torvalds 694752922b Merge branch 'for-4.12/block' of git://git.kernel.dk/linux-block
Pull block layer updates from Jens Axboe:

 - Add BFQ IO scheduler under the new blk-mq scheduling framework. BFQ
   was initially a fork of CFQ, but subsequently changed to implement
   fairness based on B-WF2Q+, a modified variant of WF2Q. BFQ is meant
   to be used on desktop type single drives, providing good fairness.
   From Paolo.

 - Add Kyber IO scheduler. This is a full multiqueue aware scheduler,
   using a scalable token based algorithm that throttles IO based on
   live completion IO stats, similary to blk-wbt. From Omar.

 - A series from Jan, moving users to separately allocated backing
   devices. This continues the work of separating backing device life
   times, solving various problems with hot removal.

 - A series of updates for lightnvm, mostly from Javier. Includes a
   'pblk' target that exposes an open channel SSD as a physical block
   device.

 - A series of fixes and improvements for nbd from Josef.

 - A series from Omar, removing queue sharing between devices on mostly
   legacy drivers. This helps us clean up other bits, if we know that a
   queue only has a single device backing. This has been overdue for
   more than a decade.

 - Fixes for the blk-stats, and improvements to unify the stats and user
   windows. This both improves blk-wbt, and enables other users to
   register a need to receive IO stats for a device. From Omar.

 - blk-throttle improvements from Shaohua. This provides a scalable
   framework for implementing scalable priotization - particularly for
   blk-mq, but applicable to any type of block device. The interface is
   marked experimental for now.

 - Bucketized IO stats for IO polling from Stephen Bates. This improves
   efficiency of polled workloads in the presence of mixed block size
   IO.

 - A few fixes for opal, from Scott.

 - A few pulls for NVMe, including a lot of fixes for NVMe-over-fabrics.
   From a variety of folks, mostly Sagi and James Smart.

 - A series from Bart, improving our exposed info and capabilities from
   the blk-mq debugfs support.

 - A series from Christoph, cleaning up how handle WRITE_ZEROES.

 - A series from Christoph, cleaning up the block layer handling of how
   we track errors in a request. On top of being a nice cleanup, it also
   shrinks the size of struct request a bit.

 - Removal of mg_disk and hd (sorry Linus) by Christoph. The former was
   never used by platforms, and the latter has outlived it's usefulness.

 - Various little bug fixes and cleanups from a wide variety of folks.

* 'for-4.12/block' of git://git.kernel.dk/linux-block: (329 commits)
  block: hide badblocks attribute by default
  blk-mq: unify hctx delay_work and run_work
  block: add kblock_mod_delayed_work_on()
  blk-mq: unify hctx delayed_run_work and run_work
  nbd: fix use after free on module unload
  MAINTAINERS: bfq: Add Paolo as maintainer for the BFQ I/O scheduler
  blk-mq-sched: alloate reserved tags out of normal pool
  mtip32xx: use runtime tag to initialize command header
  scsi: Implement blk_mq_ops.show_rq()
  blk-mq: Add blk_mq_ops.show_rq()
  blk-mq: Show operation, cmd_flags and rq_flags names
  blk-mq: Make blk_flags_show() callers append a newline character
  blk-mq: Move the "state" debugfs attribute one level down
  blk-mq: Unregister debugfs attributes earlier
  blk-mq: Only unregister hctxs for which registration succeeded
  blk-mq-debugfs: Rename functions for registering and unregistering the mq directory
  blk-mq: Let blk_mq_debugfs_register() look up the queue name
  blk-mq: Register <dev>/queue/mq after having registered <dev>/queue
  ide-pm: always pass 0 error to ide_complete_rq in ide_do_devset
  ide-pm: always pass 0 error to __blk_end_request_all
  ..
2017-05-01 10:39:57 -07:00
Christoph Hellwig 17d5363b83 scsi: introduce a result field in struct scsi_request
This passes on the scsi_cmnd result field to users of passthrough
requests.  Currently we abuse req->errors for this purpose, but that
field will go away in its current form.

Note that the old IDE code abuses the errors field in very creative
ways and stores all kinds of different values in it.  I didn't dare
to touch this magic, so the abuses are brought forward 1:1.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-20 12:16:10 -06:00
Christoph Hellwig 48920ff2a5 block: remove the discard_zeroes_data flag
Now that we use the proper REQ_OP_WRITE_ZEROES operation everywhere we can
kill this hack.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-08 11:25:38 -06:00
Christoph Hellwig 64c7f1d157 block, scsi: move the retries field to struct scsi_request
Instead of bloating the generic struct request with it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-05 12:05:08 -06:00
Sagi Grimberg 4459e04297 iscsi-target: use generic inet_pton_with_scope
Instead of parsing address strings, use a generic
helper.

Acked-by: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
2017-04-04 09:48:23 -06:00
Xiubo Li a5d68ba858 tcmu: Skip Data-Out blocks before gathering Data-In buffer for BIDI case
For the bidirectional case, the Data-Out buffer blocks will always at
the head of the tcmu_cmd's bitmap, and before gathering the Data-In
buffer, first of all it should skip the Data-Out ones, or the device
supporting BIDI commands won't work.

Fixed: 26418649ee ("target/user: Introduce data_bitmap, replace
		data_length/data_head/data_tail")
Reported-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Cc: stable@vger.kernel.org # 4.6+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-04-02 16:18:51 -07:00
Nicholas Bellinger 1c99de981f iscsi-target: Drop work-around for legacy GlobalSAN initiator
Once upon a time back in 2009, a work-around was added to support
the GlobalSAN iSCSI initiator v3.3 for MacOSX, which during login
did not propose nor respond to MaxBurstLength, FirstBurstLength,
DefaultTime2Wait and DefaultTime2Retain keys.

The work-around in iscsi_check_proposer_for_optional_reply()
allowed the missing keys to be proposed, but did not require
waiting for a response before moving to full feature phase
operation.  This allowed GlobalSAN v3.3 to work out-of-the
box, and for many years we didn't run into login interopt
issues with any other initiators..

Until recently, when Martin tried a QLogic 57840S iSCSI Offload
HBA on Windows 2016 which completed login, but subsequently
failed with:

    Got unknown iSCSI OpCode: 0x43

The issue was QLogic MSFT side did not propose DefaultTime2Wait +
DefaultTime2Retain, so LIO proposes them itself, and immediately
transitions to full feature phase because of the GlobalSAN hack.
However, the QLogic MSFT side still attempts to respond to
DefaultTime2Retain + DefaultTime2Wait, even though LIO has set
ISCSI_FLAG_LOGIN_NEXT_STAGE3 + ISCSI_FLAG_LOGIN_TRANSIT
in last login response.

So while the QLogic MSFT side should have been proposing these
two keys to start, it was doing the correct thing per RFC-3720
attempting to respond to proposed keys before transitioning to
full feature phase.

All that said, recent versions of GlobalSAN iSCSI (v5.3.0.541)
does correctly propose the four keys during login, making the
original work-around moot.

So in order to allow QLogic MSFT to run unmodified as-is, go
ahead and drop this long standing work-around.

Reported-by: Martin Svec <martin.svec@zoner.cz>
Cc: Martin Svec <martin.svec@zoner.cz>
Cc: Himanshu Madhani <Himanshu.Madhani@cavium.com>
Cc: Arun Easi <arun.easi@cavium.com>
Cc: stable@vger.kernel.org # 3.1+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-04-02 14:10:16 -07:00
Mike Christie d19c4643a5 target: Fix ALUA transition state race between multiple initiators
Multiple threads could be writing to alua_access_state at
the same time, or there could be multiple STPGs in flight
(different initiators sending them or one initiator sending
them to different ports), or a combo of both and the
core_alua_do_transition_tg_pt calls will race with each other.

Because from the last patches we no longer delay running
core_alua_do_transition_tg_pt_work, there does not seem to be
any point in running that in a workqueue. And, we always
wait for it to complete one way or another, so we can sleep
in this code path. So, this patch made over target-pending just adds a
mutex and does the work core_alua_do_transition_tg_pt_work was doing in
core_alua_do_transition_tg_pt.

There is also no need to use an atomic for the
tg_pt_gp_alua_access_state. In core_alua_do_transition_tg_pt we will
test and set it under the transition mutex. And, it is a int/32 bits
so in the other places where it is read, we will never see it partially
updated.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 23:12:40 -07:00
Nicholas Bellinger a4467018c2 iscsi-target: Propigate queue_data_in + queue_status errors
This patch changes iscsi-target to propagate iscsit_transport
->iscsit_queue_data_in() and ->iscsit_queue_status() callback
errors, back up into target-core.

This allows target-core to retry failed iscsit_transport
callbacks using internal queue-full logic.

Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 20:34:58 -07:00
Nicholas Bellinger fa7e25cf13 target: Fix unknown fabric callback queue-full errors
This patch fixes a set of queue-full response handling
bugs, where outgoing responses are leaked when a fabric
driver is propagating non -EAGAIN or -ENOMEM errors
to target-core.

It introduces TRANSPORT_COMPLETE_QF_ERR state used to
signal when CHECK_CONDITION status should be generated,
when fabric driver ->write_pending(), ->queue_data_in(),
or ->queue_status() callbacks fail with non -EAGAIN or
-ENOMEM errors, and data-transfer should not be retried.

Note all fabric driver -EAGAIN and -ENOMEM errors are
still retried indefinately with associated data-transfer
callbacks, following existing queue-full logic.

Also fix two missing ->queue_status() queue-full cases
related to CMD_T_ABORTED w/ TAS status handling.

Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Potnuri Bharat Teja <bharat@chelsio.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Cc: Potnuri Bharat Teja <bharat@chelsio.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 20:34:31 -07:00
Xiubo Li abe342a5b4 tcmu: Fix wrongly calculating of the base_command_size
The t_data_nents and t_bidi_data_nents are the numbers of the
segments, but it couldn't be sure the block size equals to size
of the segment.

For the worst case, all the blocks are discontiguous and there
will need the same number of iovecs, that's to say: blocks == iovs.
So here just set the number of iovs to block count needed by tcmu
cmd.

Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Reviewed-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 01:36:53 -07:00
Xiubo Li ab22d2604c tcmu: Fix possible overwrite of t_data_sg's last iov[]
If there has BIDI data, its first iov[] will overwrite the last
iov[] for se_cmd->t_data_sg.

To fix this, we can just increase the iov pointer, but this may
introuduce a new memory leakage bug: If the se_cmd->data_length
and se_cmd->t_bidi_data_sg->length are all not aligned up to the
DATA_BLOCK_SIZE, the actual length needed maybe larger than just
sum of them.

So, this could be avoided by rounding all the data lengthes up
to DATA_BLOCK_SIZE.

Reviewed-by: Mike Christie <mchristi@redhat.com>
Tested-by: Ilias Tsitsimpis <iliastsi@arrikto.com>
Reviewed-by: Bryant G. Ly <bryantly@linux.vnet.ibm.com>
Signed-off-by: Xiubo Li <lixiubo@cmss.chinamobile.com>
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 01:36:52 -07:00
Nicholas Bellinger 49cb77e297 target: Avoid mappedlun symlink creation during lun shutdown
This patch closes a race between se_lun deletion during configfs
unlink in target_fabric_port_unlink() -> core_dev_del_lun()
-> core_tpg_remove_lun(), when transport_clear_lun_ref() blocks
waiting for percpu_ref RCU grace period to finish, but a new
NodeACL mappedlun is added before the RCU grace period has
completed.

This can happen in target_fabric_mappedlun_link() because it
only checks for se_lun->lun_se_dev, which is not cleared until
after transport_clear_lun_ref() percpu_ref RCU grace period
finishes.

This bug originally manifested as NULL pointer dereference
OOPsen in target_stat_scsi_att_intr_port_show_attr_dev() on
v4.1.y code, because it dereferences lun->lun_se_dev without
a explicit NULL pointer check.

In post v4.1 code with target-core RCU conversion, the code
in target_stat_scsi_att_intr_port_show_attr_dev() no longer
uses se_lun->lun_se_dev, but the same race still exists.

To address the bug, go ahead and set se_lun>lun_shutdown as
early as possible in core_tpg_remove_lun(), and ensure new
NodeACL mappedlun creation in target_fabric_mappedlun_link()
fails during se_lun shutdown.

Reported-by: James Shen <jcs@datera.io>
Cc: James Shen <jcs@datera.io>
Tested-by: James Shen <jcs@datera.io>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 01:36:52 -07:00
Nicholas Bellinger efb2ea770b iscsi-target: Fix TMR reference leak during session shutdown
This patch fixes a iscsi-target specific TMR reference leak
during session shutdown, that could occur when a TMR was
quiesced before the hand-off back to iscsi-target code
via transport_cmd_check_stop_to_fabric().

The reference leak happens because iscsit_free_cmd() was
incorrectly skipping the final target_put_sess_cmd() for
TMRs when transport_generic_free_cmd() returned zero because
the se_cmd->cmd_kref did not reach zero, due to the missing
se_cmd assignment in original code.

The result was iscsi_cmd and it's associated se_cmd memory
would be freed once se_sess->sess_cmd_map where released,
but the associated se_tmr_req was leaked and remained part
of se_device->dev_tmr_list.

This bug would manfiest itself as kernel paging request
OOPsen in core_tmr_lun_reset(), when a left-over se_tmr_req
attempted to dereference it's se_cmd pointer that had
already been released during normal session shutdown.

To address this bug, go ahead and treat ISCSI_OP_SCSI_CMD
and ISCSI_OP_SCSI_TMFUNC the same when there is an extra
se_cmd->cmd_kref to drop in iscsit_free_cmd(), and use
op_scsi to signal __iscsit_free_cmd() when the former
needs to clear any further iscsi related I/O state.

Reported-by: Rob Millner <rlm@daterainc.com>
Cc: Rob Millner <rlm@daterainc.com>
Reported-by: Chu Yuan Lin <cyl@datera.io>
Cc: Chu Yuan Lin <cyl@datera.io>
Tested-by: Chu Yuan Lin <cyl@datera.io>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 01:36:51 -07:00
Nicholas Bellinger 740372b76e tcmu: Allow cmd_time_out to be set to zero (disabled)
The new cmd_time_out configfs attribute for TCMU is allowed to
be disabled, so go ahead and drop the tcmu_cmd_time_out_store()
check.

Reported-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-30 01:36:44 -07:00
Nicholas Bellinger 7d7a743543 tcmu: Convert cmd_time_out into backend device attribute
Instead of putting cmd_time_out under ../target/core/user_0/foo/control,
which has historically been used by parameters needed for initial
backend device configuration, go ahead and move cmd_time_out into
a backend device attribute.

In order to do this, tcmu_module_init() has been updated to create
a local struct configfs_attribute **tcmu_attrs, that is based upon
the existing passthrough_attrib_attrs along with the new cmd_time_out
attribute.  Once **tcm_attrs has been setup, go ahead and point
it at tcmu_ops->tb_dev_attrib_attrs so it's picked up by target-core.

Also following MNC's previous change, ->cmd_time_out is stored in
milliseconds but exposed via configfs in seconds.  Also, note this
patch restricts the modification of ->cmd_time_out to before +
after the TCMU device has been configured, but not while it has
active fabric exports.

Cc: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-18 16:32:30 -07:00
Mike Christie af980e46a2 tcmu: make cmd timeout configurable
A single daemon could implement multiple types of devices
using multuple types of real devices that may not support
restarting from crashes and/or handling tcmu timeouts. This
makes the cmd timeout configurable, so handlers that do not
support it can turn if off for now.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-18 16:32:27 -07:00
Mike Christie 972c7f1679 tcmu: add helper to check if dev was configured
This adds a helper to check if the dev was configured. It
will be used in the next patch to prevent updates to some
config settings after the device has been setup.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-18 16:32:19 -07:00
Mike Christie 760bf578ed target: fix race during implicit transition work flushes
This fixes the following races:

1. core_alua_do_transition_tg_pt could have read
tg_pt_gp_alua_access_state and gone into this if chunk:

if (!explicit &&
        atomic_read(&tg_pt_gp->tg_pt_gp_alua_access_state) ==
           ALUA_ACCESS_STATE_TRANSITION) {

and then core_alua_do_transition_tg_pt_work could update the
state. core_alua_do_transition_tg_pt would then only set
tg_pt_gp_alua_pending_state and the tg_pt_gp_alua_access_state would
not get updated with the second calls state.

2. core_alua_do_transition_tg_pt could be setting
tg_pt_gp_transition_complete while the tg_pt_gp_transition_work
is already completing. core_alua_do_transition_tg_pt then waits on the
completion that will never be called.

To handle these issues, we just call flush_work which will return when
core_alua_do_transition_tg_pt_work has completed so there is no need
to do the complete/wait. And, if core_alua_do_transition_tg_pt_work
was running, instead of trying to sneak in the state change, we just
schedule up another core_alua_do_transition_tg_pt_work call.

Note that this does not handle a possible race where there are multiple
threads call core_alua_do_transition_tg_pt at the same time. I think
we need a mutex in target_tg_pt_gp_alua_access_state_store.

Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2017-03-18 14:47:29 -07:00