OpenCloudOS-Kernel/drivers/s390/crypto/ap_queue.c

919 lines
24 KiB
C
Raw Normal View History

License cleanup: add SPDX GPL-2.0 license identifier to files with no license Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-01 22:07:57 +08:00
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright IBM Corp. 2016
* Author(s): Martin Schwidefsky <schwidefsky@de.ibm.com>
*
* Adjunct processor bus, queue related code.
*/
#define KMSG_COMPONENT "ap"
#define pr_fmt(fmt) KMSG_COMPONENT ": " fmt
#include <linux/init.h>
#include <linux/slab.h>
#include <asm/facility.h>
#include "ap_bus.h"
#include "ap_debug.h"
static void __ap_flush_queue(struct ap_queue *aq);
/**
* ap_queue_enable_irq(): Enable interrupt support on this AP queue.
* @qid: The AP queue number
* @ind: the notification indicator byte
*
* Enables interruption on AP queue via ap_aqic(). Based on the return
* value it waits a while and tests the AP queue if interrupts
* have been switched on using ap_test_queue().
*/
static int ap_queue_enable_irq(struct ap_queue *aq, void *ind)
{
struct ap_queue_status status;
struct ap_qirq_ctrl qirqctrl = { 0 };
qirqctrl.ir = 1;
qirqctrl.isc = AP_ISC;
status = ap_aqic(aq->qid, qirqctrl, ind);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
case AP_RESPONSE_OTHERWISE_CHANGED:
return 0;
case AP_RESPONSE_Q_NOT_AVAIL:
case AP_RESPONSE_DECONFIGURED:
case AP_RESPONSE_CHECKSTOPPED:
case AP_RESPONSE_INVALID_ADDRESS:
pr_err("Registering adapter interrupts for AP device %02x.%04x failed\n",
AP_QID_CARD(aq->qid),
AP_QID_QUEUE(aq->qid));
return -EOPNOTSUPP;
case AP_RESPONSE_RESET_IN_PROGRESS:
case AP_RESPONSE_BUSY:
default:
return -EBUSY;
}
}
/**
* __ap_send(): Send message to adjunct processor queue.
* @qid: The AP queue number
* @psmid: The program supplied message identifier
* @msg: The message text
* @length: The message length
* @special: Special Bit
*
* Returns AP queue status structure.
* Condition code 1 on NQAP can't happen because the L bit is 1.
* Condition code 2 on NQAP also means the send is incomplete,
* because a segment boundary was reached. The NQAP is repeated.
*/
static inline struct ap_queue_status
__ap_send(ap_qid_t qid, unsigned long long psmid, void *msg, size_t length,
int special)
{
if (special)
qid |= 0x400000UL;
return ap_nqap(qid, psmid, msg, length);
}
int ap_send(ap_qid_t qid, unsigned long long psmid, void *msg, size_t length)
{
struct ap_queue_status status;
status = __ap_send(qid, psmid, msg, length, 0);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
return 0;
case AP_RESPONSE_Q_FULL:
case AP_RESPONSE_RESET_IN_PROGRESS:
return -EBUSY;
case AP_RESPONSE_REQ_FAC_NOT_INST:
return -EINVAL;
default: /* Device is gone. */
return -ENODEV;
}
}
EXPORT_SYMBOL(ap_send);
int ap_recv(ap_qid_t qid, unsigned long long *psmid, void *msg, size_t length)
{
struct ap_queue_status status;
if (msg == NULL)
return -EINVAL;
status = ap_dqap(qid, psmid, msg, length, NULL, NULL);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
return 0;
case AP_RESPONSE_NO_PENDING_REPLY:
if (status.queue_empty)
return -ENOENT;
return -EBUSY;
case AP_RESPONSE_RESET_IN_PROGRESS:
return -EBUSY;
default:
return -ENODEV;
}
}
EXPORT_SYMBOL(ap_recv);
/* State machine definitions and helpers */
static enum ap_sm_wait ap_sm_nop(struct ap_queue *aq)
{
return AP_SM_WAIT_NONE;
}
/**
* ap_sm_recv(): Receive pending reply messages from an AP queue but do
* not change the state of the device.
* @aq: pointer to the AP queue
*
* Returns AP_SM_WAIT_NONE, AP_SM_WAIT_AGAIN, or AP_SM_WAIT_INTERRUPT
*/
static struct ap_queue_status ap_sm_recv(struct ap_queue *aq)
{
struct ap_queue_status status;
struct ap_message *ap_msg;
s390/ap: Fix hanging ioctl caused by wrong msg counter When a AP queue is switched to soft offline, all pending requests are purged out of the pending requests list and 'received' by the upper layer like zcrypt device drivers. This is also done for requests which are already enqueued into the firmware queue. A request in a firmware queue may eventually produce an response message, but there is no waiting process any more. However, the response was counted with the queue_counter and as this counter was reset to 0 with the offline switch, the pending response caused the queue_counter to get negative. The next request increased this counter to 0 (instead of 1) which caused the ap code to assume there is nothing to receive and so the response for this valid request was never tried to fetch from the firmware queue. This all caused a queue to not work properly after a switch offline/online and in the end processes to hang forever when trying to send a crypto request after an queue offline/online switch cicle. Fixed by a) making sure the counter does not drop below 0 and b) on a successful enqueue of a message has at least a value of 1. Additionally a warning is emitted, when a reply can't get assigned to a waiting process. This may be normal operation (process had timeout or has been killed) but may give a hint that something unexpected happened (like this odd behavior described above). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-01 14:27:29 +08:00
bool found = false;
size_t reslen;
unsigned long resgr0 = 0;
int parts = 0;
/*
* DQAP loop until response code and resgr0 indicate that
* the msg is totally received. As we use the very same buffer
* the msg is overwritten with each invocation. That's intended
* and the receiver of the msg is informed with a msg rc code
* of EMSGSIZE in such a case.
*/
do {
status = ap_dqap(aq->qid, &aq->reply->psmid,
aq->reply->msg, aq->reply->bufsize,
&reslen, &resgr0);
parts++;
} while (status.response_code == 0xFF && resgr0 != 0);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
s390/ap: Fix hanging ioctl caused by wrong msg counter When a AP queue is switched to soft offline, all pending requests are purged out of the pending requests list and 'received' by the upper layer like zcrypt device drivers. This is also done for requests which are already enqueued into the firmware queue. A request in a firmware queue may eventually produce an response message, but there is no waiting process any more. However, the response was counted with the queue_counter and as this counter was reset to 0 with the offline switch, the pending response caused the queue_counter to get negative. The next request increased this counter to 0 (instead of 1) which caused the ap code to assume there is nothing to receive and so the response for this valid request was never tried to fetch from the firmware queue. This all caused a queue to not work properly after a switch offline/online and in the end processes to hang forever when trying to send a crypto request after an queue offline/online switch cicle. Fixed by a) making sure the counter does not drop below 0 and b) on a successful enqueue of a message has at least a value of 1. Additionally a warning is emitted, when a reply can't get assigned to a waiting process. This may be normal operation (process had timeout or has been killed) but may give a hint that something unexpected happened (like this odd behavior described above). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-01 14:27:29 +08:00
aq->queue_count = max_t(int, 0, aq->queue_count - 1);
if (aq->queue_count > 0)
mod_timer(&aq->timeout,
jiffies + aq->request_timeout);
list_for_each_entry(ap_msg, &aq->pendingq, list) {
if (ap_msg->psmid != aq->reply->psmid)
continue;
list_del_init(&ap_msg->list);
aq->pendingq_count--;
if (parts > 1) {
ap_msg->rc = -EMSGSIZE;
ap_msg->receive(aq, ap_msg, NULL);
} else {
ap_msg->receive(aq, ap_msg, aq->reply);
}
s390/ap: Fix hanging ioctl caused by wrong msg counter When a AP queue is switched to soft offline, all pending requests are purged out of the pending requests list and 'received' by the upper layer like zcrypt device drivers. This is also done for requests which are already enqueued into the firmware queue. A request in a firmware queue may eventually produce an response message, but there is no waiting process any more. However, the response was counted with the queue_counter and as this counter was reset to 0 with the offline switch, the pending response caused the queue_counter to get negative. The next request increased this counter to 0 (instead of 1) which caused the ap code to assume there is nothing to receive and so the response for this valid request was never tried to fetch from the firmware queue. This all caused a queue to not work properly after a switch offline/online and in the end processes to hang forever when trying to send a crypto request after an queue offline/online switch cicle. Fixed by a) making sure the counter does not drop below 0 and b) on a successful enqueue of a message has at least a value of 1. Additionally a warning is emitted, when a reply can't get assigned to a waiting process. This may be normal operation (process had timeout or has been killed) but may give a hint that something unexpected happened (like this odd behavior described above). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-01 14:27:29 +08:00
found = true;
break;
}
s390/ap: Fix hanging ioctl caused by wrong msg counter When a AP queue is switched to soft offline, all pending requests are purged out of the pending requests list and 'received' by the upper layer like zcrypt device drivers. This is also done for requests which are already enqueued into the firmware queue. A request in a firmware queue may eventually produce an response message, but there is no waiting process any more. However, the response was counted with the queue_counter and as this counter was reset to 0 with the offline switch, the pending response caused the queue_counter to get negative. The next request increased this counter to 0 (instead of 1) which caused the ap code to assume there is nothing to receive and so the response for this valid request was never tried to fetch from the firmware queue. This all caused a queue to not work properly after a switch offline/online and in the end processes to hang forever when trying to send a crypto request after an queue offline/online switch cicle. Fixed by a) making sure the counter does not drop below 0 and b) on a successful enqueue of a message has at least a value of 1. Additionally a warning is emitted, when a reply can't get assigned to a waiting process. This may be normal operation (process had timeout or has been killed) but may give a hint that something unexpected happened (like this odd behavior described above). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-01 14:27:29 +08:00
if (!found) {
AP_DBF_WARN("%s unassociated reply psmid=0x%016llx on 0x%02x.%04x\n",
__func__, aq->reply->psmid,
AP_QID_CARD(aq->qid), AP_QID_QUEUE(aq->qid));
}
fallthrough;
case AP_RESPONSE_NO_PENDING_REPLY:
if (!status.queue_empty || aq->queue_count <= 0)
break;
/* The card shouldn't forget requests but who knows. */
aq->queue_count = 0;
list_splice_init(&aq->pendingq, &aq->requestq);
aq->requestq_count += aq->pendingq_count;
aq->pendingq_count = 0;
break;
default:
break;
}
return status;
}
/**
* ap_sm_read(): Receive pending reply messages from an AP queue.
* @aq: pointer to the AP queue
*
* Returns AP_SM_WAIT_NONE, AP_SM_WAIT_AGAIN, or AP_SM_WAIT_INTERRUPT
*/
static enum ap_sm_wait ap_sm_read(struct ap_queue *aq)
{
struct ap_queue_status status;
if (!aq->reply)
return AP_SM_WAIT_NONE;
status = ap_sm_recv(aq);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
if (aq->queue_count > 0) {
aq->sm_state = AP_SM_STATE_WORKING;
return AP_SM_WAIT_AGAIN;
}
aq->sm_state = AP_SM_STATE_IDLE;
return AP_SM_WAIT_NONE;
case AP_RESPONSE_NO_PENDING_REPLY:
if (aq->queue_count > 0)
return aq->interrupt ?
AP_SM_WAIT_INTERRUPT : AP_SM_WAIT_TIMEOUT;
aq->sm_state = AP_SM_STATE_IDLE;
return AP_SM_WAIT_NONE;
default:
2020-07-02 17:22:01 +08:00
aq->dev_state = AP_DEV_STATE_ERROR;
aq->last_err_rc = status.response_code;
AP_DBF_WARN("%s RC 0x%02x on 0x%02x.%04x -> AP_DEV_STATE_ERROR\n",
__func__, status.response_code,
AP_QID_CARD(aq->qid), AP_QID_QUEUE(aq->qid));
return AP_SM_WAIT_NONE;
}
}
/**
* ap_sm_write(): Send messages from the request queue to an AP queue.
* @aq: pointer to the AP queue
*
* Returns AP_SM_WAIT_NONE, AP_SM_WAIT_AGAIN, or AP_SM_WAIT_INTERRUPT
*/
static enum ap_sm_wait ap_sm_write(struct ap_queue *aq)
{
struct ap_queue_status status;
struct ap_message *ap_msg;
s390/zcrypt: Introduce Failure Injection feature Introduce a way to specify additional debug flags with an crpyto request to be able to trigger certain failures within the zcrypt device drivers and/or ap core code. This failure injection possibility is only enabled with a kernel debug build CONFIG_ZCRYPT_DEBUG) and should never be available on a regular kernel running in production environment. Details: * The ioctl(ICARSAMODEXPO) get's a struct ica_rsa_modexpo. If the leftmost bit of the 32 bit unsigned int inputdatalength field is set, the uppermost 16 bits are separated and used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ICARSACRT) get's a struct ica_rsa_modexpo_crt. If the leftmost bit of the 32 bit unsigned int inputdatalength field is set, the uppermost 16 bits are separated and used als debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ZSECSENDCPRB) used to send CCA CPRBs get's a struct ica_xcRB. If the leftmost bit of the 32 bit unsigned int status field is set, the uppermost 16 bits of this field are used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ZSENDEP11CPRB) used to send EP11 CPRBs get's a struct ep11_urb. If the leftmost bit of the 64 bit unsigned int req_len field is set, the uppermost 16 bits of this field are used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. So it is possible to send an additional 16 bit value to the zcrypt API to be used to carry a failure injection command which may trigger special behavior within the zcrypt API and layers below. This 16 bit value is for the rest of the test referred as 'fi command' for Failure Injection. The lower 8 bits of the fi command construct a numerical argument in the range of 1-255 and is the 'fi action' to be performed with the request or the resulting reply: * 0x00 (all requests): No failure injection action but flags may be provided which may affect the processing of the request or reply. * 0x01 (only CCA CPRBs): The CPRB's agent_ID field is set to 'FF'. This results in an reply code 0x90 (Transport-Protocol Failure). * 0x02 (only CCA CPRBs): After the APQN to send to has been chosen, the domain field within the CPRB is overwritten with value 99 to enforce an reply with RY 0x8A. * 0x03 (all requests): At NQAP invocation the invalid qid value 0xFF00 is used causing an response code of 0x01 (AP queue not valid). The upper 8 bits of the fi command may carry bit flags which may influence the processing of an request or response: * 0x01: No retry. If this bit is set, the usual loop in the zcrypt API which retries an CPRB up to 10 times when the lower layers return with EAGAIN is abandoned after the first attempt to send the CPRB. * 0x02: Toggle special. Toggles the special bit on this request. This should result in an reply code RY~0x41 and result in an ioctl failure with errno EINVAL. This failure injection possibilities may get some further extensions in the future. As of now this is a starting point for Continuous Test and Integration to trigger some failures and watch for the reaction of the ap bus and zcrypt device driver code. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-29 22:07:22 +08:00
ap_qid_t qid = aq->qid;
if (aq->requestq_count <= 0)
return AP_SM_WAIT_NONE;
/* Start the next request on the queue. */
ap_msg = list_entry(aq->requestq.next, struct ap_message, list);
s390/zcrypt: Introduce Failure Injection feature Introduce a way to specify additional debug flags with an crpyto request to be able to trigger certain failures within the zcrypt device drivers and/or ap core code. This failure injection possibility is only enabled with a kernel debug build CONFIG_ZCRYPT_DEBUG) and should never be available on a regular kernel running in production environment. Details: * The ioctl(ICARSAMODEXPO) get's a struct ica_rsa_modexpo. If the leftmost bit of the 32 bit unsigned int inputdatalength field is set, the uppermost 16 bits are separated and used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ICARSACRT) get's a struct ica_rsa_modexpo_crt. If the leftmost bit of the 32 bit unsigned int inputdatalength field is set, the uppermost 16 bits are separated and used als debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ZSECSENDCPRB) used to send CCA CPRBs get's a struct ica_xcRB. If the leftmost bit of the 32 bit unsigned int status field is set, the uppermost 16 bits of this field are used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. * The ioctl(ZSENDEP11CPRB) used to send EP11 CPRBs get's a struct ep11_urb. If the leftmost bit of the 64 bit unsigned int req_len field is set, the uppermost 16 bits of this field are used as debug flag value. The process is checked to have the CAP_SYS_ADMIN capability enabled or EPERM is returned. So it is possible to send an additional 16 bit value to the zcrypt API to be used to carry a failure injection command which may trigger special behavior within the zcrypt API and layers below. This 16 bit value is for the rest of the test referred as 'fi command' for Failure Injection. The lower 8 bits of the fi command construct a numerical argument in the range of 1-255 and is the 'fi action' to be performed with the request or the resulting reply: * 0x00 (all requests): No failure injection action but flags may be provided which may affect the processing of the request or reply. * 0x01 (only CCA CPRBs): The CPRB's agent_ID field is set to 'FF'. This results in an reply code 0x90 (Transport-Protocol Failure). * 0x02 (only CCA CPRBs): After the APQN to send to has been chosen, the domain field within the CPRB is overwritten with value 99 to enforce an reply with RY 0x8A. * 0x03 (all requests): At NQAP invocation the invalid qid value 0xFF00 is used causing an response code of 0x01 (AP queue not valid). The upper 8 bits of the fi command may carry bit flags which may influence the processing of an request or response: * 0x01: No retry. If this bit is set, the usual loop in the zcrypt API which retries an CPRB up to 10 times when the lower layers return with EAGAIN is abandoned after the first attempt to send the CPRB. * 0x02: Toggle special. Toggles the special bit on this request. This should result in an reply code RY~0x41 and result in an ioctl failure with errno EINVAL. This failure injection possibilities may get some further extensions in the future. As of now this is a starting point for Continuous Test and Integration to trigger some failures and watch for the reaction of the ap bus and zcrypt device driver code. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2020-09-29 22:07:22 +08:00
#ifdef CONFIG_ZCRYPT_DEBUG
if (ap_msg->fi.action == AP_FI_ACTION_NQAP_QID_INVAL) {
AP_DBF_WARN("%s fi cmd 0x%04x: forcing invalid qid 0xFF00\n",
__func__, ap_msg->fi.cmd);
qid = 0xFF00;
}
#endif
status = __ap_send(qid, ap_msg->psmid,
ap_msg->msg, ap_msg->len,
ap_msg->flags & AP_MSG_FLAG_SPECIAL);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
s390/ap: Fix hanging ioctl caused by wrong msg counter When a AP queue is switched to soft offline, all pending requests are purged out of the pending requests list and 'received' by the upper layer like zcrypt device drivers. This is also done for requests which are already enqueued into the firmware queue. A request in a firmware queue may eventually produce an response message, but there is no waiting process any more. However, the response was counted with the queue_counter and as this counter was reset to 0 with the offline switch, the pending response caused the queue_counter to get negative. The next request increased this counter to 0 (instead of 1) which caused the ap code to assume there is nothing to receive and so the response for this valid request was never tried to fetch from the firmware queue. This all caused a queue to not work properly after a switch offline/online and in the end processes to hang forever when trying to send a crypto request after an queue offline/online switch cicle. Fixed by a) making sure the counter does not drop below 0 and b) on a successful enqueue of a message has at least a value of 1. Additionally a warning is emitted, when a reply can't get assigned to a waiting process. This may be normal operation (process had timeout or has been killed) but may give a hint that something unexpected happened (like this odd behavior described above). Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
2021-06-01 14:27:29 +08:00
aq->queue_count = max_t(int, 1, aq->queue_count + 1);
if (aq->queue_count == 1)
mod_timer(&aq->timeout, jiffies + aq->request_timeout);
list_move_tail(&ap_msg->list, &aq->pendingq);
aq->requestq_count--;
aq->pendingq_count++;
if (aq->queue_count < aq->card->queue_depth) {
aq->sm_state = AP_SM_STATE_WORKING;
return AP_SM_WAIT_AGAIN;
}
fallthrough;
case AP_RESPONSE_Q_FULL:
aq->sm_state = AP_SM_STATE_QUEUE_FULL;
return aq->interrupt ?
AP_SM_WAIT_INTERRUPT : AP_SM_WAIT_TIMEOUT;
case AP_RESPONSE_RESET_IN_PROGRESS:
aq->sm_state = AP_SM_STATE_RESET_WAIT;
return AP_SM_WAIT_TIMEOUT;
case AP_RESPONSE_INVALID_DOMAIN:
AP_DBF(DBF_WARN, "AP_RESPONSE_INVALID_DOMAIN on NQAP\n");
fallthrough;
case AP_RESPONSE_MESSAGE_TOO_BIG:
case AP_RESPONSE_REQ_FAC_NOT_INST:
list_del_init(&ap_msg->list);
aq->requestq_count--;
ap_msg->rc = -EINVAL;
ap_msg->receive(aq, ap_msg, NULL);
return AP_SM_WAIT_AGAIN;
default:
2020-07-02 17:22:01 +08:00
aq->dev_state = AP_DEV_STATE_ERROR;
aq->last_err_rc = status.response_code;
AP_DBF_WARN("%s RC 0x%02x on 0x%02x.%04x -> AP_DEV_STATE_ERROR\n",
__func__, status.response_code,
AP_QID_CARD(aq->qid), AP_QID_QUEUE(aq->qid));
return AP_SM_WAIT_NONE;
}
}
/**
* ap_sm_read_write(): Send and receive messages to/from an AP queue.
* @aq: pointer to the AP queue
*
* Returns AP_SM_WAIT_NONE, AP_SM_WAIT_AGAIN, or AP_SM_WAIT_INTERRUPT
*/
static enum ap_sm_wait ap_sm_read_write(struct ap_queue *aq)
{
return min(ap_sm_read(aq), ap_sm_write(aq));
}
/**
* ap_sm_reset(): Reset an AP queue.
* @qid: The AP queue number
*
* Submit the Reset command to an AP queue.
*/
static enum ap_sm_wait ap_sm_reset(struct ap_queue *aq)
{
struct ap_queue_status status;
status = ap_rapq(aq->qid);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
case AP_RESPONSE_RESET_IN_PROGRESS:
aq->sm_state = AP_SM_STATE_RESET_WAIT;
aq->interrupt = false;
return AP_SM_WAIT_TIMEOUT;
default:
2020-07-02 17:22:01 +08:00
aq->dev_state = AP_DEV_STATE_ERROR;
aq->last_err_rc = status.response_code;
AP_DBF_WARN("%s RC 0x%02x on 0x%02x.%04x -> AP_DEV_STATE_ERROR\n",
__func__, status.response_code,
AP_QID_CARD(aq->qid), AP_QID_QUEUE(aq->qid));
return AP_SM_WAIT_NONE;
}
}
/**
* ap_sm_reset_wait(): Test queue for completion of the reset operation
* @aq: pointer to the AP queue
*
* Returns AP_POLL_IMMEDIATELY, AP_POLL_AFTER_TIMEROUT or 0.
*/
static enum ap_sm_wait ap_sm_reset_wait(struct ap_queue *aq)
{
struct ap_queue_status status;
void *lsi_ptr;
if (aq->queue_count > 0 && aq->reply)
/* Try to read a completed message and get the status */
status = ap_sm_recv(aq);
else
/* Get the status with TAPQ */
status = ap_tapq(aq->qid, NULL);
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
lsi_ptr = ap_airq_ptr();
if (lsi_ptr && ap_queue_enable_irq(aq, lsi_ptr) == 0)
aq->sm_state = AP_SM_STATE_SETIRQ_WAIT;
else
aq->sm_state = (aq->queue_count > 0) ?
AP_SM_STATE_WORKING : AP_SM_STATE_IDLE;
return AP_SM_WAIT_AGAIN;
case AP_RESPONSE_BUSY:
case AP_RESPONSE_RESET_IN_PROGRESS:
return AP_SM_WAIT_TIMEOUT;
case AP_RESPONSE_Q_NOT_AVAIL:
case AP_RESPONSE_DECONFIGURED:
case AP_RESPONSE_CHECKSTOPPED:
default:
2020-07-02 17:22:01 +08:00
aq->dev_state = AP_DEV_STATE_ERROR;
aq->last_err_rc = status.response_code;
AP_DBF_WARN("%s RC 0x%02x on 0x%02x.%04x -> AP_DEV_STATE_ERROR\n",
__func__, status.response_code,
AP_QID_CARD(aq->qid), AP_QID_QUEUE(aq->qid));
return AP_SM_WAIT_NONE;
}
}
/**
* ap_sm_setirq_wait(): Test queue for completion of the irq enablement
* @aq: pointer to the AP queue
*
* Returns AP_POLL_IMMEDIATELY, AP_POLL_AFTER_TIMEROUT or 0.
*/
static enum ap_sm_wait ap_sm_setirq_wait(struct ap_queue *aq)
{
struct ap_queue_status status;
if (aq->queue_count > 0 && aq->reply)
/* Try to read a completed message and get the status */
status = ap_sm_recv(aq);
else
/* Get the status with TAPQ */
status = ap_tapq(aq->qid, NULL);
if (status.irq_enabled == 1) {
/* Irqs are now enabled */
aq->interrupt = true;
aq->sm_state = (aq->queue_count > 0) ?
AP_SM_STATE_WORKING : AP_SM_STATE_IDLE;
}
switch (status.response_code) {
case AP_RESPONSE_NORMAL:
if (aq->queue_count > 0)
return AP_SM_WAIT_AGAIN;
fallthrough;
case AP_RESPONSE_NO_PENDING_REPLY:
return AP_SM_WAIT_TIMEOUT;
default:
2020-07-02 17:22:01 +08:00
aq->dev_state = AP_DEV_STATE_ERROR;
aq->last_err_rc = status.response_code;
AP_DBF_WARN("%s RC 0x%02x on 0x%02x.%04x -> AP_DEV_STATE_ERROR\n",
__func__, status.response_code,
AP_QID_CARD(aq->qid), AP_QID_QUEUE(aq->qid));
return AP_SM_WAIT_NONE;
}
}
/*
* AP state machine jump table
*/
static ap_func_t *ap_jumptable[NR_AP_SM_STATES][NR_AP_SM_EVENTS] = {
[AP_SM_STATE_RESET_START] = {
[AP_SM_EVENT_POLL] = ap_sm_reset,
[AP_SM_EVENT_TIMEOUT] = ap_sm_nop,
},
[AP_SM_STATE_RESET_WAIT] = {
[AP_SM_EVENT_POLL] = ap_sm_reset_wait,
[AP_SM_EVENT_TIMEOUT] = ap_sm_nop,
},
[AP_SM_STATE_SETIRQ_WAIT] = {
[AP_SM_EVENT_POLL] = ap_sm_setirq_wait,
[AP_SM_EVENT_TIMEOUT] = ap_sm_nop,
},
[AP_SM_STATE_IDLE] = {
[AP_SM_EVENT_POLL] = ap_sm_write,
[AP_SM_EVENT_TIMEOUT] = ap_sm_nop,
},
[AP_SM_STATE_WORKING] = {
[AP_SM_EVENT_POLL] = ap_sm_read_write,
[AP_SM_EVENT_TIMEOUT] = ap_sm_reset,
},
[AP_SM_STATE_QUEUE_FULL] = {
[AP_SM_EVENT_POLL] = ap_sm_read,
[AP_SM_EVENT_TIMEOUT] = ap_sm_reset,
},
};
enum ap_sm_wait ap_sm_event(struct ap_queue *aq, enum ap_sm_event event)
{
2020-07-02 17:22:01 +08:00
if (aq->dev_state > AP_DEV_STATE_UNINITIATED)
return ap_jumptable[aq->sm_state][event](aq);
else
return AP_SM_WAIT_NONE;
}
enum ap_sm_wait ap_sm_event_loop(struct ap_queue *aq, enum ap_sm_event event)
{
enum ap_sm_wait wait;
while ((wait = ap_sm_event(aq, event)) == AP_SM_WAIT_AGAIN)
;
return wait;
}
/*
* AP queue related attributes.
*/
static ssize_t request_count_show(struct device *dev,
struct device_attribute *attr,
char *buf)
{
struct ap_queue *aq = to_ap_queue(dev);
2020-07-02 17:22:01 +08:00
bool valid = false;
u64 req_cnt;
spin_lock_bh(&aq->lock);
2020-07-02 17:22:01 +08:00
if (aq->dev_state > AP_DEV_STATE_UNINITIATED) {
req_cnt = aq->total_request_count;
valid = true;
}
spin_unlock_bh(&aq->lock);
2020-07-02 17:22:01 +08:00
if (valid)
return scnprintf(buf, PAGE_SIZE, "%llu\n", req_cnt);
else
return scnprintf(buf, PAGE_SIZE, "-\n");
}
static ssize_t request_count_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
struct ap_queue *aq = to_ap_queue(dev);
spin_lock_bh(&aq->lock);
aq->total_request_count = 0;
spin_unlock_bh(&aq->lock);
return count;
}
static DEVICE_ATTR_RW(request_count);
static ssize_t requestq_count_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct ap_queue *aq = to_ap_queue(dev);
unsigned int reqq_cnt = 0;
spin_lock_bh(&aq->lock);
2020-07-02 17:22:01 +08:00
if (aq->dev_state > AP_DEV_STATE_UNINITIATED)
reqq_cnt = aq->requestq_count;
spin_unlock_bh(&aq->lock);
return scnprintf(buf, PAGE_SIZE, "%d\n", reqq_cnt);
}
static DEVICE_ATTR_RO(requestq_count);
static ssize_t pendingq_count_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct ap_queue *aq = to_ap_queue(dev);
unsigned int penq_cnt = 0;
spin_lock_bh(&aq->lock);
2020-07-02 17:22:01 +08:00
if (aq->dev_state > AP_DEV_STATE_UNINITIATED)
penq_cnt = aq->pendingq_count;
spin_unlock_bh(&aq->lock);
return scnprintf(buf, PAGE_SIZE, "%d\n", penq_cnt);
}
static DEVICE_ATTR_RO(pendingq_count);
static ssize_t reset_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct ap_queue *aq = to_ap_queue(dev);
int rc = 0;
spin_lock_bh(&aq->lock);
switch (aq->sm_state) {
case AP_SM_STATE_RESET_START:
case AP_SM_STATE_RESET_WAIT:
rc = scnprintf(buf, PAGE_SIZE, "Reset in progress.\n");
break;
case AP_SM_STATE_WORKING:
case AP_SM_STATE_QUEUE_FULL:
rc = scnprintf(buf, PAGE_SIZE, "Reset Timer armed.\n");
break;
default:
rc = scnprintf(buf, PAGE_SIZE, "No Reset Timer set.\n");
}
spin_unlock_bh(&aq->lock);
return rc;
}
static ssize_t reset_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
struct ap_queue *aq = to_ap_queue(dev);
spin_lock_bh(&aq->lock);
__ap_flush_queue(aq);
aq->sm_state = AP_SM_STATE_RESET_START;
ap_wait(ap_sm_event(aq, AP_SM_EVENT_POLL));
spin_unlock_bh(&aq->lock);
AP_DBF(DBF_INFO, "reset queue=%02x.%04x triggered by user\n",
AP_QID_CARD(aq->qid), AP_QID_QUEUE(aq->qid));
return count;
}
static DEVICE_ATTR_RW(reset);
static ssize_t interrupt_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct ap_queue *aq = to_ap_queue(dev);
int rc = 0;
spin_lock_bh(&aq->lock);
if (aq->sm_state == AP_SM_STATE_SETIRQ_WAIT)
rc = scnprintf(buf, PAGE_SIZE, "Enable Interrupt pending.\n");
else if (aq->interrupt)
rc = scnprintf(buf, PAGE_SIZE, "Interrupts enabled.\n");
else
rc = scnprintf(buf, PAGE_SIZE, "Interrupts disabled.\n");
spin_unlock_bh(&aq->lock);
return rc;
}
static DEVICE_ATTR_RO(interrupt);
static ssize_t config_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct ap_queue *aq = to_ap_queue(dev);
int rc;
spin_lock_bh(&aq->lock);
rc = scnprintf(buf, PAGE_SIZE, "%d\n", aq->config ? 1 : 0);
spin_unlock_bh(&aq->lock);
return rc;
}
static DEVICE_ATTR_RO(config);
2020-07-02 17:22:01 +08:00
#ifdef CONFIG_ZCRYPT_DEBUG
static ssize_t states_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct ap_queue *aq = to_ap_queue(dev);
int rc = 0;
spin_lock_bh(&aq->lock);
/* queue device state */
switch (aq->dev_state) {
case AP_DEV_STATE_UNINITIATED:
rc = scnprintf(buf, PAGE_SIZE, "UNINITIATED\n");
break;
case AP_DEV_STATE_OPERATING:
rc = scnprintf(buf, PAGE_SIZE, "OPERATING");
break;
case AP_DEV_STATE_SHUTDOWN:
rc = scnprintf(buf, PAGE_SIZE, "SHUTDOWN");
break;
case AP_DEV_STATE_ERROR:
rc = scnprintf(buf, PAGE_SIZE, "ERROR");
break;
default:
rc = scnprintf(buf, PAGE_SIZE, "UNKNOWN");
}
/* state machine state */
if (aq->dev_state) {
switch (aq->sm_state) {
case AP_SM_STATE_RESET_START:
rc += scnprintf(buf + rc, PAGE_SIZE - rc,
" [RESET_START]\n");
break;
case AP_SM_STATE_RESET_WAIT:
rc += scnprintf(buf + rc, PAGE_SIZE - rc,
" [RESET_WAIT]\n");
break;
case AP_SM_STATE_SETIRQ_WAIT:
rc += scnprintf(buf + rc, PAGE_SIZE - rc,
" [SETIRQ_WAIT]\n");
break;
case AP_SM_STATE_IDLE:
rc += scnprintf(buf + rc, PAGE_SIZE - rc,
" [IDLE]\n");
break;
case AP_SM_STATE_WORKING:
rc += scnprintf(buf + rc, PAGE_SIZE - rc,
" [WORKING]\n");
break;
case AP_SM_STATE_QUEUE_FULL:
rc += scnprintf(buf + rc, PAGE_SIZE - rc,
" [FULL]\n");
break;
default:
rc += scnprintf(buf + rc, PAGE_SIZE - rc,
" [UNKNOWN]\n");
}
}
spin_unlock_bh(&aq->lock);
return rc;
}
static DEVICE_ATTR_RO(states);
static ssize_t last_err_rc_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
struct ap_queue *aq = to_ap_queue(dev);
int rc;
spin_lock_bh(&aq->lock);
rc = aq->last_err_rc;
spin_unlock_bh(&aq->lock);
switch (rc) {
case AP_RESPONSE_NORMAL:
return scnprintf(buf, PAGE_SIZE, "NORMAL\n");
case AP_RESPONSE_Q_NOT_AVAIL:
return scnprintf(buf, PAGE_SIZE, "Q_NOT_AVAIL\n");
case AP_RESPONSE_RESET_IN_PROGRESS:
return scnprintf(buf, PAGE_SIZE, "RESET_IN_PROGRESS\n");
case AP_RESPONSE_DECONFIGURED:
return scnprintf(buf, PAGE_SIZE, "DECONFIGURED\n");
case AP_RESPONSE_CHECKSTOPPED:
return scnprintf(buf, PAGE_SIZE, "CHECKSTOPPED\n");
case AP_RESPONSE_BUSY:
return scnprintf(buf, PAGE_SIZE, "BUSY\n");
case AP_RESPONSE_INVALID_ADDRESS:
return scnprintf(buf, PAGE_SIZE, "INVALID_ADDRESS\n");
case AP_RESPONSE_OTHERWISE_CHANGED:
return scnprintf(buf, PAGE_SIZE, "OTHERWISE_CHANGED\n");
case AP_RESPONSE_Q_FULL:
return scnprintf(buf, PAGE_SIZE, "Q_FULL/NO_PENDING_REPLY\n");
case AP_RESPONSE_INDEX_TOO_BIG:
return scnprintf(buf, PAGE_SIZE, "INDEX_TOO_BIG\n");
case AP_RESPONSE_NO_FIRST_PART:
return scnprintf(buf, PAGE_SIZE, "NO_FIRST_PART\n");
case AP_RESPONSE_MESSAGE_TOO_BIG:
return scnprintf(buf, PAGE_SIZE, "MESSAGE_TOO_BIG\n");
case AP_RESPONSE_REQ_FAC_NOT_INST:
return scnprintf(buf, PAGE_SIZE, "REQ_FAC_NOT_INST\n");
default:
return scnprintf(buf, PAGE_SIZE, "response code %d\n", rc);
}
}
static DEVICE_ATTR_RO(last_err_rc);
2020-07-02 17:22:01 +08:00
#endif
static struct attribute *ap_queue_dev_attrs[] = {
&dev_attr_request_count.attr,
&dev_attr_requestq_count.attr,
&dev_attr_pendingq_count.attr,
&dev_attr_reset.attr,
&dev_attr_interrupt.attr,
&dev_attr_config.attr,
2020-07-02 17:22:01 +08:00
#ifdef CONFIG_ZCRYPT_DEBUG
&dev_attr_states.attr,
&dev_attr_last_err_rc.attr,
2020-07-02 17:22:01 +08:00
#endif
NULL
};
static struct attribute_group ap_queue_dev_attr_group = {
.attrs = ap_queue_dev_attrs
};
static const struct attribute_group *ap_queue_dev_attr_groups[] = {
&ap_queue_dev_attr_group,
NULL
};
static struct device_type ap_queue_type = {
.name = "ap_queue",
.groups = ap_queue_dev_attr_groups,
};
static void ap_queue_device_release(struct device *dev)
{
struct ap_queue *aq = to_ap_queue(dev);
spin_lock_bh(&ap_queues_lock);
hash_del(&aq->hnode);
spin_unlock_bh(&ap_queues_lock);
kfree(aq);
}
struct ap_queue *ap_queue_create(ap_qid_t qid, int device_type)
{
struct ap_queue *aq;
aq = kzalloc(sizeof(*aq), GFP_KERNEL);
if (!aq)
return NULL;
aq->ap_dev.device.release = ap_queue_device_release;
aq->ap_dev.device.type = &ap_queue_type;
aq->ap_dev.device_type = device_type;
aq->qid = qid;
aq->interrupt = false;
spin_lock_init(&aq->lock);
INIT_LIST_HEAD(&aq->pendingq);
INIT_LIST_HEAD(&aq->requestq);
timer_setup(&aq->timeout, ap_request_timeout, 0);
return aq;
}
void ap_queue_init_reply(struct ap_queue *aq, struct ap_message *reply)
{
aq->reply = reply;
spin_lock_bh(&aq->lock);
ap_wait(ap_sm_event(aq, AP_SM_EVENT_POLL));
spin_unlock_bh(&aq->lock);
}
EXPORT_SYMBOL(ap_queue_init_reply);
/**
* ap_queue_message(): Queue a request to an AP device.
* @aq: The AP device to queue the message to
* @ap_msg: The message that is to be added
*/
2020-07-02 17:22:01 +08:00
int ap_queue_message(struct ap_queue *aq, struct ap_message *ap_msg)
{
2020-07-02 17:22:01 +08:00
int rc = 0;
/* msg needs to have a valid receive-callback */
BUG_ON(!ap_msg->receive);
spin_lock_bh(&aq->lock);
2020-07-02 17:22:01 +08:00
/* only allow to queue new messages if device state is ok */
if (aq->dev_state == AP_DEV_STATE_OPERATING) {
list_add_tail(&ap_msg->list, &aq->requestq);
aq->requestq_count++;
aq->total_request_count++;
atomic64_inc(&aq->card->total_request_count);
} else
rc = -ENODEV;
/* Send/receive as many request from the queue as possible. */
ap_wait(ap_sm_event_loop(aq, AP_SM_EVENT_POLL));
2020-07-02 17:22:01 +08:00
spin_unlock_bh(&aq->lock);
2020-07-02 17:22:01 +08:00
return rc;
}
EXPORT_SYMBOL(ap_queue_message);
/**
* ap_cancel_message(): Cancel a crypto request.
* @aq: The AP device that has the message queued
* @ap_msg: The message that is to be removed
*
* Cancel a crypto request. This is done by removing the request
* from the device pending or request queue. Note that the
* request stays on the AP queue. When it finishes the message
* reply will be discarded because the psmid can't be found.
*/
void ap_cancel_message(struct ap_queue *aq, struct ap_message *ap_msg)
{
struct ap_message *tmp;
spin_lock_bh(&aq->lock);
if (!list_empty(&ap_msg->list)) {
list_for_each_entry(tmp, &aq->pendingq, list)
if (tmp->psmid == ap_msg->psmid) {
aq->pendingq_count--;
goto found;
}
aq->requestq_count--;
found:
list_del_init(&ap_msg->list);
}
spin_unlock_bh(&aq->lock);
}
EXPORT_SYMBOL(ap_cancel_message);
/**
* __ap_flush_queue(): Flush requests.
* @aq: Pointer to the AP queue
*
* Flush all requests from the request/pending queue of an AP device.
*/
static void __ap_flush_queue(struct ap_queue *aq)
{
struct ap_message *ap_msg, *next;
list_for_each_entry_safe(ap_msg, next, &aq->pendingq, list) {
list_del_init(&ap_msg->list);
aq->pendingq_count--;
ap_msg->rc = -EAGAIN;
ap_msg->receive(aq, ap_msg, NULL);
}
list_for_each_entry_safe(ap_msg, next, &aq->requestq, list) {
list_del_init(&ap_msg->list);
aq->requestq_count--;
ap_msg->rc = -EAGAIN;
ap_msg->receive(aq, ap_msg, NULL);
}
aq->queue_count = 0;
}
void ap_flush_queue(struct ap_queue *aq)
{
spin_lock_bh(&aq->lock);
__ap_flush_queue(aq);
spin_unlock_bh(&aq->lock);
}
EXPORT_SYMBOL(ap_flush_queue);
s390/zcrypt: revisit ap device remove procedure Working with the vfio-ap driver let to some revisit of the way how an ap (queue) device is removed from the driver. With the current implementation all the cleanup was done before the driver even got notified about the removal. Now the ap queue removal is done in 3 steps: 1) A preparation step, all ap messages within the queue are flushed and so the driver does 'receive' them. Also a new state AP_STATE_REMOVE assigned to the queue makes sure there are no new messages queued in. 2) Now the driver's remove function is invoked and the driver should do the job of cleaning up it's internal administration lists or whatever. After 2) is done it is guaranteed, that the driver is not invoked any more. On the other hand the driver has to make sure that the APQN is not accessed any more after step 2 is complete. 3) Now the ap bus code does the job of total cleanup of the APQN. A reset with zero is triggered and the state of the queue goes to AP_STATE_UNBOUND. After step 3) is complete, the ap queue has no pending messages and the APQN is cleared and so there are no requests and replies lingering around in the firmware queue for this APQN. Also the interrupts are disabled. After these remove steps the ap queue device may be assigned to another driver. Stress testing this remove/probe procedure showed a problem with the correct module reference counting. The actual receive of an reply in the driver is done asynchronous with completions. So with a driver change on an ap queue the message flush triggers completions but the threads waiting for the completions may run at a time where the queue already has the new driver assigned. So the module_put() at receive time needs to be done on the driver module which queued the ap message. This change is also part of this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-02-23 00:24:11 +08:00
void ap_queue_prepare_remove(struct ap_queue *aq)
{
s390/zcrypt: revisit ap device remove procedure Working with the vfio-ap driver let to some revisit of the way how an ap (queue) device is removed from the driver. With the current implementation all the cleanup was done before the driver even got notified about the removal. Now the ap queue removal is done in 3 steps: 1) A preparation step, all ap messages within the queue are flushed and so the driver does 'receive' them. Also a new state AP_STATE_REMOVE assigned to the queue makes sure there are no new messages queued in. 2) Now the driver's remove function is invoked and the driver should do the job of cleaning up it's internal administration lists or whatever. After 2) is done it is guaranteed, that the driver is not invoked any more. On the other hand the driver has to make sure that the APQN is not accessed any more after step 2 is complete. 3) Now the ap bus code does the job of total cleanup of the APQN. A reset with zero is triggered and the state of the queue goes to AP_STATE_UNBOUND. After step 3) is complete, the ap queue has no pending messages and the APQN is cleared and so there are no requests and replies lingering around in the firmware queue for this APQN. Also the interrupts are disabled. After these remove steps the ap queue device may be assigned to another driver. Stress testing this remove/probe procedure showed a problem with the correct module reference counting. The actual receive of an reply in the driver is done asynchronous with completions. So with a driver change on an ap queue the message flush triggers completions but the threads waiting for the completions may run at a time where the queue already has the new driver assigned. So the module_put() at receive time needs to be done on the driver module which queued the ap message. This change is also part of this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-02-23 00:24:11 +08:00
spin_lock_bh(&aq->lock);
/* flush queue */
__ap_flush_queue(aq);
2020-07-02 17:22:01 +08:00
/* move queue device state to SHUTDOWN in progress */
aq->dev_state = AP_DEV_STATE_SHUTDOWN;
s390/zcrypt: revisit ap device remove procedure Working with the vfio-ap driver let to some revisit of the way how an ap (queue) device is removed from the driver. With the current implementation all the cleanup was done before the driver even got notified about the removal. Now the ap queue removal is done in 3 steps: 1) A preparation step, all ap messages within the queue are flushed and so the driver does 'receive' them. Also a new state AP_STATE_REMOVE assigned to the queue makes sure there are no new messages queued in. 2) Now the driver's remove function is invoked and the driver should do the job of cleaning up it's internal administration lists or whatever. After 2) is done it is guaranteed, that the driver is not invoked any more. On the other hand the driver has to make sure that the APQN is not accessed any more after step 2 is complete. 3) Now the ap bus code does the job of total cleanup of the APQN. A reset with zero is triggered and the state of the queue goes to AP_STATE_UNBOUND. After step 3) is complete, the ap queue has no pending messages and the APQN is cleared and so there are no requests and replies lingering around in the firmware queue for this APQN. Also the interrupts are disabled. After these remove steps the ap queue device may be assigned to another driver. Stress testing this remove/probe procedure showed a problem with the correct module reference counting. The actual receive of an reply in the driver is done asynchronous with completions. So with a driver change on an ap queue the message flush triggers completions but the threads waiting for the completions may run at a time where the queue already has the new driver assigned. So the module_put() at receive time needs to be done on the driver module which queued the ap message. This change is also part of this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-02-23 00:24:11 +08:00
spin_unlock_bh(&aq->lock);
del_timer_sync(&aq->timeout);
s390/zcrypt: revisit ap device remove procedure Working with the vfio-ap driver let to some revisit of the way how an ap (queue) device is removed from the driver. With the current implementation all the cleanup was done before the driver even got notified about the removal. Now the ap queue removal is done in 3 steps: 1) A preparation step, all ap messages within the queue are flushed and so the driver does 'receive' them. Also a new state AP_STATE_REMOVE assigned to the queue makes sure there are no new messages queued in. 2) Now the driver's remove function is invoked and the driver should do the job of cleaning up it's internal administration lists or whatever. After 2) is done it is guaranteed, that the driver is not invoked any more. On the other hand the driver has to make sure that the APQN is not accessed any more after step 2 is complete. 3) Now the ap bus code does the job of total cleanup of the APQN. A reset with zero is triggered and the state of the queue goes to AP_STATE_UNBOUND. After step 3) is complete, the ap queue has no pending messages and the APQN is cleared and so there are no requests and replies lingering around in the firmware queue for this APQN. Also the interrupts are disabled. After these remove steps the ap queue device may be assigned to another driver. Stress testing this remove/probe procedure showed a problem with the correct module reference counting. The actual receive of an reply in the driver is done asynchronous with completions. So with a driver change on an ap queue the message flush triggers completions but the threads waiting for the completions may run at a time where the queue already has the new driver assigned. So the module_put() at receive time needs to be done on the driver module which queued the ap message. This change is also part of this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-02-23 00:24:11 +08:00
}
s390/zcrypt: reinit ap queue state machine during device probe Until the vfio-ap driver came into live there was a well known agreement about the way how ap devices are initialized and their states when the driver's probe function is called. However, the vfio device driver when receiving an ap queue device does additional resets thereby removing the registration for interrupts for the ap device done by the ap bus core code. So when later the vfio driver releases the device and one of the default zcrypt drivers takes care of the device the interrupt registration needs to get renewed. The current code does no renew and result is that requests send into such a queue will never see a reply processed - the application hangs. This patch adds a function which resets the aq queue state machine for the ap queue device and triggers the walk through the initial states (which are reset and registration for interrupts). This function is now called before the driver's probe function is invoked. When the association between driver and device is released, the driver's remove function is called. The current implementation calls a ap queue function ap_queue_remove(). This invokation has been moved to the ap bus function to make the probe / remove pair for ap bus and drivers more symmetric. Fixes: 7e0bdbe5c21c ("s390/zcrypt: AP bus support for alternate driver(s)") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewd-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewd-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-09 21:59:24 +08:00
s390/zcrypt: revisit ap device remove procedure Working with the vfio-ap driver let to some revisit of the way how an ap (queue) device is removed from the driver. With the current implementation all the cleanup was done before the driver even got notified about the removal. Now the ap queue removal is done in 3 steps: 1) A preparation step, all ap messages within the queue are flushed and so the driver does 'receive' them. Also a new state AP_STATE_REMOVE assigned to the queue makes sure there are no new messages queued in. 2) Now the driver's remove function is invoked and the driver should do the job of cleaning up it's internal administration lists or whatever. After 2) is done it is guaranteed, that the driver is not invoked any more. On the other hand the driver has to make sure that the APQN is not accessed any more after step 2 is complete. 3) Now the ap bus code does the job of total cleanup of the APQN. A reset with zero is triggered and the state of the queue goes to AP_STATE_UNBOUND. After step 3) is complete, the ap queue has no pending messages and the APQN is cleared and so there are no requests and replies lingering around in the firmware queue for this APQN. Also the interrupts are disabled. After these remove steps the ap queue device may be assigned to another driver. Stress testing this remove/probe procedure showed a problem with the correct module reference counting. The actual receive of an reply in the driver is done asynchronous with completions. So with a driver change on an ap queue the message flush triggers completions but the threads waiting for the completions may run at a time where the queue already has the new driver assigned. So the module_put() at receive time needs to be done on the driver module which queued the ap message. This change is also part of this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-02-23 00:24:11 +08:00
void ap_queue_remove(struct ap_queue *aq)
{
/*
2020-07-02 17:22:01 +08:00
* all messages have been flushed and the device state
* is SHUTDOWN. Now reset with zero which also clears
* the irq registration and move the device state
* to the initial value AP_DEV_STATE_UNINITIATED.
s390/zcrypt: revisit ap device remove procedure Working with the vfio-ap driver let to some revisit of the way how an ap (queue) device is removed from the driver. With the current implementation all the cleanup was done before the driver even got notified about the removal. Now the ap queue removal is done in 3 steps: 1) A preparation step, all ap messages within the queue are flushed and so the driver does 'receive' them. Also a new state AP_STATE_REMOVE assigned to the queue makes sure there are no new messages queued in. 2) Now the driver's remove function is invoked and the driver should do the job of cleaning up it's internal administration lists or whatever. After 2) is done it is guaranteed, that the driver is not invoked any more. On the other hand the driver has to make sure that the APQN is not accessed any more after step 2 is complete. 3) Now the ap bus code does the job of total cleanup of the APQN. A reset with zero is triggered and the state of the queue goes to AP_STATE_UNBOUND. After step 3) is complete, the ap queue has no pending messages and the APQN is cleared and so there are no requests and replies lingering around in the firmware queue for this APQN. Also the interrupts are disabled. After these remove steps the ap queue device may be assigned to another driver. Stress testing this remove/probe procedure showed a problem with the correct module reference counting. The actual receive of an reply in the driver is done asynchronous with completions. So with a driver change on an ap queue the message flush triggers completions but the threads waiting for the completions may run at a time where the queue already has the new driver assigned. So the module_put() at receive time needs to be done on the driver module which queued the ap message. This change is also part of this patch. Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Ingo Franzki <ifranzki@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2019-02-23 00:24:11 +08:00
*/
s390/zcrypt: reinit ap queue state machine during device probe Until the vfio-ap driver came into live there was a well known agreement about the way how ap devices are initialized and their states when the driver's probe function is called. However, the vfio device driver when receiving an ap queue device does additional resets thereby removing the registration for interrupts for the ap device done by the ap bus core code. So when later the vfio driver releases the device and one of the default zcrypt drivers takes care of the device the interrupt registration needs to get renewed. The current code does no renew and result is that requests send into such a queue will never see a reply processed - the application hangs. This patch adds a function which resets the aq queue state machine for the ap queue device and triggers the walk through the initial states (which are reset and registration for interrupts). This function is now called before the driver's probe function is invoked. When the association between driver and device is released, the driver's remove function is called. The current implementation calls a ap queue function ap_queue_remove(). This invokation has been moved to the ap bus function to make the probe / remove pair for ap bus and drivers more symmetric. Fixes: 7e0bdbe5c21c ("s390/zcrypt: AP bus support for alternate driver(s)") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewd-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewd-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-09 21:59:24 +08:00
spin_lock_bh(&aq->lock);
ap_zapq(aq->qid);
2020-07-02 17:22:01 +08:00
aq->dev_state = AP_DEV_STATE_UNINITIATED;
s390/zcrypt: reinit ap queue state machine during device probe Until the vfio-ap driver came into live there was a well known agreement about the way how ap devices are initialized and their states when the driver's probe function is called. However, the vfio device driver when receiving an ap queue device does additional resets thereby removing the registration for interrupts for the ap device done by the ap bus core code. So when later the vfio driver releases the device and one of the default zcrypt drivers takes care of the device the interrupt registration needs to get renewed. The current code does no renew and result is that requests send into such a queue will never see a reply processed - the application hangs. This patch adds a function which resets the aq queue state machine for the ap queue device and triggers the walk through the initial states (which are reset and registration for interrupts). This function is now called before the driver's probe function is invoked. When the association between driver and device is released, the driver's remove function is called. The current implementation calls a ap queue function ap_queue_remove(). This invokation has been moved to the ap bus function to make the probe / remove pair for ap bus and drivers more symmetric. Fixes: 7e0bdbe5c21c ("s390/zcrypt: AP bus support for alternate driver(s)") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewd-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewd-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-09 21:59:24 +08:00
spin_unlock_bh(&aq->lock);
}
s390/zcrypt: reinit ap queue state machine during device probe Until the vfio-ap driver came into live there was a well known agreement about the way how ap devices are initialized and their states when the driver's probe function is called. However, the vfio device driver when receiving an ap queue device does additional resets thereby removing the registration for interrupts for the ap device done by the ap bus core code. So when later the vfio driver releases the device and one of the default zcrypt drivers takes care of the device the interrupt registration needs to get renewed. The current code does no renew and result is that requests send into such a queue will never see a reply processed - the application hangs. This patch adds a function which resets the aq queue state machine for the ap queue device and triggers the walk through the initial states (which are reset and registration for interrupts). This function is now called before the driver's probe function is invoked. When the association between driver and device is released, the driver's remove function is called. The current implementation calls a ap queue function ap_queue_remove(). This invokation has been moved to the ap bus function to make the probe / remove pair for ap bus and drivers more symmetric. Fixes: 7e0bdbe5c21c ("s390/zcrypt: AP bus support for alternate driver(s)") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewd-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewd-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-09 21:59:24 +08:00
void ap_queue_init_state(struct ap_queue *aq)
s390/zcrypt: reinit ap queue state machine during device probe Until the vfio-ap driver came into live there was a well known agreement about the way how ap devices are initialized and their states when the driver's probe function is called. However, the vfio device driver when receiving an ap queue device does additional resets thereby removing the registration for interrupts for the ap device done by the ap bus core code. So when later the vfio driver releases the device and one of the default zcrypt drivers takes care of the device the interrupt registration needs to get renewed. The current code does no renew and result is that requests send into such a queue will never see a reply processed - the application hangs. This patch adds a function which resets the aq queue state machine for the ap queue device and triggers the walk through the initial states (which are reset and registration for interrupts). This function is now called before the driver's probe function is invoked. When the association between driver and device is released, the driver's remove function is called. The current implementation calls a ap queue function ap_queue_remove(). This invokation has been moved to the ap bus function to make the probe / remove pair for ap bus and drivers more symmetric. Fixes: 7e0bdbe5c21c ("s390/zcrypt: AP bus support for alternate driver(s)") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewd-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewd-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-09 21:59:24 +08:00
{
spin_lock_bh(&aq->lock);
2020-07-02 17:22:01 +08:00
aq->dev_state = AP_DEV_STATE_OPERATING;
aq->sm_state = AP_SM_STATE_RESET_START;
ap_wait(ap_sm_event(aq, AP_SM_EVENT_POLL));
s390/zcrypt: reinit ap queue state machine during device probe Until the vfio-ap driver came into live there was a well known agreement about the way how ap devices are initialized and their states when the driver's probe function is called. However, the vfio device driver when receiving an ap queue device does additional resets thereby removing the registration for interrupts for the ap device done by the ap bus core code. So when later the vfio driver releases the device and one of the default zcrypt drivers takes care of the device the interrupt registration needs to get renewed. The current code does no renew and result is that requests send into such a queue will never see a reply processed - the application hangs. This patch adds a function which resets the aq queue state machine for the ap queue device and triggers the walk through the initial states (which are reset and registration for interrupts). This function is now called before the driver's probe function is invoked. When the association between driver and device is released, the driver's remove function is called. The current implementation calls a ap queue function ap_queue_remove(). This invokation has been moved to the ap bus function to make the probe / remove pair for ap bus and drivers more symmetric. Fixes: 7e0bdbe5c21c ("s390/zcrypt: AP bus support for alternate driver(s)") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Harald Freudenberger <freude@linux.ibm.com> Reviewd-by: Tony Krowiak <akrowiak@linux.ibm.com> Reviewd-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2018-11-09 21:59:24 +08:00
spin_unlock_bh(&aq->lock);
}
EXPORT_SYMBOL(ap_queue_init_state);