There is a long standing bug in linux tcp stack, about ACK messages sent
on behalf of TIME_WAIT sockets.
In the IP header of the ACK message, we choose to reflect TOS field of
incoming message, and this might break some setups.
Example of things that were broken :
- Routing using TOS as a selector
- Firewalls
- Trafic classification / shaping
We now remember in timewait structure the inet tos field and use it in
ACK generation, and route lookup.
Notes :
- We still reflect incoming TOS in RST messages.
- We could extend MuraliRaja Muniraju patch to report TOS value in
netlink messages for TIME_WAIT sockets.
- A patch is needed for IPv6
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Renato Westphal noticed that since commit a2835763e1
"rtnetlink: handle rtnl_link netlink notifications manually" was merged
we no longer send a netlink message when a networking device is moved
from one network namespace to another.
Fix this by adding the missing manual notification in dev_change_net_namespaces.
Since all network devices that are processed by dev_change_net_namspaces are
in the initialized state the complicated tests that guard the manual
rtmsg_ifinfo calls in rollback_registered and register_netdevice are
unnecessary and we can just perform a plain notification.
Cc: stable@kernel.org
Tested-by: Renato Westphal <renatowestphal@gmail.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There is bug in commit 5e2b61f(ipv4: Remove flowi from struct rtable).
It makes xfrm4_fill_dst() modify wrong data structure.
Signed-off-by: Zheng Yan <zheng.z.yan@intel.com>
Reported-by: Kim Phillips <kim.phillips@freescale.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If the device is down during suspend/resume, interrupts are enabled
without a registered interrupt handler, causing a storm of
unhandled interrupts until the IRQ is disabled because "nobody
cared".
Instead, check that the device is up before touching it in the
suspend/resume code.
Fixes https://bugzilla.kernel.org/show_bug.cgi?id=39112
Helped-by: Adrian Chadd <adrian@freebsd.org>
Helped-by: Mohammed Shafi <shafi.wireless@gmail.com>
Signed-off-by: Clemens Buchacher <drizzd@aon.at>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit f39925dbde
(ipv4: Cache learned redirect information in inetpeer.)
removed some ICMP packet validations which are required by
RFC 1122, section 3.2.2.2:
...
A Redirect message SHOULD be silently discarded if the new
gateway address it specifies is not on the same connected
(sub-) net through which the Redirect arrived [INTRO:2,
Appendix A], or if the source of the Redirect is not the
current first-hop gateway for the specified destination (see
Section 3.3.1).
Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The pair of functions,
* skb_clone_tx_timestamp()
* skb_complete_tx_timestamp()
were designed to allow timestamping in PHY devices. The first
function, called during the MAC driver's hard_xmit method, identifies
PTP protocol packets, clones them, and gives them to the PHY device
driver. The PHY driver may hold onto the packet and deliver it at a
later time using the second function, which adds the packet to the
socket's error queue.
As pointed out by Johannes, nothing prevents the socket from
disappearing while the cloned packet is sitting in the PHY driver
awaiting a timestamp. This patch fixes the issue by taking a reference
on the socket for each such packet. In addition, the comments
regarding the usage of these function are expanded to highlight the
rule that PHY drivers must use skb_complete_tx_timestamp() to release
the packet, in order to release the socket reference, too.
These functions first appeared in v2.6.36.
Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Richard Cochran <richard.cochran@omicron.at>
Cc: <stable@vger.kernel.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now tcp_md5_hash_header() has a const tcphdr argument, we can add more
const attributes to callers.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for reporting ring sizes via ethtool -g to the virtio_net
driver.
Signed-off-by: Rick Jones <rick.jones2@hp.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_md5_hash_header() writes into skb header a temporary zero value,
this might confuse other users of this area.
Since tcphdr is small (20 bytes), copy it in a temporary variable and
make the change in the copy.
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.infradead.org/iommu-2.6:
intel-iommu: fix superpage support in pfn_to_dma_pte()
intel-iommu: set iommu_superpage on VM domains to lowest common denominator
intel-iommu: fix return value of iommu_unmap() API
MAINTAINERS: Update VT-d entry for drivers/pci -> drivers/iommu move
intel-iommu: Export a flag indicating that the IOMMU is used for iGFX.
intel-iommu: Workaround IOTLB hang on Ironlake GPU
intel-iommu: Fix AB-BA lockdep report
Commit 4b239f458 ("x86-64, mm: Put early page table high") causes a S4
regression since 2.6.39, namely the machine reboots occasionally at S4
resume. It doesn't happen always, overall rate is about 1/20. But,
like other bugs, once when this happens, it continues to happen.
This patch fixes the problem by essentially reverting the memory
assignment in the older way.
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Cc: <stable@kernel.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <yinghai.lu@oracle.com>
[ We'll hopefully find the real fix, but that's too late for 3.1 now ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes the legacy usage of se_task->task_timer and associated
infrastructure that originally was used as a way to help manage buggy backend
SCSI LLDs that in certain cases would never return back an outstanding task.
This includes the removal of target_complete_timeout_work(), timeout logic
from transport_complete_task(), transport_task_timeout_handler(),
transport_start_task_timer(), the per device task_timeout configfs attribute,
and all task_timeout associated structure members and defines in
target_core_base.h
This is being removed in preparation to make transport_complete_task() run
in lock-less mode.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch converts target-core to use se_cmd->t_transport_sent instead of
a duplicated se_cmd->transport_sent member in a handful of locations.
It also updates iscsi_target to properly use ->t_transport_sent instead of
it's own iscsi_cmd_t->transport_sent value that was not being assigned.
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
If we only have a single task per command (which at least in my testing
is the by far most common case) we do not have to allocate a new per-task
S/G list but can reuse the one from the command.
(nab: Fix BIDI handling in transport_free_dev_tasks)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch fixes a bug for BIDI handling in transport_generic_new_cmd() where
cmd->t_task_cdbs_left and Co. where not taking into account the extra
task count generated during the first call to transport_allocate_data_tasks().
Cc: Christoph Hellwig <hch@lst.de>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
There were only two callers, and one of them always wants the call
to transport_allocate_data_tasks anyway. Also drop the constant
lba argument to transport_allocate_data_tasks and move the variables
inside it into the minimum required scope.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
These are two fairly small functions, and merging them gives a much
more readable control flow, and opportunities for more useful comments.
It also moves all code related to resources allocation closer together
and allows to remove a forward declaration for transport_allocate_tasks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This field is never used given that BIDI handling happens at the
command and not the task level. Remove it and the dead code in
pscsi that tries to work on it.
It also prevents pSCSI passthrough for the two currently enabled BIDI
commands now that task->task_sg_bidi support has been removed.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Remove the now unnecessary extra call to transport_subsystem_check_init() in
target_core_register_fabric(), and also merge transport_subsystem_reqmods()
directly into transport_subsystem_check_init().
Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Instead of abusing the target processing thread for offloading I/O
completion in the backends to user context add a new workqueue. This means
completions can be processed as fast as available CPU time allows it,
including in parallel with other completions and more importantly I/O
submission or QUEUE FULL retries. This should give much better performance
especially on loaded systems.
As a fallout we can merge all the completed states into a single
one.
On the downside this change complicates lun reset handling a bit by
requiring us to cancel a work item only for those states that have it
initialized. The alternative would be to either always initialize the work
item to a dummy handler, or always use the same handler and do a switch on
the state. The long term solution will be a flag that says that the command
has an initialized work item, but that's only going to be useful once we
have more users.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We never check for this state, and it makes testing for a completed
state much harder given that it overrides the existing state.
Also remove the unused deferred_t_state which is related to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We never queue an command with this state, and only set it in a completely
bogus place in tcm_fc.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We only need to decrement dev->depth_left if failing a command from
__transport_execute_tasks. Instead of doing it first thing in
transport_generic_request_failure and requiring a pseudo-flag argument
for it just opencode the decrement in the two callers (which should
be factored into a single one anyway)
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Currently we stop the timers for all tasks in a command fairly late during
I/O completion, which is fairly pointless and requires all kinds of safety
checks.
Instead delete pending timers early on in transport_complete_task, thus
ensuring no new timers firest after that. We take t_state_lock a bit later
in that function thus making sure currenly running timers are out of the
criticial section. To be completely sure the timer has finished we also
add another del_timer_sync call when freeing the task.
This also allows removing TF_TIMER_RUNNING as it would be equivalent
to TF_ACTIVE now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
TF_TIMER_STOP is useless as it only helps to mitigate a tiny race during
deleting the timer. But given that we have cleared TF_ACTIVE at this point
we already have another mitigation a few lines down the function.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
list_for_each_entry_safe only protects against deletions from the list,
but not against any concurrent modifications. Given that we drop
t_state_lock inside the loop it is not safe in transport_free_dev_tasks.
Instead of use a local dispose_list that we move all tasks that are
to be deleted to. This is safe because we never do list_emptry checks
on t_list to check if a command is on the list anywhere.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Change one remaining user of transport_cmd_check_stop(cmd, 2, 0) to the
transport_cmd_check_stop_to_fabric wrapper.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We always operated on the same queue, so move finding it into the function,
just like we do for all other helpers operating on it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Add a new boolean at_head parameter to transport_add_cmd_to_queue and thus
obsolete the SCF_EMULATE_QUEUE_FULL flag.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Remove the need for the transport_qf_callback callback by making
sure we have specific states with specific handlers for the two
queue full cases.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Remove the dpo_emulated, fua_write_emulated, fua_read_emulated and
write_cache_emulated methods, and replace them with a simple bitfields in
se_subsystem_api in those cases where they ever returned one.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Do not block the submitting thread when handling a SYNCHRONIZE CACHE command,
but implement it asynchronously by sending the FLUSH command ourself and
calling transport_complete_sync_cache from the completion handler.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch fixes a bug with the handling of REPORT TARGET PORT GROUPS
containing a smaller allocation length than the payload requires causing
memory writes beyond the end of the buffer. This patch checks for the
minimum 4 byte length for the response payload length, and also checks
upon each loop of T10_ALUA(su_dev)->tg_pt_gps_list to ensure the Target
port group and Target port descriptor list is able to fit into the
remaining allocation length.
If the response payload exceeds the allocation length length, then rd_len
is still increments to indicate to the initiator that the payload has
been truncated.
Reported-by: Roland Dreier <roland@purestorage.com>
Cc: stable@kernel.org
Signed-off-by: Nicholas Bellinger <nab@risingtidesystems.com>
Add a switch statement implementing the CDB LBA/len update directly
in target_get_task_cdb and remove the old ->transport_split_cdb
callback and all its implementations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Instead of calling out to the backends from the core to get a per-task
CDB and then modify it for the LBA/len pair used for this CDB provide
a helper that writes the adjusted CDB into a provided buffer and call
this method from ->do_task in pscsi.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The most commonly used file, iblock and rd backends have no use for
a per-task CDB and thus don't need a method to copy it into their
otherwise unused CDB fields.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Rearrange the fields in se_task to avoid holes. Also increase the
flags field to 16 bits as we have the space for it, and this makes
adding new flags safer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This is a squashed version of the following unnecessary se_task structure
member removal patches:
target: remove the task_execute_queue field in se_task
Instead of using a separate flag we can simply do list_emptry checks
on t_execute_list if we make sure to always use list_del_init to remove
a task from the list. Also factor some duplicate code into a new
__transport_remove_task_from_execute_queue helper.
target: remove the read-only task_no field in se_task
The task_no field never was initialized and only used in debug printks,
so kill it.
target: remove the task_padded_sg field in se_task
This field is only check in one place and not actually needed there.
Rationale:
- transport_do_task_sg_chain asserts that we have task_sg_chaining
set early on
- we only make use of the sg_prev_nents field we calculate based on it
if there is another sg list that gets chained onto this one, which
never happens for the last (or only) task.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Replace various atomic_t variables that were mostly under t_state_lock
with new flags in task_flags. Note that the execution error path
didn't take t_state_lock before, so add it there.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This is a squashed version of the following se_task cleanup patches:
target: remove the unused task_state_flags field in se_task
target: remove the unused se_obj_ptr field in se_task
target: remove the se_dev field in se_task
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This is a squashed version of the following target_core_base.h
cleanup patches:
target: remove the unused SHUTDOWN_SIGS defintion
target: remove unused se_mem leftovers
target: remove the unused map_func_t typedef
target: move TRANSPORT_IOV_DATA_BUFFER to the iscsi-specific code
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch removes the legacy device active I/O shutdown code that was
originally called from transport_processing_thread() context during shutdown
including transport_processing_shutdown() and transport_release_all_cmds().
This is due to the fact that in modern configfs control plane code by the
time shutdown of an se_device instance in transport_processing_thread()
is allowed to occur via:
rmdir /sys/kernel/config/target/core/$HBA/$DEV
all active I/O will already have been ceased while removing active configfs
fabric Port/LUN symlinks. Eg: the removal of an active se_device is protected
by inter-module VFS references from active Port/LUN symlinks.
Two WARN_ON() checks have been added in their place before exiting
transport_processing_thread() to watch out for any leaked descriptors.
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch merges transport_cmd_finish_abort_tmr() logic into a single
transport_cmd_finish_abort() function by adding a cmd->se_tmr_req check
around transport_lun_remove_cmd(), and updates the single caller within
core_tmr_drain_tmr_list().
Reported-by: Christoph Hellwig <hch@lst.de>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>