OpenCloudOS-Kernel

Commit Graph

Author	SHA1	Message	Date
Stephen Hemminger	f6b2db084b	vmbus: make sysfs names consistent with PCI In commit 9a56e5d6a0ba ("Drivers: hv: make VMBus bus ids persistent") the name of vmbus devices in sysfs changed to be (in 4.9-rc1): /sys/bus/vmbus/vmbus-6aebe374-9ba0-11e6-933c-00259086b36b The prefix ("vmbus-") is redundant and differs from how PCI is represented in sysfs. Therefore simplify to: /sys/bus/vmbus/6aebe374-9ba0-11e6-933c-00259086b36b Please merge this before 4.9 is released and the old format has to live forever. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-11-01 09:07:13 -06:00
Long Li	407a3aee6e	hv: do not lose pending heartbeat vmbus packets The host keeps sending heartbeat packets independent of the guest responding to them. Even though we respond to the heartbeat messages at interrupt level, we can have situations where there maybe multiple heartbeat messages pending that have not been responded to. For instance this occurs when the VM is paused and the host continues to send the heartbeat messages. Address this issue by draining and responding to all the heartbeat messages that maybe pending. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-10-25 08:52:10 +02:00
Vitaly Kuznetsov	e7fca5d860	Drivers: hv: get rid of id in struct vmbus_channel The auto incremented counter is not being used anymore, get rid of it. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-27 12:35:49 +02:00
Vitaly Kuznetsov	b294809dbf	Drivers: hv: make VMBus bus ids persistent Some tools use bus ids to identify devices and they count on the fact that these ids are persistent across reboot. This may be not true for VMBus as we use auto incremented counter from alloc_channel() as such id. Switch to using if_instance from channel offer, this id is supposed to be persistent. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-27 12:35:49 +02:00
Vivek yadav	3ba1eb17b6	Drivers: hv: hv_util: Avoid dynamic allocation in time synch Under stress, we have seen allocation failure in time synch code. Avoid this dynamic allocation. Signed-off-by: Vivek Yadav <vyadav@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-09 13:48:23 +02:00
Alex Ng	8e1d260738	Drivers: hv: utils: Support TimeSync version 4.0 protocol samples. This enables support for more accurate TimeSync v4 samples when hosted under Windows Server 2016 and newer hosts. The new time samples include a "vmreferencetime" field that represents the guest's TSC value when the host generated its time sample. This value lets the guest calculate the latency in receiving the time sample. The latency is added to the sample host time prior to updating the clock. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-08 13:53:07 +02:00
Alex Ng	2e338f7e03	Drivers: hv: utils: Use TimeSync samples to adjust the clock after boot. Only the first 50 samples after boot were being used to discipline the clock. After the first 50 samples, any samples from the host were ignored and the guest clock would eventually drift from the host clock. This patch allows TimeSync-enabled guests to continuously synchronize the clock with the host clock, even after the first 50 samples. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-08 13:53:07 +02:00
Alex Ng	abeda47ebb	Drivers: hv: utils: Rename version definitions to reflect protocol version. Different Windows host versions may reuse the same protocol version when negotiating the TimeSync, Shutdown, and Heartbeat protocols. We should only refer to the protocol version to avoid conflating the two concepts. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-08 13:53:07 +02:00
Dexuan Cui	0f98829a99	Drivers: hv: vmbus: suppress some "hv_vmbus: Unknown GUID" warnings Some VMBus devices are not needed by Linux guest[1][2], and, VMBus channels of Hyper-V Sockets don't really mean usual synthetic devices, so let's suppress the warnings for them. [1] https://support.microsoft.com/en-us/kb/2925727 [2] https://msdn.microsoft.com/en-us/library/jj980180(v=winembedded.81).aspx Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-07 12:57:55 +02:00
Stephen Hemminger	e2e8084134	Driver: hv: vmbus: Make mmio resource local This fixes a sparse warning because hyperv_mmio resources are only used in this one file and should be static. Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-07 12:57:55 +02:00
Alex Ng	db886e4d24	Drivers: hv: utils: Check VSS daemon is listening before a hot backup Hyper-V host will send a VSS_OP_HOT_BACKUP request to check if guest is ready for a live backup/snapshot. The driver should respond to the check only if the daemon is running and listening to requests. This allows the host to fallback to standard snapshots in case the VSS daemon is not running. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-02 17:22:51 +02:00
Alex Ng	497af84b81	Drivers: hv: utils: Continue to poll VSS channel after handling requests. Multiple VSS_OP_HOT_BACKUP requests may arrive in quick succession, even though the host only signals once. The driver wass handling the first request while ignoring the others in the ring buffer. We should poll the VSS channel after handling a request to continue processing other requests. Signed-off-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-02 17:22:51 +02:00
K. Y. Srinivasan	509879bdb3	Drivers: hv: Introduce a policy for controlling channel affinity Introduce a mechanism to control how channels will be affinitized. We will support two policies: 1. HV_BALANCED: All performance critical channels will be dstributed evenly amongst all the available NUMA nodes. Once the Node is assigned, we will assign the CPU based on a simple round robin scheme. 2. HV_LOCALIZED: Only the primary channels are distributed across all NUMA nodes. Sub-channels will be in the same NUMA node as the primary channel. This is the current behaviour. The default policy will be the HV_BALANCED as it can minimize the remote memory access on NUMA machines with applications that span NUMA nodes. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-02 17:22:51 +02:00
Vitaly Kuznetsov	f24f0b495b	Drivers: hv: ring_buffer: use wrap around mappings in hv_copy{from, to}_ringbuffer() With wrap around mappings for ring buffers we can always use a single memcpy() to do the job. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-02 17:22:51 +02:00
Vitaly Kuznetsov	9988ce6856	Drivers: hv: ring_buffer: wrap around mappings for ring buffers Make it possible to always use a single memcpy() or to provide a direct link to a packet on the ring buffer by creating virtual mapping for two copies of the ring buffer with vmap(). Utilize currently empty hv_ringbuffer_cleanup() to do the unmap. While on it, replace sizeof(struct hv_ring_buffer) check in hv_ringbuffer_init() with BUILD_BUG_ON() as it is a compile time check. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-02 17:22:51 +02:00
Vitaly Kuznetsov	98f531b10d	Drivers: hv: cleanup vmbus_open() for wrap around mappings In preparation for doing wrap around mappings for ring buffers cleanup vmbus_open() function: - check that ring sizes are PAGE_SIZE aligned (they are for all in-kernel drivers now); - kfree(open_info) on error only after we kzalloc() it (not an issue as it is valid to call kfree(NULL); - rename poorly named labels; - use alloc_pages() instead of __get_free_pages() as we need struct page pointer for future. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Tested-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-09-02 17:22:51 +02:00
Alex Ng	b605c2d913	Drivers: hv: balloon: Use available memory value in pressure report Reports for available memory should use the si_mem_available() value. The previous freeram value does not include available page cache memory. Signed-off-by: Alex Ng <alexng@messages.microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:42 +02:00
Vitaly Kuznetsov	eece30b9f0	Drivers: hv: balloon: replace ha_region_mutex with spinlock lockdep reports possible circular locking dependency when udev is used for memory onlining: systemd-udevd/3996 is trying to acquire lock: ((memory_chain).rwsem){++++.+}, at: [<ffffffff810d137e>] __blocking_notifier_call_chain+0x4e/0xc0 but task is already holding lock: (&dm_device.ha_region_mutex){+.+.+.}, at: [<ffffffffa015382e>] hv_memory_notifier+0x5e/0xc0 [hv_balloon] ... which is probably a false positive because we take and release ha_region_mutex from memory notifier chain depending on the arg. No real deadlocks were reported so far (though I'm not really sure about preemptible kernels...) but we don't really need to hold the mutex for so long. We use it to protect ha_region_list (and its members) and the num_pages_onlined counter. None of these operations require us to sleep and nothing is slow, switch to using spinlock with interrupts disabled. While on it, replace list_for_each -> list_for_each_entry as we actually need entries in all these cases, drop meaningless list_empty() checks. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
Vitaly Kuznetsov	a132c54cbc	Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled With the recently introduced in-kernel memory onlining (MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages to come online in the driver and we can get rid of the waiting. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
Vitaly Kuznetsov	cb7a5724c7	Drivers: hv: balloon: account for gaps in hot add regions I'm observing the following hot add requests from the WS2012 host: hot_add_req: start_pfn = 0x108200 count = 330752 hot_add_req: start_pfn = 0x158e00 count = 193536 hot_add_req: start_pfn = 0x188400 count = 239616 As the host doesn't specify hot add regions we're trying to create 128Mb-aligned region covering the first request, we create the 0x108000 - 0x160000 region and we add 0x108000 - 0x158e00 memory. The second request passes the pfn_covered() check, we enlarge the region to 0x108000 - 0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the third request as it starts at 0x188400 so there is a 0x200 gap which is not covered. As the end of our region is 0x190000 now it again passes the pfn_covered() check were we just adjust the covered_end_pfn and make it 0x188400 instead of 0x188200 which means that we'll try to online 0x188200-0x188400 pages but these pages were never assigned to us and we crash. We can't react to such requests by creating new hot add regions as it may happen that the whole suggested range falls into the previously identified 128Mb-aligned area so we'll end up adding nothing or create intersecting regions and our current logic doesn't allow that. Instead, create a list of such 'gaps' and check for them in the page online callback. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
Vitaly Kuznetsov	7cf3b79ec8	Drivers: hv: balloon: keep track of where ha_region starts Windows 2012 (non-R2) does not specify hot add region in hot add requests and the logic in hot_add_req() is trying to find a 128Mb-aligned region covering the request. It may also happen that host's requests are not 128Mb aligned and the created ha_region will start before the first specified PFN. We can't online these non-present pages but we don't remember the real start of the region. This is a regression introduced by the commit `5abbbb75d7` ("Drivers: hv: hv_balloon: don't lose memory when onlining order is not natural"). While the idea of keeping the 'moving window' was wrong (as there is no guarantee that hot add requests come ordered) we should still keep track of covered_start_pfn. This is not a revert, the logic is different. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
K. Y. Srinivasan	3724287c0e	Drivers: hv: vmbus: Implement a mechanism to tag the channel for low latency On Hyper-V, performance critical channels use the monitor mechanism to signal the host when the guest posts mesages for the host. This mechanism minimizes the hypervisor intercepts and also makes the host more efficient in that each time the host is woken up, it processes a batch of messages as opposed to just one. The goal here is improve the throughput and this is at the expense of increased latency. Implement a mechanism to let the client driver decide if latency is important. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
K. Y. Srinivasan	8de0d7e951	Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg() The current delay between retries is unnecessarily high and is negatively affecting the time it takes to boot the system. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
K. Y. Srinivasan	ccef9bcc02	Drivers: hv: vmbus: Enable explicit signaling policy for NIC channels For synthetic NIC channels, enable explicit signaling policy as netvsc wants to explicitly control when the host is to be signaled. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
Dexuan Cui	638fea33ae	Drivers: hv: vmbus: fix the race when querying & updating the percpu list There is a rare race when we remove an entry from the global list hv_context.percpu_list[cpu] in hv_process_channel_removal() -> percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() -> process_chn_event() -> pcpu_relid2channel() is trying to query the list, we can get the kernel fault. Similarly, we also have the issue in the code path: vmbus_process_offer() -> percpu_channel_enq(). We can resolve the issue by disabling the tasklet when updating the list. The patch also moves vmbus_release_relid() to a later place where the channel has been removed from the per-cpu and the global lists. Reported-by: Rolf Neugebauer <rolf.neugebauer@docker.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
Vitaly Kuznetsov	e0fa3e5e7d	Drivers: hv: utils: fix a race on userspace daemons registration Background: userspace daemons registration protocol for Hyper-V utilities drivers has two steps: 1) daemon writes its own version to kernel 2) kernel reads it and replies with module version at this point we consider the handshake procedure being completed and we do hv_poll_channel() transitioning the utility device to HVUTIL_READY state. At this point we're ready to handle messages from kernel. When hvutil_transport is in HVUTIL_TRANSPORT_CHARDEV mode we have a single buffer for outgoing message. hvutil_transport_send() puts to this buffer and till the buffer is cleared with hvt_op_read() returns -EFAULT to all consequent calls. Host<->guest protocol guarantees there is no more than one request at a time and we will not get new requests till we reply to the previous one so this single message buffer is enough. Now to the race. When we finish negotiation procedure and send kernel module version to userspace with hvutil_transport_send() it goes into the above mentioned buffer and if the daemon is slow enough to read it from there we can get a collision when a request from the host comes, we won't be able to put anything to the buffer so the request will be lost. To solve the issue we need to know when the negotiation is really done (when the version message is read by the daemon) and transition to HVUTIL_READY state after this happens. Implement a callback on read to support this. Old style netlink communication is not affected by the change, we don't really know when these messages are delivered but we don't have a single message buffer there. Reported-by: Barry Davis <barry_davis@stormagic.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
Vitaly Kuznetsov	396e287fa2	Drivers: hv: get rid of timeout in vmbus_open() vmbus_teardown_gpadl() can result in infinite wait when it is called on 5 second timeout in vmbus_open(). The issue is caused by the fact that gpadl teardown operation won't ever succeed for an opened channel and the timeout isn't always enough. As a guest, we can always trust the host to respond to our request (and there is nothing we can do if it doesn't). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
Vitaly Kuznetsov	7cc80c9807	Drivers: hv: don't leak memory in vmbus_establish_gpadl() In some cases create_gpadl_header() allocates submessages but we never free them. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:41 +02:00
Vitaly Kuznetsov	4d63763296	Drivers: hv: get rid of redundant messagecount in create_gpadl_header() We use messagecount only once in vmbus_establish_gpadl() to check if it is safe to iterate through the submsglist. We can just initialize the list header in all cases in create_gpadl_header() instead. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:40 +02:00
Vitaly Kuznetsov	a9f61ca793	Drivers: hv: avoid vfree() on crash When we crash from NMI context (e.g. after NMI injection from host when 'sysctl -w kernel.unknown_nmi_panic=1' is set) we hit kernel BUG at mm/vmalloc.c:1530! as vfree() is denied. While the issue could be solved with in_nmi() check instead I opted for skipping vfree on all sorts of crashes to reduce the amount of work which can cause consequent crashes. We don't really need to free anything on crash. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-08-31 13:05:40 +02:00
Stephan Mueller	4b44f2d18a	random: add interrupt callback to VMBus IRQ handler The Hyper-V Linux Integration Services use the VMBus implementation for communication with the Hypervisor. VMBus registers its own interrupt handler that completely bypasses the common Linux interrupt handling. This implies that the interrupt entropy collector is not triggered. This patch adds the interrupt entropy collection callback into the VMBus interrupt handler function. Cc: stable@kernel.org Signed-off-by: Stephan Mueller <stephan.mueller@atsec.com> Signed-off-by: Stephan Mueller <smueller@chronox.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2016-06-13 11:54:33 -04:00
Vitaly Kuznetsov	d19a55d6ed	Drivers: hv: balloon: reset host_specified_ha_region We set host_specified_ha_region = true on certain request but this is a global state which stays 'true' forever. We need to reset it when we receive a request where ha_region is not specified. I did not see any real issues, the bug was found by code inspection. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-01 09:23:14 -07:00
Vitaly Kuznetsov	77c0c9735b	Drivers: hv: balloon: don't crash when memory is added in non-sorted order When we iterate through all HA regions in handle_pg_range() we have an assumption that all these regions are sorted in the list and the 'start_pfn >= has->end_pfn' check is enough to find the proper region. Unfortunately it's not the case with WS2016 where host can hot-add regions in a different order. We end up modifying the wrong HA region and crashing later on pages online. Modify the check to make sure we found the region we were searching for while iterating. Fix the same check in pfn_covered() as well. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-01 09:23:14 -07:00
Vitaly Kuznetsov	cd95aad557	Drivers: hv: vmbus: handle various crash scenarios Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always delivered to the CPU which was used for initial contact or to CPU0 depending on host version. vmbus_wait_for_unload() doesn't account for the fact that in case we're crashing on some other CPU we won't get the CHANNELMSG_UNLOAD_RESPONSE message and our wait on the current CPU will never end. Do the following: 1) Check for completion_done() in the loop. In case interrupt handler is still alive we'll get the confirmation we need. 2) Read message pages for all CPUs message page as we're unsure where CHANNELMSG_UNLOAD_RESPONSE is going to be delivered to. We can race with still-alive interrupt handler doing the same, add cmpxchg() to vmbus_signal_eom() to not lose CHANNELMSG_UNLOAD_RESPONSE message. 3) Cleanup message pages on all CPUs. This is required (at least for the current CPU as we're clearing CPU0 messages now but we may want to bring up additional CPUs on crash) as new messages won't be delivered till we consume what's pending. On boot we'll place message pages somewhere else and we won't be able to read stale messages. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-01 09:23:14 -07:00
Vitaly Kuznetsov	4dbfc2e680	Drivers: hv: kvp: fix IP Failover Hyper-V VMs can be replicated to another hosts and there is a feature to set different IP for replicas, it is called 'Failover TCP/IP'. When such guest starts Hyper-V host sends it KVP_OP_SET_IP_INFO message as soon as we finish negotiation procedure. The problem is that it can happen (and it actually happens) before userspace daemon connects and we reply with HV_E_FAIL to the message. As there are no repetitions we fail to set the requested IP. Solve the issue by postponing our reply to the negotiation message till userspace daemon is connected. We can't wait too long as there is a host-side timeout (cca. 75 seconds) and if we fail to reply in this time frame the whole KVP service will become inactive. The solution is not ideal - if it takes userspace daemon more than 60 seconds to connect IP Failover will still fail but I don't see a solution with our current separation between kernel and userspace parts. Other two modules (VSS and FCOPY) don't require such delay, leave them untouched. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-01 09:23:14 -07:00
Jake Oshins	ea37a6b8a0	drivers:hv: Separate out frame buffer logic when picking MMIO range Simplify the logic that picks MMIO ranges by pulling out the logic related to trying to lay frame buffer claim on top of where the firmware placed the frame buffer. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:01:37 -07:00
Jake Oshins	6d146aefba	drivers:hv: Record MMIO range in use by frame buffer Later in the boot sequence, we need to figure out which memory ranges can be given out to various paravirtual drivers. The hyperv_fb driver should, ideally, be placed right on top of the frame buffer, without some other device getting plopped on top of this range in the meantime. Recording this now allows that to be guaranteed. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:01:37 -07:00
Jake Oshins	be000f93e5	drivers:hv: Track allocations of children of hv_vmbus in private resource tree This patch changes vmbus_allocate_mmio() and vmbus_free_mmio() so that when child paravirtual devices allocate memory-mapped I/O space, they allocate it privately from a resource tree pointed at by hyperv_mmio and also by the public resource tree iomem_resource. This allows the region to be marked as "busy" in the private tree, but a "bridge window" in the public tree, guaranteeing that no two bridge windows will overlap each other but while also allowing the PCI device children of the bridge windows to overlap that window. One might conclude that this belongs in the pnp layer, rather than in this driver. Rafael Wysocki, the maintainter of the pnp layer, has previously asked that we not modify the pnp layer as it is considered deprecated. This patch is thus essentially a workaround. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:01:37 -07:00
Jake Oshins	23a0683186	drivers:hv: Reverse order of resources in hyperv_mmio A patch later in this series allocates child nodes in this resource tree. For that to work, this tree needs to be sorted in ascending order. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:01:37 -07:00
Jake Oshins	97fb77dc87	drivers:hv: Make a function to free mmio regions through vmbus This patch introduces a function that reverses everything done by vmbus_allocate_mmio(). Existing code just called release_mem_region(). Future patches in this series require a more complex sequence of actions, so this function is introduced to wrap those actions. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:01:37 -07:00
Jake Oshins	e16dad6bfe	drivers:hv: Lock access to hyperv_mmio resource tree In existing code, this tree of resources is created in single-threaded code and never modified after it is created, and thus needs no locking. This patch introduces a semaphore for tree access, as other patches in this series introduce run-time modifications of this resource tree which can happen on multiple threads. Signed-off-by: Jake Oshins <jakeo@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:01:37 -07:00
K. Y. Srinivasan	ab028db41c	Drivers: hv: vmbus: Implement APIs to support "in place" consumption of vmbus packets Implement APIs for in-place consumption of vmbus packets. Currently, each packet is copied and processed one at a time and as part of processing each packet we potentially may signal the host (if it is waiting for room to produce a packet). These APIs help batched in-place processing of vmbus packets. We also optimize host signaling by having a separate API to signal the end of in-place consumption. With netvsc using these APIs, on an iperf run on average I see about 20X reduction in checks to signal the host. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:00:19 -07:00
K. Y. Srinivasan	687f32e6d9	Drivers: hv: vmbus: Move some ring buffer functions to hyperv.h In preparation for implementing APIs for in-place consumption of VMBUS packets, movve some ring buffer functionality into hyperv.h Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:00:19 -07:00
K. Y. Srinivasan	5cc472477f	Drivers: hv: vmbus: Export the vmbus_set_event() API In preparation for moving some ring buffer functionality out of the vmbus driver, export the API for signaling the host. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:00:19 -07:00
K. Y. Srinivasan	dcd0eeca44	Drivers: hv: vmbus: Use the new virt_xx barrier code Use the virt_xx barriers that have been defined for use in virtual machines. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:00:19 -07:00
K. Y. Srinivasan	d45faaeedb	Drivers: hv: vmbus: Use READ_ONCE() to read variables that are volatile Use the READ_ONCE macro to access variabes that can change asynchronously. This is the recommended mechanism for dealing with "unsafe" compiler optimizations. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:00:19 -07:00
K. Y. Srinivasan	a6341f0000	Drivers: hv: vmbus: Introduce functions for estimating room in the ring buffer Introduce separate functions for estimating how much can be read from and written to the ring buffer. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:00:19 -07:00
K. Y. Srinivasan	a389fcfd2c	Drivers: hv: vmbus: Fix signaling logic in hv_need_to_signal_on_read() On the consumer side, we have interrupt driven flow management of the producer. It is sufficient to base the signaling decision on the amount of space that is available to write after the read is complete. The current code samples the previous available space and uses this in making the signaling decision. This state can be stale and is unnecessary. Since the state can be stale, we end up not signaling the host (when we should) and this can result in a hang. Fix this problem by removing the unnecessary check. I would like to thank Arseney Romanenko <arseneyr@microsoft.com> for pointing out this issue. Also, issue a full memory barrier before making the signaling descision to correctly deal with potential reordering of the write (read index) followed by the read of pending_sz. Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Tested-by: Dexuan Cui <decui@microsoft.com> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-30 14:00:16 -07:00
Linus Torvalds	8eee93e257	Char/Misc patches for 4.6-rc1 Here is the big char/misc driver update for 4.6-rc1. The majority of the patches here is hwtracing and some new mic drivers, but there's a lot of other driver updates as well. Full details in the shortlog. All have been in linux-next for a while with no reported issues. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEABECAAYFAlbp9IcACgkQMUfUDdst+ykyJgCeLTC2QNGrh51kiJglkVJ0yD36 q4MAn0NkvSX2+iv5Jq8MaX6UQoRa4Nun =MNjR -----END PGP SIGNATURE----- Merge tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc updates from Greg KH: "Here is the big char/misc driver update for 4.6-rc1. The majority of the patches here is hwtracing and some new mic drivers, but there's a lot of other driver updates as well. Full details in the shortlog. All have been in linux-next for a while with no reported issues" * tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (238 commits) goldfish: Fix build error of missing ioremap on UM nvmem: mediatek: Fix later provider initialization nvmem: imx-ocotp: Fix return value of imx_ocotp_read nvmem: Fix dependencies for !HAS_IOMEM archs char: genrtc: replace blacklist with whitelist drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular drivers: char: mem: fix IS_ERROR_VALUE usage char: xillybus: Fix internal data structure initialization pch_phub: return -ENODATA if ROM can't be mapped Drivers: hv: vmbus: Support kexec on ws2012 r2 and above Drivers: hv: vmbus: Support handling messages on multiple CPUs Drivers: hv: utils: Remove util transport handler from list if registration fails Drivers: hv: util: Pass the channel information during the init call Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload() Drivers: hv: vmbus: remove code duplication in message handling Drivers: hv: vmbus: avoid wait_for_completion() on crash Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages misc: at24: replace memory_accessor with nvmem_device_read eeprom: 93xx46: extend driver to plug into the NVMEM framework eeprom: at25: extend driver to plug into the NVMEM framework ...	2016-03-17 13:47:50 -07:00
Alex Ng	7268644734	Drivers: hv: vmbus: Support kexec on ws2012 r2 and above WS2012 R2 and above hosts can support kexec in that thay can support reconnecting to the host (as would be needed in the kexec path) on any CPU. Enable this. Pre ws2012 r2 hosts don't have this ability and consequently cannot support kexec. Signed-off-by: Alex Ng <alexng@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-03-01 16:57:20 -08:00

1 2 3 4 5 ...

410 Commits