commit 36c5bdc438 upstream.
------------------------------------------------------------------------
That flag is set unconditionally in sd_init(), and no one checks for it
anymore. Remove it.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200415210512.805-5-valentin.schneider@arm.com
Signed-off-by: Xue Sinian <tangyuan911@yeah.net>
commit e669ac8ab9 upstream.
------------------------------------------------------------------------
The SD_LOAD_BALANCE flag is set unconditionally for all domains in
sd_init(). By making the sched_domain->flags syctl interface read-only, we
have removed the last piece of code that could clear that flag - as such,
it will now be always present. Rather than to keep carrying it along, we
can work towards getting rid of it entirely.
cpusets don't need it because they can make CPUs be attached to the NULL
domain (e.g. cpuset with sched_load_balance=0), or to a partitioned
root_domain, i.e. a sched_domain hierarchy that doesn't span the entire
system (e.g. root cpuset with sched_load_balance=0 and sibling cpusets with
sched_load_balance=1).
isolcpus apply the same "trick": isolated CPUs are explicitly taken out of
the sched_domain rebuild (using housekeeping_cpumask()), so they get the
NULL domain treatment as well.
Remove the checks against SD_LOAD_BALANCE.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200415210512.805-4-valentin.schneider@arm.com
Signed-off-by: Xue Sinian <tangyuan911@yeah.net>
commit 0252aa08aafb4a40ea2d821f58e88e99a644b097 openeuler.
When I compile the kernel with CONFIG_PPC_WATCHDOG is disabled on
PowerPC, I got the following compile error:
In file included from kernel/hung_task.c:11:0:
./include/linux/nmi.h: In function ‘touch_nmi_watchdog’:
./include/linux/nmi.h:143:2: error: implicit declaration of function ‘arch_touch_nmi_watchdog’; did you mean ‘touch_nmi_watchdog’? [-Werror=implicit-function-declaration]
arch_touch_nmi_watchdog();
^~~~~~~~~~~~~~~~~~~~~~~
touch_nmi_watchdog
It is because CONFIG_HARDLOCKUP_DETECTOR_PERF is still enabled in my
situation. Fix it by excluding arch_touch_nmi_watchdog() only when
CONFIG_PPC_WATCHDOG is disabled.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 0fa83fd0f8f7267be1e31c824cedb9d112504785 openeuler.
Firmware may not trigger SDEI event as required frequency. SDEI event
may be triggered too soon, which cause false hardlockup in kernel. Check
the time stamp in sdei_watchdog_callbak and skip the hardlockup check if
it is invoked too soon.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit bdda54cc39843589ee91a0176ca9a94adf307763 openeuler.
Functions called in sdei_handler are not allowed to be kprobed, so
marked them as NOKPROBE_SYMBOL. There are so many functions in
'watchdog_check_timestamp()'. Luckily, we don't need
'CONFIG_HARDLOCKUP_CHECK_TIMESTAMP' now. So just make
CONFIG_SDEI_WATCHDOG depends on !CONFIG_HARDLOCKUP_CHECK_TIMESTAMP
in case someone add 'CONFIG_HARDLOCKUP_CHECK_TIMESTAMP' in the future.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 13ddc12768ca98d36ec03bfa21a30b3ebc91673d openeuler.
The period of the secure timer is set to 3s by BIOS. That means the
secure timer interrupt will trigger every 3 seconds. To further decrease
the NMI watchdog's effect on performance, this patch set the period of
the secure timer base on 'watchdog_thresh'. This variable is initiallized
to 10s. We can also set the period at runtime by modifying
'/proc/sys/kernel/watchdog_thresh'
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 75ac7be96da43f12bad247de69137500e02fd37f openeuler.
When we panic in hardlockup, the secure timer interrupt remains activate
because firmware clear eoi after dispatch is completed. This will cause
arm_arch_timer interrupt failed to trigger in the second kernel.
This patch add a new SMC helper to clear eoi of a certain interrupt and
clear eoi of the secure timer before booting the second kernel.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 5bc048a102ef9c3748464cacce443a0f1d9bed5b openeuler.
The trigger period of secure time is set by firmware. We need to check
the time_stamp every time the secure time fires to make sure the
hardlockup detection is not executed too soon. We need to refresh
'last_timestamp' to the current time when we enable the nmi_watchdog.
Otherwise, false hardlockup may be detected when the secure timer fires
the first time.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit cc19c0b385e3bd423e20465b06eb232678ce5c16 openeuler.
Add nmi_watchdog support for arm64 based on SDEI.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit bef7d8e1432400f3d78339ac269167e09c15dabd openeuler.
We call 'sdei_init' as 'subsys_initcall_sync'. lockup detector need to
be initialised after sdei_init. The influence of this patch is that we
can not detect the hard lockup in init_calls.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit cfaccce945988392d70ad42924e76f330c25ab9a openeuler.
NMI Watchdog need to enable the event for each core individually. But the
existing public api 'sdei_event_enable' enable events for all cores when
the event type is private.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 860744b94a10a159562fc491fd7f3ea1388965c1 openeuler.
This patch add a interrupt binding api function which returns the binded
event number.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 4ffed7d5435d12be6762e6fdef92fd2c67fc27df openeuler.
In current code, the hardlockup detect code is contained by
CONFIG_HARDLOCKUP_DETECTOR_PERF. This patch makes this code public so
that other arch hardlockup detector can use it.
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit d8f6267f7c upstream.
Add required PMU interrupt operations for NMIs. Request interrupt lines as
NMIs when possible, otherwise fall back to normal interrupts.
NMIs are only supported on the arm64 architecture with a GICv3 irqchip.
[Alexandru E.: Added that NMIs only work on arm64 + GICv3, print message
when PMU is using NMIs]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-8-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit f76b130bdb upstream.
Currently the PMU interrupt can either be a normal irq or a percpu irq.
Supporting NMI will introduce two cases for each existing one. It becomes
a mess of 'if's when managing the interrupt.
Define sets of callbacks for operations commonly done on the interrupt. The
appropriate set of callbacks is selected at interrupt request time and
simplifies interrupt enabling/disabling and freeing.
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-7-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 95e92e45a4 upstream.
kvm_vcpu_kick() is not NMI safe. When the overflow handler is called from
NMI context, defer waking the vcpu to an irq_work queue.
A vcpu can be freed while it's not running by kvm_destroy_vm(). Prevent
running the irq_work for a non-existent vcpu by calling irq_work_sync() on
the PMU destroy path.
[Alexandru E.: Added irq_work_sync()]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Suzuki K Pouloze <suzuki.poulose@arm.com>
Cc: kvm@vger.kernel.org
Cc: kvmarm@lists.cs.columbia.edu
Link: https://lore.kernel.org/r/20200924110706.254996-6-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 05ab728133 upstream.
When handling events, armv8pmu_handle_irq() calls perf_event_overflow(),
and subsequently calls irq_work_run() to handle any work queued by
perf_event_overflow(). As perf_event_overflow() raises IPI_IRQ_WORK when
queuing the work, this isn't strictly necessary and the work could be
handled as part of the IPI_IRQ_WORK handler.
In the common case the IPI handler will run immediately after the PMU IRQ
handler, and where the PE is heavily loaded with interrupts other handlers
may run first, widening the window where some counters are disabled.
In practice this window is unlikely to be a significant issue, and removing
the call to irq_work_run() would make the PMU IRQ handler NMI safe in
addition to making it simpler, so let's do that.
[Alexandru E.: Reworded commit message]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-5-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 2a0e2a02e4 upstream.
The PMU is disabled and enabled, and the counters are programmed from
contexts where interrupts or preemption is disabled.
The functions to toggle the PMU and to program the PMU counters access the
registers directly and don't access data modified by the interrupt handler.
That, and the fact that they're always called from non-preemptible
contexts, means that we don't need to disable interrupts or use a spinlock.
[Alexandru E.: Explained why locking is not needed, removed WARN_ONs]
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-4-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 0fdf1bb759 upstream.
Currently we access the counter registers and their respective type
registers indirectly. This requires us to write to PMSELR, issue an ISB,
then access the relevant PMXEV* registers.
This is unfortunate, because:
* Under virtualization, accessing one register requires two traps to
the hypervisor, even though we could access the register directly with
a single trap.
* We have to issue an ISB which we could otherwise avoid the cost of.
* When we use NMIs, the NMI handler will have to save/restore the select
register in case the code it preempted was attempting to access a
counter or its type register.
We can avoid these issues by directly accessing the relevant registers.
This patch adds helpers to do so.
In armv8pmu_enable_event() we still need the ISB to prevent the PE from
reordering the write to PMINTENSET_EL1 register. If the interrupt is
enabled before we disable the counter and the new event is configured,
we might get an interrupt triggered by the previously programmed event
overflowing, but which we wrongly attribute to the event that we are
enabling. Execute an ISB after we disable the counter.
In the process, remove the comment that refers to the ARMv7 PMU.
[Julien T.: Don't inline read/write functions to avoid big code-size
increase, remove unused read_pmevtypern function,
fix counter index issue.]
[Alexandru E.: Removed comment, removed trailing semicolons in macros,
added ISB]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-3-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 490d7b7c08 upstream.
Writes to the PMXEVTYPER_EL0 register are not self-synchronising. In
armv8pmu_enable_event(), the PE can reorder configuring the event type
after we have enabled the counter and the interrupt. This can lead to an
interrupt being asserted because of the previous event type that we were
counting using the same counter, not the one that we've just configured.
The same rationale applies to writes to the PMINTENSET_EL1 register. The PE
can reorder enabling the interrupt at any point in the future after we have
enabled the event.
Prevent both situations from happening by adding an ISB just before we
enable the event counter.
Fixes: 030896885a ("arm64: Performance counters support")
Reported-by: Julien Thierry <julien.thierry@arm.com>
Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
Tested-by: Sumit Garg <sumit.garg@linaro.org> (Developerbox)
Cc: Julien Thierry <julien.thierry.kdev@gmail.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Link: https://lore.kernel.org/r/20200924110706.254996-2-alexandru.elisei@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: huwentao <huwentao19@h-partners.com>
commit 7c73ddb7589fb8ddb1136b6306dfb72089c81511 upstream.
jbd2_transaction_committed() is used to check whether a transaction with
the given tid has already committed, it holds j_state_lock in read mode
and check the tid of current running transaction and committing
transaction, but holding the j_state_lock is expensive.
We have already stored the sequence number of the most recently
committed transaction in journal t->j_commit_sequence, we could do this
check by comparing it with the given tid instead. If the given tid isn't
smaller than j_commit_sequence, we can ensure that the given transaction
has been committed. That way we could drop the expensive lock and
achieve about 10% ~ 20% performance gains in concurrent DIOs on may
virtual machine with 100G ramdisk.
fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \
-bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \
-group_reporting
Before:
overwrite IOPS=88.2k, BW=344MiB/s
read IOPS=95.7k, BW=374MiB/s
rand overwrite IOPS=98.7k, BW=386MiB/s
randread IOPS=102k, BW=397MiB/s
After:
overwrite IOPS=105k, BW=410MiB/s
read IOPS=112k, BW=436MiB/s
rand overwrite IOPS=104k, BW=404MiB/s
randread IOPS=111k, BW=432MiB/s
CC: Dave Chinner <david@fromorbit.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Link: https://lore.kernel.org/linux-ext4/ZjILCPNZRHeazSqV@dread.disaster.area/
Signed-off-by: huwentao <huwentao19@h-partners.com>
\#MC is a NMI-like exception. But do not do any setup that NMIs need.
This will lead to console_owner_lock issue and HPET dead loop issue.
For example, The HPET dead loop issue:
CPU x CPU x
---- ----
read_hpet()
arch_spin_trylock(&hpet.lock)
[CPU x got the hpet.lock] #MCE happened
do_machine_check()
mce_panic()
panic()
kmsg_dump()
pstore_dump()
pstore_record_init()
ktime_get_real_fast_ns()
read_hpet()
[dead loops]
This may lead to read_hpet dead loops.
The console_owner_lock issue is similar.
To avoid these issues, add NMIs setup When Handling #MC Exceptions.
Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
OverCurrent condition is not standardized in the UHCI spec.
Zhaoxin UHCI controllers report OverCurrent bit active off.
In order to handle OverCurrent condition correctly, the uhci-hcd
driver needs to be told to expect the active-off behavior.
Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Cc: stable@vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Link: https://lore.kernel.org/r/20230423105952.4526-1-WeitaoWang-oc@zhaoxin.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
zhaoxin inclusion
category: feature
-------------------
Add the new PCI ID 0x1d17 0x9141/0x9142/0x9144 Zhaoxin NB HDAC
support. And add some special initialization for Zhaoxin NB HDAC.
Signed-off-by: LeoLiu-oc <leoliu-oc@zhaoxin.com>
When CONFIG="generic-release", %{rpm_name} is kernel, when
CONFIG="generic-debug", %{rpm_name} is kernel-debug.
Provides: kernel-debug-core in kernel-debug-core rpm
Provides: kernel-debug-modules in kernel-debug-modules rpm
Provides: kernel-debug-devel in kernel-debug-devel rpm
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
Reviewed-by: Yongliang Gao <leonylgao@tencent.com>
We intend to archive kernle-debug rpm in yum. Release kernel will
build perf/tools/bpf-tools rpm, to avoid kernle-debug build the same
rpm, disable them.
Signed-off-by: Jianping Liu <frankjpliu@tencent.com>
zhaoxin inclusion
category: feature
version: v3.0.6
-------------------
Detect the extended topology information of Zhaoxin CPUs if available.
Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
mainline inclusion
from mainline-v6.4-rc5
commit <d5e234ff08a45a7a08a52173ed793b3c125ab88d>
-------------------
Add U1/U2 feature support of xHCI for ZHAOXIN.
Since both INTEL and ZHAOXIN need to check the tier where the device is
located to determine whether to enabled U1/U2, remove the previous INTEL
U1/U2 tier policy and add common policy in xhci_check_tier_policy.
If vendor has specific U1/U2 enable policy,quirks can be add to declare.
Suggested-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Message-ID: <20230602144009.1225632-12-mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>
mainline inclusion
from mainline-v6.4-rc5
commit <d9b0328d0b8b8298dfdc97cd8e0e2371d4bcc97b>
-------------------
Some ZHAOXIN xHCI controllers follow usb3.1 spec, but only support
gen1 speed 5Gbps. While in Linux kernel, if xHCI suspport usb3.1,
root hub speed will show on 10Gbps.
To fix this issue of ZHAOXIN xHCI platforms, read usb speed ID
supported by xHCI to determine root hub speed. And add a quirk
XHCI_ZHAOXIN_HOST for this issue.
[fix warning about uninitialized symbol -Mathias]
Suggested-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Weitao Wang <WeitaoWang-oc@zhaoxin.com>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Message-ID: <20230602144009.1225632-11-mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: leoliu-oc <leoliu-oc@zhaoxin.com>