Skip to content

[6.17]NVIDIA: SAUCE: r8127: fix NAPI warning on module removal#321

Open
nirmoy wants to merge 318 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
nirmoy:r8127_napi
Open

[6.17]NVIDIA: SAUCE: r8127: fix NAPI warning on module removal#321
nirmoy wants to merge 318 commits intoNVIDIA:24.04_linux-nvidia-6.17-nextfrom
nirmoy:r8127_napi

Conversation

@nirmoy
Copy link
Collaborator

@nirmoy nirmoy commented Feb 13, 2026

When the r8127 module is unloaded, __netif_napi_del_locked() can trigger a WARN because NAPI is removed while still enabled. unregister_netdev() calls ndo_stop, which disables NAPI; deleting NAPI before that runs violates the netdev/NAPI teardown order.

Move rtl8127_del_napi() to after unregister_netdev() so NAPI is disabled in ndo_stop before it is removed.

Aligns with the upstream r8169 fix in commit 12b1bc7 ("r8169: improve rtl_remove_one").

https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.17/+bug/2141780

James Morse and others added 30 commits January 15, 2026 16:33
…lper

BugLink: https://bugs.launchpad.net/bugs/2122432

The PPTT has three functions that loop over the table looking at each
entry.
This adds a fair amount of visual distraction which isn't relevant
to what each of these functions do.
Add a for_each_acpi_pptt_entry() helper to do this work making the
users easier on the eye.

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit f549ad6ffcb4b39411f0fd7674de365fe7a0d8f8 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

The bulk of the MPAM driver lives outside the arch code because it
largely manages MMIO devices that generate interrupts. The driver
needs a Kconfig symbol to enable it. As MPAM is only found on arm64
platforms, the arm64 tree is the most natural home for the Kconfig
option.
This Kconfig option will later be used by the arch code to enable
or disable the MPAM context-switch code, and to register properties
of CPUs with the MPAM driver.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
CC: Dave Martin <dave.martin@arm.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 1a0142e7fa9f7ca1e4209b03b00d15331b402a5d https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Add code to parse the arm64 specific MPAM table, looking up the cache
level from the PPTT and feeding the end result into the MPAM driver.
This happens in two stages. Platform devices are created first for the
MSC devices. Once the driver probes it calls acpi_mpam_parse_resources()
to discover the RIS entries the MSC contains.
For now the MPAM hook mpam_ris_create() is stubbed out, but will update
the MPAM driver with optional discovered data about the RIS entries.
CC: Carl Worth <carl@os.amperecomputing.com>
Link: https://developer.arm.com/documentation/den0065/3-0bet/?lang=en
Reviewed-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 22f9e3b01379d6b820853d650a197ab76868172d https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

The binding is designed around the assumption that an MSC will be a
sub-block of something else such as a memory controller, cache controller,
or IOMMU. However, it's certainly possible a design does not have that
association or has a mixture of both, so the binding illustrates how we can
support that with RIS child nodes.
A key part of MPAM is we need to know about all of the MSCs in the system
before it can be enabled. This drives the need for the genericish
'arm,mpam-msc' compatible. Though we can't assume an MSC is accessible
until a h/w specific driver potentially enables the h/w.
Cc: James Morse <james.morse@arm.com>

Signed-off-by: Rob Herring <robh@kernel.org>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit b38bed339681b3d90ff7508f8a585127bd721d90 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…ild boiler plate

BugLink: https://bugs.launchpad.net/bugs/2122432

Probing MPAM is convoluted. MSCs that are integrated with a CPU may
only be accessible from those CPUs, and they may not be online.
Touching the hardware early is pointless as MPAM can't be used until
the system-wide common values for num_partid and num_pmg have been
discovered.
Start with driver probe/remove and mapping the MSC.
CC: Carl Worth <carl@os.amperecomputing.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit ac46acae13756a05118b806ef6061f47e51d01c5 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Parse resources from either MPAM ACPI table or device tree. The parsed
resources are stored in ris[] per msc.

The author is James. He didn't add Signed-off-by.

(backported from commit a6ab8b6c77cbc78a57015363abad5a68b2c7f18b https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
[fenghuay: Change subject and add commit message.]
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
… DT platforms

BugLink: https://bugs.launchpad.net/bugs/2122432

The device-tree binding has two examples for MSC associated with
memory controllers. Add the support to discover the component_id
from the device-tree and create 'memory' RIS.
[ morse: split out of a bigger patch, added affinity piece ]

Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit c1be40782ace54798333d5148ab8e82fc002fca5 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…irmware described ris

BugLink: https://bugs.launchpad.net/bugs/2122432

An MSC is a container of resources, each identified by their RIS index.
Some RIS are described by firmware to provide their position in the system.
Others are discovered when the driver probes the hardware.
To configure a resource it needs to be found by its class, e.g. 'L2'.
There are two kinds of grouping, a class is a set of components, which
are visible to user-space as there are likely to be multiple instances
of the L2 cache. (e.g. one per cluster or package)
Add support for creating and destroying structures to allow a hierarchy
of resources to be created.
CC: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 30ed801f46cb339d91a4524503247c3fb36bc627 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Memory Partitioning and Monitoring (MPAM) has memory mapped devices
(MSCs) with an identity/configuration page.
Add the definitions for these registers as offset within the page(s).
Link: https://developer.arm.com/documentation/ihi0099/latest/
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 5da0d5b259df91842a3df81f08e658a942913f78 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Because an MSC can only by accessed from the CPUs in its cpu-affinity
set we need to be running on one of those CPUs to probe the MSC
hardware.
Do this work in the cpuhp callback. Probing the hardware will only
happen before MPAM is enabled, walk all the MSCs and probe those we can
reach that haven't already been probed as each CPU's online call is made.
This adds the low-level MSC register accessors.
Once all MSCs reported by the firmware have been probed from a CPU in
their respective cpu-affinity set, the probe-time cpuhp callbacks are
replaced.  The replacement callbacks will ultimately need to handle
save/restore of the runtime MSC state across power transitions, but for
now there is nothing to do in them: so do nothing.
The architecture's context switch code will be enabled by a static-key,
this can be set by mpam_enable(), but must be done from process context,
not a cpuhp callback because both take the cpuhp lock.
Whenever a new MSC has been probed, the mpam_enable() work is scheduled
to test if all the MSCs have been probed. If probing fails, mpam_disable()
is scheduled to unregister the cpuhp callbacks and free memory.
CC: Lecopzer Chen <lecopzerc@nvidia.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Reviewed-by: Gavin Shan <gshan@redhat.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 8164f1cf4f4c50e2fe39e9091066176e6bcdf7f2 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…pmg values

BugLink: https://bugs.launchpad.net/bugs/2122432

CPUs can generate traffic with a range of PARTID and PMG values,
but each MSC may also have its own maximum size for these fields.
Before MPAM can be used, the driver needs to probe each RIS on
each MSC, to find the system-wide smallest value that can be used.
The limits from requestors (e.g. CPUs) also need taking into account.
While doing this, RIS entries that firmware didn't describe are created
under MPAM_CLASS_UNKNOWN.
While we're here, implement the mpam_register_requestor() call
for the arch code to register the CPU limits. Future callers of this
will tell us about the SMMU and ITS.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 517cf73f94c7101c07315c7086384b30e6889365 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…the mon_sel registers

BugLink: https://bugs.launchpad.net/bugs/2122432

The MSC MON_SEL register needs to be accessed from hardirq for the overflow
interrupt, and when taking an IPI to access these registers on platforms
where MSC are not accessible from every CPU. This makes an irqsave
spinlock the obvious lock to protect these registers. On systems with SCMI
or PCC mailboxes it must be able to sleep, meaning a mutex must be used.
The SCMI or PCC platforms can't support an overflow interrupt, and
can't access the registers from hardirq context.
Clearly these two can't exist for one MSC at the same time.
Add helpers for the MON_SEL locking. For now, use a irqsave spinlock and
only support 'real' MMIO platforms.
In the future this lock will be split in two allowing SCMI/PCC platforms
to take a mutex. Because there are contexts where the SCMI/PCC platforms
can't make an access, mpam_mon_sel_lock() needs to be able to fail. Do
this now, so that all the error handling on these paths is present. This
allows the relevant paths to fail if they are needed on a platform where
this isn't possible, instead of having to make explicit checks of the
interface type.
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 65f75f143d41bf1d00ec2064ae63b0760a41c217 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Expand the probing support with the control and monitor types
we can use with resctrl.
CC: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit f5a241b9b8cd085b492914b1ed0f9a71b55e9406 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…) into mpam_class

BugLink: https://bugs.launchpad.net/bugs/2122432

To make a decision about whether to expose an mpam class as
a resctrl resource we need to know its overall supported
features and properties.
Once we've probed all the resources, we can walk the tree
and produce overall values by merging the bitmaps. This
eliminates features that are only supported by some MSC
that make up a component or class.
If bitmap properties are mismatched within a component we
cannot support the mismatched feature.
Care has to be taken as vMSC may hold mismatched RIS.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit e06f0b201b617255c7e02f3673380b78a7096bff https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

When a CPU comes online, it may bring a newly accessible MSC with
it. Only the default partid has its value reset by hardware, and
even then the MSC might not have been reset since its config was
previously dirtied. e.g. Kexec.
Any in-use partid must have its configuration restored, or reset.
In-use partids may be held in caches and evicted later.
MSC are also reset when CPUs are taken offline to cover cases where
firmware doesn't reset the MSC over reboot using UEFI, or kexec
where there is no firmware involvement.
If the configuration for a RIS has not been touched since it was
brought online, it does not need resetting again.
To reset, write the maximum values for all discovered controls.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 8ed3d7ab1b69b1501e6445688171d0315b6a7919 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Resetting RIS entries from the cpuhp callback is easy as the
callback occurs on the correct CPU. This won't be true for any other
caller that wants to reset or configure an MSC.
Add a helper that schedules the provided function if necessary.
Callers should take the cpuhp lock to prevent the cpuhp callbacks from
changing the MSC state.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 2ac4287d617339c32c82e012b7548811994f802f https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…set any time

BugLink: https://bugs.launchpad.net/bugs/2122432

cpuhp callbacks aren't the only time the MSC configuration may need to
be reset. Resctrl has an API call to reset a class.
If an MPAM error interrupt arrives it indicates the driver has
misprogrammed an MSC. The safest thing to do is reset all the MSCs
and disable MPAM.
Add a helper to reset RIS via their class. Call this from mpam_disable(),
which can be scheduled from the error interrupt handler.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 888b77167d8bd718e2733865285c3b53a0d4af56 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Register and enable error IRQs. All the MPAM error interrupts indicate a
software bug, e.g. out of range partid. If the error interrupt is ever
signalled, attempt to disable MPAM.
Only the irq handler accesses the MPAMF_ESR register, so no locking is
needed. The work to disable MPAM after an error needs to happen at process
context as it takes mutex. It also unregisters the interrupts, meaning
it can't be done from the threaded part of a threaded interrupt.
Instead, mpam_disable() gets scheduled.
Enabling the IRQs in the MSC may involve cross calling to a CPU that
can access the MSC.
Once the IRQ is requested, the mpam_disable() path can be called
asynchronously, which will walk structures sized by max_partid. Ensure
this size is fixed before the interrupt is requested.
CC: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Rohit Mathew <rohit.mathew@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 9cc38597ebd5423a1e85c773aca85c099a4c47e0 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…abled

BugLink: https://bugs.launchpad.net/bugs/2122432

Once all the MSC have been probed, the system wide usable number of
PARTID is known and the configuration arrays can be allocated.
After this point, checking all the MSC have been probed is pointless,
and the cpuhp callbacks should restore the configuration, instead of
just resetting the MSC.
Add a static key to enable this behaviour. This will also allow MPAM
to be disabled in response to an error, and the architecture code to
enable/disable the context switch of the MPAM system registers.
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 34e36d87fe274396f66f0b73f9846e2036e1a4d8 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…ed during cpu online

BugLink: https://bugs.launchpad.net/bugs/2122432

When CPUs come online the MSC's original configuration should be restored.
Add struct mpam_config to hold the configuration. This has a bitmap of
features that were modified. Once the maximum partid is known, allocate
a configuration array for each component, and reprogram each RIS
configuration from this.
CC: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 6af806f712f02b54c3e617686dc288f280ffc61b https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

MPAM supports more features than are going to be exposed to resctrl.
For partid other than 0, the reset values of these controls isn't
known.
Discover the rest of the features so they can be reset to avoid any
side effects when resctrl is in use.
PARTID narrowing allows MSC/RIS to support less configuration space than
is usable. If this feature is found on a class of device we are likely
to use, then reduce the partid_max to make it usable. This allows us
to map a PARTID to itself.
CC: Rohit Mathew <Rohit.Mathew@arm.com>
CC: Zeng Heng <zengheng4@huawei.com>
CC: Dave Martin <Dave.Martin@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 342bfa69997131bebce86f273bc0b12014ffc519 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

MPAM's MSC support a number of monitors, each of which supports
bandwidth counters, or cache-storage-utilisation counters. To use
a counter, a monitor needs to be configured. Add helpers to allocate
and free CSU or MBWU monitors.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 7d8120abd471b76d6c6523b7f9807381e8df18d7 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Reading a monitor involves configuring what you want to monitor, and
reading the value. Components made up of multiple MSC may need values
from each MSC. MSCs may take time to configure, returning 'not ready'.
The maximum 'not ready' time should have been provided by firmware.
Add mpam_msmon_read() to hide all this. If (one of) the MSC returns
not ready, then wait the full timeout value before trying again.
CC: Shanker Donthineni <sdonthineni@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit c6bc912118f3aeceb03408d72f8fe66f643d9464 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

(cherry picked from commit 107021b4f39c086b64a0d7e7142cea0c6c6bdf7a https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
…nd power management

BugLink: https://bugs.launchpad.net/bugs/2122432

Bandwidth counters need to run continuously to correctly reflect the
bandwidth.
The value read may be lower than the previous value read in the case
of overflow and when the hardware is reset due to CPU hotplug.
Add struct mbwu_state to track the bandwidth counter to allow overflow
and power management to be handled.
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit d2737c47cbdafc47494316a58c43c2f0c5d5bf5b https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

mpam v0.1 and versions above v1.0 support optional long counter for
memory bandwidth monitoring. The MPAMF_MBWUMON_IDR register has fields
indicating support for long counters.
Probe these feature bits.
The mpam_feat_msmon_mbwu feature is used to indicate that bandwidth
monitors are supported, instead of muddling this with which size of
bandwidth monitors, add an explicit 31 bit counter feature.
[ morse: Added 31bit counter feature to simplify later logic ]
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit f882fb5d0f37e8a8d13bd6fac173ee1fb6f9d7e5 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

Now that the larger counter sizes are probed, make use of them.
Callers of mpam_msmon_read() may not know (or care!) about the different
counter sizes. Allow them to specify mpam_feat_msmon_mbwu and have the
driver pick the counter to use.
Only 32bit accesses to the MSC are required to be supported by the
spec, but these registers are 64bits. The lower half may overflow
into the higher half between two 32bit reads. To avoid this, use
a helper that reads the top half multiple times to check for overflow.
[morse: merged multiple patches from Rohit, added explicit counter selection ]
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: Rohit Mathew <rohit.mathew@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 9afed066bb775f9a7c5bac8d4f382b261bc20ca9 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

resctrl expects to reset the bandwidth counters when the filesystem
is mounted.
To allow this, add a helper that clears the saved mbwu state. Instead
of cross calling to each CPU that can access the component MSC to
write to the counter, set a flag that causes it to be zero'd on the
the next read. This is easily done by forcing a configuration update.
Reviewed-by: Fenghua Yu <fenghuay@nvdia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 6731e078c20836769f8b4bd9a2ebcac440586039 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

The bitmap reset code has been a source of bugs. Add a unit test.
This currently has to be built in, as the rest of the driver is
builtin.
Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit 8cc7f34b3bc33ca6ccdc1a0e827c3ac34b590fe2 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2122432

When features are mismatched between MSC the way features are combined
to the class determines whether resctrl can support this SoC.
Add some tests to illustrate the sort of thing that is expected to
work, and those that must be removed.
Reviewed-by: Ben Horgan <ben.horgan@arm.com>
Reviewed-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>

Signed-off-by: James Morse <james.morse@arm.com>
(cherry picked from commit c31ec1bb514c90ed5610ed67de44257a8e9a5748 https://git.kernel.org/pub/scm/linux/kernel/git/morse/linux.git)
Signed-off-by: Fenghua Yu <fenghuay@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Shameer Kolothum and others added 18 commits February 5, 2026 09:23
BugLink: https://bugs.launchpad.net/bugs/2140343

The function hugetlb_reserve_pages() returns the number of pages added
to the reservation map on success and a negative error code on failure
(e.g. -EINVAL, -ENOMEM). However, in some error paths, it may return -1
directly.

For example, a failure at:

    if (hugetlb_acct_memory(h, gbl_reserve) < 0)
        goto out_put_pages;

results in returning -1 (since add = -1), which may be misinterpreted
in userspace as -EPERM.

Fix this by explicitly capturing and propagating the return values from
helper functions, and using -EINVAL for all other failure cases.

Link: https://lkml.kernel.org/r/20251125171350.86441-1-skolothumtho@nvidia.com
Fixes: 986f5f2 ("mm/hugetlb: make hugetlb_reserve_pages() return nr of entries updated")
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
Reviewed-by: Joshua Hahn <joshua.hahnjy@gmail.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Cc: Matthew R. Ochs <mochs@nvidia.com>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Nicolin Chen <nicolinc@nvidia.com>
Cc: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
(backported from commit 9ee5d17)
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2140343

The Enable bits in CMDQV/VINTF/VCMDQ_CONFIG registers do not actually reset
the HW registers. So, the driver explicitly clears all the registers when a
VINTF or VCMDQ is being initialized calling its hw_deinit() function.

However, a userspace VCMDQ is not properly reset, unlike an in-kernel VCMDQ
getting reset in tegra241_vcmdq_hw_init().

Meanwhile, tegra241_vintf_hw_init() calling tegra241_vintf_hw_deinit() will
not deinit any VCMDQ, since there is no userspace VCMDQ mapped to the VINTF
at that stage.

Then, this may result in dirty VCMDQ registers, which can fail the VM.

Like tegra241_vcmdq_hw_init(), reset a VCMDQ in tegra241_vcmdq_hw_init() to
fix this bug. This is required by a host kernel.

Fixes: 6717f26ab1e7 ("iommu/tegra241-cmdqv: Add user-space use support")
Cc: stable@vger.kernel.org
Reported-by: Bao Nguyen <ncqb@google.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
(backported from commit 80f1a2c)
Signed-off-by: Nathan Chen <nathanc@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
… transfer

BugLink: https://bugs.launchpad.net/bugs/2139640

When the ISR thread wakes up late and finds that the timeout handler
has already processed the transfer (curr_xfer is NULL), return
IRQ_HANDLED instead of IRQ_NONE.

Use a similar approach to tegra_qspi_handle_timeout() by reading
QSPI_TRANS_STATUS and checking the QSPI_RDY bit to determine if the
hardware actually completed the transfer. If QSPI_RDY is set, the
interrupt was legitimate and triggered by real hardware activity.
The fact that the timeout path handled it first doesn't make it
spurious. Returning IRQ_NONE incorrectly suggests the interrupt
wasn't for this device, which can cause issues with shared interrupt
lines and interrupt accounting.

Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-1-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit aabd8ea linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640

Move the assignment of the transfer pointer from curr_xfer inside the
spinlock critical section in both handle_cpu_based_xfer() and
handle_dma_based_xfer().

Previously, curr_xfer was read before acquiring the lock, creating a
window where the timeout path could clear curr_xfer between reading it
and using it. By moving the read inside the lock, the handlers are
guaranteed to see a consistent value that cannot be modified by the
timeout path.

Fixes: 921fc18 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller")
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Thierry Reding <treding@nvidia.com>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-2-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit ef13ba3 linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…transfer_one

BugLink: https://bugs.launchpad.net/bugs/2139640

When the timeout handler processes a completed transfer and signals
completion, the transfer thread can immediately set up the next transfer
and assign curr_xfer to point to it.

If a delayed ISR from the previous transfer then runs, it checks if
(!tqspi->curr_xfer) (currently without the lock also -- to be fixed
soon) to detect stale interrupts, but this check passes because
curr_xfer now points to the new transfer. The ISR then incorrectly
processes the new transfer's context.

Protect the curr_xfer assignment with the spinlock to ensure the ISR
either sees NULL (and bails out) or sees the new value only after the
assignment is complete.

Fixes: 921fc18 ("spi: tegra210-quad: Add support for Tegra210 QSPI controller")
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-3-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit f5a4d7f linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640

The curr_xfer field is read by the IRQ handler without holding the lock
to check if a transfer is in progress. When clearing curr_xfer in the
combined sequence transfer loop, protect it with the spinlock to prevent
a race with the interrupt handler.

Protect the curr_xfer clearing at the exit path of
tegra_qspi_combined_seq_xfer() with the spinlock to prevent a race
with the interrupt handler that reads this field.

Without this protection, the IRQ handler could read a partially updated
curr_xfer value, leading to NULL pointer dereference or use-after-free.

Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-4-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit bf4528a linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…ined_seq_xfer

BugLink: https://bugs.launchpad.net/bugs/2139640

Protect the curr_xfer clearing in tegra_qspi_non_combined_seq_xfer()
with the spinlock to prevent a race with the interrupt handler that
reads this field to check if a transfer is in progress.

Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-5-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 6d7723e linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139640

Now that all other accesses to curr_xfer are done under the lock,
protect the curr_xfer NULL check in tegra_qspi_isr_thread() with the
spinlock. Without this protection, the following race can occur:

  CPU0 (ISR thread)              CPU1 (timeout path)
  ----------------               -------------------
  if (!tqspi->curr_xfer)
    // sees non-NULL
                                 spin_lock()
                                 tqspi->curr_xfer = NULL
                                 spin_unlock()
  handle_*_xfer()
    spin_lock()
    t = tqspi->curr_xfer  // NULL!
    ... t->len ...        // NULL dereference!

With this patch, all curr_xfer accesses are now properly synchronized.

Although all accesses to curr_xfer are done under the lock, in
tegra_qspi_isr_thread() it checks for NULL, releases the lock and
reacquires it later in handle_cpu_based_xfer()/handle_dma_based_xfer().
There is a potential for an update in between, which could cause a NULL
pointer dereference.

To handle this, add a NULL check inside the handlers after acquiring
the lock. This ensures that if the timeout path has already cleared
curr_xfer, the handler will safely return without dereferencing the
NULL pointer.

Fixes: b4e002d ("spi: tegra210-quad: Fix timeout handling")
Signed-off-by: Breno Leitao <leitao@debian.org>
Tested-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Jon Hunter <jonathanh@nvidia.com>
Acked-by: Thierry Reding <treding@nvidia.com>
Link: https://patch.msgid.link/20260126-tegra_xfer-v2-6-6d2115e4f387@debian.org
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit edf9088 linux-next)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2139648

Currently cpu-clock event always returns 0 count, e.g.,

perf stat -e cpu-clock -- sleep 1

 Performance counter stats for 'sleep 1':
                 0      cpu-clock                        #    0.000 CPUs utilized
       1.002308394 seconds time elapsed

The root cause is the commit 'bc4394e5e79c ("perf: Fix the throttle
 error of some clock events")' adds PERF_EF_UPDATE flag check before
calling cpu_clock_event_update() to update the count, however the
PERF_EF_UPDATE flag is never set when the cpu-clock event is stopped in
counting mode (pmu->dev() -> cpu_clock_event_del() ->
cpu_clock_event_stop()). This leads to the cpu-clock event count is
never updated.

To fix this issue, force to set PERF_EF_UPDATE flag for cpu-clock event
just like what task-clock does.

Fixes: bc4394e ("perf: Fix the throttle error of some clock events")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://patch.msgid.link/20251112080526.3971392-1-dapeng1.mi@linux.intel.com
(cherry picked from commit f1f9651)
Signed-off-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2093957

Signed-off-by: Jeremy Szu <jszu@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138892

Remove this declaration which is now used within the file
after merging upstream "vfio/nvgrace-gpu: register device memory for
poison handling".

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2136828

Add PCI_VENDOR_ID_ASPEED to the shared pci_ids.h header and remove the
duplicate local definition from ehci-pci.c.

This prepares for adding a PCI quirk for ASPEED devices.

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
(backported from https://lore.kernel.org/linux-iommu/20251217154529.377586-1-nirmoyd@nvidia.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2136828

ASPEED BMC controllers have VGA and USB functions behind a PCIe-to-PCI
bridge that causes them to share the same stream ID:

  [e0]---00.0-[e1-e2]----00.0-[e2]--+-00.0  ASPEED Graphics Family
                                    \-02.0  ASPEED USB Controller

Both devices get stream ID 0x5e200 due to bridge aliasing, causing the
USB controller to be rejected with 'Aliasing StreamID unsupported'.

Per ASPEED, the AST1150 doesn't use a real PCI bus and always forwards
the original requester ID from downstream devices rather than replacing
it with any alias.

Add a new PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES flag and apply it to the
AST1150.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
(backported from https://lore.kernel.org/linux-iommu/20251217154529.377586-2-nirmoyd@nvidia.com/)
[nirmoy: set PCI_DEV_FLAGS_PCI_BRIDGE_NO_ALIASES to (1 << 15) instead of (1 << 14)]
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Acked-by: Abdur Rahman <abdur.rahman@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
…ivation (LFA)

BugLink: https://bugs.launchpad.net/bugs/2138342

The Arm Live Firmware Activation (LFA) is a specification [1] to describe
activating firmware components without a reboot. Those components
(like TF-A's BL31, EDK-II, TF-RMM, secure paylods) would be updated the
usual way: via fwupd, FF-A or other secure storage methods, or via some
IMPDEF Out-Of-Bound method. The user can then activate this new firmware,
at system runtime, without requiring a reboot.
The specification covers the SMCCC interface to list and query available
components and eventually trigger the activation.

Add a new directory under /sys/firmware to present firmware components
capable of live activation. Each of them is a directory under lfa/,
and is identified via its GUID. The activation will be triggered by echoing
"1" into the "activate" file:
==========================================
/sys/firmware/lfa # ls -l . 6c*
.:
total 0
drwxr-xr-x    2 0 0         0 Jan 19 11:33 47d4086d-4cfe-9846-9b95-2950cbbd5a00
drwxr-xr-x    2 0 0         0 Jan 19 11:33 6c0762a6-12f2-4b56-92cb-ba8f633606d9
drwxr-xr-x    2 0 0         0 Jan 19 11:33 d6d0eea7-fcea-d54b-9782-9934f234b6e4

6c0762a6-12f2-4b56-92cb-ba8f633606d9:
total 0
--w-------    1 0        0             4096 Jan 19 11:33 activate
-r--r--r--    1 0        0             4096 Jan 19 11:33 activation_capable
-r--r--r--    1 0        0             4096 Jan 19 11:33 activation_pending
--w-------    1 0        0             4096 Jan 19 11:33 cancel
-r--r--r--    1 0        0             4096 Jan 19 11:33 cpu_rendezvous
-r--r--r--    1 0        0             4096 Jan 19 11:33 current_version
-rw-r--r--    1 0        0             4096 Jan 19 11:33 force_cpu_rendezvous
-r--r--r--    1 0        0             4096 Jan 19 11:33 may_reset_cpu
-r--r--r--    1 0        0             4096 Jan 19 11:33 name
-r--r--r--    1 0        0             4096 Jan 19 11:33 pending_version
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # grep . *
grep: activate: Permission denied
activation_capable:1
activation_pending:1
grep: cancel: Permission denied
cpu_rendezvous:1
current_version:0.0
force_cpu_rendezvous:1
may_reset_cpu:0
name:TF-RMM
pending_version:0.0
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # echo 1 > activate
[ 2825.797871] Arm LFA: firmware activation succeeded.
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 #
==========================================

[1] https://developer.arm.com/documentation/den0147/latest/

Signed-off-by: Salman Nabi <salman.nabi@arm.com>
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
(backported from https://lore.kernel.org/all/20260119122729.287522-2-salman.nabi@arm.com/)
[nirmoyd: Added image_name fallback to fw_uuid in update_fw_image_node()]
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138342

Enhance PRIME/ACTIVATION functions to touch watchdog and implement
timeout mechanism. This update ensures that any potential hangs are
detected promptly and that the LFA process is allocated sufficient
execution time before the watchdog timer expires. These changes improve
overall system reliability by reducing the risk of undetected process
stalls and unexpected watchdog resets.

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138342

- Register the LFA driver as a platform driver corresponding to
'arml0003' ACPI device. The driver will be invoked when the device is
detected on a platform. NOTE: current functionality only available for
ACPI configuration.
- Add functionality to register ACPI notify handler for LFA in the
driver probe().
- When notify handler is invoked, driver will query latest FW component
details and trigger activation of capable and pending FW component in a
loop until all FWs are activated.

ACPI node snippet from LFA spec[1]:
Device (LFA0) {
   Name (_HID, "ARML0003")
   Name (_UID, 0)
}

[1] https://developer.arm.com/documentation/den0147/latest/

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2138342

Enable Arm Live Firmware Activation support by setting CONFIG_ARM_LFA=y.

Signed-off-by: Jamie Nguyen <jamien@nvidia.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2140997

The atlantic driver can receive packets with more than MAX_SKB_FRAGS (17)
fragments when handling large multi-descriptor packets. This causes an
out-of-bounds write in skb_add_rx_frag_netmem() leading to kernel panic.

The issue occurs because the driver doesn't check the total number of
fragments before calling skb_add_rx_frag(). When a packet requires more
than MAX_SKB_FRAGS fragments, the fragment index exceeds the array bounds.

Fix by assuming there will be an extra frag if buff->len > AQ_CFG_RX_HDR_SIZE,
then all fragments are accounted for. And reusing the existing check to
prevent the overflow earlier in the code path.

This crash occurred in production with an Aquantia AQC113 10G NIC.

Stack trace from production environment:
```
RIP: 0010:skb_add_rx_frag_netmem+0x29/0xd0
Code: 90 f3 0f 1e fa 0f 1f 44 00 00 48 89 f8 41 89
ca 48 89 d7 48 63 ce 8b 90 c0 00 00 00 48 c1 e1 04 48 01 ca 48 03 90
c8 00 00 00 <48> 89 7a 30 44 89 52 3c 44 89 42 38 40 f6 c7 01 75 74 48
89 fa 83
RSP: 0018:ffffa9bec02a8d50 EFLAGS: 00010287
RAX: ffff925b22e80a00 RBX: ffff925ad38d2700 RCX:
fffffffe0a0c8000
RDX: ffff9258ea95bac0 RSI: ffff925ae0a0c800 RDI:
0000000000037a40
RBP: 0000000000000024 R08: 0000000000000000 R09:
0000000000000021
R10: 0000000000000848 R11: 0000000000000000 R12:
ffffa9bec02a8e24
R13: ffff925ad8615570 R14: 0000000000000000 R15:
ffff925b22e80a00
FS: 0000000000000000(0000)
GS:ffff925e47880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff9258ea95baf0 CR3: 0000000166022004 CR4:
0000000000f72ef0
PKRU: 55555554
Call Trace:
<IRQ>
aq_ring_rx_clean+0x175/0xe60 [atlantic]
? aq_ring_rx_clean+0x14d/0xe60 [atlantic]
? aq_ring_tx_clean+0xdf/0x190 [atlantic]
? kmem_cache_free+0x348/0x450
? aq_vec_poll+0x81/0x1d0 [atlantic]
? __napi_poll+0x28/0x1c0
? net_rx_action+0x337/0x420
```

Fixes: 6aecbba ("net: atlantic: add check for MAX_SKB_FRAGS")
Changes in v4:
- Add Fixes: tag to satisfy patch validation requirements.

Changes in v3:
- Fix by assuming there will be an extra frag if buff->len > AQ_CFG_RX_HDR_SIZE,
  then all fragments are accounted for.

Signed-off-by: Jiefeng Zhang <jiefeng.z.zhang@gmail.com>
Link: https://patch.msgid.link/20251126032249.69358-1-jiefeng.z.zhang@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 5ffcb7b)
Signed-off-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Acked-by: Jacob Martin <jacob.martin@canonical.com>
Acked-by: Noah Wager <noah.wager@canonical.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
@nirmoy nirmoy changed the title NVIDIA: SAUCE: r8127: fix NAPI warning on module removal [6.17]NVIDIA: SAUCE: r8127: fix NAPI warning on module removal Feb 13, 2026
When the r8127 module is unloaded, __netif_napi_del_locked() can trigger
a WARN because NAPI is removed while still enabled. unregister_netdev()
calls ndo_stop, which disables NAPI; deleting NAPI before that runs
violates the netdev/NAPI teardown order.

Move rtl8127_del_napi() to after unregister_netdev() so NAPI is disabled
in ndo_stop before it is removed.

Aligns with the upstream r8169 fix in commit 12b1bc7
("r8169: improve rtl_remove_one").

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Copy link
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

Copy link
Collaborator

@nvmochs nvmochs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@nvmochs
Copy link
Collaborator

nvmochs commented Feb 13, 2026

PR sent to Canonical

@nvidia-bfigg nvidia-bfigg force-pushed the 24.04_linux-nvidia-6.17-next branch 2 times, most recently from 11f5434 to 9c65dde Compare February 15, 2026 13:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.