Skip to content

Conversation

@kubalewski
Copy link
Owner

No description provided.

kubalewski and others added 30 commits June 6, 2023 23:31
Add a protocol spec for DPLL.
Add code generated from the spec.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
DPLL framework is used to represent and configure DPLL devices
in systems. Each device that has DPLL and can configure sources
and outputs can use this framework. Netlink interface is used to
provide configuration data and to receive notification messages
about changes in the configuration or status of DPLL device.
Inputs and outputs of the DPLL device are represented as special
objects which could be dynamically added to and removed from DPLL
device.

Co-developed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Co-developed-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Co-developed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Add documentation explaining common netlink interface to configure DPLL
devices and monitoring events. Common way to implement DPLL device in
a driver is also covered.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Add firmware admin command to access clock generation unit
configuration, it is required to enable Extended PTP and SyncE features
in the driver.
Add definitions of possible hardware variations of input and output pins
related to clock generation unit and functions to access the data.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Control over clock generation unit is required for further development
of Synchronous Ethernet feature. Interface provides ability to obtain
current state of a dpll, its sources and outputs which are pins, and
allows their configuration.

Co-developed-by: Milena Olech <milena.olech@intel.com>
Signed-off-by: Milena Olech <milena.olech@intel.com>
Co-developed-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Implement basic DPLL operations in ptp_ocp driver as the
simplest example of using new subsystem.

Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
In case netdevice represents a SyncE port, the user needs to understand
the connection between netdevice and associated DPLL pin. There might me
multiple netdevices pointing to the same pin, in case of VF/SF
implementation.

Add a IFLA Netlink attribute to nest the DPLL pin handle, similar to
how is is implemented for devlink port. Add a struct dpll_pin pointer
to netdev and protect access to it by RTNL. Expose netdev_dpll_pin_set()
and netdev_dpll_pin_clear() helpers to the drivers so they can set/clear
the DPLL pin relationship to netdev.

Note that during the lifetime of struct dpll_pin the handle fields do not
change. Therefore it is save to access them lockless. It is drivers
responsibility to call netdev_dpll_pin_clear() before dpll_pin_put().

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Implement SyncE support using newly introduced DPLL support.
Make sure that each PFs/VFs/SFs probed with appropriate capability
will spawn a dpll auxiliary device and register appropriate dpll device
and pin instances.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Attributes are not used, remove them and start with a value given.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Remove unspec attributes after removing from dpll netlink spec.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Remove unspec attributes after removing from dpll netlink spec.

Fixes: 6348a82 ("ice: add admin commands to access cgu configuration")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Remove unspec attributes after removing from dpll netlink spec.

Fixes: bff85d9 ("ice: implement dpll interface to control cgu")
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Add missing frequencies to the dpll ynl spec, so their defines are
properly generated.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
- remove "no holdover available" from freerun mode description
- improve description of holdover lock-status
- fix temperature typo

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
ice_dpll_pin flags field is intended to store flags that were received
after getting pin info from firmware, stop using it on set commands,
as the flags of set commands can have different meaning and values.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Use ice_dpll_pin's 'state' field to provide state to the caller,
as well as to check if state change requested shall be proceeded.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Remove attribute values of subset attributes, they are no longer needed
as ynl lib was fixed.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Move device 'nest' to the end of the dpll netlink attributes.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
'device' nested attribute shall be used only for receiving data
from pin-get do/dump commands. Use it this way and define list
of expected list of attributes for device-get which are not nested.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Previously the docs reffered to DPLL_MODE_FORCED mode, but its name was
changed to DPLL_MODE_MANUAL, fix it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Previously temperature could be supplied with one digit float precision,
use divider value of 1000 instead of 10 and allow three digit float
precision for users.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Enum in subset attribute definition is redundant and not used for
anything in YNL spec, remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Previously it was possible to have different pin-direction returned to
the user depending on which dpll was given as argument for pin-get cmd.
If pin was registered with multiple dplls each could have reported
different direction. I.e. driver exposes chained pins, where one pin is
an input for one dpll and an output for second dpll.
Fix it by enclosing pin-direction in the 'device' nested attribute.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Previously it was possible to have different pin-direction returned to
the user depending on which dpll was given as argument for pin-get cmd.
If pin was registered with multiple dplls each could have reported
different direction. I.e. driver exposes chained pins, where one pin is
an input for one dpll and an output for second dpll.
Fix it by enclosing pin-direction in the 'device' nested attribute.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
The header is already included in linux/dpll.h. Remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Caller shall take care of checking if arguments are valid, remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Propagate error return value of netlink callback mutex lock function to
the caller, a dpll subsystem.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Replace if statements with enum values to the switch case statements.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
In case of runtime errors the dev_err macro shall be used for printing
the traces.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Add pf structure pointer to ice_dpll and ice_dpll_pin structures.
Use ice_dpll and ice_dpll_pin structure pointers as private data when
registering within dpll subsystem.
New private data does not need to perform any lookups to perform
callback requests.
Remove ice_pind_pin as it is unused.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
kubalewski added 25 commits June 6, 2023 23:31
Adapt ptp_ocp/dpll part, after change to dpll spec/core, use board_label
instead of label field of pin properities.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Adapt mlx5/dpll part, after change to dpll spec/core, previously
pin-label was required now it is not, remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Allow user to get id of a device or a pin with dpll netlink interface.
It requires to provide arguments which result in a single match,
otherwise the -EINVAL is returned.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Allow user to get id of a device or a pin with dpll netlink interface.
It requires to provide arguments which result in a single match,
otherwise the -EINVAL is returned.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Rename mutex to better reflect the intentions, the lock is to
serialize access to dpll subsytem internal data structures and prevent
concurrent access of netlink callbacks and dpll core functions.
One global lock for any access to the dpll subsystem.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Improve doxygen description of functions and their return values.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Remove unused struct fields, adapt dpll_device_register.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Adapt dpll_device_register `owner` argument was removed.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Adapt dpll_device_register `owner` argument was removed.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Adapt dpll_device_register `owner` argument was removed.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
The argument shall be u32 as in struct or in caller dpll_pin_get.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Instead of copying data from driver provided pin property use provided
pointer during pin lifetime.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Verify if pin being registered belongs to the same instance of dpll
by checking if they share clock_id and owner module.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Do not use const modifier for arguments of helper get private data
functions. Xarray macro doesn't allow to find elements which are const,
remove the modifier and a cast which is no longer needed.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
The function was removed in previous commits, remove the declaration.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Use current year, use same format for all the headers.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Instead of parsing attributes as a stream of attributes, parse them as
an nlattr array, while using dedicated functions.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
The attribute is no longer needed, remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
The attribute is no longer needed, remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
The attribute is no longer needed, remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
The attribute is no longer needed, remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
The attribute is no longer needed, remove it.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Leave only one description of dpll subsystem interface functions in
the drivers/dpll/dpll_core.c

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Previously module->name was used, but it is only available when
CONFIG_MODULE is defined. Instead use module_name() which allows
to compile dpll subsystem without CONFIG_MODULE defined.

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Fix docs of dpll core files after checking with helper script:
./scripts/kernel-doc -none $DPLL_FILES

Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
kubalewski pushed a commit that referenced this pull request Jun 19, 2023
Currently, the per cpu upcall counters are allocated after the vport is
created and inserted into the system. This could lead to the datapath
accessing the counters before they are allocated resulting in a kernel
Oops.

Here is an example:

  PID: 59693    TASK: ffff0005f4f51500  CPU: 0    COMMAND: "ovs-vswitchd"
   #0 [ffff80000a39b5b0] __switch_to at ffffb70f0629f2f4
   #1 [ffff80000a39b5d0] __schedule at ffffb70f0629f5cc
   #2 [ffff80000a39b650] preempt_schedule_common at ffffb70f0629fa60
   #3 [ffff80000a39b670] dynamic_might_resched at ffffb70f0629fb58
   #4 [ffff80000a39b680] mutex_lock_killable at ffffb70f062a1388
   #5 [ffff80000a39b6a0] pcpu_alloc at ffffb70f0594460c
   #6 [ffff80000a39b750] __alloc_percpu_gfp at ffffb70f05944e68
   #7 [ffff80000a39b760] ovs_vport_cmd_new at ffffb70ee6961b90 [openvswitch]
   ...

  PID: 58682    TASK: ffff0005b2f0bf00  CPU: 0    COMMAND: "kworker/0:3"
   #0 [ffff80000a5d2f40] machine_kexec at ffffb70f056a0758
   #1 [ffff80000a5d2f70] __crash_kexec at ffffb70f057e2994
   #2 [ffff80000a5d3100] crash_kexec at ffffb70f057e2ad8
   #3 [ffff80000a5d3120] die at ffffb70f0628234c
   #4 [ffff80000a5d31e0] die_kernel_fault at ffffb70f062828a8
   #5 [ffff80000a5d3210] __do_kernel_fault at ffffb70f056a31f4
   #6 [ffff80000a5d3240] do_bad_area at ffffb70f056a32a4
   #7 [ffff80000a5d3260] do_translation_fault at ffffb70f062a9710
   #8 [ffff80000a5d3270] do_mem_abort at ffffb70f056a2f74
   #9 [ffff80000a5d32a0] el1_abort at ffffb70f06297dac
  #10 [ffff80000a5d32d0] el1h_64_sync_handler at ffffb70f06299b24
  #11 [ffff80000a5d3410] el1h_64_sync at ffffb70f056812dc
  #12 [ffff80000a5d3430] ovs_dp_upcall at ffffb70ee6963c84 [openvswitch]
  #13 [ffff80000a5d3470] ovs_dp_process_packet at ffffb70ee6963fdc [openvswitch]
  #14 [ffff80000a5d34f0] ovs_vport_receive at ffffb70ee6972c78 [openvswitch]
  #15 [ffff80000a5d36f0] netdev_port_receive at ffffb70ee6973948 [openvswitch]
  #16 [ffff80000a5d3720] netdev_frame_hook at ffffb70ee6973a28 [openvswitch]
  #17 [ffff80000a5d3730] __netif_receive_skb_core.constprop.0 at ffffb70f06079f90

We moved the per cpu upcall counter allocation to the existing vport
alloc and free functions to solve this.

Fixes: 95637d9 ("net: openvswitch: release vport resources on failure")
Fixes: 1933ea3 ("net: openvswitch: Add support to count upcall packets")
Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Acked-by: Aaron Conole <aconole@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
kubalewski pushed a commit that referenced this pull request Jun 27, 2023
…plate

When commit 19343b5 ("mm/page-writeback: introduce tracepoint for
wait_on_page_writeback()") repurposed the writeback_dirty_page trace event
as a template to create its new wait_on_page_writeback trace event, it
ended up opening a window to NULL pointer dereference crashes due to the
(infrequent) occurrence of a race where an access to a page in the
swap-cache happens concurrently with the moment this page is being written
to disk and the tracepoint is enabled:

    BUG: kernel NULL pointer dereference, address: 0000000000000040
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP PTI
    CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13
    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023
    RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0
    Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d
    RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044
    RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006
    R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000
    R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200
    FS:  00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0
    Call Trace:
     <TASK>
     ? __die+0x20/0x70
     ? page_fault_oops+0x76/0x170
     ? kernelmode_fixup_or_oops+0x84/0x110
     ? exc_page_fault+0x65/0x150
     ? asm_exc_page_fault+0x22/0x30
     ? trace_event_raw_event_writeback_folio_template+0x76/0xf0
     folio_wait_writeback+0x6b/0x80
     shmem_swapin_folio+0x24a/0x500
     ? filemap_get_entry+0xe3/0x140
     shmem_get_folio_gfp+0x36e/0x7c0
     ? find_busiest_group+0x43/0x1a0
     shmem_fault+0x76/0x2a0
     ? __update_load_avg_cfs_rq+0x281/0x2f0
     __do_fault+0x33/0x130
     do_read_fault+0x118/0x160
     do_pte_missing+0x1ed/0x2a0
     __handle_mm_fault+0x566/0x630
     handle_mm_fault+0x91/0x210
     do_user_addr_fault+0x22c/0x740
     exc_page_fault+0x65/0x150
     asm_exc_page_fault+0x22/0x30

This problem arises from the fact that the repurposed writeback_dirty_page
trace event code was written assuming that every pointer to mapping
(struct address_space) would come from a file-mapped page-cache object,
thus mapping->host would always be populated, and that was a valid case
before commit 19343b5.  The swap-cache address space
(swapper_spaces), however, doesn't populate its ->host (struct inode)
pointer, thus leading to the crashes in the corner-case aforementioned.

commit 19343b5 ended up breaking the assignment of __entry->name and
__entry->ino for the wait_on_page_writeback tracepoint -- both dependent
on mapping->host carrying a pointer to a valid inode.  The assignment of
__entry->name was fixed by commit 68f23b8 ("memcg: fix a crash in
wb_workfn when a device disappears"), and this commit fixes the remaining
case, for __entry->ino.

Link: https://lkml.kernel.org/r/20230606233613.1290819-1-aquini@redhat.com
Fixes: 19343b5 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()")
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Reviewed-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
kubalewski pushed a commit that referenced this pull request Jul 28, 2023
Petr Machata says:

====================
mlxsw: Permit enslavement to netdevices with uppers

The mlxsw driver currently makes the assumption that the user applies
configuration in a bottom-up manner. Thus netdevices need to be added to
the bridge before IP addresses are configured on that bridge or SVI added
on top of it. Enslaving a netdevice to another netdevice that already has
uppers is in fact forbidden by mlxsw for this reason. Despite this safety,
it is rather easy to get into situations where the offloaded configuration
is just plain wrong.

As an example, take a front panel port, configure an IP address: it gets a
RIF. Now enslave the port to the bridge, and the RIF is gone. Remove the
port from the bridge again, but the RIF never comes back. There is a number
of similar situations, where changing the configuration there and back
utterly breaks the offload.

Similarly, detaching a front panel port from a configured topology means
unoffloading of this whole topology -- VLAN uppers, next hops, etc.
Attaching the port back is then not permitted at all. If it were, it would
not result in a working configuration, because much of mlxsw is written to
react to changes in immediate configuration. There is nothing that would go
visit netdevices in the attached-to topology and offload existing routes
and VLAN memberships, for example.

In this patchset, introduce a number of replays to be invoked so that this
sort of post-hoc offload is supported. Then remove the vetoes that
disallowed enslavement of front panel ports to other netdevices with
uppers.

The patchset progresses as follows:

- In patch #1, fix an issue in the bridge driver. To my knowledge, the
  issue could not have resulted in a buggy behavior previously, and thus is
  packaged with this patchset instead of being sent separately to net.

- In patch #2, add a new helper to the switchdev code.

- In patch #3, drop mlxsw selftests that will not be relevant after this
  patchset anymore.

- Patches #4, #5, #6, #7 and #8 prepare the codebase for smoother
  introduction of the rest of the code.

- Patches #9, #10, #11, #12, #13 and #14 replay various aspects of upper
  configuration when a front panel port is introduced into a topology.
  Individual patches take care of bridge and LAG RIF memberships, switchdev
  replay, nexthop and neighbors replay, and MACVLAN offload.

- Patches #15 and #16 introduce RIFs for newly-relevant netdevices when a
  front panel port is enslaved (in which case all uppers are newly
  relevant), or, respectively, deslaved (in which case the newly-relevant
  netdevice is the one being deslaved).

- Up until this point, the introduced scaffolding was not really used,
  because mlxsw still forbids enslavement of mlxsw netdevices to uppers
  with uppers. In patch #17, this condition is finally relaxed.

A sizable selftest suite is available to test all this new code. That will
be sent in a separate patchset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
kubalewski pushed a commit that referenced this pull request Aug 4, 2023
The cited commit holds encap tbl lock unconditionally when setting
up dests. But it may cause the following deadlock:

 PID: 1063722  TASK: ffffa062ca5d0000  CPU: 13   COMMAND: "handler8"
  #0 [ffffb14de05b7368] __schedule at ffffffffa1d5aa91
  #1 [ffffb14de05b7410] schedule at ffffffffa1d5afdb
  #2 [ffffb14de05b7430] schedule_preempt_disabled at ffffffffa1d5b528
  #3 [ffffb14de05b7440] __mutex_lock at ffffffffa1d5d6cb
  #4 [ffffb14de05b74e8] mutex_lock_nested at ffffffffa1d5ddeb
  #5 [ffffb14de05b74f8] mlx5e_tc_tun_encap_dests_set at ffffffffc12f2096 [mlx5_core]
  #6 [ffffb14de05b7568] post_process_attr at ffffffffc12d9fc5 [mlx5_core]
  #7 [ffffb14de05b75a0] mlx5e_tc_add_fdb_flow at ffffffffc12de877 [mlx5_core]
  #8 [ffffb14de05b75f0] __mlx5e_add_fdb_flow at ffffffffc12e0eef [mlx5_core]
  #9 [ffffb14de05b7660] mlx5e_tc_add_flow at ffffffffc12e12f7 [mlx5_core]
 #10 [ffffb14de05b76b8] mlx5e_configure_flower at ffffffffc12e1686 [mlx5_core]
 #11 [ffffb14de05b7720] mlx5e_rep_indr_offload at ffffffffc12e3817 [mlx5_core]
 #12 [ffffb14de05b7730] mlx5e_rep_indr_setup_tc_cb at ffffffffc12e388a [mlx5_core]
 #13 [ffffb14de05b7740] tc_setup_cb_add at ffffffffa1ab2ba8
 #14 [ffffb14de05b77a0] fl_hw_replace_filter at ffffffffc0bdec2f [cls_flower]
 #15 [ffffb14de05b7868] fl_change at ffffffffc0be6caa [cls_flower]
 #16 [ffffb14de05b7908] tc_new_tfilter at ffffffffa1ab71f0

[1031218.028143]  wait_for_completion+0x24/0x30
[1031218.028589]  mlx5e_update_route_decap_flows+0x9a/0x1e0 [mlx5_core]
[1031218.029256]  mlx5e_tc_fib_event_work+0x1ad/0x300 [mlx5_core]
[1031218.029885]  process_one_work+0x24e/0x510

Actually no need to hold encap tbl lock if there is no encap action.
Fix it by checking if encap action exists or not before holding
encap tbl lock.

Fixes: 37c3b9f ("net/mlx5e: Prevent encap offload when neigh update is running")
Signed-off-by: Chris Mi <cmi@nvidia.com>
Reviewed-by: Vlad Buslov <vladbu@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
kubalewski pushed a commit that referenced this pull request Aug 22, 2023
Jiri Pirko says:

====================
devlink: introduce selective dumps

Motivation:

For SFs, one devlink instance per SF is created. There might be
thousands of these on a single host. When a user needs to know port
handle for specific SF, he needs to dump all devlink ports on the host
which does not scale good.

Solution:

Allow user to pass devlink handle (and possibly other attributes)
alongside the dump command and dump only objects which are matching
the selection.

Use split ops to generate policies for dump callbacks acccording to
the attributes used for selection.

The userspace can use ctrl genetlink GET_POLICY command to find out if
the selective dumps are supported by kernel for particular command.

Example:
$ devlink port show
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false

$ devlink port show auxiliary/mlx5_core.eth.0
auxiliary/mlx5_core.eth.0/65535: type eth netdev eth2 flavour physical port 0 splittable false

$ devlink port show auxiliary/mlx5_core.eth.1
auxiliary/mlx5_core.eth.1/131071: type eth netdev eth3 flavour physical port 1 splittable false

Extension:

patches #12 and #13 extends selection attributes by port index
for health reporter dumping.
====================

Link: https://lore.kernel.org/r/20230811155714.1736405-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants