Skip to content

[26.04_linux-nvidia-bos] BPMP ACPI + SoC Hub MBWT sysfs for T410/NVL72#421

Open
kobak2026 wants to merge 3 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
kobak2026:bug-DGX-16100/bpmp-acpi-mbwt-v4-26.04-bos
Open

[26.04_linux-nvidia-bos] BPMP ACPI + SoC Hub MBWT sysfs for T410/NVL72#421
kobak2026 wants to merge 3 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
kobak2026:bug-DGX-16100/bpmp-acpi-mbwt-v4-26.04-bos

Conversation

@kobak2026
Copy link
Copy Markdown
Collaborator

@kobak2026 kobak2026 commented May 14, 2026

Summary

Backport Aniruddha's V4 internal-review 3-patch BPMP ACPI + SoC Hub MBWT sysfs series to the 7.0-HWE / 26.04 BOS kernel branch. This is the V4 internal-mail design, not PR352's older export/unexport sysfs layout.

This is based on Aniruddha's V4 internal-review series; V5 is under review, so material upstream deltas will be handled as a follow-up refresh if needed.

V4 vs PR352

  • Sysfs path is now /sys/bus/platform/devices/NVDA3001:NN/mbwt_control/{pcie_instance_id,vc_type,bandwidth}.
  • bandwidth reads back a 3-tuple: <pcie_instance_id> <vc_type> <bandwidth>.
  • mbwt_control is guarded by BPMP QUERY_ABI support for both GET_BW and SET_BW.
  • No bpmp-tegra410.c is introduced; the BPMP ACPI path is integrated into the existing driver structure.
  • The old PR352 /sys/class/bpmp_mbwt/.../{export,unexport,mbwtN} model is intentionally not carried forward.

Commits

  • 8a4662e7423f2b1c4250d773c6e25a4b136418b7 - NVIDIA: VR: SAUCE: tegra: bpmp: Move channel, resource init to helper
  • fd9097f38b08fe890012e3e97120935dc174047f - NVIDIA: VR: SAUCE: tegra: bpmp: Add ACPI support
  • 758f258afbfa39891b95de08226702d7bcb89391 - NVIDIA: VR: SAUCE: tegra: bpmp: Add sysfs for memory bandwidth QoS

Validation

LTS runtime validation on comp016 was completed on the paired LTS branch:

  • Booted 7.0.0-sudeep-test-dgx16100-v4.

  • mbwt_control present under both NVDA3001:00 and NVDA3001:01 platform devices.

  • V4 sysfs unit test: PASS=16 FAIL=0 SKIP=1.

  • fio throttle sweep on /dev/nvme0n1, socket 00:

    • 100 GB/s baseline: READ: bw=13.3GiB/s (14.3GB/s).
    • 50 GB/s: READ: bw=13.8GiB/s (14.8GB/s).
    • 10 GB/s: READ: bw=9344MiB/s (9798MB/s).
    • 1 GB/s: READ: bw=972MiB/s (1019MB/s).
    • 100 GB/s recovery: READ: bw=13.8GiB/s (14.8GB/s).
  • These values match the PR352 throttle behavior: high limits do not cap read bandwidth, 10 GB/s caps near 10 GB/s, 1 GB/s caps near 1 GB/s, and recovery returns to uncapped bandwidth.

  • make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=/tmp/build-dgx16100-bos olddefconfig

  • make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- O=/tmp/build-dgx16100-bos -j$(nproc) drivers/firmware/tegra/

  • Result: exit 0.

  • Warning observed: security/apparmor/Kconfig:120:warning: multi-line strings not supported during olddefconfig.

  • The previous known tegra_bpmp_transfer_acpi frame-size warning did not appear in the final targeted build output.

Debugfs note: the first LTS runtime run observed the known duplicate BPMP debugfs warning (debugfs: 'bpmp' already exists in '/', tegra-bpmp NVDA3001:01: debugfs initialization failed: -12). The current commits include the per-device debugfs fix, build-verified.

@kobak2026 kobak2026 requested review from clsotog, fyu1 and nvmochs May 14, 2026 17:03
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 14, 2026

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ❌ Errors found

Details
Checking 3 commits...

Cherry-pick digest:
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 6cfd7c422b5b │ [SAUCE] tegra: bpmp: add sysfs for memory bandwidth qos          │ N/A        │ N/A     │ anrao, kobak              │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 84f4a187fc6c │ [SAUCE] tegra: bpmp: add acpi support                            │ N/A        │ N/A     │ anrao, kobak              │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 92f1cd122c98 │ [SAUCE] tegra: bpmp: move channel, resource init to helper       │ N/A        │ N/A     │ anrao, kobak              │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

PR metadata:
E: PR targets 26.04_linux-nvidia-bos but body has no https://bugs.launchpad.net/... link

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 14, 2026

hey koba can you check this compile warnings
drivers/firmware/tegra/bpmp.c: In function ‘tegra_bpmp_transfer_acpi’:
drivers/firmware/tegra/bpmp.c:324:13: warning: unused variable ‘i’ [-Wunused-variable]
324 | u32 i;
| ^
drivers/firmware/tegra/bpmp.c:389:1: warning: the frame size of 4096 bytes is larger than 1024 bytes [-Wframe-larger-than=]
389 | }

@nirmoy
Copy link
Copy Markdown
Collaborator

nirmoy commented May 14, 2026

Boro review

Latest watcher review: open review

Head: 6cfd7c422b5b

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 15, 2026

@kobak2026

Some findings from review with Codex:

  1. ACPI bpmp->soc NULL derefs
    In ACPI probe, device_get_match_data() returns NULL because the ACPI match entry uses .driver_data = 0 at drivers/firmware/tegra/bpmp.c:1038. Some init paths were guarded with !ACPI_HANDLE(), but not all later SoC-op
    uses were.

    Risky paths:

    • Probe failure jumps to deinit: and calls bpmp->soc->ops->deinit at drivers/firmware/tegra/bpmp.c:945.
    • Resume calls bpmp->soc->ops->resume at drivers/firmware/tegra/bpmp.c:966.

    Suggested fix: guard these like the init path, for example if (bpmp->soc && bpmp->soc->ops->deinit) and same for resume, or branch ACPI resume/deinit explicitly.

  2. Short ACPI BMRQ response can leak/corrupt data
    tegra_bpmp_transfer_acpi() validates that the firmware response is not too large, but it does not validate that it is large enough for the caller’s requested RX size.

    Problem sequence:

    • ACPI buffer length is read into rdata_len at drivers/firmware/tegra/bpmp.c:363.
    • The code checks only rbuf_len > sizeof(pkg) at drivers/firmware/tegra/bpmp.c:365.
    • It then copies msg->rx.size bytes from pkg.data at drivers/firmware/tegra/bpmp.c:383.

    If AML returns fewer bytes than msg->rx.size, the tail of pkg.data was never initialized by acpi_extract_package(). Suggested fix: before copying, reject rdata_len < msg->rx.size.

  3. MBWT sysfs write ignores BPMP command failure
    tegra_sochub_set_mbwt() only checks whether tegra_bpmp_transfer() itself failed at drivers/firmware/tegra/bpmp-tegra-sysfs.c:94. It never checks msg.rx.ret.

    That means firmware can reject CMD_SOCHUB_MBWT_SET_BW, but the sysfs write still returns count, so userspace sees success. This is especially relevant because ABI text says firmware validates instance, vc_type, and
    bandwidth. Suggested fix: mirror the GET path and return an error if msg.rx.ret < 0.

  4. Large stack frame in ACPI transfer
    tegra_bpmp_transfer_acpi() declares struct tegra_bpmp_acpi_message pkg on the stack at drivers/firmware/tegra/bpmp.c:321. That struct contains a 3960-byte data buffer at drivers/firmware/tegra/bpmp-private.h:29.

    This will likely trigger frame-size warnings on configs with CONFIG_FRAME_WARN=1024 or 2048, and it is also undesirable in a common transfer path. Suggested fix: avoid acpi_extract_package() into that large fixed
    struct, or allocate the temporary extraction buffer with kmalloc() sized from rbuf_len.


As a side note, didn't we address some of these same issues last time in the version we took into the v6.17 kernel? We need to understand why Aniruddha is not carrying these fixes forward.

kobak2026 added 3 commits May 18, 2026 16:05
Refactor the BPMP driver by moving channel initialization and Device
Tree resource parsing into separate helper functions. This prepares the
driver for ACPI support, where these helpers will be skipped because
channel initialization is handled by ACPI AML methods on ACPI-based
systems.

Signed-off-by: Aniruddha Rao <anrao@nvidia.com>

(backported from V4 internal mail <20260423140823.2848045-2-anrao@nvidia.com>)
[kobak: Preserve threaded channel count/semaphore initialization after
 the helper split and align rx_channel allocation continuation.]

Signed-off-by: Koba Ko <kobak@nvidia.com>
This patch adds required changes in the Tegra BPMP driver to make it
compatible with ACPI based platforms.
On ACPI systems, IPC is handled through the AML method instead of the
core kernel framework using Mailboxes and IVC.
Bypass clock, reset and powergate init calls as these are not controlled
by the Linux drivers on ACPI based systems.

Signed-off-by: Aniruddha Rao <anrao@nvidia.com>

(backported from V4 internal mail <20260423140823.2848045-3-anrao@nvidia.com>)
[kobak: Add !ACPI_HANDLE(bpmp->dev) guard around bpmp->soc->ops->init
 because ACPI match driver_data=0 makes bpmp->soc NULL; make BPMP
 debugfs directory per-device on ACPI systems to avoid duplicate
 /sys/kernel/debug/bpmp collision on dual NVDA3001 instances; remove
 unused i and heap-allocate the ACPI BMRQ package to avoid the
 frame-size warning; guard ACPI deinit/resume paths when bpmp->soc is
 NULL; reject short BMRQ replies before copying response data.]

Signed-off-by: Koba Ko <kobak@nvidia.com>
Tegra410 exposes memory bandwidth QoS for PCIe and GPU UPHY traffic on the
path to DRAM. Each bandwidth group can cap PCIe read, PCIe write, or
combined GPU UPHY read and write traffic, with target limits.

The memory bandwidth QoS is not exposed as ordinary host MMIO and
cannot be controlled from the kernel. The bandwidth limits can be programmed
by sending the corresponding requests (MBWT MRQ) to the BPMP.

On Tegra410, an ACPI-based platform, Linux BPMP driver does not use the
device-tree mailbox path for communicating with the BPMP firmware.
As a result, there is no existing client driver or interface that can be
used to send the memory bandwidth requests to the BPMP.

This patch exposes a sysfs directory mbwt_control on the tegra-bpmp platform
device with pcie_instance_id, vc_type, and bandwidth.
Writing bandwidth issues an MBWT_SET for the selected group (pcie_instance_id)
and traffic class (vc_type).
A read issues MBWT_GET and returns the bandwidth value reported by firmware.

These attributes are exposed only if MBWT QUERY probe reports both
MBWT_SET and MBWT_GET commands as supported.

ABI documented in Documentation/ABI/testing/sys-platform-tegra-bpmp

Signed-off-by: Aniruddha Rao <anrao@nvidia.com>

(backported from V4 internal mail <20260423140823.2848045-4-anrao@nvidia.com>)
[kobak: Keep functional MRQ_SOCHUB_MBWT ABI definitions and sysfs
 interface from V4; condense verbose per-field ABI comments while
 preserving enum/struct layout and Documentation/ABI coverage; use
 refcounted kobject allocation for mbwt_control; validate
 pcie_instance_id/vc_type before staging; return only the bandwidth
 value from bandwidth reads; report BPMP SET rejections to userspace.]

Signed-off-by: Koba Ko <kobak@nvidia.com>
@kobak2026 kobak2026 force-pushed the bug-DGX-16100/bpmp-acpi-mbwt-v4-26.04-bos branch from 758f258 to 6cfd7c4 Compare May 18, 2026 08:39
@kobak2026
Copy link
Copy Markdown
Collaborator Author

@clsotog thanks, fixed.

@nirmoy thanks I found there's a issue of your review mechanism. it reviews one commit by one commit. not whole changes.
this leads some comments are reported but addressed in next commit.

@nvmochs thanks, fixed.
I will find Aniruddha with these comments after compare his v5.

Copy link
Copy Markdown
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes.

Acked-by: Carol L Soto <csoto@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 18, 2026

@kobak2026 Thanks for addressing the other items.

Two more findings when reviewing with Codex...

  1. Finding: CONFIG_ACPI=n Build Risk

TEGRA_BPMP still supports non-ACPI Tegra builds:

drivers/firmware/tegra/Kconfig:14

      depends on !CPU_BIG_ENDIAN

So CONFIG_TEGRA_BPMP=y with CONFIG_ACPI=n is still a valid DT-only configuration.

However, drivers/firmware/tegra/bpmp.c:313 now always compiles tegra_bpmp_transfer_acpi(), and that helper calls ACPI-only routines:

Most other ACPI-looking uses in this series are okay with CONFIG_ACPI=n because they only use ACPI_HANDLE(), which has a stub that evaluates false. bpmp-tegra-sysfs.c and the debugfs naming change should therefore become inert in non-ACPI
builds. The actual build risk is the body of tegra_bpmp_transfer_acpi().

Suggested fix: compile the real helper only when ACPI is enabled and provide a stub otherwise:

#if IS_ENABLED(CONFIG_ACPI)
static int tegra_bpmp_transfer_acpi(struct tegra_bpmp *bpmp,
struct tegra_bpmp_message *msg)
{
...
}
#else
static int tegra_bpmp_transfer_acpi(struct tegra_bpmp *bpmp,
struct tegra_bpmp_message *msg)
{
return -EOPNOTSUPP;
}
#endif

Optionally, the ACPI match table can also be guarded for cleanliness, but the required fix is guarding or stubbing tegra_bpmp_transfer_acpi().


  1. Missing irqs_disabled() Guard
    Before the ACPI split, the exported tegra_bpmp_transfer() rejected callers with local IRQs disabled. That matters because non-atomic BPMP transfers can sleep.

After the split, the guard is only in the DT/mailbox path:

drivers/firmware/tegra/bpmp.c:456

static int tegra_bpmp_transfer_channel(...)
{
if (WARN_ON(irqs_disabled()))
return -EPERM;
...
}

But the exported wrapper routes ACPI devices around that check:

drivers/firmware/tegra/bpmp.c:496

if (ACPI_HANDLE(bpmp->dev))
return tegra_bpmp_transfer_acpi(bpmp, msg);

The ACPI path can definitely sleep: acpi_evaluate_object() may execute AML and allocate memory, and the new code also does kzalloc(..., GFP_KERNEL). So an IRQ-disabled caller would now hit sleepable code instead of getting the old -EPERM.

Suggested fix: put the guard back in the public wrapper, before choosing transport:

if (WARN_ON(irqs_disabled()))
return -EPERM;

Then the rule applies consistently to both DT and ACPI. The atomic API already rejects ACPI with -EOPNOTSUPP, so this preserves the existing contract cleanly.

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 18, 2026

Thanks for the changes.

Acked-by: Carol L Soto <csoto@nvidia.com>

Sorry one question at line drivers/firmware/tegra/bpmp.c:951 do we need a goto or let it probe even if sysfs_register failed?

@clsotog clsotog self-requested a review May 18, 2026 20:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants