Skip to content

VR: update LFA support on linux-nvidia-6.18#427

Open
nirmoy wants to merge 16 commits into
NVIDIA:linux-nvidia-6.18from
nirmoy:codex/lfa-v2-6.18
Open

VR: update LFA support on linux-nvidia-6.18#427
nirmoy wants to merge 16 commits into
NVIDIA:linux-nvidia-6.18from
nirmoy:codex/lfa-v2-6.18

Conversation

@nirmoy
Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy commented May 18, 2026

Summary

  • Revert the old LFA support stack on linux-nvidia-6.18
  • Bring in the newer LFA series from the BOS branch
  • Apply the Tegra SMCCC macro-parentheses fix
  • Add follow-up LFA cleanup for image names and init failure unwinding

Status

Draft PR. Remote build, install, and boot smoke completed on nvidia@10.105.57.17; full LFA functional testing is still pending.

Validation

  • git diff --check upstream/linux-nvidia-6.18..HEAD
  • Remote arm64 package build on nvidia@10.105.57.17
    • PR head: 36f52a70c249df084dc13bb874d558d6ccbb1b2d
    • Test build head: 26a0e4dccc29 (build-only: tag LFA detection log, not part of this PR)
    • Kernel release: 6.18.25-lfa-v2-test
    • Build result: BUILD_STATUS=0
  • Remote install and one-shot boot smoke on nvidia@10.105.57.17
    • Booted kernel: 6.18.25-lfa-v2-test
    • /sys/firmware/lfa enumerated UUIDs:
      • 0509b633-5734-422f-a681-6096e932d93a
      • 3ab71f81-32b9-496d-841b-e3d0e9fd1a48
      • 65922703-2f74-e644-8dff-579ac1ff0610
      • 6c0762a6-12f2-4b56-92cb-ba8f633606d9
    • Kernel log:
      • Arm LFA: Live Firmware Activation: detected v1.0 [NVIDIA VR SAUCE lfa_v2]
      • Arm LFA: registered LFA ACPI notification

Notes

  • The linux-image-6.18.25-lfa-v2-test package is installed and booted, but dpkg reports it as iF because unrelated DKMS modules failed during postinst (iser, isert, mlnx-ofed-kernel, mods, srp). The boot smoke and LFA sysfs/log checks still passed.
  • Full LFA functional testing remains to be done.

nirmoy and others added 15 commits May 13, 2026 03:17
…improve SMC retry pacing"

This reverts commit 1137d9f.

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
…tialization race"

This reverts commit 1e028a6.

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
This reverts commit 7227499.

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
This reverts commit e2b0a6e.

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
…ware Activation (LFA)"

This reverts commit 43325a3.

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
…ivation (LFA)

BugLink: https://bugs.launchpad.net/bugs/2150652

The Arm Live Firmware Activation (LFA) is a specification [1] to describe
activating firmware components without a reboot. Those components
(like TF-A's BL31, EDK-II, TF-RMM, secure paylods) would be updated the
usual way: via fwupd, FF-A or other secure storage methods, or via some
IMPDEF Out-Of-Bound method. The user can then activate this new firmware,
at system runtime, without requiring a reboot.
The specification covers the SMCCC interface to list and query available
components and eventually trigger the activation.

Add a new directory under /sys/firmware to present firmware components
capable of live activation. Each of them is a directory under lfa/,
and is identified via its GUID. The activation will be triggered by echoing
"1" into the "activate" file:
==========================================
/sys/firmware/lfa # ls -l . 6c*
.:
total 0
drwxr-xr-x    2 0 0         0 Jan 19 11:33 47d4086d-4cfe-9846-9b95-2950cbbd5a00
drwxr-xr-x    2 0 0         0 Jan 19 11:33 6c0762a6-12f2-4b56-92cb-ba8f633606d9
drwxr-xr-x    2 0 0         0 Jan 19 11:33 d6d0eea7-fcea-d54b-9782-9934f234b6e4

6c0762a6-12f2-4b56-92cb-ba8f633606d9:
total 0
--w-------    1 0        0             4096 Jan 19 11:33 activate
-r--r--r--    1 0        0             4096 Jan 19 11:33 activation_capable
-r--r--r--    1 0        0             4096 Jan 19 11:33 activation_pending
--w-------    1 0        0             4096 Jan 19 11:33 cancel
-r--r--r--    1 0        0             4096 Jan 19 11:33 cpu_rendezvous
-r--r--r--    1 0        0             4096 Jan 19 11:33 current_version
-rw-r--r--    1 0        0             4096 Jan 19 11:33 force_cpu_rendezvous
-r--r--r--    1 0        0             4096 Jan 19 11:33 may_reset_cpu
-r--r--r--    1 0        0             4096 Jan 19 11:33 name
-r--r--r--    1 0        0             4096 Jan 19 11:33 pending_version
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # grep . *
grep: activate: Permission denied
activation_capable:1
activation_pending:1
grep: cancel: Permission denied
cpu_rendezvous:1
current_version:0.0
force_cpu_rendezvous:1
may_reset_cpu:0
name:TF-RMM
pending_version:0.0
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 # echo 1 > activate
[ 2825.797871] Arm LFA: firmware activation succeeded.
/sys/firmware/lfa/6c0762a6-12f2-4b56-92cb-ba8f633606d9 #
==========================================

[1] https://developer.arm.com/documentation/den0147/latest/

Signed-off-by: Salman Nabi <salman.nabi@arm.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
(backported from https://lore.kernel.org/all/20260317103336.1273582-1-andre.przywara@arm.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 08bef19)
BugLink: https://bugs.launchpad.net/bugs/2150652

After an image activation, the list of firmware images might change, so
we have to re-iterate them through the SMC interface.
Move the corresponding code from the activate_fw_image() function into
update_fw_images_tree(), where it could be reused more easily, for
instance when triggered by an interrupt.

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
[Andre: split off from another patch, rebased]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
(backported from https://lore.kernel.org/all/20260317103336.1273582-1-andre.przywara@arm.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit cedf5ce)
…hdog

BugLink: https://bugs.launchpad.net/bugs/2150652

Enhance PRIME/ACTIVATION functions to touch watchdog and implement
timeout mechanism. This update ensures that any potential hangs are
detected promptly and that the LFA process is allocated sufficient
execution time before the watchdog timer expires. These changes improve
overall system reliability by reducing the risk of undetected process
stalls and unexpected watchdog resets.

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
(backported from https://lore.kernel.org/all/20260317103336.1273582-1-andre.przywara@arm.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 17cfbbd)
BugLink: https://bugs.launchpad.net/bugs/2150652

The Arm LFA spec describes an ACPI notification mechanism, where the
platform (firmware) can notify an LFA client about newly available
firmware imag updates ("pending images" in LFA terms).

Add a faux device after discovering the existence of an LFA agent via
the SMCCC discovery mechnism, and use that device to check for the ACPI
notification description. Register this when one is provided.

The notification just conveys the fact that at least one firmware image
has now a pending update, it doesn't say which, also there could be more
than one pending. Loop through all images to find every which needs to
be activated, and trigger the activation. We need to do this is a loop,
since an activation might change the number and the status of available
images.

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
[Andre: convert from platform driver to faux device]
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
(backported from https://lore.kernel.org/all/20260317103336.1273582-1-andre.przywara@arm.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 195ce64)
BugLink: https://bugs.launchpad.net/bugs/2150652

The Arm LFA spec places control over the actual activation process in
the hands of the non-secure host OS. An platform initiated interrupt or
notification signals the availability of an updateable firmware image,
but does not necessarily need to trigger it automatically.

Add a sysfs control file that guards such automatic activation. If an
administrator wants to allow automatic platform initiated updates, they
can activate that by echoing a "1" into the auto_activate file in the
respective sysfs directory. Any incoming notification would then result
in the activation triggered.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
(backported from https://lore.kernel.org/all/20260317103336.1273582-1-andre.przywara@arm.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit bc3733f)
BugLink: https://bugs.launchpad.net/bugs/2150652

The Arm Live Firmware Activation spec describes an asynchronous
notification mechanism, where the platform can notify the host OS about
newly pending image updates.
In the absence of the ACPI notification mechanism also a simple
devicetree node can describe an interrupt.

Add code to find the respective DT node and register the specified
interrupt, to trigger the activation if needed.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
(backported from https://lore.kernel.org/all/20260317103336.1273582-1-andre.przywara@arm.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 07cbad7)
BugLink: https://bugs.launchpad.net/bugs/2150652

After a successful live activation, the list of firmware images might
change, which also affects the sequence IDs. We store the sequence
ID in a data structure and connect it to its GUID, which is the
identifier used to access certain image properties from userland.
When an activation is happening, the sequence ID associations might
change at any point, so we must be sure to not use any previously
learned sequence ID during this time.

Protect the association between a sequence ID and a firmware image
(its GUID, really) by a reader/writer lock. In this case it's a R/W
semaphore, so it can sleep and we can hold it for longer, also
concurrent SMC calls are not blocked on each other, it's just an
activation that blocks calls.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
(backported from https://lore.kernel.org/all/20260317103336.1273582-1-andre.przywara@arm.com/)
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 6724bd7)
… ACTIVATE

BugLink: https://bugs.launchpad.net/bugs/2150652

DEN0147 §2.6: LFA_ACTIVATE can return LFA_BUSY when the firmware
postpones the activation. Although the rwsem in this driver prevents
concurrent ACTIVATE calls from kernel space, an external agent or
internal firmware state may still produce LFA_BUSY. Add an explicit
retry loop (same budget and delay as CALL_AGAIN) so the code does not
silently treat a retriable condition as a terminal failure. Catching
LFA_BUSY explicitly also surfaces potential firmware or driver bugs.

DEN0147 §2.5: LFA_PRIME returning LFA_BUSY means another CPU is
running LFA_PRIME concurrently. This driver never issues parallel PRIME,
so this is unexpected; log pr_warn and return so the caller can surface
the anomaly rather than swallowing it in the generic error path.

Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 63cbc12)
…pdates

BugLink: https://bugs.launchpad.net/bugs/2150652

Firmware image directories are plain kobjects under /sys/firmware.
udev coldplug does not enumerate them as devices, so rules matching
the per-image LFA kobjects do not run reliably at boot.

LFA already creates the arm-lfa faux device. Emit KOBJ_CHANGE from
that device after the firmware image tree is refreshed, so user space
can use the existing driver-core device as the notification anchor
for runtime inventory updates. The same udev rule then also covers
coldplug via the device add event, e.g.:

  ACTION=="add|change", SUBSYSTEM=="faux", KERNEL=="arm-lfa", \
          RUN+="/usr/local/sbin/lfa-auto-activate"

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 74a636e)
Add missing parentheses around macro parameter 'x' in TEGRA_SMCCC_*
macros to prevent operator precedence issues if invoked with expressions.

Fixes: 579bc50 ("NVIDIA: VR: SAUCE: soc/tegra: misc: Use SMCCC to get chipid")
Signed-off-by: Saurav Sachidanand <sauravsc@amazon.com>

Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
@nirmoy nirmoy force-pushed the codex/lfa-v2-6.18 branch from ac518c6 to 37f5dc2 Compare May 18, 2026 12:37
…anup

Unknown firmware image UUIDs should still produce a useful name in sysfs.
Leave the stored image name empty for unknown UUIDs and use the kobject name
as the fallback when reporting the image name.

Also unwind resources allocated during init if a later step fails. Destroy the
workqueue when kset creation fails, destroy the faux device when inventory
initialization fails after the faux device was created, and guard the exit path
against a missing faux device.

Suggested-by: Saurav Sachidanand <sauravsc@amazon.com>
Signed-off-by: Nirmoy Das <nirmoyd@nvidia.com>
@nirmoy nirmoy force-pushed the codex/lfa-v2-6.18 branch from 37f5dc2 to 36f52a7 Compare May 18, 2026 12:53
@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 18, 2026

Boro review

Latest watcher review: open review

Head: 8f082f5dea1c

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@nirmoy nirmoy marked this pull request as ready for review May 19, 2026 11:40
@nirmoy
Copy link
Copy Markdown
Collaborator Author

nirmoy commented May 19, 2026

Chris validated the 6.18 LFA debs successfully. BPMP FW, RAS FW, and RMM were updated, and the kernel detected NVIDIA VR SAUCE lfa_v2. Logs: http://tegra-bmc-sol.nvidia.com:4002/shared/10_103_171_145_20260519_2106_edfc8682

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 19, 2026

@nirmoy some commits are missing your Sign off and Maybe need to take out the last cherry pick line like for example in this one 7fdc5c6

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 19, 2026

It looks like these were picked from the 7.0 branch. That is fine, but needs to be accounted for in the pick tag (just need to include the branch name after the SHA).

The revert's are also missing a sign-off and a note that they are being replaced by a newer version of the series.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

@nirmoy nirmoy force-pushed the codex/lfa-v2-6.18 branch from 36f52a7 to 8f082f5 Compare May 20, 2026 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants