Skip to content

[26.04_linux-nvidia-bos] Backport CXL reset save/restore support#430

Closed
kobak2026 wants to merge 38 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
kobak2026:bug-DGX-16137/cxl-backport-26.04-bos-nvpr
Closed

[26.04_linux-nvidia-bos] Backport CXL reset save/restore support#430
kobak2026 wants to merge 38 commits into
NVIDIA:26.04_linux-nvidia-bosfrom
kobak2026:bug-DGX-16137/cxl-backport-26.04-bos-nvpr

Conversation

@kobak2026
Copy link
Copy Markdown
Collaborator

@kobak2026 kobak2026 commented May 19, 2026

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2153819

Summary

Backport Srirangan's CXL reset plus DVSEC/HDM save-restore series for the CXL Type-2 stack.

This series:

  • Reverts the older monolithic CXL reset implementation.
  • Adds PCI CXL DVSEC and HDM decoder state save/restore across reset.
  • Adds CXL reset orchestration, sibling coordination, memory quiesce/cache flush handling, and the cxl_reset PCI sysfs interface.
  • Adds ABI documentation for /sys/bus/pci/devices/.../cxl_reset.
  • Preserves PCI_DVSEC_CXL_CACHE_CAPABLE for the existing CXL.cache ATS dependency.

Dependency Note

This PR depends on the CXL Type-2 base branch and is intended to be reviewed/applied after that branch lands.

The branch is currently stacked locally on that dependency; the dependency branch is not pushed to NVIDIA/NV-Kernels.

Validation

  • Full kernel build passed on the remote validation host.
  • Built and booted validation kernel: 7.0.0-dgx16137-cxlreset.
  • Read-only CXL reset baseline passed for all four target BDFs.
  • Destructive cxl_reset validation passed on all four target BDFs.
  • CXLCtl and captured CXL range fields were preserved across reset.
  • CXLSta2 reported ResetComplete+ ResetError-.
  • dmesg deltas were clean; no host/device recovery was required.
  • After rebasing onto the refreshed dependency branch, git diff --check is clean.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 19, 2026

PR Validation Report

Patchscan ✅ No Missing Fixes

All cherry-picked commits checked — no missing upstream fixes found.

PR Lint ❌ Errors found

Details
Checking 38 commits...

Cherry-pick digest:
E: 52975ec822b9 ("NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC re"): backport trailer order: ORDER: move [Name: note] before the backporter Signed-off-by and after (backported from ...)
E: 82adedeffcd7 ("NVIDIA: VR: SAUCE: cxl: Move HDM decoder"): backport trailer order: ORDER: move [Name: note] before the backporter Signed-off-by and after (backported from ...)
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local        │ Referenced upstream / Patch subject                              │ Patch-ID   │ Subject │ SoB chain                 │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 9bf30cf9cb99 │ [SAUCE] documentation: abi: add cxl pci cxl_reset sysfs attribut │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c6db6ce8e793 │ [SAUCE] cxl: add cxl_reset sysfs interface for pci devices       │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 6e9e72b4b2a2 │ [SAUCE] cxl: add cxl dvsec reset sequence and flow orchestration │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3a245b9b22c3 │ [SAUCE] cxl: add multi-function sibling coordination for cxl res │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ baa527d0e4a3 │ [SAUCE] cxl: add memory offlining and cache flush helpers        │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d1c9d1b51460 │ [SAUCE] pci: export pci_dev_save_and_disable() and pci_dev_resto │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 52975ec822b9 │ pci: add cxl dvsec reset and capability register definitions     │ noted      │ found   │ ORDER: move [Name: note]  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 42ade8ec191c │ [SAUCE] pci: add hdm decoder state save/restore                  │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 969356c44c97 │ [SAUCE] pci: add cxl dvsec state save/restore across resets      │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 9a09ee09f074 │ [SAUCE] pci: add virtual extended cap save buffer for cxl state  │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 82adedeffcd7 │ cxl: move hdm decoder and register map definitions to include/cx │ no-match   │ not fou │ ORDER: move [Name: note]  │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 07705ee1b952 │ [SAUCE] pci: add cxl dvsec control, lock, and range register def │ N/A        │ N/A     │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ e90b1ee9e1bf │ [SAUCE] revert "nvidia: vr: sauce: cxl: add support for cxl rese │ N/A        │ N/A     │ kobak                     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 5268ad904265 │ [SAUCE] [config] add pci_cxl annotation for cxl state save/resto │ N/A        │ N/A     │ jan, bfigg, kobak         │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ e3aadf637883 │ [SAUCE] [config] enable cxl dax and kmem built-in for cxl memory │ N/A        │ N/A     │ jan, bfigg, kobak         │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ a2eba9dce2b6 │ [SAUCE] [config] cxl config annotations for type-2 device and ra │ N/A        │ N/A     │ jan, bfigg, kobak         │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 7a6aebb18177 │ cxl/region: support multi-level interleaving with smaller granul │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 8cb4feab45af │ [SAUCE] dax/hmem: reintroduce soft reserved ranges back into the │ N/A        │ N/A     │ schofiel, lizhijia, Koral │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c0db8c8c8fda │ [SAUCE] dax/hmem, cxl: defer and resolve ownership of soft reser │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2e433461671f │ [SAUCE] dax: add deferred-work helpers for dax_hmem and dax_cxl  │ N/A        │ N/A     │ Koralaha, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3a6c6f56dc2c │ cxl/region: add helper to check soft reserved containment by cxl │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3a52fc67323f │ [SAUCE] dax: track all dax_region allocations under a global res │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 79356d9fd78d │ [SAUCE] dax/cxl, hmem: initialize hmem early and defer dax_cxl b │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d8ce89e1116e │ [SAUCE] cxl/region: skip decoder reset on detach for autodiscove │ N/A        │ N/A     │ Koralaha, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 77ac7f9272ba │ [SAUCE] dax/hmem: gate soft reserved deferral on dev_dax_cxl     │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1d651dbee5bc │ [SAUCE] dax/hmem: request cxl_acpi and cxl_pci before walking so │ N/A        │ N/A     │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 0fdb2010fcfc │ sfc: support pio mapping based on cxl                            │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d6acd1ddf9f6 │ [SAUCE] cxl: avoid dax creation for accelerators                 │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 388dc8c0c90a │ cxl: attach region to an accelerator/type2 memdev                │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 9c9123639c35 │ [SAUCE] sfc: create type2 cxl memdev                             │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 79654508fdaf │ [SAUCE] cxl: prepare memdev creation for type2                   │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ cb8c41246ed1 │ [SAUCE] cxl/sfc: initialize dpa without a mailbox                │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ ba9972ccf965 │ cxl/sfc: map cxl regs                                            │ noted      │ found   │ ok, backporter: kobak     │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 513a2a51e219 │ [SAUCE] sfc: add cxl support                                     │ N/A        │ N/A     │ alucerop, kobak           │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d3059e8eab00 │ d537d953c478 cxl/pci: Remove redundant cxl_pci_find_port() call  │ match      │ match   │ preserved + kobak added   │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 8b61bd238bbb │ 58f28930c7fb cxl: Move pci generic code from cxl_pci to core/cxl │ match      │ match   │ preserved + kobak added   │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2bc7fd2fd558 │ 005869886d1d cxl: export internal structs for external Type2 dri │ match      │ match   │ preserved + kobak added   │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 645dd5c6be9f │ 9a775c07bb04 cxl: support Type2 when initializing cxl_dev_state  │ match      │ match   │ preserved + kobak added   │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘

Lint: all checks passed.

@nirmoy
Copy link
Copy Markdown
Collaborator

nirmoy commented May 19, 2026

Boro review

Latest watcher review: open review

Head: 9bf30cf9cb99

This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review.

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 19, 2026

@kobak2026 Two comments...

0d3aece NVIDIA: VR: SAUCE: cxl: Add CXL DVSEC reset sequence and flow orchestration

cxl_reset_prepare_memdev() only checks !endpoint before walking &endpoint->dev, but cxlmd->endpoint is initialized to ERR_PTR(-ENXIO) until endpoint attach succeeds. A memdev can exist with endpoint still an error pointer, so writing 1 to cxl_reset can dereference an encoded error pointer. Please match cxl_reset_flush_cpu_caches() and use if (!endpoint || IS_ERR(endpoint)) return 0;.


fb0ce0b NVIDIA: SAUCE: Revert "NVIDIA: VR: SAUCE: cxl: add support for cxl reset\”

Is there a reason why the original commit’s SHA is absent from the commit message?

Nit: Why are there backslashes in the title?

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 19, 2026

Same comment as Matt for fb0ce0b

And extra comment
drivers/cxl/core/pci.c:1126-1137 / drivers/cxl/core/pci.c:1154-1155
If krealloc() fails while collecting sibling CXL functions, the callback returns 1, but pci_walk_bus() returns void and discards that failure. The reset then proceeds at drivers/cxl/core/pci.c:1336-1339 with only a partial sibling list quiesced. For a reset that affects all CXL cache/mem functions, that can leave an active sibling running
through the reset.

Comment thread drivers/cxl/core/pci.c Outdated
Comment thread drivers/cxl/core/region.c Outdated
Comment thread drivers/cxl/core/pci.c Outdated
@kobak2026 kobak2026 force-pushed the bug-DGX-16137/cxl-backport-26.04-bos-nvpr branch 3 times, most recently from 1e17142 to 0d087a2 Compare May 20, 2026 03:27
@kobak2026
Copy link
Copy Markdown
Collaborator Author

kobak2026 commented May 20, 2026

@nirmoy
Pushback itmes

  • drivers/cxl/mem.c: to_cxl_memdev_state() already checks cxlds->type != CXL_DEVTYPE_CLASSMEM and returns NULL before container_of(). The poison debugfs/sysfs guards are effective for DEVMEM devices.
  • drivers/net/ethernet/sfc/efx_cxl.c: devm_cxl_dev_state_create() wraps _devm_cxl_dev_state_create(), which uses devm_kzalloc() and returns NULL, not ERR_PTR; if (!cxl) is correct.
  • drivers/net/ethernet/sfc/efx_cxl.c: the same allocation is devm-managed, so early returns after allocation do not leak cxl.
  • drivers/net/ethernet/sfc/efx_cxl.c: I did not find a runtime efx_ef10_dimension_resources() path racing CXL region removal; current unmap clears queue piobuf under TX locks. If there is a concrete reset/resource-redimension caller, point me at it and we can lock that path.
  • drivers/cxl/core/region.c: devm_cxl_add_region() is only called with CXL_DECODER_HOSTONLYMEM in the current tree, so the non-HOSTONLYMEM no-attach leak is not reachable. The real attach failure cleanup path was fixed.

@kobak2026
Copy link
Copy Markdown
Collaborator Author

@jamieNguyenNVIDIA @clsotog @nvmochs
Updated the branch for the latest review feedback.

Changes:

  • Propagated sibling collection allocation failures so reset aborts instead of proceeding with a partial sibling list.
  • Added the missing ERR_PTR-safe endpoint handling and kept the endpoint lock held across reset preparation.
  • Fixed the NULL callback path in CXL memory offlining.
  • Adjusted the region attach cleanup path to avoid taking the region write lock while the read lock is held.
  • Updated the revert commit message with the original reverted SHA and cleaned up the subject.

Validation:

  • Full remote kernel build passed after these fixes.
  • kernelrelease: 7.0.0-dgx16137-cxlreset
  • No install/reboot/reset rerun was done for this update.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

@kobak2026: The code changes are looking good now.

The only remaining issues I can spot have to do with these 8 commit messages:

8 commits missing provenance trailers
                                                                                                                        
  777eae3dfe6e  Dan Williams         dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved
  abe65b93d89d  Dan Williams         dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
  cacbb42691fc  Smita Koralahalli    cxl/region: Skip decoder reset on detach for autodiscovered regions
  300526c8b348  Dan Williams         dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding                     
  6b90d4fa5cdd  Smita Koralahalli    dax: Track all dax_region allocations under a global resource tree                 
  6001a57f1ad5  Smita Koralahalli    dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
  2dcd553bf5df  Smita Koralahalli    dax/hmem, cxl: Defer and resolve ownership of Soft Reserved
  827f5acfea4f  Smita Koralahalli    dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree

These need a (backported/cherry-picked from ) trailer added.

@kobak2026 kobak2026 force-pushed the bug-DGX-16137/cxl-backport-26.04-bos-nvpr branch from 0d087a2 to b9745d6 Compare May 20, 2026 05:49
@kobak2026
Copy link
Copy Markdown
Collaborator Author

@jamieNguyenNVIDIA thanks. fixed.

alucerop and others added 9 commits May 20, 2026 14:21
In preparation for type2 drivers add function and macro for
differentiating CXL memory expanders (type 3) from CXL device
accelerators (type 2) helping drivers built from public headers
to embed struct cxl_dev_state inside a private struct.

Update type3 driver for using this same initialization.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260306164741.3796372-2-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit 9a775c0)
Signed-off-by: Koba Ko <kobak@nvidia.com>
In preparation for type2 support, move structs and functions a type2
driver will need to access to into a new shared header file.

Differentiate between public and private data to be preserved by type2
drivers.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Link: https://patch.msgid.link/20260306164741.3796372-3-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit 0058698)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Inside cxl/core/pci.c there are helpers for CXL PCIe initialization
meanwhile cxl/pci_drv.c implements the functionality for a Type3 device
initialization.

In preparation for type2 support, move helper functions from cxl/pci.c to
cxl/core/pci.c in order to be exported and used by type2 drivers.

[ dj: Clarified subject. ]

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://patch.msgid.link/20260306164741.3796372-4-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit 58f2893)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Remove the redundant port lookup from cxl_rcrb_get_comp_regs() and use the
dport parameter directly. The caller has already validated the port is
non-NULL before invoking this function, and dport is given as a param.
This is simpler than getting dport in the callee and return the pointer
to the caller what would require more changes.

Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Link: https://patch.msgid.link/20260306164741.3796372-5-alejandro.lucero-palau@amd.com
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from commit d537d95)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Add CXL initialization based on new CXL API for accel drivers and make
it dependent on kernel CXL configuration.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-2-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Export cxl core functions for a Type2 driver being able to discover and
map the device registers.

Use it in sfc driver cxl initialization.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
(backported from https://lore.kernel.org/r/20260423180528.17166-3-alejandro.lucero-palau@amd.com)
[kobak: Kept cxl_pci_setup_regs() in the core/pci provider added by the full Type2 prerequisite series and dropped the duplicate provider hunk from drivers/cxl/pci.c.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing
memdev state params which end up being used for DPA initialization.

Allow a Type2 driver to initialize DPA simply by giving the size of its
volatile hardware partition.

Move related functions to memdev.

Add sfc driver as the client.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-4-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when
creating a memdev leading to problems when obtaining cxl_memdev_state
references from a CXL_DEVTYPE_DEVMEM type.

Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM
support.

Make devm_cxl_add_memdev accessible from an accel driver.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Alison Schofield <alison.schofield@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-5-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
Use cxl API for creating a cxl memory device using the type2
cxl_dev_state struct.

Signed-off-by: Alejandro Lucero <alucerop@amd.com>
Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com>
Reviewed-by: Fan Ni <fan.ni@samsung.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
(cherry picked from https://lore.kernel.org/r/20260423180528.17166-6-alejandro.lucero-palau@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

jamieNguyenNVIDIA commented May 20, 2026

Acked-by: Jamie Nguyen <jamien@nvidia.com>

@clsotog clsotog self-requested a review May 20, 2026 15:08
Copy link
Copy Markdown
Collaborator

@clsotog clsotog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acked-by: Carol L Soto <csoto@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 20, 2026

@kobak2026 Thanks for addressing my feedback from previous review.

I re-reviewed the latest changes with codex and it is flagging the locking changes that were made in response to comments from Carol and Jamie's feedback that can now trigger a deadlock.

Here are the details:

There are two separate changes in drivers/cxl/core/pci.c. Here is the expanded before/after.

Before

Old cxl_do_reset() in 1877f75 looked like this:

  memdev = bus_find_device(...);
  if (memdev) {
          cxlmd = to_cxl_memdev(memdev);
          guard(device)(&cxlmd->dev);
  } /* cxlmd->dev lock is released here */

  mutex_lock(&cxl_reset_mutex);
  pci_dev_lock(pdev);

  if (cxlmd) {
          rc = cxl_reset_prepare_memdev(cxlmd);
          if (rc)
                  goto out_unlock;

          cxl_reset_flush_cpu_caches(cxlmd);
  }

  pci_dev_save_and_disable(pdev);
  cxl_pci_functions_reset_prepare(&ctx);

  rc = cxl_dev_reset(pdev, dvsec);

  cxl_pci_functions_reset_done(&ctx);
  pci_dev_restore(pdev);

Two key points:

  • The guard(device)(&cxlmd->dev) was scoped only to the if (memdev) { ... } block, so it was released before reset began.
  • Memory offlining / cache flush happened before pci_dev_save_and_disable(pdev).

After

New code in b9745d6 split this into cxl_do_reset() and __cxl_do_reset():

  static int cxl_do_reset(struct pci_dev *pdev)
  {
          ...
          struct cxl_memdev *cxlmd = to_cxl_memdev(memdev);

          guard(device)(&cxlmd->dev);
          return __cxl_do_reset(pdev, cxlmd, dvsec);
  }

Because this guard is in function scope, cxlmd->dev stays locked until __cxl_do_reset() returns.

  mutex_lock(&cxl_reset_mutex);
  pci_dev_lock(pdev);

  pci_dev_save_and_disable(pdev);
  rc = cxl_pci_functions_reset_prepare(&ctx);
  if (rc)
          goto out_restore;

  if (cxlmd) {
          rc = cxl_reset_prepare_memdev(cxlmd);
          if (rc)
                  goto out_reset_done;

          cxl_reset_flush_cpu_caches(cxlmd);
  }

  rc = cxl_dev_reset(pdev, dvsec);

  out_reset_done:
          cxl_pci_functions_reset_done(&ctx);

  out_restore:
          pci_dev_restore(pdev);

Issue 1: self-deadlock

The new outer guard(device)(&cxlmd->dev) is still held when pci_dev_restore(pdev) runs.

pci_dev_restore() calls the PCI driver’s reset_done() callback:

if (err_handler && err_handler->reset_done)
err_handler->reset_done(dev);

For the CXL PCI driver, that is cxl_reset_done(), which does:

guard(device)(&cxlmd->dev);

So the same thread already holds cxlmd->dev from cxl_do_reset(), then tries to lock it again inside cxl_reset_done(). That deadlocks.

Issue 2: reset ordering

The old flow was:

offline memory
flush CPU caches
disable PCI function
disable siblings
reset
restore

The new flow is:

disable PCI function
disable siblings
offline memory
flush CPU caches
reset
restore

The problem is that pci_dev_save_and_disable(pdev) clears PCI Command, including memory decode and bus master. That now happens before cxl_reset_prepare_memdev() tries to offline CXL-backed memory. If that memory is still online, this is backwards: the CXL memory path should be quiesced before disabling/resetting the device that backs it.

So the issue resides in the new refactor around cxl_do_reset() / __cxl_do_reset() in drivers/cxl/core/pci.c: Koba fixed the endpoint race by extending the memdev lock, but extended it too far, and also moved pci_dev_save_and_disable() ahead of memory preparation.

Instead, the memdev lock should be kept only around the endpoint-dependent preparation, not around the entire reset/restore flow.

A better shape would be:

  mutex_lock(&cxl_reset_mutex);
  pci_dev_lock(pdev);

  if (cxlmd) {
          guard(device)(&cxlmd->dev);

          rc = cxl_reset_prepare_memdev(cxlmd);
          if (rc)
                  goto out_unlock;

          cxl_reset_flush_cpu_caches(cxlmd);
  } /* cxlmd->dev lock releases here */

  pci_dev_save_and_disable(pdev);

  rc = cxl_pci_functions_reset_prepare(&ctx);
  if (rc)
          goto out_restore;

  rc = cxl_dev_reset(pdev, dvsec);

  cxl_pci_functions_reset_done(&ctx);

  out_restore:
          pci_dev_restore(pdev);

  out_unlock:
          pci_dev_unlock(pdev);
          mutex_unlock(&cxl_reset_mutex);

But one more detail matters: if sibling prepare succeeds, cxl_pci_functions_reset_done() must run. So the final structure should track whether siblings were prepared:

  bool siblings_prepared = false;

  mutex_lock(&cxl_reset_mutex);
  pci_dev_lock(pdev);

  if (cxlmd) {
          guard(device)(&cxlmd->dev);

          rc = cxl_reset_prepare_memdev(cxlmd);
          if (rc)
                  goto out_unlock;

          cxl_reset_flush_cpu_caches(cxlmd);
  }

  pci_dev_save_and_disable(pdev);

  rc = cxl_pci_functions_reset_prepare(&ctx);
  if (rc)
          goto out_restore;

  siblings_prepared = true;

  rc = cxl_dev_reset(pdev, dvsec);

  if (siblings_prepared)
          cxl_pci_functions_reset_done(&ctx);

  out_restore:
          pci_dev_restore(pdev);

  out_unlock:
          pci_dev_unlock(pdev);
          mutex_unlock(&cxl_reset_mutex);

Conceptually:

  • Keep cxlmd->dev locked while reading/walking cxlmd->endpoint.
  • Release cxlmd->dev before pci_dev_restore(), because reset_done() may lock it.
  • Keep memory offline/cache flush before pci_dev_save_and_disable().
  • Still propagate sibling collection failure and abort before reset.
  • Restore the target PCI function on any path after pci_dev_save_and_disable().

So the fix is not “don’t hold the memdev lock”; it is “hold it only across cxl_reset_prepare_memdev() and cxl_reset_flush_cpu_caches(), then release it before PCI restore callbacks run.”

@kobak2026 kobak2026 force-pushed the bug-DGX-16137/cxl-backport-26.04-bos-nvpr branch from b9745d6 to 94ac72f Compare May 21, 2026 02:29
@kobak2026
Copy link
Copy Markdown
Collaborator Author

@nvmochs thanks
Fixed in the latest push. I folded the change into the CXL reset flow commit.
Summary:

  • Scoped the cxlmd->dev lock only around endpoint-dependent memory preparation
    and CPU cache flush.
  • Released that lock before pci_dev_restore() can invoke the CXL
    reset_done() callback.
  • Restored the reset ordering so CXL memory offlining/cache flush happens before
    pci_dev_save_and_disable().
  • Kept sibling reset cleanup conditional on successful sibling prepare.
    Validation:
  • Pre-push arm64 nvidia-bos whole-kernel build passed on the pushed head.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

My ACK still stands:

Acked-by: Jamie Nguyen <jamien@nvidia.com>

@clsotog
Copy link
Copy Markdown
Collaborator

clsotog commented May 21, 2026

Acked-by: Carol L Soto <csoto@nvidia.com>

skoralah and others added 5 commits May 21, 2026 23:30
…to the iomem tree

Reworked from a patch by Alison Schofield <alison.schofield@intel.com>

Reintroduce Soft Reserved range into the iomem_resource tree for HMEM
to consume.

This restores visibility in /proc/iomem for ranges actively in use, while
avoiding the early-boot conflicts that occurred when Soft Reserved was
published into iomem before CXL window and region discovery.

Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/
Co-developed-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
Co-developed-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Zhijian Li <lizhijian@fujitsu.com>
Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com>
Link: https://lore.kernel.org/r/20260210064501.157591-10-Smita.KoralahalliChannabasappa@amd.com
(cherry picked from https://lore.kernel.org/r/20260210064501.157591-10-Smita.KoralahalliChannabasappa@amd.com)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…smaller granularities for lower levels

The CXL specification supports multi-level interleaving "as long as
all the levels use different, but consecutive, HPA bits to select the
target and no Interleave Set has more than 8 devices" (from 3.2).

Currently the kernel expects that a decoder's "interleave granularity
is a multiple of @parent_port granularity". That is, the granularity
of a lower level is bigger than those of the parent and uses the outer
HPA bits as selector. It works e.g. for the following 8-way config:

 * cross-link (cross-hostbridge config in CFMWS):
   * 4-way
   * 256 granularity
   * Selector: HPA[8:9]
 * sub-link (CXL Host bridge config of the HDM):
   * 2-way
   * 1024 granularity
   * Selector: HPA[10]

Now, if the outer HPA bits are used for the cross-hostbridge, an 8-way
config could look like this:

 * cross-link (cross-hostbridge config in CFMWS):
   * 4-way
   * 512 granularity
   * Selector: HPA[9:10]
 * sub-link (CXL Host bridge config of the HDM):
   * 2-way
   * 256 granularity
   * Selector: HPA[8]

The enumeration of decoders for this configuration fails then with
following error:

 cxl region0: pci0000:00:port1 cxl_port_setup_targets expected iw: 2 ig: 1024 [mem 0x10000000000-0x1ffffffffff flags 0x200]
 cxl region0: pci0000:00:port1 cxl_port_setup_targets got iw: 2 ig: 256 state: enabled 0x10000000000:0x1ffffffffff
 cxl_port endpoint12: failed to attach decoder12.0 to region0: -6

Note that this happens only if firmware is setting up the decoders
(CXL_REGION_F_AUTO). For userspace region assembly the granularities
are chosen to increase from root down to the lower levels. That is,
outer HPA bits are always used for lower interleaving levels.

Rework the implementation to also support multi-level interleaving
with smaller granularities for lower levels. Determine the interleave
set of autodetected decoders. Check that it is a subset of the root
interleave.

The HPA selector bits are extracted for all decoders of the set and
checked that there is no overlap and bits are consecutive. All
decoders can be programmed now to use any bit range within the
region's target selector.

Signed-off-by: Robert Richter <rrichter@amd.com>
(backported from https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/)
[kobak: resolved conflicts with cxlr->cxlrd and spa_maps_hpa()]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…and RAS support

BugLink: https://bugs.launchpad.net/bugs/2143032

Source: NVIDIA@f80636d

Add Ubuntu kernel config annotations for CXL-related configs introduced
or changed by the CXL Type-2, RAS, and autodiscovered-region support
backports.

CONFIG_CXL_BUS, CONFIG_CXL_PCI, CONFIG_CXL_MEM, and CONFIG_CXL_PORT are
built in for Type-2 device support. CONFIG_CXL_RAS and the EINJ symbols
cover CXL RAS/error-injection support. CONFIG_SFC_CXL remains disabled
for NVIDIA platforms.

Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit f80636d nv-kernels/24.04_linux-nvidia-6.17-next)
[kobak: Backported annotation overrides from debian.nvidia-6.17 to debian.nvidia-bos; PCIEAER_CXL is overridden as removed instead of editing debian.master.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…memory access

BugLink: https://bugs.launchpad.net/bugs/2143032

Source: NVIDIA@c5c11cf

Override debian.master policy for DEV_DAX, DEV_DAX_CXL, and
DEV_DAX_KMEM so CXL memory regions are available as raw DAX devices and
as hotplugged System-RAM without relying on module load ordering.

Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit c5c11cf nv-kernels/24.04_linux-nvidia-6.17-next)
[kobak: Backported annotation overrides from debian.nvidia-6.17 to debian.nvidia-bos.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…/restore

BugLink: https://bugs.launchpad.net/bugs/2143032

Source: NVIDIA@a5544cb

Add Ubuntu kernel config annotation for CONFIG_PCI_CXL introduced by
the CXL DVSEC and HDM state save/restore series.

CONFIG_PCI_CXL is a hidden bool auto-enabled when CXL_BUS=y. It gates
compilation of drivers/pci/cxl.o, which saves and restores CXL DVSEC
control/range registers and HDM decoder state across PCI resets and
link transitions.

Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit a5544cb nv-kernels/24.04_linux-nvidia-6.17-next)
[kobak: Backported annotation override from debian.nvidia-6.17 to debian.nvidia-bos.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
@kobak2026 kobak2026 force-pushed the bug-DGX-16137/cxl-backport-26.04-bos-nvpr branch from 94ac72f to 2462338 Compare May 21, 2026 17:30
@kobak2026
Copy link
Copy Markdown
Collaborator Author

@nvmochs I rebased to PR1

Verified your finding. cxlmd->endpoint can still be
ERR_PTR(-ENXIO), and cxl_reset_done() was only checking for
non-NULL before calling cxl_endpoint_decoder_reset_detected().
That made the restore/reset_done path still vulnerable even
after the reset prepare/cache flush paths were fixed.

Fixed in the reset orchestration commit by:

• reading cxlmd->endpoint under guard(device)(&cxlmd->dev)
• returning early on !endpoint || IS_ERR(endpoint)
• only calling cxl_endpoint_decoder_reset_detected() for a
valid endpoint

I also amended the related commit description with:

[koba: Guard reset_done() against NULL/ERR_PTR memdev
endpoints before decoder reset detection.]

Validation:

• focused static guard/order check: PASS
• git diff --check: PASS
• scripts/checkpatch.pl --strict --terse: PASS
• remote arm64 nvidia-bos full build: PASS

Pushed in 2462338.

@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

Re-adding my ACK. Matt's comment looks to be addressed in v4.

Acked-by: Jamie Nguyen <jamien@nvidia.com>

@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 21, 2026

@kobak2026

Looks like these were mistakenly left in during the rebase?

7d436d86bf5a NVIDIA: SAUCE: dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
bd27785772fa NVIDIA: SAUCE: dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination

kobak2026 and others added 13 commits May 22, 2026 02:24
…set"

This reverts commit 4f089d9.

Drop the older monolithic CXL reset method before applying the refreshed
save/restore and reset series.

Signed-off-by: Koba Ko <kobak@nvidia.com>
…er definitions

BugLink: https://bugs.launchpad.net/bugs/2143032

PCI: Add CXL DVSEC control, lock, and range register definitions

Add register offset and field definitions for CXL DVSEC registers needed
by CXL state save/restore across resets:

  - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers
  - CONFIG_LOCK bit in the LOCK register
  - RWL (read-write-when-locked) field masks for CTRL and range base
    registers.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 07ad5f1 nv-kernels/24.04_linux-nvidia-6.17-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
… to include/cxl/cxl.h

BugLink: https://bugs.launchpad.net/bugs/2143032

Move CXL HDM decoder register defines, register map structs
(cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map,
cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(),
enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs()
declarations from internal CXL headers to include/cxl/pci.h.

This makes them accessible to code outside the CXL subsystem, in
particular the PCI core CXL state save/restore support added in a
subsequent patch.

No functional change.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit b5e166c nv-kernels/24.04_linux-nvidia-6.17-next)
[koba: Also move CXL_CM_CAP_CAP_ID_RAS, CXL_CM_CAP_CAP_ID_HDM, and CXL_CM_CAP_CAP_HDM_VERSION into public include/cxl/cxl.h to keep the public CXL header layout consistent.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…state

BugLink: https://bugs.launchpad.net/bugs/2143032

Add pci_add_virtual_ext_cap_save_buffer() to allocate save buffers
using virtual cap IDs (above PCI_EXT_CAP_ID_MAX) that don't require
a real capability in config space.

The existing pci_add_ext_cap_save_buffer() cannot be used for
CXL DVSEC state because it calls pci_find_saved_ext_cap()
which searches for a matching capability in PCI config space.
The CXL state saved here is a synthetic snapshot (DVSEC+HDM)
and should not be tied to a real extended-cap instance. A
virtual extended-cap save buffer API (cap IDs above
PCI_EXT_CAP_ID_MAX) allows PCI to track this state without
a backing config space capability.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit b3a768f nv-kernels/24.04_linux-nvidia-6.17-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2143032

Save and restore CXL DVSEC control registers (CTRL, CTRL2), range
base registers, and lock state across PCI resets.

When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields
become read-only and hardware may have updated them. Blindly
restoring saved values would be silently ignored or conflict
with hardware state. Instead, a read-merge-write approach is
used: current hardware values are read for the RWL
(read-write-when-locked) fields and merged with saved state,
so only writable bits are restored while locked bits retain
their hardware values.

Hooked into pci_save_state()/pci_restore_state() so all PCI reset
paths automatically preserve CXL DVSEC configuration.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
[jan: Resolve minor conflict in drivers/pci/Makefile due to code line shifts ]
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 0bb0dc0 nv-kernels/24.04_linux-nvidia-6.17-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2143032

Save and restore CXL HDM decoder registers (global control,
per-decoder base/size/target-list, and commit state) across PCI
resets. On restore, decoders that were committed are reprogrammed
and recommitted with a 10ms timeout. Locked decoders that are
already committed are skipped, since their state is protected by
hardware and reprogramming them would fail.

The Register Locator DVSEC is parsed directly via PCI config space
reads rather than calling cxl_find_regblock()/cxl_setup_regs(),
since this code lives in the PCI core and must not depend on CXL
module symbols.

MSE is temporarily enabled during save/restore to allow MMIO
access to the HDM decoder register block.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/)
[jan: Include <cxl/cxl.h> in drivers/pci/cxl.c due to conflict resolution in "4acbc27592b8 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h"]
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 578f916 nv-kernels/24.04_linux-nvidia-6.17-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…efinitions

BugLink: https://bugs.launchpad.net/bugs/2143032

Add CXL DVSEC register definitions needed for CXL device reset per
CXL r3.2 section 8.1.3.1:
- Capability bits: RST_CAPABLE, CACHE_CAPABLE, CACHE_WBI_CAPABLE,
  RST_TIMEOUT, RST_MEM_CLR_CAPABLE
- Control2 register: DISABLE_CACHING, INIT_CACHE_WBI, INIT_CXL_RST,
  RST_MEM_CLR_EN
- Status2 register: CACHE_INV, RST_DONE, RST_ERR
- Non-CXL Function Map DVSEC register offset

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
[jan: Resolve conflicts where PCI_DVSEC_CXL_CACHE_CAPABLE is already added by "72bd823fb4f1 NVIDIA: VR: SAUCE: PCI: Allow ATS to be always on for CXL.cache capable devices"]
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit 8f77fbe nv-kernels/24.04_linux-nvidia-6.17-next)
[koba: Preserve PCI_DVSEC_CXL_CACHE_CAPABLE because drivers/pci/ats.c still uses it for CXL.cache ATS dependency from commit 3765488 (6.17 source commit 72bd823).]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…_restore()

BugLink: https://bugs.launchpad.net/bugs/2143032

Export pci_dev_save_and_disable() and pci_dev_restore() so that
subsystems performing non-standard reset sequences (e.g. CXL)
can reuse the PCI core standard pre/post reset lifecycle:
driver reset_prepare/reset_done callbacks, PCI config space
save/restore, and device disable/re-enable.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit a14a427 nv-kernels/24.04_linux-nvidia-6.17-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2143032

Add infrastructure for quiescing the CXL data path before reset:

- Memory offlining: check if CXL-backed memory is online and offline
  it via offline_and_remove_memory() before reset, per CXL
  spec requirement to quiesce all CXL.mem transactions before issuing
  CXL Reset.
- CPU cache flush: invalidate cache lines before reset
  as a safety measure after memory offline.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit 98bfbf9 nv-kernels/24.04_linux-nvidia-6.17-next)
[koba: Use a real System RAM walker callback so resource walks never invoke a NULL function pointer.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…XL reset

BugLink: https://bugs.launchpad.net/bugs/2143032

Add sibling PCI function save/disable/restore coordination for CXL
reset. Before reset, all CXL.cachemem sibling functions are locked,
saved, and disabled; after reset they are restored. The Non-CXL Function
Map DVSEC and per-function DVSEC capability register are consulted to
skip non-CXL and CXL.io-only functions. A global mutex serializes
concurrent resets to prevent deadlocks between sibling functions.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit 9a08c02 nv-kernels/24.04_linux-nvidia-6.17-next)
[koba: Propagate sibling collection allocation failures after pci_walk_bus() so reset aborts instead of proceeding with a partial sibling list.]
Signed-off-by: Koba Ko <kobak@nvidia.com>
…ration

BugLink: https://bugs.launchpad.net/bugs/2143032

cxl_dev_reset() implements the hardware reset sequence:
optionally enable memory clear, initiate reset via
CTRL2, wait for completion, and re-enable caching.

cxl_do_reset() orchestrates the full reset flow:
  1. CXL pre-reset: mem offlining and cache flush (when memdev present)
  2. PCI save/disable: pci_dev_save_and_disable() automatically saves
     CXL DVSEC and HDM decoder state via PCI core hooks
  3. Sibling coordination: save/disable CXL.cachemem sibling functions
  4. Execute CXL DVSEC reset
  5. Sibling restore: always runs to re-enable sibling functions
  6. PCI restore: pci_dev_restore() automatically restores CXL state

The CXL-specific DVSEC and HDM save/restore is handled
by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c).

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(backported from commit 92fb807 nv-kernels/24.04_linux-nvidia-6.17-next)
[koba: Treat error-valued cxlmd->endpoint as no endpoint to avoid dereferencing ERR_PTR before endpoint attach.]
[koba: Check sibling collection failure before starting the CXL reset so allocation failure restores the target and aborts.]
[koba: Limit the memdev device lock to endpoint-dependent memory preparation and cache flush, restore memory quiesce before PCI disable, and track sibling reset preparation so reset_done cleanup only runs after successful sibling prepare.]
[koba: Guard reset_done() against NULL/ERR_PTR memdev endpoints before decoder reset detection.]

Signed-off-by: Koba Ko <kobak@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2143032

Add a "cxl_reset" sysfs attribute to PCI devices that support CXL
Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on
devices with both CXL.cache and CXL.mem capabilities and the CXL
Reset Capable bit set in the DVSEC.

Writing "1" to the attribute triggers the full CXL reset flow via
cxl_do_reset(). The interface is decoupled from memdev creation:
when a CXL memdev exists, memory offlining and cache flush are
performed; otherwise reset proceeds without the memory management.

The sysfs attribute is managed entirely by the CXL module using
sysfs_create_group() / sysfs_remove_group() rather than the PCI
core's static attribute groups. This avoids cross-module symbol
dependencies between the PCI core (always built-in) and CXL_BUS
(potentially modular).

At module init, existing PCI devices are scanned and a PCI bus
notifier handles hot-plug/unplug. kernfs_drain() makes sure that
any in-flight store() completes before sysfs_remove_group() returns,
preventing use-after-free during module unload.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 6e96f7e nv-kernels/24.04_linux-nvidia-6.17-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
…tribute

BugLink: https://bugs.launchpad.net/bugs/2143032

Document the cxl_reset sysfs attribute added to PCI devices that
support CXL Reset.

Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com>
(cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/)
Signed-off-by: Jiandi An <jan@nvidia.com>
Acked-by: Jamie Nguyen <jamien@nvidia.com>
Acked-by: Nirmoy Das <nirmoyd@nvidia.com>
Acked-by: Carol L Soto <csoto@nvidia.com>
Acked-by: Matthew R. Ochs <mochs@nvidia.com>
Signed-off-by: Brad Figg <bfigg@nvidia.com>
(cherry picked from commit 33b53e1 nv-kernels/24.04_linux-nvidia-6.17-next)
Signed-off-by: Koba Ko <kobak@nvidia.com>
@jamieNguyenNVIDIA
Copy link
Copy Markdown
Collaborator

@kobak2026

Looks like these were mistakenly left in during the rebase?

7d436d86bf5a NVIDIA: SAUCE: dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
bd27785772fa NVIDIA: SAUCE: dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination

I believe that was on purpose. Both of these are included in pr#426 (the dependency series).

@kobak2026 kobak2026 force-pushed the bug-DGX-16137/cxl-backport-26.04-bos-nvpr branch from 2462338 to 9bf30cf Compare May 21, 2026 18:41
@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 21, 2026

@kobak2026
Looks like these were mistakenly left in during the rebase?

7d436d86bf5a NVIDIA: SAUCE: dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
bd27785772fa NVIDIA: SAUCE: dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination

I believe that was on purpose. Both of these are included in pr#426 (the dependency series).

Here is what I see...

2462338e3d1e (kobanvk/bug-DGX-16137/cxl-backport-26.04-bos-nvpr) NVIDIA: VR: SAUCE: Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute
7c745e181a30 NVIDIA: VR: SAUCE: cxl: Add cxl_reset sysfs interface for PCI devices
31a7fb4259aa NVIDIA: VR: SAUCE: cxl: Add CXL DVSEC reset sequence and flow orchestration
2ce02b595597 NVIDIA: VR: SAUCE: cxl: Add multi-function sibling coordination for CXL reset
d6ae2502e342 NVIDIA: VR: SAUCE: cxl: Add memory offlining and cache flush helpers
72d3d93f4c9f NVIDIA: VR: SAUCE: PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
d62f7889f5d6 NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC reset and capability register definitions
154db0c73a8e NVIDIA: VR: SAUCE: PCI: Add HDM decoder state save/restore
ff45db1361fb NVIDIA: VR: SAUCE: PCI: Add cxl DVSEC state save/restore across resets
5540bcc62acb NVIDIA: VR: SAUCE: PCI: Add virtual extended cap save buffer for CXL state
b482b6992a86 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h
b89497a0d17a NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC control, lock, and range register definitions
1fef6831b64d NVIDIA: SAUCE: Revert "NVIDIA: VR: SAUCE: cxl: add support for cxl reset"


7d436d86bf5a NVIDIA: SAUCE: dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
bd27785772fa NVIDIA: SAUCE: dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination

^^ These two commits are already present in the dependency series below (i.e. they exist twice on this branch)


5268ad904265 (kobanvk/bug-DGX-16136/cxl-backport-26.04-bos) NVIDIA: VR: SAUCE: [Config] Add PCI_CXL annotation for CXL state save/restore
e3aadf637883 NVIDIA: VR: SAUCE: [Config] Enable CXL DAX and KMEM built-in for CXL memory access
a2eba9dce2b6 NVIDIA: VR: SAUCE: [Config] CXL config annotations for Type-2 device and RAS support
7a6aebb18177 NVIDIA: VR: SAUCE: cxl/region: Support multi-level interleaving with smaller granularities for lower levels
8cb4feab45af NVIDIA: VR: SAUCE: dax/hmem: Reintroduce Soft Reserved ranges back into the iomem tree
c0db8c8c8fda NVIDIA: VR: SAUCE: dax/hmem, cxl: Defer and resolve ownership of Soft Reserved memory ranges
2e433461671f NVIDIA: VR: SAUCE: dax: Add deferred-work helpers for dax_hmem and dax_cxl coordination
3a6c6f56dc2c NVIDIA: VR: SAUCE: cxl/region: Add helper to check Soft Reserved containment by CXL regions
3a52fc67323f NVIDIA: VR: SAUCE: dax: Track all dax_region allocations under a global resource tree
79356d9fd78d NVIDIA: VR: SAUCE: dax/cxl, hmem: Initialize hmem early and defer dax_cxl binding
d8ce89e1116e NVIDIA: VR: SAUCE: cxl/region: Skip decoder reset on detach for autodiscovered regions
77ac7f9272ba NVIDIA: VR: SAUCE: dax/hmem: Gate Soft Reserved deferral on DEV_DAX_CXL
1d651dbee5bc NVIDIA: VR: SAUCE: dax/hmem: Request cxl_acpi and cxl_pci before walking Soft Reserved ranges
0fdb2010fcfc NVIDIA: VR: SAUCE: sfc: support pio mapping based on cxl
d6acd1ddf9f6 NVIDIA: VR: SAUCE: cxl: Avoid dax creation for accelerators
388dc8c0c90a NVIDIA: VR: SAUCE: cxl: attach region to an accelerator/type2 memdev
9c9123639c35 NVIDIA: VR: SAUCE: sfc: create type2 cxl memdev
79654508fdaf NVIDIA: VR: SAUCE: cxl: Prepare memdev creation for type2
cb8c41246ed1 NVIDIA: VR: SAUCE: cxl/sfc: Initialize dpa without a mailbox
ba9972ccf965 NVIDIA: VR: SAUCE: cxl/sfc: Map cxl regs
513a2a51e219 NVIDIA: VR: SAUCE: sfc: add cxl support
d3059e8eab00 cxl/pci: Remove redundant cxl_pci_find_port() call
8b61bd238bbb cxl: Move pci generic code from cxl_pci to core/cxl_pci
2bc7fd2fd558 cxl: export internal structs for external Type2 drivers
645dd5c6be9f cxl: support Type2 when initializing cxl_dev_state

It's not that big of a deal, but we'll need to be clear to Brad which commits to pick from this PR (he can't just blindly apply all the commits).

No issues with the recent updates to the reset and save/restore patches. I verified that it resolves my last finding.

Acked-by: Matthew R. Ochs <mochs@nvidia.com>

@nvmochs nvmochs changed the title Backport CXL reset save/restore support [26.04_linux-nvidia-bos] Backport CXL reset save/restore support May 21, 2026
@nirmoy nirmoy added help wanted Extra attention is needed question Further information is requested labels May 21, 2026
@nvmochs
Copy link
Copy Markdown
Collaborator

nvmochs commented May 21, 2026

Merged, closing PR.

28b628e1e58c (nresolute/nvidia-bos-next) NVIDIA: VR: SAUCE: Documentation: ABI: Add CXL PCI cxl_reset sysfs attribute
c46d624f2a73 NVIDIA: VR: SAUCE: cxl: Add cxl_reset sysfs interface for PCI devices
0b863ed53738 NVIDIA: VR: SAUCE: cxl: Add CXL DVSEC reset sequence and flow orchestration
d1153b9e307c NVIDIA: VR: SAUCE: cxl: Add multi-function sibling coordination for CXL reset
14b5b2f40916 NVIDIA: VR: SAUCE: cxl: Add memory offlining and cache flush helpers
dec4b0067323 NVIDIA: VR: SAUCE: PCI: Export pci_dev_save_and_disable() and pci_dev_restore()
e52f4e02aa42 NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC reset and capability register definitions
177b561c35aa NVIDIA: VR: SAUCE: PCI: Add HDM decoder state save/restore
dfd1a8f1e61e NVIDIA: VR: SAUCE: PCI: Add cxl DVSEC state save/restore across resets
5be20ab050a3 NVIDIA: VR: SAUCE: PCI: Add virtual extended cap save buffer for CXL state
017c49292ab0 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h
743681b4b192 NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC control, lock, and range register definitions
eccf13bac1d5 NVIDIA: SAUCE: Revert "NVIDIA: VR: SAUCE: cxl: add support for cxl reset"

@nvmochs nvmochs closed this May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

help wanted Extra attention is needed question Further information is requested

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants