[26.04_linux-nvidia-bos] Backport CXL reset save/restore support#430
[26.04_linux-nvidia-bos] Backport CXL reset save/restore support#430kobak2026 wants to merge 38 commits into
Conversation
PR Validation ReportPatchscan ✅ No Missing FixesAll cherry-picked commits checked — no missing upstream fixes found. PR Lint ❌ Errors foundDetailsChecking 38 commits...
Cherry-pick digest:
E: 52975ec822b9 ("NVIDIA: VR: SAUCE: PCI: Add CXL DVSEC re"): backport trailer order: ORDER: move [Name: note] before the backporter Signed-off-by and after (backported from ...)
E: 82adedeffcd7 ("NVIDIA: VR: SAUCE: cxl: Move HDM decoder"): backport trailer order: ORDER: move [Name: note] before the backporter Signed-off-by and after (backported from ...)
┌──────────────┬──────────────────────────────────────────────────────────────────┬────────────┬─────────┬───────────────────────────┐
│ Local │ Referenced upstream / Patch subject │ Patch-ID │ Subject │ SoB chain │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 9bf30cf9cb99 │ [SAUCE] documentation: abi: add cxl pci cxl_reset sysfs attribut │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c6db6ce8e793 │ [SAUCE] cxl: add cxl_reset sysfs interface for pci devices │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 6e9e72b4b2a2 │ [SAUCE] cxl: add cxl dvsec reset sequence and flow orchestration │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3a245b9b22c3 │ [SAUCE] cxl: add multi-function sibling coordination for cxl res │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ baa527d0e4a3 │ [SAUCE] cxl: add memory offlining and cache flush helpers │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d1c9d1b51460 │ [SAUCE] pci: export pci_dev_save_and_disable() and pci_dev_resto │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 52975ec822b9 │ pci: add cxl dvsec reset and capability register definitions │ noted │ found │ ORDER: move [Name: note] │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 42ade8ec191c │ [SAUCE] pci: add hdm decoder state save/restore │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 969356c44c97 │ [SAUCE] pci: add cxl dvsec state save/restore across resets │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 9a09ee09f074 │ [SAUCE] pci: add virtual extended cap save buffer for cxl state │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 82adedeffcd7 │ cxl: move hdm decoder and register map definitions to include/cx │ no-match │ not fou │ ORDER: move [Name: note] │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 07705ee1b952 │ [SAUCE] pci: add cxl dvsec control, lock, and range register def │ N/A │ N/A │ smadhava, jan, bfigg, kob │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ e90b1ee9e1bf │ [SAUCE] revert "nvidia: vr: sauce: cxl: add support for cxl rese │ N/A │ N/A │ kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 5268ad904265 │ [SAUCE] [config] add pci_cxl annotation for cxl state save/resto │ N/A │ N/A │ jan, bfigg, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ e3aadf637883 │ [SAUCE] [config] enable cxl dax and kmem built-in for cxl memory │ N/A │ N/A │ jan, bfigg, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ a2eba9dce2b6 │ [SAUCE] [config] cxl config annotations for type-2 device and ra │ N/A │ N/A │ jan, bfigg, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 7a6aebb18177 │ cxl/region: support multi-level interleaving with smaller granul │ noted │ found │ ok, backporter: kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 8cb4feab45af │ [SAUCE] dax/hmem: reintroduce soft reserved ranges back into the │ N/A │ N/A │ schofiel, lizhijia, Koral │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ c0db8c8c8fda │ [SAUCE] dax/hmem, cxl: defer and resolve ownership of soft reser │ N/A │ N/A │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2e433461671f │ [SAUCE] dax: add deferred-work helpers for dax_hmem and dax_cxl │ N/A │ N/A │ Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3a6c6f56dc2c │ cxl/region: add helper to check soft reserved containment by cxl │ noted │ found │ ok, backporter: kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 3a52fc67323f │ [SAUCE] dax: track all dax_region allocations under a global res │ N/A │ N/A │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 79356d9fd78d │ [SAUCE] dax/cxl, hmem: initialize hmem early and defer dax_cxl b │ N/A │ N/A │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d8ce89e1116e │ [SAUCE] cxl/region: skip decoder reset on detach for autodiscove │ N/A │ N/A │ Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 77ac7f9272ba │ [SAUCE] dax/hmem: gate soft reserved deferral on dev_dax_cxl │ N/A │ N/A │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 1d651dbee5bc │ [SAUCE] dax/hmem: request cxl_acpi and cxl_pci before walking so │ N/A │ N/A │ williams, Koralaha, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 0fdb2010fcfc │ sfc: support pio mapping based on cxl │ noted │ found │ ok, backporter: kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d6acd1ddf9f6 │ [SAUCE] cxl: avoid dax creation for accelerators │ N/A │ N/A │ alucerop, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 388dc8c0c90a │ cxl: attach region to an accelerator/type2 memdev │ noted │ found │ ok, backporter: kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 9c9123639c35 │ [SAUCE] sfc: create type2 cxl memdev │ N/A │ N/A │ alucerop, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 79654508fdaf │ [SAUCE] cxl: prepare memdev creation for type2 │ N/A │ N/A │ alucerop, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ cb8c41246ed1 │ [SAUCE] cxl/sfc: initialize dpa without a mailbox │ N/A │ N/A │ alucerop, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ ba9972ccf965 │ cxl/sfc: map cxl regs │ noted │ found │ ok, backporter: kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 513a2a51e219 │ [SAUCE] sfc: add cxl support │ N/A │ N/A │ alucerop, kobak │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ d3059e8eab00 │ d537d953c478 cxl/pci: Remove redundant cxl_pci_find_port() call │ match │ match │ preserved + kobak added │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 8b61bd238bbb │ 58f28930c7fb cxl: Move pci generic code from cxl_pci to core/cxl │ match │ match │ preserved + kobak added │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 2bc7fd2fd558 │ 005869886d1d cxl: export internal structs for external Type2 dri │ match │ match │ preserved + kobak added │
├──────────────┼──────────────────────────────────────────────────────────────────┼────────────┼─────────┼───────────────────────────┤
│ 645dd5c6be9f │ 9a775c07bb04 cxl: support Type2 when initializing cxl_dev_state │ match │ match │ preserved + kobak added │
└──────────────┴──────────────────────────────────────────────────────────────────┴────────────┴─────────┴───────────────────────────┘
Lint: all checks passed.
|
Boro reviewLatest watcher review: open review Head: This comment is maintained by nv-pr-bot. It is updated when the GitHub watcher publishes a newer review. |
|
@kobak2026 Two comments... 0d3aece NVIDIA: VR: SAUCE: cxl: Add CXL DVSEC reset sequence and flow orchestration cxl_reset_prepare_memdev() only checks !endpoint before walking &endpoint->dev, but cxlmd->endpoint is initialized to ERR_PTR(-ENXIO) until endpoint attach succeeds. A memdev can exist with endpoint still an error pointer, so writing 1 to cxl_reset can dereference an encoded error pointer. Please match cxl_reset_flush_cpu_caches() and use if (!endpoint || IS_ERR(endpoint)) return 0;. fb0ce0b NVIDIA: SAUCE: Revert "NVIDIA: VR: SAUCE: cxl: add support for cxl reset\” Is there a reason why the original commit’s SHA is absent from the commit message? Nit: Why are there backslashes in the title? |
|
Same comment as Matt for fb0ce0b And extra comment |
1e17142 to
0d087a2
Compare
|
@nirmoy
|
|
@jamieNguyenNVIDIA @clsotog @nvmochs Changes:
Validation:
|
|
@kobak2026: The code changes are looking good now. The only remaining issues I can spot have to do with these 8 commit messages: These need a (backported/cherry-picked from ) trailer added. |
0d087a2 to
b9745d6
Compare
|
@jamieNguyenNVIDIA thanks. fixed. |
In preparation for type2 drivers add function and macro for differentiating CXL memory expanders (type 3) from CXL device accelerators (type 2) helping drivers built from public headers to embed struct cxl_dev_state inside a private struct. Update type3 driver for using this same initialization. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260306164741.3796372-2-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from commit 9a775c0) Signed-off-by: Koba Ko <kobak@nvidia.com>
In preparation for type2 support, move structs and functions a type2 driver will need to access to into a new shared header file. Differentiate between public and private data to be preserved by type2 drivers. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Tested-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20260306164741.3796372-3-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from commit 0058698) Signed-off-by: Koba Ko <kobak@nvidia.com>
Inside cxl/core/pci.c there are helpers for CXL PCIe initialization meanwhile cxl/pci_drv.c implements the functionality for a Type3 device initialization. In preparation for type2 support, move helper functions from cxl/pci.c to cxl/core/pci.c in order to be exported and used by type2 drivers. [ dj: Clarified subject. ] Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Gregory Price <gourry@gourry.net> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Gregory Price <gourry@gourry.net> Link: https://patch.msgid.link/20260306164741.3796372-4-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from commit 58f2893) Signed-off-by: Koba Ko <kobak@nvidia.com>
Remove the redundant port lookup from cxl_rcrb_get_comp_regs() and use the dport parameter directly. The caller has already validated the port is non-NULL before invoking this function, and dport is given as a param. This is simpler than getting dport in the callee and return the pointer to the caller what would require more changes. Signed-off-by: Gregory Price <gourry@gourry.net> Reviewed-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Davidlohr Bueso <dave@stgolabs.net> Link: https://patch.msgid.link/20260306164741.3796372-5-alejandro.lucero-palau@amd.com Signed-off-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from commit d537d95) Signed-off-by: Koba Ko <kobak@nvidia.com>
Add CXL initialization based on new CXL API for accel drivers and make it dependent on kernel CXL configuration. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-2-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
Export cxl core functions for a Type2 driver being able to discover and map the device registers. Use it in sfc driver cxl initialization. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> (backported from https://lore.kernel.org/r/20260423180528.17166-3-alejandro.lucero-palau@amd.com) [kobak: Kept cxl_pci_setup_regs() in the core/pci provider added by the full Type2 prerequisite series and dropped the duplicate provider hunk from drivers/cxl/pci.c.] Signed-off-by: Koba Ko <kobak@nvidia.com>
Type3 relies on mailbox CXL_MBOX_OP_IDENTIFY command for initializing memdev state params which end up being used for DPA initialization. Allow a Type2 driver to initialize DPA simply by giving the size of its volatile hardware partition. Move related functions to memdev. Add sfc driver as the client. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-4-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
Current cxl core is relying on a CXL_DEVTYPE_CLASSMEM type device when creating a memdev leading to problems when obtaining cxl_memdev_state references from a CXL_DEVTYPE_DEVMEM type. Modify check for obtaining cxl_memdev_state adding CXL_DEVTYPE_DEVMEM support. Make devm_cxl_add_memdev accessible from an accel driver. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Ben Cheatham <benjamin.cheatham@amd.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Alison Schofield <alison.schofield@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-5-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
Use cxl API for creating a cxl memory device using the type2 cxl_dev_state struct. Signed-off-by: Alejandro Lucero <alucerop@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Reviewed-by: Fan Ni <fan.ni@samsung.com> Acked-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> (cherry picked from https://lore.kernel.org/r/20260423180528.17166-6-alejandro.lucero-palau@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
|
|
clsotog
left a comment
There was a problem hiding this comment.
Acked-by: Carol L Soto <csoto@nvidia.com>
|
@kobak2026 Thanks for addressing my feedback from previous review. I re-reviewed the latest changes with codex and it is flagging the locking changes that were made in response to comments from Carol and Jamie's feedback that can now trigger a deadlock. Here are the details: There are two separate changes in drivers/cxl/core/pci.c. Here is the expanded before/after. Before Old cxl_do_reset() in 1877f75 looked like this: Two key points:
After New code in b9745d6 split this into cxl_do_reset() and __cxl_do_reset(): Because this guard is in function scope, cxlmd->dev stays locked until __cxl_do_reset() returns. Issue 1: self-deadlock The new outer guard(device)(&cxlmd->dev) is still held when pci_dev_restore(pdev) runs. pci_dev_restore() calls the PCI driver’s reset_done() callback: if (err_handler && err_handler->reset_done) For the CXL PCI driver, that is cxl_reset_done(), which does: guard(device)(&cxlmd->dev); So the same thread already holds cxlmd->dev from cxl_do_reset(), then tries to lock it again inside cxl_reset_done(). That deadlocks. Issue 2: reset ordering The old flow was: offline memory The new flow is: disable PCI function The problem is that pci_dev_save_and_disable(pdev) clears PCI Command, including memory decode and bus master. That now happens before cxl_reset_prepare_memdev() tries to offline CXL-backed memory. If that memory is still online, this is backwards: the CXL memory path should be quiesced before disabling/resetting the device that backs it. So the issue resides in the new refactor around cxl_do_reset() / __cxl_do_reset() in drivers/cxl/core/pci.c: Koba fixed the endpoint race by extending the memdev lock, but extended it too far, and also moved pci_dev_save_and_disable() ahead of memory preparation. Instead, the memdev lock should be kept only around the endpoint-dependent preparation, not around the entire reset/restore flow. A better shape would be: But one more detail matters: if sibling prepare succeeds, cxl_pci_functions_reset_done() must run. So the final structure should track whether siblings were prepared: Conceptually:
So the fix is not “don’t hold the memdev lock”; it is “hold it only across cxl_reset_prepare_memdev() and cxl_reset_flush_cpu_caches(), then release it before PCI restore callbacks run.” |
b9745d6 to
94ac72f
Compare
|
@nvmochs thanks
|
|
My ACK still stands:
|
|
|
…to the iomem tree Reworked from a patch by Alison Schofield <alison.schofield@intel.com> Reintroduce Soft Reserved range into the iomem_resource tree for HMEM to consume. This restores visibility in /proc/iomem for ranges actively in use, while avoiding the early-boot conflicts that occurred when Soft Reserved was published into iomem before CXL window and region discovery. Link: https://lore.kernel.org/linux-cxl/29312c0765224ae76862d59a17748c8188fb95f1.1692638817.git.alison.schofield@intel.com/ Co-developed-by: Alison Schofield <alison.schofield@intel.com> Signed-off-by: Alison Schofield <alison.schofield@intel.com> Co-developed-by: Zhijian Li <lizhijian@fujitsu.com> Signed-off-by: Zhijian Li <lizhijian@fujitsu.com> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Tomasz Wolski <tomasz.wolski@fujitsu.com> Link: https://lore.kernel.org/r/20260210064501.157591-10-Smita.KoralahalliChannabasappa@amd.com (cherry picked from https://lore.kernel.org/r/20260210064501.157591-10-Smita.KoralahalliChannabasappa@amd.com) Signed-off-by: Koba Ko <kobak@nvidia.com>
…smaller granularities for lower levels The CXL specification supports multi-level interleaving "as long as all the levels use different, but consecutive, HPA bits to select the target and no Interleave Set has more than 8 devices" (from 3.2). Currently the kernel expects that a decoder's "interleave granularity is a multiple of @parent_port granularity". That is, the granularity of a lower level is bigger than those of the parent and uses the outer HPA bits as selector. It works e.g. for the following 8-way config: * cross-link (cross-hostbridge config in CFMWS): * 4-way * 256 granularity * Selector: HPA[8:9] * sub-link (CXL Host bridge config of the HDM): * 2-way * 1024 granularity * Selector: HPA[10] Now, if the outer HPA bits are used for the cross-hostbridge, an 8-way config could look like this: * cross-link (cross-hostbridge config in CFMWS): * 4-way * 512 granularity * Selector: HPA[9:10] * sub-link (CXL Host bridge config of the HDM): * 2-way * 256 granularity * Selector: HPA[8] The enumeration of decoders for this configuration fails then with following error: cxl region0: pci0000:00:port1 cxl_port_setup_targets expected iw: 2 ig: 1024 [mem 0x10000000000-0x1ffffffffff flags 0x200] cxl region0: pci0000:00:port1 cxl_port_setup_targets got iw: 2 ig: 256 state: enabled 0x10000000000:0x1ffffffffff cxl_port endpoint12: failed to attach decoder12.0 to region0: -6 Note that this happens only if firmware is setting up the decoders (CXL_REGION_F_AUTO). For userspace region assembly the granularities are chosen to increase from root down to the lower levels. That is, outer HPA bits are always used for lower interleaving levels. Rework the implementation to also support multi-level interleaving with smaller granularities for lower levels. Determine the interleave set of autodetected decoders. Check that it is a subset of the root interleave. The HPA selector bits are extracted for all decoders of the set and checked that there is no overlap and bits are consecutive. All decoders can be programmed now to use any bit range within the region's target selector. Signed-off-by: Robert Richter <rrichter@amd.com> (backported from https://lore.kernel.org/all/20251028094754.72816-1-rrichter@amd.com/) [kobak: resolved conflicts with cxlr->cxlrd and spa_maps_hpa()] Signed-off-by: Koba Ko <kobak@nvidia.com>
…and RAS support BugLink: https://bugs.launchpad.net/bugs/2143032 Source: NVIDIA@f80636d Add Ubuntu kernel config annotations for CXL-related configs introduced or changed by the CXL Type-2, RAS, and autodiscovered-region support backports. CONFIG_CXL_BUS, CONFIG_CXL_PCI, CONFIG_CXL_MEM, and CONFIG_CXL_PORT are built in for Type-2 device support. CONFIG_CXL_RAS and the EINJ symbols cover CXL RAS/error-injection support. CONFIG_SFC_CXL remains disabled for NVIDIA platforms. Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit f80636d nv-kernels/24.04_linux-nvidia-6.17-next) [kobak: Backported annotation overrides from debian.nvidia-6.17 to debian.nvidia-bos; PCIEAER_CXL is overridden as removed instead of editing debian.master.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…memory access BugLink: https://bugs.launchpad.net/bugs/2143032 Source: NVIDIA@c5c11cf Override debian.master policy for DEV_DAX, DEV_DAX_CXL, and DEV_DAX_KMEM so CXL memory regions are available as raw DAX devices and as hotplugged System-RAM without relying on module load ordering. Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit c5c11cf nv-kernels/24.04_linux-nvidia-6.17-next) [kobak: Backported annotation overrides from debian.nvidia-6.17 to debian.nvidia-bos.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…/restore BugLink: https://bugs.launchpad.net/bugs/2143032 Source: NVIDIA@a5544cb Add Ubuntu kernel config annotation for CONFIG_PCI_CXL introduced by the CXL DVSEC and HDM state save/restore series. CONFIG_PCI_CXL is a hidden bool auto-enabled when CXL_BUS=y. It gates compilation of drivers/pci/cxl.o, which saves and restores CXL DVSEC control/range registers and HDM decoder state across PCI resets and link transitions. Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit a5544cb nv-kernels/24.04_linux-nvidia-6.17-next) [kobak: Backported annotation override from debian.nvidia-6.17 to debian.nvidia-bos.] Signed-off-by: Koba Ko <kobak@nvidia.com>
94ac72f to
2462338
Compare
|
@nvmochs I rebased to PR1 Verified your finding. cxlmd->endpoint can still be Fixed in the reset orchestration commit by: • reading cxlmd->endpoint under guard(device)(&cxlmd->dev) I also amended the related commit description with: [koba: Guard reset_done() against NULL/ERR_PTR memdev Validation: • focused static guard/order check: PASS Pushed in 2462338. |
|
Re-adding my ACK. Matt's comment looks to be addressed in v4.
|
|
Looks like these were mistakenly left in during the rebase? |
…set" This reverts commit 4f089d9. Drop the older monolithic CXL reset method before applying the refreshed save/restore and reset series. Signed-off-by: Koba Ko <kobak@nvidia.com>
…er definitions BugLink: https://bugs.launchpad.net/bugs/2143032 PCI: Add CXL DVSEC control, lock, and range register definitions Add register offset and field definitions for CXL DVSEC registers needed by CXL state save/restore across resets: - CTRL2 (offset 0x10) and LOCK (offset 0x14) registers - CONFIG_LOCK bit in the LOCK register - RWL (read-write-when-locked) field masks for CTRL and range base registers. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 07ad5f1 nv-kernels/24.04_linux-nvidia-6.17-next) Signed-off-by: Koba Ko <kobak@nvidia.com>
… to include/cxl/cxl.h BugLink: https://bugs.launchpad.net/bugs/2143032 Move CXL HDM decoder register defines, register map structs (cxl_reg_map, cxl_component_reg_map, cxl_device_reg_map, cxl_pmu_reg_map, cxl_register_map), cxl_hdm_decoder_count(), enum cxl_regloc_type, and cxl_find_regblock()/cxl_setup_regs() declarations from internal CXL headers to include/cxl/pci.h. This makes them accessible to code outside the CXL subsystem, in particular the PCI core CXL state save/restore support added in a subsequent patch. No functional change. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit b5e166c nv-kernels/24.04_linux-nvidia-6.17-next) [koba: Also move CXL_CM_CAP_CAP_ID_RAS, CXL_CM_CAP_CAP_ID_HDM, and CXL_CM_CAP_CAP_HDM_VERSION into public include/cxl/cxl.h to keep the public CXL header layout consistent.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…state BugLink: https://bugs.launchpad.net/bugs/2143032 Add pci_add_virtual_ext_cap_save_buffer() to allocate save buffers using virtual cap IDs (above PCI_EXT_CAP_ID_MAX) that don't require a real capability in config space. The existing pci_add_ext_cap_save_buffer() cannot be used for CXL DVSEC state because it calls pci_find_saved_ext_cap() which searches for a matching capability in PCI config space. The CXL state saved here is a synthetic snapshot (DVSEC+HDM) and should not be tied to a real extended-cap instance. A virtual extended-cap save buffer API (cap IDs above PCI_EXT_CAP_ID_MAX) allows PCI to track this state without a backing config space capability. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit b3a768f nv-kernels/24.04_linux-nvidia-6.17-next) Signed-off-by: Koba Ko <kobak@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2143032 Save and restore CXL DVSEC control registers (CTRL, CTRL2), range base registers, and lock state across PCI resets. When the DVSEC CONFIG_LOCK bit is set, certain DVSEC fields become read-only and hardware may have updated them. Blindly restoring saved values would be silently ignored or conflict with hardware state. Instead, a read-merge-write approach is used: current hardware values are read for the RWL (read-write-when-locked) fields and merged with saved state, so only writable bits are restored while locked bits retain their hardware values. Hooked into pci_save_state()/pci_restore_state() so all PCI reset paths automatically preserve CXL DVSEC configuration. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) [jan: Resolve minor conflict in drivers/pci/Makefile due to code line shifts ] Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 0bb0dc0 nv-kernels/24.04_linux-nvidia-6.17-next) Signed-off-by: Koba Ko <kobak@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2143032 Save and restore CXL HDM decoder registers (global control, per-decoder base/size/target-list, and commit state) across PCI resets. On restore, decoders that were committed are reprogrammed and recommitted with a 10ms timeout. Locked decoders that are already committed are skipped, since their state is protected by hardware and reprogramming them would fail. The Register Locator DVSEC is parsed directly via PCI config space reads rather than calling cxl_find_regblock()/cxl_setup_regs(), since this code lives in the PCI core and must not depend on CXL module symbols. MSE is temporarily enabled during save/restore to allow MMIO access to the HDM decoder register block. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306080026.116789-1-smadhavan@nvidia.com/) [jan: Include <cxl/cxl.h> in drivers/pci/cxl.c due to conflict resolution in "4acbc27592b8 NVIDIA: VR: SAUCE: cxl: Move HDM decoder and register map definitions to include/cxl/cxl.h"] Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 578f916 nv-kernels/24.04_linux-nvidia-6.17-next) Signed-off-by: Koba Ko <kobak@nvidia.com>
…efinitions BugLink: https://bugs.launchpad.net/bugs/2143032 Add CXL DVSEC register definitions needed for CXL device reset per CXL r3.2 section 8.1.3.1: - Capability bits: RST_CAPABLE, CACHE_CAPABLE, CACHE_WBI_CAPABLE, RST_TIMEOUT, RST_MEM_CLR_CAPABLE - Control2 register: DISABLE_CACHING, INIT_CACHE_WBI, INIT_CXL_RST, RST_MEM_CLR_EN - Status2 register: CACHE_INV, RST_DONE, RST_ERR - Non-CXL Function Map DVSEC register offset Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (backported from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) [jan: Resolve conflicts where PCI_DVSEC_CXL_CACHE_CAPABLE is already added by "72bd823fb4f1 NVIDIA: VR: SAUCE: PCI: Allow ATS to be always on for CXL.cache capable devices"] Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit 8f77fbe nv-kernels/24.04_linux-nvidia-6.17-next) [koba: Preserve PCI_DVSEC_CXL_CACHE_CAPABLE because drivers/pci/ats.c still uses it for CXL.cache ATS dependency from commit 3765488 (6.17 source commit 72bd823).] Signed-off-by: Koba Ko <kobak@nvidia.com>
…_restore() BugLink: https://bugs.launchpad.net/bugs/2143032 Export pci_dev_save_and_disable() and pci_dev_restore() so that subsystems performing non-standard reset sequences (e.g. CXL) can reuse the PCI core standard pre/post reset lifecycle: driver reset_prepare/reset_done callbacks, PCI config space save/restore, and device disable/re-enable. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit a14a427 nv-kernels/24.04_linux-nvidia-6.17-next) Signed-off-by: Koba Ko <kobak@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2143032 Add infrastructure for quiescing the CXL data path before reset: - Memory offlining: check if CXL-backed memory is online and offline it via offline_and_remove_memory() before reset, per CXL spec requirement to quiesce all CXL.mem transactions before issuing CXL Reset. - CPU cache flush: invalidate cache lines before reset as a safety measure after memory offline. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit 98bfbf9 nv-kernels/24.04_linux-nvidia-6.17-next) [koba: Use a real System RAM walker callback so resource walks never invoke a NULL function pointer.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…XL reset BugLink: https://bugs.launchpad.net/bugs/2143032 Add sibling PCI function save/disable/restore coordination for CXL reset. Before reset, all CXL.cachemem sibling functions are locked, saved, and disabled; after reset they are restored. The Non-CXL Function Map DVSEC and per-function DVSEC capability register are consulted to skip non-CXL and CXL.io-only functions. A global mutex serializes concurrent resets to prevent deadlocks between sibling functions. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit 9a08c02 nv-kernels/24.04_linux-nvidia-6.17-next) [koba: Propagate sibling collection allocation failures after pci_walk_bus() so reset aborts instead of proceeding with a partial sibling list.] Signed-off-by: Koba Ko <kobak@nvidia.com>
…ration BugLink: https://bugs.launchpad.net/bugs/2143032 cxl_dev_reset() implements the hardware reset sequence: optionally enable memory clear, initiate reset via CTRL2, wait for completion, and re-enable caching. cxl_do_reset() orchestrates the full reset flow: 1. CXL pre-reset: mem offlining and cache flush (when memdev present) 2. PCI save/disable: pci_dev_save_and_disable() automatically saves CXL DVSEC and HDM decoder state via PCI core hooks 3. Sibling coordination: save/disable CXL.cachemem sibling functions 4. Execute CXL DVSEC reset 5. Sibling restore: always runs to re-enable sibling functions 6. PCI restore: pci_dev_restore() automatically restores CXL state The CXL-specific DVSEC and HDM save/restore is handled by the PCI core's CXL save/restore infrastructure (drivers/pci/cxl.c). Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (backported from commit 92fb807 nv-kernels/24.04_linux-nvidia-6.17-next) [koba: Treat error-valued cxlmd->endpoint as no endpoint to avoid dereferencing ERR_PTR before endpoint attach.] [koba: Check sibling collection failure before starting the CXL reset so allocation failure restores the target and aborts.] [koba: Limit the memdev device lock to endpoint-dependent memory preparation and cache flush, restore memory quiesce before PCI disable, and track sibling reset preparation so reset_done cleanup only runs after successful sibling prepare.] [koba: Guard reset_done() against NULL/ERR_PTR memdev endpoints before decoder reset detection.] Signed-off-by: Koba Ko <kobak@nvidia.com>
BugLink: https://bugs.launchpad.net/bugs/2143032 Add a "cxl_reset" sysfs attribute to PCI devices that support CXL Reset (CXL r3.2 section 8.1.3.1). The attribute is visible only on devices with both CXL.cache and CXL.mem capabilities and the CXL Reset Capable bit set in the DVSEC. Writing "1" to the attribute triggers the full CXL reset flow via cxl_do_reset(). The interface is decoupled from memdev creation: when a CXL memdev exists, memory offlining and cache flush are performed; otherwise reset proceeds without the memory management. The sysfs attribute is managed entirely by the CXL module using sysfs_create_group() / sysfs_remove_group() rather than the PCI core's static attribute groups. This avoids cross-module symbol dependencies between the PCI core (always built-in) and CXL_BUS (potentially modular). At module init, existing PCI devices are scanned and a PCI bus notifier handles hot-plug/unplug. kernfs_drain() makes sure that any in-flight store() completes before sysfs_remove_group() returns, preventing use-after-free during module unload. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 6e96f7e nv-kernels/24.04_linux-nvidia-6.17-next) Signed-off-by: Koba Ko <kobak@nvidia.com>
…tribute BugLink: https://bugs.launchpad.net/bugs/2143032 Document the cxl_reset sysfs attribute added to PCI devices that support CXL Reset. Signed-off-by: Srirangan Madhavan <smadhavan@nvidia.com> (cherry picked from https://lore.kernel.org/linux-cxl/20260306092322.148765-1-smadhavan@nvidia.com/) Signed-off-by: Jiandi An <jan@nvidia.com> Acked-by: Jamie Nguyen <jamien@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Carol L Soto <csoto@nvidia.com> Acked-by: Matthew R. Ochs <mochs@nvidia.com> Signed-off-by: Brad Figg <bfigg@nvidia.com> (cherry picked from commit 33b53e1 nv-kernels/24.04_linux-nvidia-6.17-next) Signed-off-by: Koba Ko <kobak@nvidia.com>
I believe that was on purpose. Both of these are included in pr#426 (the dependency series). |
2462338 to
9bf30cf
Compare
Here is what I see... It's not that big of a deal, but we'll need to be clear to Brad which commits to pick from this PR (he can't just blindly apply all the commits). No issues with the recent updates to the reset and save/restore patches. I verified that it resolves my last finding.
|
|
Merged, closing PR. |
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-7.0/+bug/2153819
Summary
Backport Srirangan's CXL reset plus DVSEC/HDM save-restore series for the CXL Type-2 stack.
This series:
cxl_resetPCI sysfs interface./sys/bus/pci/devices/.../cxl_reset.PCI_DVSEC_CXL_CACHE_CAPABLEfor the existing CXL.cache ATS dependency.Dependency Note
This PR depends on the CXL Type-2 base branch and is intended to be reviewed/applied after that branch lands.
The branch is currently stacked locally on that dependency; the dependency branch is not pushed to
NVIDIA/NV-Kernels.Validation
7.0.0-dgx16137-cxlreset.cxl_resetvalidation passed on all four target BDFs.CXLCtland captured CXL range fields were preserved across reset.CXLSta2reportedResetComplete+ ResetError-.git diff --checkis clean.