Conversation
|
Not tested yet. Ideally it would print both and highlight the differences. |
2d36dd5 to
b18d926
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a post-write validation step for the dGPU EEPROM “GPU descriptor” by reading back the data after programming and comparing it to the input bytes.
Changes:
- Add
CrosEc::validate_gpu_descriptor()to read the descriptor back and compare against a reference buffer. - Update the CLI flow for
--flash-gpu-descriptor-fileto perform readback validation after a successful write.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
framework_lib/src/commandline/mod.rs |
Adds CLI readback/validation output after writing the GPU descriptor file. |
framework_lib/src/chromium_ec/mod.rs |
Introduces validate_gpu_descriptor() to compare EEPROM contents against a provided reference. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
framework_lib/src/commandline/mod.rs
Outdated
| println!("GPU Descriptor successfully written"); | ||
| println!("Reading Back GPU Descriptor"); | ||
| let valid = ec.validate_gpu_descriptor(&data); | ||
| println!(" Read returns same bytes: {:?}", valid); |
There was a problem hiding this comment.
When --dry-run is used, set_gpu_descriptor returns Ok(()) without writing anything, but this branch still prints "GPU Descriptor successfully written" and performs a readback/validation. This produces misleading output in dry-run mode; consider skipping the success message + validation when args.dry_run is true, or explicitly labeling the result as a simulated write.
| println!("GPU Descriptor successfully written"); | |
| println!("Reading Back GPU Descriptor"); | |
| let valid = ec.validate_gpu_descriptor(&data); | |
| println!(" Read returns same bytes: {:?}", valid); | |
| if args.dry_run { | |
| println!("GPU Descriptor write simulated (--dry-run)"); | |
| } else { | |
| println!("GPU Descriptor successfully written"); | |
| println!("Reading Back GPU Descriptor"); | |
| let valid = ec.validate_gpu_descriptor(&data); | |
| println!(" Read returns same bytes: {:?}", valid); | |
| } |
framework_lib/src/commandline/mod.rs
Outdated
| let valid = ec.validate_gpu_descriptor(&data); | ||
| println!(" Read returns same bytes: {:?}", valid); |
There was a problem hiding this comment.
validate_gpu_descriptor returns an EcResult<bool>, but the CLI prints it with {:?} under a message that implies a boolean result ("Read returns same bytes"). Consider matching on the result: print a clear pass/fail for Ok(true/false) and handle Err(_) separately (and optionally return a non-zero exit code on validation failure) so validation errors aren't silently treated as just debug output.
| let valid = ec.validate_gpu_descriptor(&data); | |
| println!(" Read returns same bytes: {:?}", valid); | |
| match ec.validate_gpu_descriptor(&data) { | |
| Ok(true) => println!(" Read returns same bytes: true"), | |
| Ok(false) => println!(" Read returns same bytes: false"), | |
| Err(err) => { | |
| println!(" GPU Descriptor validation failed with error: {:?}", err) | |
| } | |
| } |
framework_lib/src/commandline/mod.rs
Outdated
| println!("Reading Back GPU Descriptor"); | ||
| let valid = ec.validate_gpu_descriptor(&data); | ||
| println!(" Read returns same bytes: {:?}", valid); |
There was a problem hiding this comment.
Readback validation is performed after set_gpu_descriptor returns, but set_gpu_descriptor may have temporarily enabled bay power and then turned it back off before returning (when force_power is true). In that case, the subsequent I2C read in validate_gpu_descriptor can fail even if the write succeeded. Consider moving validation into set_gpu_descriptor (before disabling power), or provide a way to keep power enabled through validation.
| println!("Reading Back GPU Descriptor"); | |
| let valid = ec.validate_gpu_descriptor(&data); | |
| println!(" Read returns same bytes: {:?}", valid); | |
| println!( | |
| "Skipping post-write readback validation here because bay power may \ | |
| have been disabled before this function regained control" | |
| ); |
framework_lib/src/chromium_ec/mod.rs
Outdated
| let header = self.read_gpu_desc_header()?; | ||
| if header.magic != [0x32, 0xAC, 0x00, 0x00] { | ||
| return Err(EcError::DeviceError( | ||
| "Invalid descriptor hdr magic".to_string(), | ||
| )); | ||
| } | ||
| let current_desc = self.read_ec_gpu_chunk(0x00, header.descriptor_length as u16)?; |
There was a problem hiding this comment.
validate_gpu_descriptor duplicates the header magic check + read logic already present in read_gpu_descriptor. Consider implementing this as self.read_gpu_descriptor()? == reference_desc (or calling read_gpu_desc_header once and reusing a shared helper) to avoid divergence between the two code paths over time.
| let header = self.read_gpu_desc_header()?; | |
| if header.magic != [0x32, 0xAC, 0x00, 0x00] { | |
| return Err(EcError::DeviceError( | |
| "Invalid descriptor hdr magic".to_string(), | |
| )); | |
| } | |
| let current_desc = self.read_ec_gpu_chunk(0x00, header.descriptor_length as u16)?; | |
| let current_desc = self.read_gpu_descriptor()?; |
framework_lib/src/chromium_ec/mod.rs
Outdated
| let current_desc = self.read_ec_gpu_chunk(0x00, header.descriptor_length as u16)?; | ||
| Ok(current_desc == reference_desc) |
There was a problem hiding this comment.
descriptor_length is a u32, but it’s cast to u16 when calling read_ec_gpu_chunk. If the EEPROM descriptor ever exceeds u16::MAX, this will truncate the length and read only a prefix. Consider explicitly checking header.descriptor_length <= u16::MAX and returning an error if it doesn’t fit (and/or adjusting read_ec_gpu_chunk to support larger reads).
| return Err(EcError::DeviceError( | ||
| "Invalid descriptor hdr magic".to_string(), | ||
| )); | ||
| } |
There was a problem hiding this comment.
As written, validation reads header.descriptor_length bytes and compares them to reference_desc. If the reference buffer length differs from the length encoded in the header (e.g., extra trailing bytes in the file), this will always report false even when the EEPROM content matches what the header says is valid. Consider asserting reference_desc.len() == header.descriptor_length as usize (and erroring if not) or otherwise defining which length should be compared.
| } | |
| } | |
| let descriptor_length = header.descriptor_length as usize; | |
| if reference_desc.len() != descriptor_length { | |
| return Err(EcError::DeviceError(format!( | |
| "Reference descriptor length mismatch: expected {}, got {}", | |
| descriptor_length, | |
| reference_desc.len() | |
| ))); | |
| } |
b18d926 to
e1a4c5d
Compare
Signed-off-by: Daniel Schaefer <dhs@frame.work>
To purely validate that the current EEPROM matches a local file and print the diff if not. Signed-off-by: Daniel Schaefer <dhs@frame.work>
e1a4c5d to
c46e7e4
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| validate_gpu_descriptor_file: args | ||
| .validate_gpu_descriptor_file | ||
| .map(|x| x.into_os_string().into_string().unwrap()), |
There was a problem hiding this comment.
Converting PathBuf to String with .into_string().unwrap() will panic on non-UTF8 paths. Since this is user-provided input, consider keeping PathBuf in Cli or using a lossy conversion (and emitting a friendly error) to avoid crashing.
| println!("GPU Descriptor successfully written"); | ||
| println!("Validating GPU Descriptor..."); | ||
| match ec.validate_gpu_descriptor(&data) { | ||
| Ok(true) => println!(" Validation passed"), | ||
| Ok(false) => println!(" Validation FAILED: read-back mismatch"), | ||
| Err(err) => println!(" Validation error: {:?}", err), |
There was a problem hiding this comment.
--validate-gpu-descriptor-file validation runs even when --dry-run is set. Since set_gpu_descriptor(..., dry_run=true) doesn't write anything, the immediate read-back validation will usually report a mismatch and/or error, which is misleading. Skip validation when args.dry_run is true (or clearly label it as validating the current EEPROM vs the file).
| println!("GPU Descriptor successfully written"); | |
| println!("Validating GPU Descriptor..."); | |
| match ec.validate_gpu_descriptor(&data) { | |
| Ok(true) => println!(" Validation passed"), | |
| Ok(false) => println!(" Validation FAILED: read-back mismatch"), | |
| Err(err) => println!(" Validation error: {:?}", err), | |
| if args.dry_run { | |
| println!( | |
| "GPU Descriptor dry run successful; skipping validation because no changes were written" | |
| ); | |
| } else { | |
| println!("GPU Descriptor successfully written"); | |
| println!("Validating GPU Descriptor..."); | |
| match ec.validate_gpu_descriptor(&data) { | |
| Ok(true) => println!(" Validation passed"), | |
| Ok(false) => println!(" Validation FAILED: read-back mismatch"), | |
| Err(err) => println!(" Validation error: {:?}", err), | |
| } |
| println!("Validating GPU Descriptor..."); | ||
| match ec.validate_gpu_descriptor(&data) { | ||
| Ok(true) => println!(" Validation passed"), | ||
| Ok(false) => println!(" Validation FAILED: read-back mismatch"), | ||
| Err(err) => println!(" Validation error: {:?}", err), | ||
| } |
There was a problem hiding this comment.
Validation is triggered after set_gpu_descriptor returns, but set_gpu_descriptor may temporarily enable gpu_3v_5v_en and then disable it again before returning. In that case validate_gpu_descriptor() will try to read the EEPROM with power off and likely fail. Consider moving validation into set_gpu_descriptor (before power is toggled off), or ensure validation also handles the same power-enable/disable flow.
| println!("Validating GPU Descriptor..."); | |
| match ec.validate_gpu_descriptor(&data) { | |
| Ok(true) => println!(" Validation passed"), | |
| Ok(false) => println!(" Validation FAILED: read-back mismatch"), | |
| Err(err) => println!(" Validation error: {:?}", err), | |
| } | |
| println!( | |
| "Skipping immediate validation after write because the EEPROM may have been powered down before set_gpu_descriptor returned" | |
| ); |
| Ok(true) => println!(" Validation passed"), | ||
| Ok(false) => println!(" Validation FAILED: read-back mismatch"), | ||
| Err(err) => println!(" Validation error: {:?}", err), | ||
| } |
There was a problem hiding this comment.
--validate-gpu-descriptor-file prints pass/fail but run_with_args still exits with status 0 either way, which makes it hard to use in scripts/CI. Consider returning a non-zero exit code when validation fails or errors (similar to how compare_version() returns 1 on mismatch).
| Ok(true) => println!(" Validation passed"), | |
| Ok(false) => println!(" Validation FAILED: read-back mismatch"), | |
| Err(err) => println!(" Validation error: {:?}", err), | |
| } | |
| Ok(true) => { | |
| println!(" Validation passed"); | |
| return 0; | |
| } | |
| Ok(false) => { | |
| println!(" Validation FAILED: read-back mismatch"); | |
| return 1; | |
| } | |
| Err(err) => { | |
| println!(" Validation error: {:?}", err); | |
| return 1; | |
| } | |
| } | |
| } else { | |
| return 1; |
| cli.validate_gpu_descriptor_file = if args.len() > i + 1 { | ||
| Some(args[i + 1].clone()) | ||
| } else { | ||
| println!("Need to provide a value for --validate_gpu_descriptor_file. PATH"); |
There was a problem hiding this comment.
The error message for a missing value uses --validate_gpu_descriptor_file, but the actual flag is --validate-gpu-descriptor-file. This will confuse users copy/pasting the flag from the error output; update the message to match the real CLI option spelling.
| println!("Need to provide a value for --validate_gpu_descriptor_file. PATH"); | |
| println!("Need to provide a value for --validate-gpu-descriptor-file. PATH"); |
| let current_desc = self.read_gpu_descriptor()?; | ||
| let max_len = core::cmp::max(current_desc.len(), reference_desc.len()); | ||
| let mut valid = true; | ||
| for i in 0..max_len { | ||
| let expected = reference_desc.get(i); | ||
| let actual = current_desc.get(i); | ||
| if expected != actual { | ||
| println!( | ||
| " Offset 0x{:04X}: expected {:02X?}, got {:02X?}", | ||
| i, expected, actual | ||
| ); | ||
| valid = false; | ||
| } | ||
| } | ||
| Ok(valid) |
There was a problem hiding this comment.
validate_gpu_descriptor() reads the EEPROM via read_gpu_descriptor() but doesn't ensure the expansion bay power rail (gpu_3v_5v_en) is enabled first. Since set_gpu_descriptor() may power-cycle the bay during writes, callers that validate immediately after writing can hit read failures. Consider either (a) handling the same power-on/power-off logic inside validation, or (b) performing validation as part of the write flow before power is turned back off.
| let current_desc = self.read_gpu_descriptor()?; | |
| let max_len = core::cmp::max(current_desc.len(), reference_desc.len()); | |
| let mut valid = true; | |
| for i in 0..max_len { | |
| let expected = reference_desc.get(i); | |
| let actual = current_desc.get(i); | |
| if expected != actual { | |
| println!( | |
| " Offset 0x{:04X}: expected {:02X?}, got {:02X?}", | |
| i, expected, actual | |
| ); | |
| valid = false; | |
| } | |
| } | |
| Ok(valid) | |
| let gpu_power_enabled = self.get_gpio("gpu_3v_5v_en")?; | |
| let force_power = !gpu_power_enabled; | |
| if force_power { | |
| self.set_gpio("gpu_3v_5v_en", true)?; | |
| } | |
| let return_val = (|| -> EcResult<bool> { | |
| let current_desc = self.read_gpu_descriptor()?; | |
| let max_len = core::cmp::max(current_desc.len(), reference_desc.len()); | |
| let mut valid = true; | |
| for i in 0..max_len { | |
| let expected = reference_desc.get(i); | |
| let actual = current_desc.get(i); | |
| if expected != actual { | |
| println!( | |
| " Offset 0x{:04X}: expected {:02X?}, got {:02X?}", | |
| i, expected, actual | |
| ); | |
| valid = false; | |
| } | |
| } | |
| Ok(valid) | |
| })(); | |
| if force_power { | |
| let res = self.set_gpio("gpu_3v_5v_en", false); | |
| if let Err(err) = res { | |
| error!("Failed to set ALW power to GPU off {:?}", err); | |
| return if return_val.is_err() { | |
| return_val | |
| } else { | |
| Err(err) | |
| }; | |
| } | |
| } | |
| return_val |
An example to test this: