Skip to content

Validate EEPROM after writing it#316

Open
JohnAZoidberg wants to merge 2 commits intomainfrom
eeprom-read-after-write
Open

Validate EEPROM after writing it#316
JohnAZoidberg wants to merge 2 commits intomainfrom
eeprom-read-after-write

Conversation

@JohnAZoidberg
Copy link
Copy Markdown
Member

@JohnAZoidberg JohnAZoidberg commented Apr 14, 2026

An example to test this:

framework_tool --dump-gpu-descriptor-file dump.bin
framework_tool --flash-gpu-descriptor-file dump.bin

@JohnAZoidberg JohnAZoidberg requested a review from kiram9 April 14, 2026 03:35
@JohnAZoidberg
Copy link
Copy Markdown
Member Author

Not tested yet. Ideally it would print both and highlight the differences.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a post-write validation step for the dGPU EEPROM “GPU descriptor” by reading back the data after programming and comparing it to the input bytes.

Changes:

  • Add CrosEc::validate_gpu_descriptor() to read the descriptor back and compare against a reference buffer.
  • Update the CLI flow for --flash-gpu-descriptor-file to perform readback validation after a successful write.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
framework_lib/src/commandline/mod.rs Adds CLI readback/validation output after writing the GPU descriptor file.
framework_lib/src/chromium_ec/mod.rs Introduces validate_gpu_descriptor() to compare EEPROM contents against a provided reference.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1811 to +1814
println!("GPU Descriptor successfully written");
println!("Reading Back GPU Descriptor");
let valid = ec.validate_gpu_descriptor(&data);
println!(" Read returns same bytes: {:?}", valid);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When --dry-run is used, set_gpu_descriptor returns Ok(()) without writing anything, but this branch still prints "GPU Descriptor successfully written" and performs a readback/validation. This produces misleading output in dry-run mode; consider skipping the success message + validation when args.dry_run is true, or explicitly labeling the result as a simulated write.

Suggested change
println!("GPU Descriptor successfully written");
println!("Reading Back GPU Descriptor");
let valid = ec.validate_gpu_descriptor(&data);
println!(" Read returns same bytes: {:?}", valid);
if args.dry_run {
println!("GPU Descriptor write simulated (--dry-run)");
} else {
println!("GPU Descriptor successfully written");
println!("Reading Back GPU Descriptor");
let valid = ec.validate_gpu_descriptor(&data);
println!(" Read returns same bytes: {:?}", valid);
}

Copilot uses AI. Check for mistakes.
Comment on lines +1813 to +1814
let valid = ec.validate_gpu_descriptor(&data);
println!(" Read returns same bytes: {:?}", valid);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_gpu_descriptor returns an EcResult<bool>, but the CLI prints it with {:?} under a message that implies a boolean result ("Read returns same bytes"). Consider matching on the result: print a clear pass/fail for Ok(true/false) and handle Err(_) separately (and optionally return a non-zero exit code on validation failure) so validation errors aren't silently treated as just debug output.

Suggested change
let valid = ec.validate_gpu_descriptor(&data);
println!(" Read returns same bytes: {:?}", valid);
match ec.validate_gpu_descriptor(&data) {
Ok(true) => println!(" Read returns same bytes: true"),
Ok(false) => println!(" Read returns same bytes: false"),
Err(err) => {
println!(" GPU Descriptor validation failed with error: {:?}", err)
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +1812 to +1814
println!("Reading Back GPU Descriptor");
let valid = ec.validate_gpu_descriptor(&data);
println!(" Read returns same bytes: {:?}", valid);
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readback validation is performed after set_gpu_descriptor returns, but set_gpu_descriptor may have temporarily enabled bay power and then turned it back off before returning (when force_power is true). In that case, the subsequent I2C read in validate_gpu_descriptor can fail even if the write succeeded. Consider moving validation into set_gpu_descriptor (before disabling power), or provide a way to keep power enabled through validation.

Suggested change
println!("Reading Back GPU Descriptor");
let valid = ec.validate_gpu_descriptor(&data);
println!(" Read returns same bytes: {:?}", valid);
println!(
"Skipping post-write readback validation here because bay power may \
have been disabled before this function regained control"
);

Copilot uses AI. Check for mistakes.
Comment on lines +1530 to +1536
let header = self.read_gpu_desc_header()?;
if header.magic != [0x32, 0xAC, 0x00, 0x00] {
return Err(EcError::DeviceError(
"Invalid descriptor hdr magic".to_string(),
));
}
let current_desc = self.read_ec_gpu_chunk(0x00, header.descriptor_length as u16)?;
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_gpu_descriptor duplicates the header magic check + read logic already present in read_gpu_descriptor. Consider implementing this as self.read_gpu_descriptor()? == reference_desc (or calling read_gpu_desc_header once and reusing a shared helper) to avoid divergence between the two code paths over time.

Suggested change
let header = self.read_gpu_desc_header()?;
if header.magic != [0x32, 0xAC, 0x00, 0x00] {
return Err(EcError::DeviceError(
"Invalid descriptor hdr magic".to_string(),
));
}
let current_desc = self.read_ec_gpu_chunk(0x00, header.descriptor_length as u16)?;
let current_desc = self.read_gpu_descriptor()?;

Copilot uses AI. Check for mistakes.
Comment on lines +1536 to +1537
let current_desc = self.read_ec_gpu_chunk(0x00, header.descriptor_length as u16)?;
Ok(current_desc == reference_desc)
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptor_length is a u32, but it’s cast to u16 when calling read_ec_gpu_chunk. If the EEPROM descriptor ever exceeds u16::MAX, this will truncate the length and read only a prefix. Consider explicitly checking header.descriptor_length <= u16::MAX and returning an error if it doesn’t fit (and/or adjusting read_ec_gpu_chunk to support larger reads).

Copilot uses AI. Check for mistakes.
return Err(EcError::DeviceError(
"Invalid descriptor hdr magic".to_string(),
));
}
Copy link

Copilot AI Apr 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As written, validation reads header.descriptor_length bytes and compares them to reference_desc. If the reference buffer length differs from the length encoded in the header (e.g., extra trailing bytes in the file), this will always report false even when the EEPROM content matches what the header says is valid. Consider asserting reference_desc.len() == header.descriptor_length as usize (and erroring if not) or otherwise defining which length should be compared.

Suggested change
}
}
let descriptor_length = header.descriptor_length as usize;
if reference_desc.len() != descriptor_length {
return Err(EcError::DeviceError(format!(
"Reference descriptor length mismatch: expected {}, got {}",
descriptor_length,
reference_desc.len()
)));
}

Copilot uses AI. Check for mistakes.
@JohnAZoidberg JohnAZoidberg force-pushed the eeprom-read-after-write branch from b18d926 to e1a4c5d Compare April 14, 2026 04:05
Signed-off-by: Daniel Schaefer <dhs@frame.work>
To purely validate that the current EEPROM matches a local file and
print the diff if not.

Signed-off-by: Daniel Schaefer <dhs@frame.work>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +571 to +573
validate_gpu_descriptor_file: args
.validate_gpu_descriptor_file
.map(|x| x.into_os_string().into_string().unwrap()),
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting PathBuf to String with .into_string().unwrap() will panic on non-UTF8 paths. Since this is user-provided input, consider keeping PathBuf in Cli or using a lossy conversion (and emitting a friendly error) to avoid crashing.

Copilot uses AI. Check for mistakes.
Comment on lines +1812 to +1817
println!("GPU Descriptor successfully written");
println!("Validating GPU Descriptor...");
match ec.validate_gpu_descriptor(&data) {
Ok(true) => println!(" Validation passed"),
Ok(false) => println!(" Validation FAILED: read-back mismatch"),
Err(err) => println!(" Validation error: {:?}", err),
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--validate-gpu-descriptor-file validation runs even when --dry-run is set. Since set_gpu_descriptor(..., dry_run=true) doesn't write anything, the immediate read-back validation will usually report a mismatch and/or error, which is misleading. Skip validation when args.dry_run is true (or clearly label it as validating the current EEPROM vs the file).

Suggested change
println!("GPU Descriptor successfully written");
println!("Validating GPU Descriptor...");
match ec.validate_gpu_descriptor(&data) {
Ok(true) => println!(" Validation passed"),
Ok(false) => println!(" Validation FAILED: read-back mismatch"),
Err(err) => println!(" Validation error: {:?}", err),
if args.dry_run {
println!(
"GPU Descriptor dry run successful; skipping validation because no changes were written"
);
} else {
println!("GPU Descriptor successfully written");
println!("Validating GPU Descriptor...");
match ec.validate_gpu_descriptor(&data) {
Ok(true) => println!(" Validation passed"),
Ok(false) => println!(" Validation FAILED: read-back mismatch"),
Err(err) => println!(" Validation error: {:?}", err),
}

Copilot uses AI. Check for mistakes.
Comment on lines +1813 to +1818
println!("Validating GPU Descriptor...");
match ec.validate_gpu_descriptor(&data) {
Ok(true) => println!(" Validation passed"),
Ok(false) => println!(" Validation FAILED: read-back mismatch"),
Err(err) => println!(" Validation error: {:?}", err),
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation is triggered after set_gpu_descriptor returns, but set_gpu_descriptor may temporarily enable gpu_3v_5v_en and then disable it again before returning. In that case validate_gpu_descriptor() will try to read the EEPROM with power off and likely fail. Consider moving validation into set_gpu_descriptor (before power is toggled off), or ensure validation also handles the same power-enable/disable flow.

Suggested change
println!("Validating GPU Descriptor...");
match ec.validate_gpu_descriptor(&data) {
Ok(true) => println!(" Validation passed"),
Ok(false) => println!(" Validation FAILED: read-back mismatch"),
Err(err) => println!(" Validation error: {:?}", err),
}
println!(
"Skipping immediate validation after write because the EEPROM may have been powered down before set_gpu_descriptor returned"
);

Copilot uses AI. Check for mistakes.
Comment on lines +1845 to +1848
Ok(true) => println!(" Validation passed"),
Ok(false) => println!(" Validation FAILED: read-back mismatch"),
Err(err) => println!(" Validation error: {:?}", err),
}
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--validate-gpu-descriptor-file prints pass/fail but run_with_args still exits with status 0 either way, which makes it hard to use in scripts/CI. Consider returning a non-zero exit code when validation fails or errors (similar to how compare_version() returns 1 on mismatch).

Suggested change
Ok(true) => println!(" Validation passed"),
Ok(false) => println!(" Validation FAILED: read-back mismatch"),
Err(err) => println!(" Validation error: {:?}", err),
}
Ok(true) => {
println!(" Validation passed");
return 0;
}
Ok(false) => {
println!(" Validation FAILED: read-back mismatch");
return 1;
}
Err(err) => {
println!(" Validation error: {:?}", err);
return 1;
}
}
} else {
return 1;

Copilot uses AI. Check for mistakes.
cli.validate_gpu_descriptor_file = if args.len() > i + 1 {
Some(args[i + 1].clone())
} else {
println!("Need to provide a value for --validate_gpu_descriptor_file. PATH");
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message for a missing value uses --validate_gpu_descriptor_file, but the actual flag is --validate-gpu-descriptor-file. This will confuse users copy/pasting the flag from the error output; update the message to match the real CLI option spelling.

Suggested change
println!("Need to provide a value for --validate_gpu_descriptor_file. PATH");
println!("Need to provide a value for --validate-gpu-descriptor-file. PATH");

Copilot uses AI. Check for mistakes.
Comment on lines +1530 to +1544
let current_desc = self.read_gpu_descriptor()?;
let max_len = core::cmp::max(current_desc.len(), reference_desc.len());
let mut valid = true;
for i in 0..max_len {
let expected = reference_desc.get(i);
let actual = current_desc.get(i);
if expected != actual {
println!(
" Offset 0x{:04X}: expected {:02X?}, got {:02X?}",
i, expected, actual
);
valid = false;
}
}
Ok(valid)
Copy link

Copilot AI Apr 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_gpu_descriptor() reads the EEPROM via read_gpu_descriptor() but doesn't ensure the expansion bay power rail (gpu_3v_5v_en) is enabled first. Since set_gpu_descriptor() may power-cycle the bay during writes, callers that validate immediately after writing can hit read failures. Consider either (a) handling the same power-on/power-off logic inside validation, or (b) performing validation as part of the write flow before power is turned back off.

Suggested change
let current_desc = self.read_gpu_descriptor()?;
let max_len = core::cmp::max(current_desc.len(), reference_desc.len());
let mut valid = true;
for i in 0..max_len {
let expected = reference_desc.get(i);
let actual = current_desc.get(i);
if expected != actual {
println!(
" Offset 0x{:04X}: expected {:02X?}, got {:02X?}",
i, expected, actual
);
valid = false;
}
}
Ok(valid)
let gpu_power_enabled = self.get_gpio("gpu_3v_5v_en")?;
let force_power = !gpu_power_enabled;
if force_power {
self.set_gpio("gpu_3v_5v_en", true)?;
}
let return_val = (|| -> EcResult<bool> {
let current_desc = self.read_gpu_descriptor()?;
let max_len = core::cmp::max(current_desc.len(), reference_desc.len());
let mut valid = true;
for i in 0..max_len {
let expected = reference_desc.get(i);
let actual = current_desc.get(i);
if expected != actual {
println!(
" Offset 0x{:04X}: expected {:02X?}, got {:02X?}",
i, expected, actual
);
valid = false;
}
}
Ok(valid)
})();
if force_power {
let res = self.set_gpio("gpu_3v_5v_en", false);
if let Err(err) = res {
error!("Failed to set ALW power to GPU off {:?}", err);
return if return_val.is_err() {
return_val
} else {
Err(err)
};
}
}
return_val

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants