Skip to content

Commit d667813

Browse files
ssjiaSS-JIA
authored andcommitted
[ET-VK] Fix force_fp16 texture bias being silently rejected for CONTIGUOUS_ANY ops
Pull Request resolved: #18770 The `force_fp16` path in `TagMemoryMetaPass` applies `ANY_TEXTURE` to bias ops toward texture storage. However, `try_constrain_with_arg_repset` has a packed-dim compatibility check that requires ALL of the source repset's PDIs to exist in the output repset. `ANY_TEXTURE` has 3 texture layouts (WP, HP, CP) but `CONTIGUOUS_ANY` outputs only support WP, so the check fails and the texture bias is silently dropped. Without the bias, buffer storage cascades from ops that must use buffer (e.g. embedding with vocab exceeding texture limits) into downstream ops that could use texture, causing unnecessary buffer↔texture transitions. Fix: check PDI compatibility against the intersection of arg and source repsets (what would actually be applied) rather than the raw source. The intersection of `ANY_TEXTURE ∩ CONTIGUOUS_ANY` = `WIDTH_PACKED_TEXTURE`, which IS compatible with the output. Authored by Claude. ghstack-source-id: 364280901 @exported-using-ghexport Differential Revision: [D100004702](https://our.internmc.facebook.com/intern/diff/D100004702/)
1 parent d76be20 commit d667813

1 file changed

Lines changed: 9 additions & 4 deletions

File tree

backends/vulkan/utils.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1511,14 +1511,19 @@ def try_constrain_with_arg_repset(
15111511
if not arg_current_repset.any_in_common(source_repset):
15121512
return False
15131513

1514+
# Compute the narrowed repset (intersection of current arg and source).
1515+
narrowed = arg_current_repset.make_intersect(source_repset)
1516+
15141517
if self.sync_primary_io_repr:
1515-
if not self.get_out_repset(0).has_compatible_packed_dim_info_set(
1516-
source_repset
1517-
):
1518+
# Check that the narrowed result is compatible with the output.
1519+
# Using the intersection rather than the raw source_repset avoids
1520+
# rejecting valid constraints where the source has extra layouts
1521+
# (e.g. ANY_TEXTURE includes HP/CP) that don't exist in the output
1522+
# but also don't appear in the intersection.
1523+
if not self.get_out_repset(0).has_compatible_packed_dim_info_set(narrowed):
15181524
return False
15191525

15201526
# If this point is reached, then it is possible to constrain
1521-
narrowed = arg_current_repset.make_intersect(source_repset)
15221527
self.args_repset_list[arg_i] = narrowed
15231528

15241529
# Propagate to other synced args via packed-dim compatibility

0 commit comments

Comments
 (0)