shader_recompiler: Implement reg type tracking#4247
Open
raphaelthegreat wants to merge 6 commits intoshadps4-emu:mainfrom
Open
shader_recompiler: Implement reg type tracking#4247raphaelthegreat wants to merge 6 commits intoshadps4-emu:mainfrom
raphaelthegreat wants to merge 6 commits intoshadps4-emu:mainfrom
Conversation
Basically fast way to check if two bools are not equal
…d CMP_U64 properly
Contributor
Author
|
PS I've also been thinking if unifying the reg space is a good idea or starting by emitting IR code faithfully to guest code and having a post pass optimization to booleans, though that would take a lot more work and not sure if it would be strictly better than this |
e7d3cf1 to
5f4c77a
Compare
Collaborator
Collaborator
Contributor
Collaborator
|
Regresses Shadow of the Colossus, the game now crashes my GPU driver before it can start rendering the little intro cutscene it has. |
Collaborator
|
Also regresses Marvel's Spider-Man, the game now crashes on a device lost on the loading screen for a new game. |
Member
|
This PR solves the problem in Elden Ring, where the screen flashes several blocks. Pre-release e16a59b |
Collaborator
|
God of War is now back to crashing from missing DS_ORDERED_COUNT |
Contributor
|
Makes 'No straight roads' display video again... |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.














In the GCN architecture general purpose registers are mostly 32-bit wide, with the exception of VCC which is 64-bit (though its lower and upper 32-bit parts can also be used as registers). EXEC is also 64-bit and is used to modify the control flow of instructions, but its technically not general purpose even though most instructions can write to or read from it. With it being a 64-bit register, its manipulation is usually done with 64-bit arithmetic instructions.
For the simplicity of the generated code the recompiler has the concept of thread-bit type, which in the guest code is a subgroup-shared 64-bit bitmask meant to be stored into EXEC at some point, but in IR its represented as a boolean condition local to each thread. This makes resulting code more sane.
To avoid fighting the IR type system, even though the scalar and thread-bit SGPRs/VCC are the same registers, they are treated as separate register spaces in SSA. That works because guest shaders will not mix and match them. A thread-bit mask will be generated by specific instructions like V_CMP, consumed and then the SGPRs will be overwritten by scalar operations.
However cases have started to appear where this separation is breached or certain instructions which are ambiguous in nature, where its not certain which register space could be used. Some of these are the MBCNT instructions and the CMP_U64 family of instructions. The former until this point also wasn't actually implemented, rather substituted with a heuristic implementation suited to most of its practical uses. This PR replaces the heuristic with an actual implementation as well and adjusts DataAppend/DataConsume to work with the new implementation is mimics how the HW instruction works. The previous heuristic generated much cleaner code, but I believe the cost is negligible.
Type tracking is done as simply as possible, reg state is kept in a per CFG block structure. For each new CFG block, all processed predecessors are checked to "inherit" the state. If there are multiple predecessors the states are compared, if the state of a reg mismatches, its set as undefined (its expected the new block should not touch it then or overwrite the value to a defined type)