gh-148285: Allow recording uops after specializing uops#148373
gh-148285: Allow recording uops after specializing uops#148373adityakrmishra wants to merge 4 commits intopython:mainfrom
Conversation
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Tools/cases_generator/analyzer.py
Outdated
| # runtime, so this ordering is safe.) | ||
| preceding_is_specializing = ( | ||
| uop_index == 1 | ||
| and isinstance(parts[-1], Uop) |
There was a problem hiding this comment.
I think parts[-1] is not equal to the previous UOP: When there is an unused/N between the specialized UOP and the recorded UOP, parts[-1] represents the skip rather than the uop.
Tools/cases_generator/analyzer.py
Outdated
| # Counts only real OpName entries (not CacheEffect/flush) so we | ||
| # know the exact position of each concrete uop inside the macro. | ||
| # CacheEffect → becomes Skip; flush → becomes Flush. | ||
| # Neither increments uop_index because neither is a "real" uop. | ||
| uop_index = 0 |
There was a problem hiding this comment.
| # Counts only real OpName entries (not CacheEffect/flush) so we | |
| # know the exact position of each concrete uop inside the macro. | |
| # CacheEffect → becomes Skip; flush → becomes Flush. | |
| # Neither increments uop_index because neither is a "real" uop. | |
| uop_index = 0 | |
| prev_uop: Uop | None = None |
Tools/cases_generator/analyzer.py
Outdated
| if uop.properties.records_value: | ||
| # A recording uop is legal in exactly two positions: | ||
| # 1. It is the very first real uop (uop_index == 0). | ||
| # 2. It is at index 1 AND the immediately preceding | ||
| # real uop is a specializing uop, identified by | ||
| # the "_SPECIALIZE_" name prefix. | ||
| # (Specializing uops are Tier-1-only; recording | ||
| # uops are Tier-2-only — they are orthogonal at | ||
| # runtime, so this ordering is safe.) | ||
| preceding_is_specializing = ( | ||
| uop_index == 1 | ||
| and isinstance(parts[-1], Uop) | ||
| and parts[-1].name.startswith("_SPECIALIZE_") | ||
| ) | ||
| if uop_index != 0 and not preceding_is_specializing: | ||
| raise analysis_error( | ||
| f"Recording uop {part.name} must be first in macro " | ||
| f"or immediately follow a specializing uop", | ||
| macro.tokens[0]) | ||
| parts.append(uop) | ||
| first = False | ||
| uop_index += 1 |
There was a problem hiding this comment.
| if uop.properties.records_value: | |
| # A recording uop is legal in exactly two positions: | |
| # 1. It is the very first real uop (uop_index == 0). | |
| # 2. It is at index 1 AND the immediately preceding | |
| # real uop is a specializing uop, identified by | |
| # the "_SPECIALIZE_" name prefix. | |
| # (Specializing uops are Tier-1-only; recording | |
| # uops are Tier-2-only — they are orthogonal at | |
| # runtime, so this ordering is safe.) | |
| preceding_is_specializing = ( | |
| uop_index == 1 | |
| and isinstance(parts[-1], Uop) | |
| and parts[-1].name.startswith("_SPECIALIZE_") | |
| ) | |
| if uop_index != 0 and not preceding_is_specializing: | |
| raise analysis_error( | |
| f"Recording uop {part.name} must be first in macro " | |
| f"or immediately follow a specializing uop", | |
| macro.tokens[0]) | |
| parts.append(uop) | |
| first = False | |
| uop_index += 1 | |
| if (uop.properties.records_value | |
| and prev_uop is not None | |
| and "specializing" not in prev_uop.annotations): | |
| raise analysis_error( | |
| f"Recording uop {part.name} must be first in macro " | |
| f"or immediately follow a specializing uop", | |
| macro.tokens[0]) | |
| parts.append(uop) | |
| prev_uop = uop |
|
And we need to add some tests for this pr. |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
@cocolato Thanks for catching that edge case with the cache skips! I've updated the logic to use a prev_uop tracker as you suggested, which correctly ignores CacheEffect and flush. I also added unit tests in Tools/cases_generator/test_analyzer.py to cover both valid positions and invalid ones (like a recording uop following a non-specializing Tier 1 uop). Finally, I regenerated the C files locally to ensure the CI check passes. Thanks for the guidance |
d35f75f to
440b823
Compare
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
@cocolato It looks like regenerating the .c.h files locally on my Windows machine caused some CRLF line-ending or JIT build failures in the CI. I'll leave the C-file regeneration to the automated bots/maintainers from here to avoid messing up the Windows CI runners! |
|
@adityakrmishra there's no need to regen. Also please fix the CLRF stuff on your local machine. You can fix it using pre-commit or git https://stackoverflow.com/questions/2517190/how-do-i-force-git-to-use-lf-instead-of-crlf-under-windows |
|
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
|
@Fidget-Spinner Ah, understood! I've updated my global core.autocrlf setting to prevent Windows from messing with the line endings in the future. I also reverted the two generated C files back to the upstream/main state so they are pristine. Thanks for the tip! |
The Issue:
Currently,
analyzer.pyforces any uop withrecords_value == Trueto be at index 0. This prevents Tier 2 recording uops from safely trailing Tier 1 specializing uops.The Fix (V2):
Following feedback from @Sacul0457 on the previous PR, simply checking if the preceding uop was
tier == 1was too permissive (it allowed recording uops to follow any Tier 1 uop, not just specializing ones).This updated patch uses a strict positional
uop_indextracker inadd_macro().A recording uop is now only permitted if:
uop_index == 0(the very first uop).uop_index == 1AND the preceding uop's name explicitly starts with_SPECIALIZE_.CacheEffect(e.g.,unused/1) andflushparts do not increment theuop_index, ensuring they remain completely transparent.This strictly enforces the architecture while allowing structures like:
macro(X) = _SPECIALIZE_X + _RECORD_TOS_TYPE + unused/1 + _X;