Skip to content

Speed up and clean up acceptance tests#3248

Open
st3penta wants to merge 12 commits intoconforma:mainfrom
st3penta:EC-1710-impl-v3
Open

Speed up and clean up acceptance tests#3248
st3penta wants to merge 12 commits intoconforma:mainfrom
st3penta:EC-1710-impl-v3

Conversation

@st3penta
Copy link
Copy Markdown
Contributor

The acceptance tests were slow in CI and hard to debug when
they failed. Most of the wall-clock time went into sequential
image builds and a setup phase that blocked on each step
finishing before the next could start. On top of that, test
output was noisy and interleaved across parallel goroutines,
making failure diagnosis a chore.

The setup phase now overlaps image builds with the Tekton
deployment, builds task bundles in parallel, and skips the
CLI image build entirely when source hasn't changed. The old
multi-stage Docker build has been replaced with a host-side
binary injection into a minimal base image, and CI downloads
pre-built tool binaries instead of compiling them from source.
Kind scenario scheduling no longer bottlenecks on CPU resource
requests that serialized pod creation.

Logging has been reworked so each scenario writes to its own
file, eliminating cross-goroutine interleaving. Passing runs
are now quiet by default, and failed scenario summaries with
log file paths are printed at the end. Test containers are
properly terminated after each scenario instead of
accumulating for the entire run.

The increased parallelism exposed a couple of latent race
conditions: two validate_input scenarios were writing to the
same files in the repo root, and ConfigMap RBAC setup depended
on scenario execution order. Both have been fixed so tests
run reliably regardless of scheduling.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

Ref: https://issues.redhat.com/browse/EC-1710

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

Warning

Rate limit exceeded

@st3penta has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 40 minutes and 20 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 40 minutes and 20 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: c3608e0d-bf15-44f1-81ef-ed22991ef0a7

📥 Commits

Reviewing files that changed from the base of the PR and between 4daaa9f and b758c00.

⛔ Files ignored due to path filters (1)
  • features/__snapshots__/validate_input.snap is excluded by !**/*.snap
📒 Files selected for processing (18)
  • .github/workflows/checks-codecov.yaml
  • Makefile
  • acceptance/acceptance_test.go
  • acceptance/cli/cli.go
  • acceptance/conftest/conftest.go
  • acceptance/git/git.go
  • acceptance/go.mod
  • acceptance/kubernetes/kind/acceptance.Dockerfile
  • acceptance/kubernetes/kind/image.go
  • acceptance/kubernetes/kind/kind.go
  • acceptance/kubernetes/kind/kubernetes.go
  • acceptance/log/log.go
  • acceptance/log/log_test.go
  • acceptance/registry/registry.go
  • acceptance/testenv/testenv.go
  • acceptance/wiremock/wiremock.go
  • features/validate_input.feature
  • hack/ubi-base-image-bump.sh
📝 Walkthrough

Walkthrough

This PR enhances the acceptance testing infrastructure by introducing cached Go dependency restoration in CI workflows, adding explicit CLI tool installation (tkn, kubectl), refactoring container-based test dependencies with proper lifecycle management, implementing file-based per-scenario logging with configurable output formats, and optimizing image building with caching and concurrent operations.

Changes

Cohort / File(s) Summary
CI/Workflow Infrastructure
.github/workflows/checks-codecov.yaml, hack/ubi-base-image-bump.sh
Workflow now caches Go build and module directories using versioned keys derived from go.sum files and installs external CLIs (tkn, kubectl) by parsing versions from tool go.mod files. Script updated to include new acceptance.Dockerfile in base-image digest bump operations.
Build Configuration
Makefile, acceptance/go.mod
Acceptance target adds E2E_INSTRUMENTATION=true flag, elapsed time logging via SECONDS, and improved test failure propagation. Task-bundle target introduces TKN variable for fallback tooling. Dependencies updated to directly require golang.org/x/sync and k8s.io/klog/v2.
Test Framework Core
acceptance/acceptance_test.go, acceptance/testenv/testenv.go
Added -verbose and -format flags for configurable output; switched runner format from hardcoded pretty to dynamic selection. Refactored failure tracking to store per-scenario log file paths. Added klog stderr suppression. Introduced VerboseOutput context key for verbose output control.
CLI & Command Execution
acceptance/cli/cli.go, acceptance/conftest/conftest.go
Verbose execution details now printed only when testenv.VerboseOutput is true. Error comparison in CLI now uses structured diff output. Conftest output logging is conditional on command failure. Step execution logging only triggers on non-nil errors.
Test Dependency Container Management
acceptance/git/git.go, acceptance/registry/registry.go, acceptance/wiremock/wiremock.go
All three added explicit container instance storage and lifecycle management: containers are now properly terminated in scenario teardown hooks when environment is not persisted, with warning logs on termination failures.
Logging System
acceptance/log/log.go, acceptance/log/log_test.go
Replaced testing.T-backed delegation with per-scenario temporary file logging. New Logger interface methods Close() and LogFile() enable file-based output. Unit tests refactored to validate file I/O and scenario isolation instead of mocking delegates.
Image Building & Kubernetes Setup
acceptance/kubernetes/kind/acceptance.Dockerfile, acceptance/kubernetes/kind/image.go, acceptance/kubernetes/kind/kind.go, acceptance/kubernetes/kind/kubernetes.go
New Dockerfile defines minimal UBI9-based acceptance test image. Image building refactored with SHA-256 hash caching, fallback kubectl discovery, and podman build integration. Concurrent execution added for image/bundle builds and deployment waits via errgroup. Namespace YAML application now uses refactored applyResources with concurrent namespace-scoped resource deployment.
Feature Tests
features/validate_input.feature
Two scenarios updated to reference test files via ${TMPDIR} paths instead of workspace root, matching updated command invocations.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant Kind as Kind Cluster
    participant Registry as Image Registry
    participant Podman
    participant Docker
    
    Caller->>Kind: Start()
    Kind->>Kind: ensureConfigMapRBAC()
    Kind->>Kind: waitForDeployment(image-registry)
    
    par Concurrent Operations
        Kind->>Kind: buildCliImage()
        activate Kind
        Kind->>Kind: computeSourceHash()
        Kind->>Kind: check cache marker
        opt Cache miss
            Kind->>Docker: go build ec binary
            Kind->>Kind: locate/build kubectl
            Kind->>Podman: build with acceptance.Dockerfile
            Podman->>Registry: push image
            Kind->>Kind: write cache marker
        end
        deactivate Kind
        
        and
        Kind->>Kind: buildTaskBundleImage()
        activate Kind
        par Bundle Tasks
            Kind->>Docker: make task-bundle (concurrent)
            Kind->>Docker: make task-bundle (concurrent)
            Kind->>Docker: make task-bundle (concurrent)
        end
        deactivate Kind
        
        and
        Kind->>Kind: waitForDeployment(tekton-pipelines)
    end
    
    Kind->>Caller: return
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 67.74% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: optimization and cleanup of acceptance tests through parallelization and improved logging.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, detailing performance improvements, logging changes, container lifecycle management, and bug fixes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
acceptance/conftest/conftest.go (1)

95-115: ⚠️ Potential issue | 🟡 Minor

Debug output is now suppressed for non-execution failures.

Previously the deferred block emitted stdout/stderr whenever the function returned an error; it now only fires when cmd.Run() itself failed. Failures in the subsequent ReadFile/diff path (e.g., conftest ran but produced unexpected content) will no longer surface the command output, which can make such failures harder to debug. If that tradeoff is intentional (quieter passing runs, per-scenario log files elsewhere), ignore; otherwise, consider gating on the outer err or on testenv.VerboseOutput instead of cmdErr.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acceptance/conftest/conftest.go` around lines 95 - 115, The deferred
debug-print block only runs when cmdErr is non-nil and cmdErr is set by
cmd.Run(), so errors that occur later (e.g., file reads/diff checks after
cmd.Run()) no longer trigger stdout/stderr printing; change the gating so the
defer prints when the function returns any non-nil err or when verbose output is
enabled: capture the named return error (err) or check
testenv.VerboseOutput(ctx) inside the defer (instead of only cmdErr) and use
that to decide whether to print; refer to the defer func, cmdErr, cmd.Run(), and
testenv.VerboseOutput to locate and modify the condition.
acceptance/wiremock/wiremock.go (1)

288-336: ⚠️ Potential issue | 🟠 Major

Consolidate After hooks: termination currently runs before unmatched-requests check.

Two sc.After hooks are registered at lines 289 and 318. In godog v0.15.0, After hooks execute in LIFO order (last registered first), so the termination hook runs before the unmatched-requests check. This terminates the WireMock container before UnmatchedRequests() can call its admin API, causing the check to fail.

Consolidate both into a single hook, running the unmatched-requests check before termination.

Suggested consolidation
-	sc.After(func(ctx context.Context, finished *godog.Scenario, scenarioErr error) (context.Context, error) {
-		if !IsRunning(ctx) {
-			return ctx, nil
-		}
-
-		w, err := wiremockFrom(ctx)
-		if err != nil {
-			// wiremock wasn't launched, we don't need to proceed
-			return ctx, err
-		}
-
-		unmatched, err := w.UnmatchedRequests()
-		if err != nil {
-			return ctx, err
-		}
-
-		if len(unmatched) == 0 {
-			return ctx, nil
-		}
-
-		logger, ctx := log.LoggerFor(ctx)
-		logger.Log("Found unmatched WireMock requests:")
-		for i, u := range unmatched {
-			logger.Logf("[%d]: %s", i, u)
-		}
-
-		return ctx, nil
-	})
-
-	sc.After(func(ctx context.Context, finished *godog.Scenario, scenarioErr error) (context.Context, error) {
-		if testenv.Persisted(ctx) {
-			return ctx, nil
-		}
-
-		if !testenv.HasState[wiremockState](ctx) {
-			return ctx, nil
-		}
-
-		state := testenv.FetchState[wiremockState](ctx)
-		if state.Container != nil {
-			if err := state.Container.Terminate(ctx); err != nil {
-				logger, _ := log.LoggerFor(ctx)
-				logger.Warnf("failed to terminate wiremock container: %v", err)
-			}
-		}
-
-		return ctx, nil
-	})
+	sc.After(func(ctx context.Context, finished *godog.Scenario, scenarioErr error) (context.Context, error) {
+		if !IsRunning(ctx) {
+			return ctx, nil
+		}
+
+		if w, err := wiremockFrom(ctx); err == nil {
+			if unmatched, err := w.UnmatchedRequests(); err == nil && len(unmatched) > 0 {
+				logger, _ := log.LoggerFor(ctx)
+				logger.Log("Found unmatched WireMock requests:")
+				for i, u := range unmatched {
+					logger.Logf("[%d]: %s", i, u)
+				}
+			}
+		}
+
+		if testenv.Persisted(ctx) {
+			return ctx, nil
+		}
+
+		state := testenv.FetchState[wiremockState](ctx)
+		if state.Container != nil {
+			if err := state.Container.Terminate(ctx); err != nil {
+				logger, _ := log.LoggerFor(ctx)
+				logger.Warnf("failed to terminate wiremock container: %v", err)
+			}
+		}
+
+		return ctx, nil
+	})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acceptance/wiremock/wiremock.go` around lines 288 - 336, The two sc.After
hooks in AddStepsTo run LIFO so the termination hook (which uses
testenv.FetchState[wiremockState] and state.Container.Terminate) currently runs
before the UnmatchedRequests check (wiremockFrom(ctx).UnmatchedRequests()),
causing the admin API to be unreachable; consolidate them into a single sc.After
hook that first calls wiremockFrom(ctx) and w.UnmatchedRequests() (logging as
currently done) and only after processing unmatched requests performs the
testenv.HasState[wiremockState] / testenv.FetchState[wiremockState] termination
logic (including the existing logger.Warnf on Terminate error), preserving the
same early returns and error handling.
🧹 Nitpick comments (5)
acceptance/registry/registry.go (1)

330-348: Terminate uses the scenario's context, which may already be cancelled.

If godog cancels ctx on scenario failure/timeout, state.Container.Terminate(ctx) can fail immediately and the container may leak (Ryuk still reaps it, but termination becomes best-effort). Consider using context.WithTimeout(context.Background(), ...) for teardown to ensure clean shutdown independent of scenario cancellation. Same pattern likely applies to the analogous After hooks added in acceptance/git/git.go and acceptance/wiremock/wiremock.go.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acceptance/registry/registry.go` around lines 330 - 348, The After hook uses
the scenario's ctx which may be cancelled; change the teardown to call
state.Container.Terminate with a fresh context that has a timeout (e.g.,
context.WithTimeout(context.Background(), <reasonableDuration>)) so termination
runs independent of scenario cancellation; create the cancel func and defer
cancel, call state.Container.Terminate(newCtx) and log errors as before. Apply
the same pattern to the equivalent After hooks referencing
registryState/state.Container.Terminate in acceptance/git/git.go and
acceptance/wiremock/wiremock.go.
acceptance/kubernetes/kind/acceptance.Dockerfile (1)

22-22: Consider separating microdnf upgrade from install and cleaning caches.

Chaining upgrade and install works, but running them as separate steps (and ending with microdnf clean all) is the idiomatic UBI pattern and produces slightly smaller images. Minor, non-blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acceptance/kubernetes/kind/acceptance.Dockerfile` at line 22, Split the
combined microdnf command into two RUN steps: first run "microdnf upgrade
--assumeyes --nodocs --setopt=keepcache=0 --refresh" and then run "microdnf -y
--nodocs --setopt=keepcache=0 install gzip jq ca-certificates", and finally add
a "microdnf clean all" (or equivalent) at the end of the install step to clear
caches; update the Dockerfile commands around the existing microdnf upgrade and
install invocations so they run in separate layers and include the clean step to
reduce image size.
acceptance/log/log.go (1)

165-168: Panic on temp-file creation failure is harsh but acceptable; consider returning an error instead for testability.

panic terminates the entire test binary on a transient filesystem issue. Returning an error (and changing the signature) would let callers degrade gracefully or retry.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@acceptance/log/log.go` around lines 165 - 168, Replace the panic on
os.CreateTemp failure with error propagation: change the enclosing function (the
one that calls os.CreateTemp(fmt.Sprintf("scenario-%010d-*.log", id))) to return
an error (and the temp file or nil) instead of panicking, return the wrapped
error when os.CreateTemp fails, and update all callers to handle/propagate the
error; specifically modify the function signature to include error return,
replace panic(fmt.Sprintf(..., err)) with returning fmt.Errorf("failed to create
scenario log file: %w", err) (or equivalent), and adjust callers/tests
accordingly.
.github/workflows/checks-codecov.yaml (1)

128-139: Fragile grep parsing may pick the wrong version if multiple matching lines exist.

grep 'tektoncd/cli' tools/go.mod and grep 'k8s.io/kubernetes' tools/kubectl/go.mod match any line (including transitive/indirect entries) containing the substring. If go.mod ever grows additional matching entries, awk '{print $2}' on multi-line output will break silently. Prefer go list -m -modfile=... <module> or a more precise anchored regex.

-          TKN_VERSION=$(grep 'tektoncd/cli' tools/go.mod | awk '{print $2}' | sed 's/^v//')
+          TKN_VERSION=$(go list -m -modfile=tools/go.mod -f '{{.Version}}' github.com/tektoncd/cli | sed 's/^v//')
...
-          KUBECTL_VERSION=$(grep 'k8s.io/kubernetes' tools/kubectl/go.mod | grep -oE 'v[0-9]+\.[0-9]+\.[0-9]+')
+          KUBECTL_VERSION=$(go list -m -modfile=tools/kubectl/go.mod -f '{{.Version}}' k8s.io/kubernetes)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/checks-codecov.yaml around lines 128 - 139, The current
TKN_VERSION and KUBECTL_VERSION extraction using grep (the lines that set
TKN_VERSION=$(grep 'tektoncd/cli' tools/go.mod ...) and KUBECTL_VERSION=$(grep
'k8s.io/kubernetes' tools/kubectl/go.mod ...) ) is fragile; replace those
grep/awk commands with precise module resolution using go list -m -modfile=...
(e.g., run go list -m -modfile=tools/go.mod tektoncd/cli and go list -m
-modfile=tools/kubectl/go.mod k8s.io/kubernetes) to get exact versions, assign
their stdout to TKN_VERSION and KUBECTL_VERSION, strip the leading "v" from
TKN_VERSION if needed, and keep the existing curl/tar and curl/chmod steps
unchanged; add a simple check to fail early if the go list command returns
empty.
Makefile (1)

137-140: Minor: test_passed preserves exit status, but a failure in go tool covdata textfmt on line 139 will short-circuit before the final [ ... ] check.

With .SHELLFLAGS=-e -c, if go tool covdata textfmt exits non-zero (e.g., no coverage files produced after test failure), the recipe will fail with that error and you'll lose the test_passed signal. Consider making the covdata step tolerant of missing data or placing the final check before it.

-	cd acceptance && go test -timeout $(ACCEPTANCE_TIMEOUT) ./... && test_passed=1 || test_passed=0; \
-	echo "[`date '+%H:%M:%S'`] Tests finished in $$((SECONDS/60))m$$((SECONDS%60))s"; \
-	go tool covdata textfmt -i=$${GOCOVERDIR} -o="$(ROOT_DIR)/coverage-acceptance.out"; \
-	[ "$$test_passed" = "1" ]
+	cd acceptance && go test -timeout $(ACCEPTANCE_TIMEOUT) ./... && test_passed=1 || test_passed=0; \
+	echo "[`date '+%H:%M:%S'`] Tests finished in $$((SECONDS/60))m$$((SECONDS%60))s"; \
+	go tool covdata textfmt -i=$${GOCOVERDIR} -o="$(ROOT_DIR)/coverage-acceptance.out" || true; \
+	[ "$$test_passed" = "1" ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 137 - 140, The recipe currently sets test_passed based
on the `go test` result but then runs `go tool covdata textfmt` which can abort
the shell (because of `.SHELLFLAGS=-e -c`) and lose the `test_passed` signal;
update the Makefile so the covdata step cannot fail the recipe (e.g., make the
`go tool covdata textfmt -i=${GOCOVERDIR}
-o="${ROOT_DIR}/coverage-acceptance.out"` invocation tolerant of missing data by
appending a conditional fallback such as `|| true` or otherwise swallowing
non-zero exits) or move the final `[ "$$test_passed" = "1" ]` check to run
before the covdata line; reference the `test_passed` variable, the `go tool
covdata textfmt` invocation, and the `GOCOVERDIR`/`ROOT_DIR` variables when
applying the fix.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@acceptance/acceptance_test.go`:
- Around line 235-241: The current TestMain uses klog.LogToStderr(false) and
klog.SetOutput(io.Discard) unconditionally which hides all k8s client warnings;
change TestMain so klog is silenced only when tests are not verbose (use
testing.Verbose() or a -verbose flag): if not verbose, call
klog.LogToStderr(false) and klog.SetOutput(io.Discard); otherwise restore normal
output (e.g., klog.LogToStderr(true) and klog.SetOutput(os.Stderr)) or wire klog
into the per-scenario logger used by your tests; update imports accordingly and
keep references to TestMain, klog.LogToStderr, klog.SetOutput, io.Discard (and
os.Stderr/testing.Verbose) so the behavior is reversible for debugging.

In `@acceptance/kubernetes/kind/image.go`:
- Around line 98-112: The code currently copies any kubectl found via
exec.LookPath into kubectlBinary which can lead to client/server version drift;
change the logic in this block (symbols: kubectlBinary, exec.LookPath, copyFile,
kubectlBuildCmd, tools/kubectl/go.mod) to prefer building the pinned kubectl
from tools/kubectl/go.mod by default and only use the PATH binary when an
explicit opt-in (env var or config flag like USE_HOST_KUBECTL) is set; if opt-in
is allowed, detect the host kubectl version (invoke kubectl version --client
--short) and log that version before copying, or compare it to the pinned
version and warn/error on mismatch to make cache/CI differences traceable.
- Around line 54-66: In buildCliImage, the cache file path currently uses a
fixed /tmp/ec-cli-image-cache-<port>.hash which can collide across users and
repos; change the cacheFile construction in computeSourceHash logic to use a
user-scoped cache directory (os.UserCacheDir(), falling back to os.TempDir())
and include a repo-specific and user-unique component (e.g., the current working
directory or repo root hash and UID) plus k.registryPort to form the filename so
cache files are isolated per user/checkout; update references to cacheFile in
buildCliImage to use that new path.

---

Outside diff comments:
In `@acceptance/conftest/conftest.go`:
- Around line 95-115: The deferred debug-print block only runs when cmdErr is
non-nil and cmdErr is set by cmd.Run(), so errors that occur later (e.g., file
reads/diff checks after cmd.Run()) no longer trigger stdout/stderr printing;
change the gating so the defer prints when the function returns any non-nil err
or when verbose output is enabled: capture the named return error (err) or check
testenv.VerboseOutput(ctx) inside the defer (instead of only cmdErr) and use
that to decide whether to print; refer to the defer func, cmdErr, cmd.Run(), and
testenv.VerboseOutput to locate and modify the condition.

In `@acceptance/wiremock/wiremock.go`:
- Around line 288-336: The two sc.After hooks in AddStepsTo run LIFO so the
termination hook (which uses testenv.FetchState[wiremockState] and
state.Container.Terminate) currently runs before the UnmatchedRequests check
(wiremockFrom(ctx).UnmatchedRequests()), causing the admin API to be
unreachable; consolidate them into a single sc.After hook that first calls
wiremockFrom(ctx) and w.UnmatchedRequests() (logging as currently done) and only
after processing unmatched requests performs the testenv.HasState[wiremockState]
/ testenv.FetchState[wiremockState] termination logic (including the existing
logger.Warnf on Terminate error), preserving the same early returns and error
handling.

---

Nitpick comments:
In @.github/workflows/checks-codecov.yaml:
- Around line 128-139: The current TKN_VERSION and KUBECTL_VERSION extraction
using grep (the lines that set TKN_VERSION=$(grep 'tektoncd/cli' tools/go.mod
...) and KUBECTL_VERSION=$(grep 'k8s.io/kubernetes' tools/kubectl/go.mod ...) )
is fragile; replace those grep/awk commands with precise module resolution using
go list -m -modfile=... (e.g., run go list -m -modfile=tools/go.mod tektoncd/cli
and go list -m -modfile=tools/kubectl/go.mod k8s.io/kubernetes) to get exact
versions, assign their stdout to TKN_VERSION and KUBECTL_VERSION, strip the
leading "v" from TKN_VERSION if needed, and keep the existing curl/tar and
curl/chmod steps unchanged; add a simple check to fail early if the go list
command returns empty.

In `@acceptance/kubernetes/kind/acceptance.Dockerfile`:
- Line 22: Split the combined microdnf command into two RUN steps: first run
"microdnf upgrade --assumeyes --nodocs --setopt=keepcache=0 --refresh" and then
run "microdnf -y --nodocs --setopt=keepcache=0 install gzip jq ca-certificates",
and finally add a "microdnf clean all" (or equivalent) at the end of the install
step to clear caches; update the Dockerfile commands around the existing
microdnf upgrade and install invocations so they run in separate layers and
include the clean step to reduce image size.

In `@acceptance/log/log.go`:
- Around line 165-168: Replace the panic on os.CreateTemp failure with error
propagation: change the enclosing function (the one that calls
os.CreateTemp(fmt.Sprintf("scenario-%010d-*.log", id))) to return an error (and
the temp file or nil) instead of panicking, return the wrapped error when
os.CreateTemp fails, and update all callers to handle/propagate the error;
specifically modify the function signature to include error return, replace
panic(fmt.Sprintf(..., err)) with returning fmt.Errorf("failed to create
scenario log file: %w", err) (or equivalent), and adjust callers/tests
accordingly.

In `@acceptance/registry/registry.go`:
- Around line 330-348: The After hook uses the scenario's ctx which may be
cancelled; change the teardown to call state.Container.Terminate with a fresh
context that has a timeout (e.g., context.WithTimeout(context.Background(),
<reasonableDuration>)) so termination runs independent of scenario cancellation;
create the cancel func and defer cancel, call state.Container.Terminate(newCtx)
and log errors as before. Apply the same pattern to the equivalent After hooks
referencing registryState/state.Container.Terminate in acceptance/git/git.go and
acceptance/wiremock/wiremock.go.

In `@Makefile`:
- Around line 137-140: The recipe currently sets test_passed based on the `go
test` result but then runs `go tool covdata textfmt` which can abort the shell
(because of `.SHELLFLAGS=-e -c`) and lose the `test_passed` signal; update the
Makefile so the covdata step cannot fail the recipe (e.g., make the `go tool
covdata textfmt -i=${GOCOVERDIR} -o="${ROOT_DIR}/coverage-acceptance.out"`
invocation tolerant of missing data by appending a conditional fallback such as
`|| true` or otherwise swallowing non-zero exits) or move the final `[
"$$test_passed" = "1" ]` check to run before the covdata line; reference the
`test_passed` variable, the `go tool covdata textfmt` invocation, and the
`GOCOVERDIR`/`ROOT_DIR` variables when applying the fix.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro Plus

Run ID: 20300c2d-e7e2-4dbb-85de-0d9a7a817b75

📥 Commits

Reviewing files that changed from the base of the PR and between 64c9678 and 4daaa9f.

⛔ Files ignored due to path filters (1)
  • features/__snapshots__/validate_input.snap is excluded by !**/*.snap
📒 Files selected for processing (18)
  • .github/workflows/checks-codecov.yaml
  • Makefile
  • acceptance/acceptance_test.go
  • acceptance/cli/cli.go
  • acceptance/conftest/conftest.go
  • acceptance/git/git.go
  • acceptance/go.mod
  • acceptance/kubernetes/kind/acceptance.Dockerfile
  • acceptance/kubernetes/kind/image.go
  • acceptance/kubernetes/kind/kind.go
  • acceptance/kubernetes/kind/kubernetes.go
  • acceptance/log/log.go
  • acceptance/log/log_test.go
  • acceptance/registry/registry.go
  • acceptance/testenv/testenv.go
  • acceptance/wiremock/wiremock.go
  • features/validate_input.feature
  • hack/ubi-base-image-bump.sh

Comment thread acceptance/acceptance_test.go
Comment thread acceptance/kubernetes/kind/image.go
Comment thread acceptance/kubernetes/kind/image.go
st3penta added 12 commits April 17, 2026 16:17
All 285 containers accumulated during the full test run with
zero cleanup, wasting memory and adding Docker daemon overhead.
Each container type (registry, git, wiremock) now stores the
testcontainers.Container reference and calls Terminate(ctx) in
the sc.After hook, respecting the persist flag.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
Each scenario now writes to its own temp file, which eliminates
log interleaving across parallel goroutines. Remove
shouldSuppress(), which incorrectly filtered Error-level
messages, and remove /dev/tty writes that failed silently in CI.
Replace os.Exit(1) with t.Fatalf() so TestMain cleanup runs.
Print failed scenario log file paths in the test summary.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
Instead of running make push-image (full multi-stage Dockerfile
that compiles Go inside the container), build the ec and kubectl
binaries locally with the host Go cache, then inject them into a
minimal ubi-minimal base image via acceptance.Dockerfile.

The old approach compiled Go twice: once on the host, once in the
container. Skipping the in-container build significantly reduces
CI build time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
The make push-image step was one of the slowest steps in CI even
when source had not changed. Compute a SHA-256 hash of all build
inputs (Go source, go.mod, go.sum, Dockerfile, build.sh,
Makefile, hack/reduce-snapshot.sh) and compare it against a
per-registry-port cache marker file. When the hash matches,
skip the build entirely.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
The sequential per-version task bundle loop was the slowest step
in CI, much slower than the same step locally. Each version
produces an independent bundle image, so the builds are safe to
run concurrently. Use errgroup to propagate the first error and
cancel remaining builds via context.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
After applying cluster resources, wait only for the in-cluster
registry before starting image builds. The CLI image build and
task bundle build now run concurrently with each other and with
the Tekton Pipelines deployment, since they only need the
registry to push to. Significantly reduces CI setup time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
The acceptance job's Go build cache was broken: it used
actions/cache/restore (read-only) with path '**' and key 'main',
which did not match the real cache paths (~/.cache/go-build,
~/go/pkg/mod). Switch to read-write actions/cache targeting the
correct directories with a content-based key from go.sum.

Download pre-built tkn and kubectl binaries (versions extracted
from tools/go.mod and tools/kubectl/go.mod) instead of compiling
from source every run. The Makefile and image.go prefer binaries
from PATH when available, falling back to go build locally.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
Print the test suite start time so the user can estimate when
it will finish, and output total elapsed time at the end. The
shell runs with -e, so a go test failure would abort before the
timing echo. Capture the exit code and re-propagate it after
printing duration and collecting coverage so timing info
survives test failures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
Switch godog formatter from "pretty" to "progress" (overridable
via -format flag or EC_ACCEPTANCE_FORMAT env var), with output
suppressed by default via progress:/dev/null. Gate logExecution
and conftest debug output on scenario or command failure so
passing scenarios stay silent.

Route snapshot artifact debug prints through the per-scenario
file logger instead of stdout. Suppress k8s client-side
throttling warnings by disabling klog logtostderr. Move failed
scenario summaries and the profiling report to TestMain so they
appear after all go test output. Gate verbose execution details
behind a -verbose flag and use diff for stderr assertions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
Remove CPU resource requests from task bundle steps in the
acceptance tests to eliminate the Tekton scheduling waterfall.
Each TaskRun pod requested 2600m CPU, limiting concurrent pods
to 1-2 on CI (3.5 allocatable CPUs), which effectively
serialized 26 Kind scenarios. Without the requests all pods
schedule immediately.

Set imagePullPolicy: IfNotPresent on CLI image steps as a
defensive measure. Parallelize namespaced resource application
in applyResources by applying cluster-scoped resources first,
then namespaced resources concurrently via errgroup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
The ensureConfigMapRBAC() call lived only in CreateConfigMap, so
scenarios that never create a ConfigMap (like "Collect keyless
signing parameters when the namespace does not exist") relied on
another scenario running first to set up the ClusterRole and
ClusterRoleBinding. Parallel execution broke this assumption.

Move the call into Start() right after applyResources, so RBAC
is in place before any scenario runs. The call in CreateConfigMap
stays as an idempotent guard.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
Two scenarios in validate_input.feature both write policy.yaml
and input.json to the shared repo root. When godog schedules
them concurrently, one overwrites the other's files before ec
reads them, causing spurious exit code mismatches.

Write per-scenario files to ${TMPDIR} instead, which is unique
per scenario and avoids the collision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: https://issues.redhat.com/browse/EC-1710
@st3penta
Copy link
Copy Markdown
Contributor Author

Went through the CodeRabbit review and addressed two findings:

  • Wiremock After hook ordering (wiremock.go:288-336): consolidated into a single hook so unmatched requests are checked before container termination.
  • Makefile covdata (Makefile:137-140): added || true so a failing go tool covdata textfmt doesn't swallow the test result.

For the rest of the CodeRabbit suggestions, here's why we're leaving them as-is:

  • Conftest debug suppression (conftest.go:95-115): the per-scenario log files capture all the output we need. The old behavior was overly noisy.
  • Terminate uses scenario ctx (registry.go:330-348, same pattern in git.go and wiremock.go): if the context is cancelled, Ryuk still reaps the container. Not worth the extra context.Background() plumbing.
  • Dockerfile microdnf split (acceptance.Dockerfile:22): splitting into separate RUN steps would actually increase image size by adding layers. The combined form is standard.
  • log.go panic (log.go:165-168): if the OS can't create a temp file, the entire test run is unrecoverable anyway. Panic is the right call here.
  • grep version parsing (checks-codecov.yaml:128-139): go list -m would be slightly cleaner but the go.mod files each have exactly one matching line, so the current grep works reliably. Not worth the churn.

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 17, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag Coverage Δ
acceptance 55.20% <ø> (-0.01%) ⬇️
generative 17.90% <ø> (ø)
integration 26.65% <ø> (ø)
unit 69.01% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@joejstuart
Copy link
Copy Markdown
Contributor

There's a cve

[Violation] cve.cve_blockers
    ImageRef: quay.io/redhat-user-workloads/rhtap-contract-tenant/ec-main/cli-main@sha256:1a62ccbf1933dfbe1dbdaea8b586afa587a63396545437b6777841b184b3904d
    Reason: Found "CVE-2026-4424" vulnerability of high security level
    Term: CVE-2026-4424
    Title: Blocking CVE check
    Description: The SLSA Provenance attestation for the image is inspected to ensure CVEs that have a known fix and meet a certain
    security level have not been detected. If detected, this policy rule will fail. By default, only CVEs of critical and high
    security level cause a failure. This is configurable by the rule data key `restrict_cve_security_levels`. The available levels
    are critical, high, medium, low, and unknown. In addition to that leeway can be granted per severity using the `cve_leeway` rule
    data key containing days of allowed leeway, measured as time between found vulnerability's public disclosure date and current
    effective time, per severity level. To exclude this rule add "cve.cve_blockers:CVE-2026-4424" to the `exclude` section of the
    policy configuration.
    Solution: Make sure to address any CVE's related to the image.
    ```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants