fix(crafter): use deterministic filename for AI config collector#2943
fix(crafter): use deterministic filename for AI config collector#2943waveywaves wants to merge 2 commits intochainloop-dev:mainfrom
Conversation
486591a to
cb22854
Compare
There was a problem hiding this comment.
1 issue found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="pkg/attestation/crafter/collector_aiagentconfig_test.go">
<violation number="1" location="pkg/attestation/crafter/collector_aiagentconfig_test.go:59">
P3: Test hardcodes a Unix-style path and full-path string equality, but aiConfigTempFilePath uses filepath.Join, which is OS-dependent. This assertion will fail on Windows even when the implementation is correct.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| path := aiConfigTempFilePath(testTempDir, testAgent, testConfigHash) | ||
|
|
||
| require.Contains(t, path, testConfigHash, "path must contain the full config hash (not truncated)") | ||
| assert.Equal(t, testTempDir+"/ai-agent-config-"+testAgent+"-"+testConfigHash+".json", path) |
There was a problem hiding this comment.
P3: Test hardcodes a Unix-style path and full-path string equality, but aiConfigTempFilePath uses filepath.Join, which is OS-dependent. This assertion will fail on Windows even when the implementation is correct.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At pkg/attestation/crafter/collector_aiagentconfig_test.go, line 59:
<comment>Test hardcodes a Unix-style path and full-path string equality, but aiConfigTempFilePath uses filepath.Join, which is OS-dependent. This assertion will fail on Windows even when the implementation is correct.</comment>
<file context>
@@ -0,0 +1,61 @@
+ path := aiConfigTempFilePath(testTempDir, testAgent, testConfigHash)
+
+ require.Contains(t, path, testConfigHash, "path must contain the full config hash (not truncated)")
+ assert.Equal(t, testTempDir+"/ai-agent-config-"+testAgent+"-"+testConfigHash+".json", path)
+ })
+}
</file context>
cb22854 to
449088d
Compare
449088d to
c7e366e
Compare
|
@jiparis @migmartri gentle ping — this is ready for review. CI is green except for the known flaky |
jiparis
left a comment
There was a problem hiding this comment.
Thanks for the contribution @waveywaves.
Please remove those PR-metadata changes since they are not related to the main issue.
Regarding the AI config deduplication in the CAS, note that it was already solved at #2917 (filename doesn't affect to CAS uploads), however, I agree that using the digest is better than a random rumber. Just check my comment about the filename length.
| return fmt.Errorf("marshaling PR/MR metadata: %w", err) | ||
| } | ||
|
|
||
| materialName, tmpFile, err := createPRMetadataTempFile(metadata.Number, jsonData) |
There was a problem hiding this comment.
PR metadata uses the PR number intentionally. It's already deduplicated across different PRs, but they share the name within the same PR since basically the latest one is the one that matters.
There was a problem hiding this comment.
Done. Reverted all PR metadata changes — the PR number naming is intentional. Restored the original file and test.
|
|
||
| if _, err := tmpFile.Write(jsonData); err != nil { | ||
| tmpFile.Close() | ||
| // Use a deterministic filename derived from the config hash so that retries |
There was a problem hiding this comment.
This sentence is not accurate. Deduplication is done through the digest, which is already addressed #2917
There was a problem hiding this comment.
Fixed. Removed the misleading dedup claim — updated the comment to just note the constant filename for consistent Artifact.Name.
| // produce the same Artifact.Name (via fileStats -> os.Stat().Name()) and | ||
| // avoid duplicate CAS uploads. PR #2917 fixed the primary root cause | ||
| // (non-deterministic content from captured_at); this is additional hardening. | ||
| tmpPath := filepath.Join(os.TempDir(), fmt.Sprintf("ai-agent-config-%s-%s.json", agentName, data.ConfigHash)) |
There was a problem hiding this comment.
Note that ConfigHash has a length of 64 characters, too long for a filename. I'd rather take the first 6 characters, or better, just use a constant name like ai-agent-config.json. Name conflicts are ok here, assuming that the project/workflow combo is always constant in the attestation (they refer to the same repository).
There was a problem hiding this comment.
Good call. Switched to a constant filename: ai-agent-config-{agentName}.json in os.TempDir(). No hash in the filename at all — name conflicts within the same attestation are expected.
c7e366e to
14ef4c0
Compare
jiparis
left a comment
There was a problem hiding this comment.
@waveywaves none of the commits are signed. This PR cannot be accepted for that reason.
Replace os.CreateTemp (which generates a random suffix) with a deterministic filename derived from the config's content hash. Root cause: although AddMaterialContractFree uses a deterministic map key (Materials[m.Name]), the Artifact struct embedded in each material gets its Name and Digest from fileStats(), which calls os.Stat(filepath).Name(). A random temp filename therefore produces a different Artifact.Name on every call — even when the JSON payload is byte-for-byte identical — causing the CAS to treat each retry as a new, distinct resource. Changes: - Extract aiConfigTempFilePath() helper for deterministic path generation - Use full ConfigHash (no truncation) to avoid collision risk - Write into a private os.MkdirTemp directory (symlink-safe) instead of the shared os.TempDir() - Add unit tests proving same-input-same-output and different-input- different-output invariants Fixes: chainloop-dev#2907 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Vibhav Bobade <vibhav.bobde@gmail.com>
…AI config filename
- Revert collector_prmetadata.go: PR metadata intentionally uses PR
number for naming; deduplication within the same PR is expected
- Restore collector_prmetadata_test.go deleted by mistake
- Simplify AI config temp filename to a constant name per agent
(ai-agent-config-{agentName}.json) instead of appending the full
64-character ConfigHash
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vibhav Bobade <vibhav.bobde@gmail.com>
14ef4c0 to
5dc7342
Compare
|
All commits are now GPG-signed. Also addressed all the review feedback in the second commit — reverted PR metadata changes and simplified the AI config filename to a constant name. Ready for re-review! |
Summary
os.CreateTemp(random suffix) with deterministic filenames in bothcollector_aiagentconfig.goandcollector_prmetadata.go, eliminating duplicate CAS uploads on attestation retriesos.TempDir()with deterministic names anddefer os.RemovecreatePRMetadataTempFilehelper — inlined at the call site with explanatory commentRoot Cause Analysis
The reviewer questioned whether the filename is actually the cause of duplicates since
Materials[m.Name]already uses a deterministic key. The answer is yes — the map key is deterministic, but the Artifact struct inside the material is not.Here is the chain of causation:
uploadAgentConfig()callsos.CreateTemp("", "ai-agent-config-claude-*.json"), which produces a random filename like/tmp/ai-agent-config-claude-3847291.jsonAddMaterialContractFree()callsaddMaterial(), which callsuploadAndCraft()uploadAndCraft()callsfileStats(artifactPath), which doesos.Stat(filepath).Name()to get the basename of the temp fileArtifact.Namein the material proto (line 136 ofmaterials.go)fileStats()also computessha256(file_contents)which becomesArtifact.DigestBecause the JSON payload is identical across retries,
Artifact.Digeststays the same. ButArtifact.Namechanges every time (random temp suffix), so the CAS sees a different resource name on each retry. The map keyMaterials["ai-agent-config-claude"]is overwritten (good), but each retry still triggers a new CAS upload because the artifact metadata differs.The fix makes the filename deterministic by deriving it from a content hash, so
Artifact.Nameis stable across retries.Changes
pkg/attestation/crafter/collector_aiagentconfig.go:os.CreateTemp+ manual write/close withos.WriteFileto a deterministic path inos.TempDir()ConfigHash(no truncation)aiConfigTempFilePath()helper andMkdirTempwrapperpkg/attestation/crafter/collector_prmetadata.go:createPRMetadataTempFile()helper (which usedos.CreateTempwith random suffix) with inline deterministic path usingsha256(content)os.WriteFile+defer os.Removeinos.TempDir()Removed test files:
collector_aiagentconfig_test.go— tested the removedaiConfigTempFilePathhelpercollector_prmetadata_test.go— tested the removedcreatePRMetadataTempFilehelperTest Plan
go vet ./pkg/attestation/crafter/...passesgo test ./pkg/attestation/crafter/...— all sub-packages passFixes #2907