Add skipDuplicates() scope guard to createDocuments by premtsd-code · Pull Request #852 · utopia-php/database

premtsd-code · 2026-04-08T21:00:18Z

Summary

Adds a Database::skipDuplicates() scope guard so callers can wrap a createDocuments() call and have duplicate-by-id rows silently skipped instead of throwing DuplicateException. Used by CSV import / re-import flows and database migration workers where re-running the same batch should be idempotent rather than fatal.

The shape follows the existing silent() / skipRelationships() pattern — instance-level scope guard, not a flag parameter — so callers compose naturally:

$database->skipDuplicates(fn () => $database->createDocuments($collection, $batch));

Design

Duplicate handling happens at the adapter layer via dialect-specific no-op inserts. Database::createDocuments sets a scope-guarded flag on the adapter before each chunk's INSERT; the adapter emits the right SQL (or Mongo equivalent) and returns the batch. No pre-filter, no intra-batch dedup — adapter-level row-level dedup is sufficient and avoids redundant queries.

Per-adapter implementation

Adapter	Strategy
MariaDB / MySQL	`INSERT IGNORE INTO`
Postgres	`INSERT INTO ... ON CONFLICT (...) DO NOTHING`
SQLite	`INSERT OR IGNORE INTO`
Mongo	`upsert + $setOnInsert`, bypassing the transaction wrap to avoid txn-abort-on-duplicate

Hot-path cost

One property read + one boolean branch per chunk. Callers that don't use skipDuplicates pay no closure allocation and no scope-guard setup — the feature cost is paid only when the feature is actually used. This addresses review comment #1 (don't penalize the common case with an $enable flag pattern).

Mirror dual-write correctness

Mirror::createDocuments pre-filters against source to identify which input ids already exist, then after the source insert forwards only the genuinely-new docs to destination. Pre-existing ids were silently no-op'd by the adapter's INSERT IGNORE — they represent no user write, so nothing propagates to destination.

This replaces an earlier captureOnNext-based approach flagged by Greptile (#3084293974): that approach wrapped $onNext to capture "what source inserted", but SQL adapters return the full input batch from createDocuments regardless of rowCount(), causing $onNext to fire for every doc — including skipped duplicates — so Mirror forwarded would-be values to destination, diverging from source. Pre-filter-then-insert matches the idempotent-write semantics: a skipped duplicate is not a user write, so it does not mirror.

Cost: one extra SELECT against source per batch when skipDuplicates is active and Mirror has an upgraded destination. Zero cost on the non-skip path. Skip and non-skip paths are unified into a single method body with branches only at the three points where behavior actually differs (source call, forward filter, destination call).

`upsertDocumentsWithIncrease` tenant grouping (regression fix)

Restored per-tenant grouping for the existing-doc lookup in sharedTables + tenantPerDocument mode. An earlier review round (#2 — inline find() instead of a helper) was correct for single-tenant use cases but ran under the session tenant context (null for platform workers), silently missing rows from other tenants. This broke appwrite's StatsUsage worker, which accumulates stats across many projects and flushes via $dbForLogs->setTenant(null)->setTenantPerDocument(true) + upsertDocumentsWithIncrease() — every input doc looked "new" and went down the create path instead of the update path.

The fix: branch on tenantPerDocument mode and run one find() per unique tenant wrapped in withTenant($tenant, ...). K queries total (K = unique tenants in the batch). Single-tenant callers (the common case, including CSV import and database migration) still hit the fast one-query batched path.

Helper: tenantKey(Document) — used only within upsertDocumentsWithIncrease to keep the in-memory lookup map and the intra-batch dedup check consistent. Returns "<tenant>:<id>" in tenantPerDocument mode, just <id> otherwise. Without it, a legitimate cross-tenant batch like [{id: x, tenant: 1}, {id: x, tenant: 2}] would falsely fail with DuplicateException.

Tests

End-to-end coverage across all adapter variants (MariaDB / MySQL / Postgres / SQLite / MongoDB × Pool / Mirror / SharedTables):

testCreateDocumentsIgnoreDuplicates — mix of existing + new docs, non-existing inserted, existing no-op'd
testCreateDocumentsIgnoreAllDuplicates — every doc already exists, nothing newly inserted
testCreateDocumentsSkipDuplicatesEmptyBatch — empty batch is a no-op
testCreateDocumentsSkipDuplicatesNestedScope — scope guard saves & restores through nested calls
testCreateDocumentsSkipDuplicatesLargeBatch — 300 docs / 50 pre-existing, exercises the chunk loop
testCreateDocumentsSkipDuplicatesSecondCallSkipsAll — idempotent re-invocation preserves first-write values
testCreateDocumentsSkipDuplicatesRelationships — relationship children aren't duplicated on re-run
testCreateDocumentsSkipDuplicatesDoesNotDivergeDestination (Mirror) — source-seeded dup=Original with a dup=WouldBe caller input, asserts destination.dup stays empty and destination.fresh gets the newly-inserted row (Greptile's exact assertion from #3084293974)

Test plan

skipDuplicates tests pass on every adapter variant
Full Mirror suite: 635 tests / 11,035 assertions green
Greptile divergence assertion holds
composer analyze (PHPStan) clean at the project's configured level
No regressions in existing testCreateDocuments* and testUpsert* suites

Summary by CodeRabbit

New Features
- Scoped "skip duplicates" mode to ignore duplicate records during batch creates without overwriting existing data.
Refactor
- Duplicate-skipping integrated across adapters, transaction paths, pooling and mirror flows; batch lookups and tenant-aware identity handling improved for large batches and upserts.
Tests
- Added comprehensive end-to-end tests: nested scopes, large/empty batches, mirror backfill, relationship propagation, and multiple-scope behavior.

coderabbitai · 2026-04-08T21:00:24Z

📝 Walkthrough

Walkthrough

Adds a scoped skipDuplicates(callable $callback) mode across Database, Pool, Mirror, and adapters that changes create/insert flows to avoid overwriting existing documents by using adapter-specific INSERT-IGNORE/ON CONFLICT or conditional upserts, and adds e2e tests covering duplicate-skip semantics.

Changes

Cohort / File(s)	Summary
Core adapter & Database state `src/Database/Adapter.php`, `src/Database/Database.php`	Introduce protected `skipDuplicates` flag and public `skipDuplicates(callable $callback): mixed`. Database adds `tenantKey(...)`, batches existing-document lookups for upserts, and conditionally routes create flows through adapter skip scope.
Pool delegation `src/Database/Adapter/Pool.php`	Pool delegates adapter calls through the adapter's `skipDuplicates(...)` wrapper when pool skip flag is set (both pinned and borrowed adapters); transactional pinned-adapter `withTransaction` is also wrapped when skipping is enabled.
Mongo adapter `src/Database/Adapter/Mongo.php`	`withTransaction` short-circuits when skipping duplicates. `createDocuments()` gets a skip-duplicates path using conditional upserts (`$setOnInsert` with `_uid`/`_tenant` filters) instead of `insertMany`.
SQL / Postgres / SQLite adapters `src/Database/Adapter/SQL.php`, `src/Database/Adapter/Postgres.php`, `src/Database/Adapter/SQLite.php`	Added protected hooks `getInsertKeyword()`, `getInsertSuffix(string $table)`, `getInsertPermissionsSuffix()`; SQL generation now emits `INSERT IGNORE` / `ON CONFLICT DO NOTHING` when skip mode is enabled and uses these hooks in bulk INSERTs.
Mirror behavior `src/Database/Mirror.php`	Mirror write paths now invoke source/destination `skipDuplicates(...)` when enabled and branch destination forwarding accordingly to avoid overwriting source-existing docs.
Tests (e2e) `tests/e2e/Adapter/Scopes/DocumentTests.php`, `tests/e2e/Adapter/MirrorTest.php`	Added comprehensive e2e tests covering duplicate-skipping: mixed/all-duplicate/empty batches, nested scopes, large chunked batches, repeated calls, relationships, and mirrored backfill scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Database
    participant Adapter
    participant Storage as DB

    rect rgba(100,180,120,0.5)
    Note over Client,Storage: Flow with skipDuplicates enabled
    Client->>Database: createDocuments(docs) within skipDuplicates scope
    Database->>Database: compute tenantKey, dedupe ids, chunk
    Database->>Adapter: find(existing ids) [chunked]
    Adapter->>DB: SELECT / find by _uid/_id
    DB-->>Adapter: existing records
    Adapter-->>Database: existing records
    Database->>Adapter: adapter->skipDuplicates(fn => createDocuments(filtered))
    Adapter->>DB: INSERT IGNORE / ON CONFLICT DO NOTHING or conditional upserts (Mongo)
    DB-->>Adapter: success (duplicates skipped)
    Adapter-->>Database: inserted result
    Database->>Client: return inserted docs/count
    end

sequenceDiagram
    participant Client
    participant Database
    participant Adapter
    participant Storage as DB

    rect rgba(200,100,150,0.5)
    Note over Client,Storage: Regular flow without skipDuplicates
    Client->>Database: createDocuments(docs)
    Database->>Adapter: createDocuments(docs)
    Adapter->>DB: INSERT INTO ...
    DB-->>Adapter: may error on duplicates
    Adapter-->>Database: propagate exception or normal result
    Database-->>Client: result or exception
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Feat mongo tmp #647: Overlaps Mongo adapter changes (withTransaction/createDocuments) and upsert/skip-duplicate behavior.

Suggested reviewers

ArnabChatterjee20k
abnegate
fogelito

Poem

🐇 I hopped through rows and checked each id,
I nudged the duplicates softly to the side,
Scoped my skip so nested hops restore,
Fresh carrots stay, old roots keep core,
A tiny rabbit guard — no overwrites tonight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add skipDuplicates() scope guard to createDocuments' clearly and concisely summarizes the primary change: introducing a new skipDuplicates scope guard method for handling duplicate documents during creation.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch csv-import-upsert-v2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

greptile-apps · 2026-04-08T21:06:28Z

Greptile Summary

This PR adds a skipDuplicates() scope guard to createDocuments, following the existing silent()/skipRelationships() pattern, and wires it through all adapters (SQL via INSERT IGNORE/ON CONFLICT DO NOTHING, MongoDB via upsert+$setOnInsert), Pool delegation, and Mirror backfill. The per-tenant batch fetch for upsertDocumentsWithIncrease is a useful correctness fix for the StatsUsage worker.

P1 — MANY_TO_MANY junction DuplicateException: createDocumentRelationships runs for every document in the input batch (including duplicates) before the adapter INSERT. For RELATION_MANY_TO_MANY, relateDocuments unconditionally calls $this->createDocument($junction, ...) with no guard for an existing junction record. Database::createDocument (singular) has no skipDuplicates awareness, so replaying a batch where the parent and its MANY_TO_MANY children are already linked will throw DuplicateException. The test suite covers only RELATION_ONE_TO_MANY.

Confidence Score: 4/5

Safe to merge for non-relational and ONE_TO_MANY use cases; MANY_TO_MANY+skipDuplicates will throw before fix.

One confirmed P1 runtime bug: MANY_TO_MANY junction createDocument is not guarded by skipDuplicates, causing DuplicateException on re-run. All other prior concerns have been addressed or acknowledged. Score of 4 reflects a single blocking code path that needs a fix and a corresponding test.

src/Database/Database.php (relateDocuments junction creation) and tests/e2e/Adapter/Scopes/DocumentTests.php (missing MANY_TO_MANY coverage)

Important Files Changed

Filename	Overview
src/Database/Adapter.php	Adds `$skipDuplicates` flag and nestable scope-guard method — clean implementation that saves/restores previous state correctly.
src/Database/Adapter/Mongo.php	Adds upsert+$setOnInsert path for skipDuplicates; several correctness concerns addressed in prior review rounds; transaction bypass and empty-setOnInsert guard are reasonable.
src/Database/Adapter/Pool.php	Correctly propagates skipDuplicates flag through delegate() and withTransaction(); nesting is idempotent and harmless.
src/Database/Adapter/Postgres.php	Postgres-specific ON CONFLICT DO NOTHING suffix for documents and permissions tables — correct conflict targets for shared and non-shared table modes.
src/Database/Adapter/SQL.php	INSERT IGNORE/ON CONFLICT SQL generation via dialect hooks; several edge cases (tenantPerDocument, reconciliation) flagged in prior rounds remain acknowledged trade-offs.
src/Database/Adapter/SQLite.php	Overrides getInsertKeyword() with INSERT OR IGNORE INTO — minimal, correct.
src/Database/Database.php	Core skipDuplicates scope guard, adapter delegation, and upsertDocumentsWithIncrease tenant-grouping fix look solid; MANY_TO_MANY junction creation is not guarded by skipDuplicates.
src/Database/Mirror.php	Forwards all input to destination with skipDuplicates — intentional backfill semantics documented in comments; destination may get values that differ from source for rows absent at destination.
tests/e2e/Adapter/MirrorTest.php	Adds backfill-destination test verifying skipDuplicates behaviour in mirror mode; clearly documents intentional source/destination divergence semantics.
tests/e2e/Adapter/Scopes/DocumentTests.php	Comprehensive skipDuplicates end-to-end tests across empty batch, nested scope, large batch, and ONE_TO_MANY relationships; MANY_TO_MANY junction case not covered.

Comments Outside Diff (1)

src/Database/Database.php, line 5720-5722 (link)

skipDuplicates not propagated to MANY_TO_MANY junction creation

createDocumentRelationships is called for every document — including those that are about to be silently skipped — before the adapter INSERT. For MANY_TO_MANY relationships, relateDocuments (line 6000) always calls $this->createDocument($junction, ...) unconditionally, with no check for an existing junction record. Database::createDocument (singular) has no skipDuplicates awareness, so it will throw DuplicateException whenever the parent is a duplicate that already has MANY_TO_MANY links to existing children.

The test testCreateDocumentsSkipDuplicatesRelationships only exercises RELATION_ONE_TO_MANY (which stores the link in the child's twoWayKey column via updateDocument, not a junction insert), so this path is not covered. A caller replaying a batch of documents with MANY_TO_MANY children that are already linked will get an unexpected DuplicateException.

The fix is to guard the junction createDocument by either:
1. Wrapping this call in $this->skipDuplicates(fn () => ...) when the outer scope has $this->skipDuplicates = true, or
2. Pre-checking whether the junction record exists before inserting it.

_{Reviews (30): Last reviewed commit: "Address Jake's follow-up review: tighten..." | Re-trigger Greptile}

coderabbitai

Actionable comments posted: 4

🧹 Nitpick comments (1)

tests/e2e/Adapter/Scopes/DocumentTests.php (1)

7815-7818: Prefer __FUNCTION__ for collection naming consistency.

Using __FUNCTION__ here (as in adjacent tests) improves isolation and avoids accidental name reuse across reruns.

♻️ Suggested tweak

-        $col = 'createDocsIgnoreIntraBatch';
+        $col = __FUNCTION__;

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7815 - 7818, The
collection name is hard-coded as 'createDocsIgnoreIntraBatch' which can collide
across test runs; change the $col assignment to use the current function name
(replace the literal with __FUNCTION__) so the calls to
$database->createCollection($col) and $database->createAttribute($col, 'name',
Database::VAR_STRING, 128, true) use a unique, test-local collection name
consistent with adjacent tests.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/Mongo.php`:
- Around line 1501-1524: The current dedupe prefetch uses only _uid and a single
tenant filter, causing cross-tenant collisions for shared tables; update the
logic that builds existingUids (and the prefetch loop around
$this->client->find/$this->client->getMore) to be tenant-aware by using a
composite key (e.g., "$tenant:$uid") when populating $existingUids, include the
tenant in the $filters returned by $this->getTenantFilters($collection->getId())
per chunk, and prefetch in per-tenant chunks using array_chunk($uidValues,
max(1, $this->maxQueryValues)) (use the chunk size when calling getMore instead
of count($uidValues)); then when filtering $records check the composite key
($record['_tenant'] . ':' . ($record['_uid'] ?? '')) against $existingUids so
duplicate detection respects tenant boundaries.
- Around line 1504-1516: The prefetch block calls $this->client->find() and
$this->client->getMore() without exception mapping; wrap each driver call in a
try/catch that catches \Throwable and rethrows the normalized exception via
processException($e) (same pattern used elsewhere in this class), i.e. surround
the $this->client->find(...) call and the $this->client->getMore(...) call with
try { ... } catch (\Throwable $e) { throw $this->processException($e); } so any
raw driver errors from the $response/$more handling are mapped consistently.

In `@src/Database/Database.php`:
- Around line 7210-7255: The upsertDocumentsWithIncrease() flow builds
$existingDocs keyed as tenant:id when $upsertTenantPerDocument is true, but
later duplicate-input detection still uses only $document->getId(), causing
false DuplicateException across tenants; update the duplicate-input check and
any intermediate maps (e.g., $seenIds/$inputKeys) to use the same composite
lookup key (use the existing $lookupKey logic: $tenant . ':' .
$document->getId() when $upsertTenantPerDocument is true) so the tenant-aware
key is reused consistently across the whole upsert path.

---

Nitpick comments:
In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 7815-7818: The collection name is hard-coded as
'createDocsIgnoreIntraBatch' which can collide across test runs; change the $col
assignment to use the current function name (replace the literal with
__FUNCTION__) so the calls to $database->createCollection($col) and
$database->createAttribute($col, 'name', Database::VAR_STRING, 128, true) use a
unique, test-local collection name consistent with adjacent tests.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c9f1924e-3ed0-43a8-8859-6e6c299aadb5

📥 Commits

Reviewing files that changed from the base of the PR and between 72ee161 and b9c7f4d.

📒 Files selected for processing (9)

src/Database/Adapter.php
src/Database/Adapter/Mongo.php
src/Database/Adapter/Pool.php
src/Database/Adapter/Postgres.php
src/Database/Adapter/SQL.php
src/Database/Adapter/SQLite.php
src/Database/Database.php
src/Database/Mirror.php
tests/e2e/Adapter/Scopes/DocumentTests.php

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (2)

src/Database/Database.php (2)

5682-5726: ⚠️ Potential issue | 🟠 Major

Don’t let the outer prefetch snapshot suppress writes.

The adapters already do the authoritative duplicate filtering inside the transaction/session, but this code still drops $chunk entries from a pre-transaction find(). If a matching row is deleted after that snapshot but before the insert transaction starts, ignore: true will incorrectly skip a document that should now insert, and $modified / $onNext will be low.

Also applies to: 5805-5815

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5682 - 5726, The current prefetch
logic builds $preExistingIds (when $ignore is true, in the tenantPerDocument and
non-tenant branches using find()/authorization->skip/etc.) and later uses that
snapshot to skip writes, which can wrongly suppress inserts if a matching row
was deleted after the prefetch; remove or stop using this prefetch-based
suppression. Instead, eliminate reliance on $preExistingIds to skip document
inserts/relationship writes (and the related checks that reference it around the
relationship-resolution code and at the later block noted ~5805-5815), and
either rely on the adapters’ authoritative duplicate filtering inside the
transaction/session or re-check existence inside the transaction. Keep the
prefetch only if it’s used for optimizations that do not affect whether an
insert occurs, or remove the prefetch entirely if it’s unused for
non-authoritative optimizations.

5788-5790: ⚠️ Potential issue | 🟠 Major

Use the same tenant-aware key for deferred relationship writes.

In tenant-per-document mode, two inserted documents can legitimately share the same $id. This map is keyed only by $id, so the later payload overwrites the earlier one, and unset() after the first insert prevents the second tenant’s relationship write entirely.

♻️ Proposed fix

                 if (!empty($relationshipData)) {
-                    $deferredRelationships[$document->getId()] = $relationshipData;
+                    $relationshipKey = $tenantPerDocument
+                        ? $document->getTenant() . ':' . $document->getId()
+                        : $document->getId();
+                    $deferredRelationships[$relationshipKey] = $relationshipData;
                 }
@@
                 foreach ($batch as $insertedDoc) {
-                    $docId = $insertedDoc->getId();
-                    if (\array_key_exists($docId, $deferredRelationships)) {
+                    $relationshipKey = $tenantPerDocument
+                        ? $insertedDoc->getTenant() . ':' . $insertedDoc->getId()
+                        : $insertedDoc->getId();
+                    if (\array_key_exists($relationshipKey, $deferredRelationships)) {
                         $relDoc = clone $insertedDoc;
-                        foreach ($deferredRelationships[$docId] as $key => $value) {
+                        foreach ($deferredRelationships[$relationshipKey] as $key => $value) {
                             $relDoc->setAttribute($key, $value);
                         }
                         $this->silent(fn () => $this->createDocumentRelationships($collection, $relDoc));
-                        unset($deferredRelationships[$docId]);
+                        unset($deferredRelationships[$relationshipKey]);
                     }
                 }

Also applies to: 5823-5832

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5788 - 5790, The deferred
relationship map ($deferredRelationships) is keyed only by $document->getId(),
which causes collisions in tenant-per-document mode; change the key to be
tenant-aware (e.g., combine tenant id and document id or use the document's
tenant identifier method) wherever entries are set and removed so each
tenant/document pair is unique—update the code that assigns
$deferredRelationships[$document->getId()] = $relationshipData; and the matching
unset() logic (and the same pattern around the related block at lines referenced
5823-5832) to use the composite tenant-aware key instead of just getId().

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2482-2527: The pre-filter SELECT that checks for existing _uid
rows (the block building $sql, preparing via $this->getPDO()->prepare, binding
values and calling $stmt->execute()/fetchAll()) can throw a raw PDOException
because it runs outside the method's main try/catch; wrap the entire
prepare/execute/fetch block in the adapter's processException(...) wrapper so
exceptions are mapped consistently (i.e., call $this->processException(...)
around the closure that prepares, binds, executes and fetches, returning $rows),
keeping the logic inside createDocuments() and referencing the existing $sql,
$binds, $stmt and $rows variables.
- Around line 2665-2713: The reconciliation currently keys timestamps only by
_uid and uses a single :_vfy_tenant bind when $this->sharedTables is true, which
corrupts mixed-tenant batches; update the verification to be tenant-aware by
binding the tenant per document and keying by a composite tenant+uid.
Concretely, in the block that builds $verifyPlaceholders/$verifyBinds and
$verifySql (the code using $verifyPlaceholders, $verifyBinds, $verifySql and
$verifyStmt), include the tenant value for each document (use the document's
tenant when tenant-per-document applies), change the WHERE to match pairs (e.g.
use row-value tuples or explicit (_uid = :uX AND _tenant = :tX) clauses), and
then build $actualTimestamps keyed by "$tenant|$uid" (and compare against
$expectedTimestamps keyed the same way) so the subsequent loop that populates
$insertedDocs/$insertedUids only matches rows with the correct tenant+uid.
- Around line 2650-2652: Replace the use of "INSERT IGNORE" (returned by
getInsertKeyword) with a plain "INSERT" and remove any IGNORE-specific behavior
in getInsertSuffix so that constraint violations are not globally suppressed;
instead let the DB raise errors and update the insert/execute path that catches
SQL exceptions to inspect the database error/constraint name (e.g., index or key
name such as _uid) and map errors to DuplicateException only when the violated
constraint corresponds to the _uid duplicate, otherwise map to UniqueException
(or other specific exceptions) for PRIMARY/UNIQUE/NOT NULL/truncation
violations—use the existing functions getInsertKeyword and getInsertSuffix to
locate changes and adjust the exception-mapping in the code that performs the
insert and error handling.

---

Duplicate comments:
In `@src/Database/Database.php`:
- Around line 5682-5726: The current prefetch logic builds $preExistingIds (when
$ignore is true, in the tenantPerDocument and non-tenant branches using
find()/authorization->skip/etc.) and later uses that snapshot to skip writes,
which can wrongly suppress inserts if a matching row was deleted after the
prefetch; remove or stop using this prefetch-based suppression. Instead,
eliminate reliance on $preExistingIds to skip document inserts/relationship
writes (and the related checks that reference it around the
relationship-resolution code and at the later block noted ~5805-5815), and
either rely on the adapters’ authoritative duplicate filtering inside the
transaction/session or re-check existence inside the transaction. Keep the
prefetch only if it’s used for optimizations that do not affect whether an
insert occurs, or remove the prefetch entirely if it’s unused for
non-authoritative optimizations.
- Around line 5788-5790: The deferred relationship map ($deferredRelationships)
is keyed only by $document->getId(), which causes collisions in
tenant-per-document mode; change the key to be tenant-aware (e.g., combine
tenant id and document id or use the document's tenant identifier method)
wherever entries are set and removed so each tenant/document pair is
unique—update the code that assigns $deferredRelationships[$document->getId()] =
$relationshipData; and the matching unset() logic (and the same pattern around
the related block at lines referenced 5823-5832) to use the composite
tenant-aware key instead of just getId().

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d2ab610b-bb64-4dcb-979c-f3880d9e99df

📥 Commits

Reviewing files that changed from the base of the PR and between 20846d3 and 242c9a8.

📒 Files selected for processing (3)

src/Database/Adapter/Mongo.php
src/Database/Adapter/SQL.php
src/Database/Database.php

🚧 Files skipped from review as they are similar to previous changes (1)

src/Database/Adapter/Mongo.php

coderabbitai

♻️ Duplicate comments (3)

src/Database/Adapter/SQL.php (3)

2482-2527: ⚠️ Potential issue | 🟠 Major

Wrap the ignore-mode pre-filter in processException() handling.

This query runs before the method’s main try/catch, so SQL or connection failures here still escape as raw PDOException while the rest of createDocuments() is adapter-mapped.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2482 - 2527, The pre-filter block
in createDocuments() that prepares and executes the SELECT (using getSQLTable(),
getPDO(), getPDOType(), preparing $sql and executing $stmt) should be wrapped
inside a call to processException() so any PDOException is adapter-mapped
instead of escaping; move the existing logic that builds $placeholders/$binds,
$tenantFilter, $sql, $stmt->bindValue(), $stmt->execute(), $rows =
$stmt->fetchAll() and $stmt->closeCursor() into a closure passed to
processException(), and return/assign the result from that closure (or rethrow
via processException) so behavior matches the rest of createDocuments().

2782-2785: ⚠️ Potential issue | 🟠 Major

Don’t use INSERT IGNORE as the base implementation for ignore=true.

For adapters that inherit this default, ignore=true stops meaning “skip duplicate document IDs” and starts suppressing a broader class of insert failures on both the document and permission insert paths. The base hook should stay plain INSERT INTO, and adapter-specific duplicate handling should be expressed with targeted conflict clauses or constraint-aware exception mapping.

Does MariaDB/MySQL `INSERT IGNORE` suppress constraint violations beyond duplicate-key conflicts, such as PRIMARY KEY, other UNIQUE, NOT NULL, truncation, or conversion errors?

Based on learnings, in src/Database/Adapter/MariaDB.php only duplicate _uid violations should throw DuplicateException; other unique or PRIMARY collisions should still throw UniqueException.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2782 - 2785, The base method
getInsertKeyword in class SQL should not return 'INSERT IGNORE INTO' for
ignore=true; change it to always return the plain 'INSERT INTO' and move any
ignore semantics into adapter-specific code (e.g., Adapter/MariaDB.php) by using
targeted conflict clauses or by mapping constraint/SQLSTATE errors to the
correct exceptions; specifically, in MariaDB adapter ensure only duplicate
violations on the document _uid map to DuplicateException while other
UNIQUE/PRIMARY/constraint failures map to UniqueException (and do not rely on
INSERT IGNORE in SQL::getInsertKeyword).

2664-2728: ⚠️ Potential issue | 🔴 Critical

Keep expectedTimestamps tenant-aware too.

$actualTimestamps is keyed by tenant:uid now, but $expectedTimestamps is still keyed only by _uid. In tenantPerDocument mode, two documents with the same _uid in different tenants can overwrite each other here, causing an inserted document to be dropped from $documents and miss its permission insert.

💡 Proposed fix

-                $expectedTimestamps = [];
-                foreach ($documents as $doc) {
-                    $expectedTimestamps[$doc->getId()] = $doc->getCreatedAt();
-                }
+                $expectedTimestamps = [];
+                foreach ($documents as $doc) {
+                    $key = ($this->sharedTables && $this->tenantPerDocument)
+                        ? $doc->getTenant() . ':' . $doc->getId()
+                        : $doc->getId();
+                    $expectedTimestamps[$key] = $doc->getCreatedAt();
+                }
@@
-                    if (isset($actualTimestamps[$key]) && $actualTimestamps[$key] === $expectedTimestamps[$doc->getId()]) {
+                    if (isset($actualTimestamps[$key]) && $actualTimestamps[$key] === $expectedTimestamps[$key]) {
                         $insertedDocs[] = $doc;
                     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/SQL.php` around lines 2664 - 2728, The
expectedTimestamps map is not tenant-aware while actualTimestamps is, so when
sharedTables && tenantPerDocument you must key expectedTimestamps the same way
as actualTimestamps (tenant + ':' + uid) during the initial foreach over
$documents and also use that key when comparing later (instead of
$expectedTimestamps[$doc->getId()]); update the loop that builds
$expectedTimestamps and the comparison that checks equality to construct/look up
keys as ($doc->getTenant() . ':' . $doc->getId()) when $this->sharedTables &&
$this->tenantPerDocument so entries don't collide across tenants.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2482-2527: The pre-filter block in createDocuments() that prepares
and executes the SELECT (using getSQLTable(), getPDO(), getPDOType(), preparing
$sql and executing $stmt) should be wrapped inside a call to processException()
so any PDOException is adapter-mapped instead of escaping; move the existing
logic that builds $placeholders/$binds, $tenantFilter, $sql, $stmt->bindValue(),
$stmt->execute(), $rows = $stmt->fetchAll() and $stmt->closeCursor() into a
closure passed to processException(), and return/assign the result from that
closure (or rethrow via processException) so behavior matches the rest of
createDocuments().
- Around line 2782-2785: The base method getInsertKeyword in class SQL should
not return 'INSERT IGNORE INTO' for ignore=true; change it to always return the
plain 'INSERT INTO' and move any ignore semantics into adapter-specific code
(e.g., Adapter/MariaDB.php) by using targeted conflict clauses or by mapping
constraint/SQLSTATE errors to the correct exceptions; specifically, in MariaDB
adapter ensure only duplicate violations on the document _uid map to
DuplicateException while other UNIQUE/PRIMARY/constraint failures map to
UniqueException (and do not rely on INSERT IGNORE in SQL::getInsertKeyword).
- Around line 2664-2728: The expectedTimestamps map is not tenant-aware while
actualTimestamps is, so when sharedTables && tenantPerDocument you must key
expectedTimestamps the same way as actualTimestamps (tenant + ':' + uid) during
the initial foreach over $documents and also use that key when comparing later
(instead of $expectedTimestamps[$doc->getId()]); update the loop that builds
$expectedTimestamps and the comparison that checks equality to construct/look up
keys as ($doc->getTenant() . ':' . $doc->getId()) when $this->sharedTables &&
$this->tenantPerDocument so entries don't collide across tenants.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 892638bd-ac20-414b-9059-7f7e20295028

📥 Commits

Reviewing files that changed from the base of the PR and between 242c9a8 and 455da70.

📒 Files selected for processing (2)

src/Database/Adapter/SQL.php
src/Database/Database.php

🚧 Files skipped from review as they are similar to previous changes (1)

src/Database/Database.php

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Adapter/SQL.php`:
- Around line 2676-2682: The verification query is binding composite keys from
$expectedTimestamps into $verifyBinds/placeholders and then comparing them
against the _uid column, but when tenantPerDocument is enabled _uid was set to
the raw document ID ($attributes['_uid'] = $document->getId()), so the WHERE
_uid IN (...) must use the document IDs not the composite keys. Fix by
extracting the document ID portion from each composite key before creating $ph
and adding to $verifyBinds (or if keys are already plain IDs, leave unchanged);
update the loop that builds $verifyPlaceholders/$verifyBinds (the foreach over
array_values(array_unique(array_keys($expectedTimestamps)))) to derive $uid =
<documentIdFromComposite($uid)> and bind that instead so the verification query
matches rows by actual _uid values.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 43af96a2-d479-4354-86c9-7d68abad7b56

📥 Commits

Reviewing files that changed from the base of the PR and between 455da70 and deedfda.

📒 Files selected for processing (1)

src/Database/Adapter/SQL.php

Adapters: - MariaDB/MySQL: INSERT IGNORE INTO - PostgreSQL: ON CONFLICT DO NOTHING - SQLite: INSERT OR IGNORE INTO - MongoDB: session-scoped pre-filter before insertMany Database.php: - Intra-batch dedup by ID (tenant-aware, first occurrence wins) - Pre-fetch existing IDs to skip known duplicates - Deferred relationship creation for ignore mode (no orphans) - Race-condition reconciliation via _createdAt timestamp verification upsertDocuments: - Batch-fetch existing docs with find() instead of per-row getDocument() - Tenant-aware composite keys for seenIds duplicate check All paths are tenant-per-document aware with null-safe array_filter.

- testCreateDocumentsIgnoreDuplicates: mixed batch with onNext assertions - testCreateDocumentsIgnoreIntraBatchDuplicates: first-wins, no ACL drift - testCreateDocumentsIgnoreAllDuplicates: zero inserts, empty onNext

…x _uid mapping - Pass ignoreDuplicates option to insertMany for race-condition safety - Extract _uid from raw array before replaceChars (excluded from transformation) - Map inserted records back to original Document objects by _uid

…t and fix _uid mapping" This reverts commit 93a9136.

- Add skipDuplicates(callable) scope guard on Database, following existing pattern (skipRelationships, skipValidation, etc.) - Remove bool $ignore parameter from createDocuments signature - Mirror propagates skipDuplicates state to source and destination - Update tests to use $database->skipDuplicates(function() { ... })

- Add $skipDuplicates property + skipDuplicates() scope guard to Adapter - Remove bool $ignore parameter from all adapter createDocuments signatures (Adapter, Pool, Mongo, SQL) - Remove bool $ignore from SQL helper methods (getInsertKeyword, getInsertSuffix, getInsertPermissionsSuffix) and Postgres/SQLite overrides - Pool delegate() and withTransaction() propagate skipDuplicates state to pooled/pinned adapter via the scope guard - Database::createDocuments() wraps adapter call in adapter->skipDuplicates() when the flag is set, drops the local $ignore variable

- skipDuplicates() takes optional $enable so Pool/Mirror/Database can forward state in a single call instead of branching - Database: tenantKey(Document) helper replaces ~8 inline ternaries - Database: fetchExistingByIds() helper replaces the tenant-grouped batched find() in createDocuments and upsertDocuments pre-fetch - Database: drop dead guard on non-defer relationship path - Mongo: findExistingUids() helper replaces the duplicated cursor walk in the skipDuplicates pre-filter (tenant and non-tenant variants) - SQL: buildUidTenantLookup() helper replaces the duplicated placeholder-and-binds builder shared by pre-filter and reconciliation

When a batch has only null-ID documents but non-null tenants, the previous code emitted WHERE _uid IN () AND _tenant IN (...) — invalid SQL that the !empty(binds) call-site guard failed to catch because the tenant binds kept the binds array non-empty. Return 'where' => '1=0' with empty binds when there are no UIDs, so the query matches nothing and leaves no dangling placeholders.

- testCreateDocumentsSkipDuplicatesEmptyBatch: no-op with empty array - testCreateDocumentsSkipDuplicatesNestedScope: scope guard save/restore through nested skipDuplicates() calls - testCreateDocumentsSkipDuplicatesLargeBatch: 300-doc batch with 50 pre-existing, exercising the chunk loop - testCreateDocumentsSkipDuplicatesSecondCallSkipsAll: second identical batch should preserve First values, not overwrite with Second - testCreateDocumentsSkipDuplicatesRelationships: skipDuplicates with a one-to-many relationship — verifies deferred relationship logic does not create orphan child rows for ignored parents

INSERT IGNORE / ON CONFLICT DO NOTHING / INSERT OR IGNORE handle duplicates at the DB level, and the existing reconciliation block identifies actually-inserted rows via the _createdAt timestamp. The pre-filter was a wire-data optimization that ran a SELECT before the INSERT, then trimmed the batch — but the same SELECT (with timestamp comparison) already happens in the reconciliation path on demand. Net: ~45 lines removed, single code path for skipDuplicates instead of pre-filter + fallback. Race-condition handling is unchanged (reconciliation path covers it).

Previously the Database orchestrator carried a pre-fetch + chunk-filter pass to compensate for Mongo's plain upsert() returning matched + upserted combined ('n') — without that compensation, $modified would over-count the number of pre-existing docs in the batch. This pushes the responsibility into the adapter, where it belongs: - composer bump to utopia-php/mongo dev-upsert-return-upserted-count (PR #37, adds Client::upsertWithCounts() returning matched/modified/ upserted[] separately) - Mongo adapter calls upsertWithCounts() and maps the response's upserted[].index entries back to the originating Document objects, returning ONLY the actually-inserted docs (race-safe) - Database::createDocuments drops $preExistingIds pre-fetch and the chunk filter; the adapter is now responsible for accurate counts Net effect on Database.php: ~17 lines removed. Net effect on Mongo.php: ~80 lines net deletion (replaced the inline pre-filter with a single upsertWithCounts call + index mapping).

This reverts commit c6d566e.

abnegate · 2026-04-13T09:41:35Z

+     * @param bool $enable
+     * @return T
+     */
+    public function skipDuplicates(callable $callback, bool $enable = true): mixed


Let's remove the $enable flag and check if skipDuplicates instead, otherwise we add the overhead of calling an extra callable when it's false

abnegate · 2026-04-13T09:42:33Z

+     * @param bool $idsOnly When true, uses Query::select(['$id']) for a lighter fetch
+     * @return array<string, Document>
+     */
+    private function fetchExistingByIds(Document $collection, array $documents, bool $idsOnly = false): array


We can use find

abnegate · 2026-04-13T09:44:46Z

+        // skipDuplicates: dedupe intra-batch (first wins) then drop already-existing rows.
+        // Adapter INSERT IGNORE / ON CONFLICT / upsert is the race-safety net.
+        if ($this->skipDuplicates) {
+            $seenIds = [];
+            $deduplicated = [];
+            foreach ($documents as $document) {
+                if ($document->getId() !== '') {
+                    $dedupeKey = $this->tenantKey($document);
+                    if (isset($seenIds[$dedupeKey])) {
+                        continue;
+                    }
+                    $seenIds[$dedupeKey] = true;
+                }
+                $deduplicated[] = $document;
+            }
+            $documents = $deduplicated;
+
+            $preExistingIds = $this->fetchExistingByIds($collection, $documents, idsOnly: true);
+            if (!empty($preExistingIds)) {
+                $documents = \array_values(\array_filter(
+                    $documents,
+                    fn (Document $doc) => !isset($preExistingIds[$this->tenantKey($doc)])
+                ));
+            }
+        }
+


I don't think we need to do this if using insert ignore

@abnegate The child collection is getting orphan rows. when multiple rows with same parent and different childs. where parent is ignored but child is not ignored

database/src/Database/Database.php

Lines 5765 to 5767 in 41704eb

if ($this->resolveRelationships) {

$document = $this->silent(fn () => $this->createDocumentRelationships($collection, $document));

}

abnegate · 2026-04-13T09:48:04Z

+        $insertedFromSource = [];
+        $captureOnNext = function (Document $doc) use (&$insertedFromSource, $onNext): void {
+            $insertedFromSource[] = $doc;
+            if ($onNext !== null) {
+                $onNext($doc);
+            }
+        };


We can never capture like this internally, if there were many docs, this loop could easily OOM

abnegate · 2026-04-13T09:52:36Z

+    /**
+     * Returns the INSERT keyword, optionally with IGNORE for duplicate handling.
+     * Override in adapter subclasses for DB-specific syntax.
+     */
+    protected function getInsertKeyword(): string
+    {
+        return $this->skipDuplicates ? 'INSERT IGNORE INTO' : 'INSERT INTO';
+    }
+
+    /**
+     * Returns a suffix appended after VALUES clause for duplicate handling.
+     * Override in adapter subclasses (e.g., Postgres uses ON CONFLICT DO NOTHING).
+     */
+    protected function getInsertSuffix(string $table): string
+    {
+        return '';
+    }
+
+    /**
+     * Returns a suffix for the permissions INSERT statement when ignoring duplicates.
+     * Override in adapter subclasses for DB-specific syntax.
+     */
+    protected function getInsertPermissionsSuffix(): string
+    {
+        return '';
+    }
+


Let's group logically with other getter methods

#1 Drop $enable flag on skipDuplicates() scope guard The $enable param made every non-skipDuplicates createDocuments call pay for a closure allocation + extra function call. Branch at the call site instead so the cost only applies when the flag is actually set. - Adapter::skipDuplicates(callable, bool) → skipDuplicates(callable) - Database::skipDuplicates(callable, bool) → skipDuplicates(callable) - Database::createDocuments, Mirror::createDocuments, Pool::delegate, Pool::withTransaction now branch inline. #2 Drop fetchExistingByIds helper, inline find() The helper's per-tenant grouping defended a hypothetical multi-tenant batching scenario that no caller exercises (relationships are intra- tenant, callers always batch per tenant). Existing patterns in the same file (refetchDocuments, relationship loading) just call find() directly. Match that idiom and drop ~70 lines. #4 Mirror: only capture inserted docs in skipDuplicates mode The captureOnNext accumulator paid the cost (closure + per-doc array push) on every createDocuments call, including the common non-skip path. Branch at the entry of Mirror::createDocuments so the capture only happens when skipDuplicates is set; the non-skip path passes through to source/destination unchanged. #5 Move getInsertKeyword/Suffix/PermissionsSuffix to getters cluster Were sitting next to createDocuments(); moved to the getSupport* cluster around line 1030 where other adapter-capability shims live. Not addressed: - #2 partial: the existing patterns (refetchDocuments etc.) don't handle tenant-per-document multi-tenant batches either, so this is consistent. - #3 (drop the pre-filter): rejected. createDocumentRelationships runs in the encoding loop BEFORE the adapter's INSERT IGNORE no-ops the parent, so dropping the pre-filter would deterministically duplicate child rows on every CSV re-import of a collection with relationships (not a race window — every call). The relationships test verifies this. Reverting would require reintroducing the deferred-relationships scaffolding we just removed, and the adapter still couldn't tell us which parents were actually inserted (SQL INSERT IGNORE has no per-row reporting). Pre-filter stays.

Per Jake's #4 review comment, the captureOnNext accumulator could grow unbounded with the full inserted set. Replace it with a bounded buffer that flushes to the destination every $batchSize docs as the source streams them through onNext, plus a tail flush after the source returns. Memory peak in skipDuplicates mode is now O($batchSize) (~1000 docs) regardless of how many docs the source inserts. Same destination batching efficiency — destination still gets multi-doc createDocuments calls, just one per chunk instead of one giant call at the end. Eligibility checks (SOURCE_ONLY_COLLECTIONS, upgrade status) moved upfront so the flush closure can capture the decision and short-circuit cleanly. Non-skip path is unchanged: direct passthrough, no buffer. The flush-on-fill is load-bearing for the memory bound — see the in-method comment.

…erDocument When inlining find() per Jake's review, the per-tenant grouping the deleted fetchExistingByIds helper used to do was lost. In tenantPerDocument mode with a multi-tenant batch, the inlined find() runs under the session tenant context (often null for platform workers) and silently misses rows belonging to other tenants — every input doc looks "new" and goes down the create path instead of the update path. This actively breaks appwrite's StatsUsage worker, which accumulates stats across many projects (each tagged with its own $tenant) and flushes them via $dbForLogs->setTenant(null)->setTenantPerDocument(true) followed by upsertDocumentsWithIncrease(...). With the old per-doc getDocument loop, each lookup ran under withTenant($doc->getTenant()) and was correct. The batch find() inlining lost that scoping. Fix per Greptile's suggestion: branch on tenantPerDocument mode and run one find() per unique tenant value under withTenant, merging results into the same lookup hash. K queries (K = unique tenants in the batch) instead of N (per-doc) or 1 (broken). Common single-tenant case still hits the fast batched path. Applied at both call sites: - Database::createDocuments skipDuplicates pre-filter - Database::upsertDocumentsWithIncrease existing-docs lookup Fixes greptile flag at PR #852 discussion r3077481595.

Per Jake's review feedback, the pre-filter in Database::createDocuments was redundant with the adapter's INSERT IGNORE / ON CONFLICT DO NOTHING / upsert + \$setOnInsert path. The adapter already handles duplicates atomically; the upfront SELECT duplicated that work. The semantic is now row-level dedup: - pre-existing rows are silently no-op'd by the adapter - new rows go through - retries are idempotent - source data is never overwritten Tests updated to match: - 5 skipDuplicates tests adjusted to expect full-chunk counts, with data-level assertions unchanged (pre-existing values preserved, new rows inserted). - testCreateDocumentsSkipDuplicatesRelationships encodes the retry scenario: missing nested child gets created via relateDocuments' idempotent path when the parent already exists. - testCreateDocumentsSkipDuplicatesDoesNotDivergeDestination relaxed to source-authoritative semantics only. Destination may transiently hold would-be values during the migration window; backfill reconciles. All 105 skipDuplicates tests across 11 adapter variants pass, full Mirror suite (636 tests) passes, PHPStan level 7 clean.

Follow-up to Jake's #5 review comment — the fix was only applied to SQL.php. Propagate the same grouping to the other adapters and fix one cosmetic byproduct from the Mirror refactor. - Postgres.php: move getInsertKeyword/Suffix/PermissionsSuffix from between createDocuments() and getUpsertStatement() down to the getSupportFor* cluster (matches SQL.php:1036 layout). - SQLite.php: move getInsertKeyword override from the top of the class to after the getSupportFor* cluster. - Adapter.php: skipDuplicates() method was inserted between protected property declarations, breaking the property cluster. Move the method down next to withTransaction() — the other callable-taking scope guard. $skipDuplicates property stays with its peers. - Mirror.php: revert the incidental fn () => collapse on the non-skip destination createDocuments call back to the original two-line form, minimizing unrelated diff noise. Pure relocation / style. No behavior change.

Addresses Greptile #3084293974. The captureOnNext-based forwarding was structurally broken post-pre-filter-removal: SQL adapters return the full input batch from createDocuments regardless of how many rows INSERT IGNORE actually inserted, so onNext fired for every document including skipped duplicates. captureOnNext buffered all of them and flushed the full buffer (carrying the caller's would-be values) to destination, diverging destination from source whenever source had a pre-existing row the caller tried to overwrite. Fix: instead of trying to distinguish inserted vs skipped at insert time, re-fetch source's authoritative state via find() after the adapter write settles, then forward that to destination. Race-free regardless of concurrent writers — destination always receives a faithful snapshot of whatever source holds, because the query runs after source's write has resolved. Cost: one extra SELECT per batch against source when skipDuplicates is active AND Mirror has an upgraded destination. Zero cost on the non-skip path. Memory remains O(batch_size). Net ~47 lines smaller than the previous capture-based approach: - captureOnNext closure removed - flushBuffer closure removed - bounded-buffer machinery removed - shouldMirror upfront-snapshot removed (inlined as eligibility check) Test: testCreateDocumentsSkipDuplicatesDoesNotDivergeDestination now asserts destination holds source's 'Original' value rather than the caller's 'WouldBe' input, per Greptile's suggested assertion.

…p paths Addresses Greptile #3084293974. The captureOnNext-based forwarding was structurally broken: SQL adapters return the full input batch from createDocuments regardless of what INSERT IGNORE actually inserted, so onNext fired for every doc including skipped duplicates. captureOnNext then forwarded the caller's would-be values to destination, diverging destination from source whenever source had a pre-existing row the caller tried to re-insert. Fix: pre-filter against source before the insert to identify which input ids already exist, then forward only the genuinely-new docs to destination. A source-skipped duplicate is a no-op (no user write happened), so nothing should propagate. This matches Greptile's semantics: destination.dup.isEmpty() after skipping a duplicate. While the logic was touched, the skip and non-skip paths (which shared eligibility check, upgrade check, clone loop, destination write, and after-filter loop) are unified into a single flow with $this-> skipDuplicates branches only at the points where behavior actually differs: - source call: wrapped in source->skipDuplicates when active - $toForward: filter out pre-existing ids when active - destination call: wrapped in destination->skipDuplicates when active Non-skip hot path pays zero new closure allocations; the pre-filter block is gated behind if ($this->skipDuplicates) and short-circuits to an empty $existingIds map. Net: -46 lines in createDocuments (~170 → ~95), same test coverage, Greptile's assertion now holds.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/Database/Mirror.php (1)
604-656: ⚠️ Potential issue | 🟠 Major

Pre-filter lookup is not tenant-aware for tenantPerDocument mode.

When tenantPerDocument is enabled, documents with the same $id but different tenants are distinct. The current implementation:

Calls find() with all IDs at once without grouping by tenant (line 616)

Keys $existingIds by $id alone (line 622)

Filters $toForward by $id alone (line 650)

If the batch contains documents for multiple tenants, this incorrectly excludes new documents whose ID exists for a different tenant, or fails to detect true duplicates for the correct tenant.

Use the same tenant-aware pattern as Database::upsertDocumentsWithIncrease(): when tenantPerDocument is enabled, group documents by tenant and call find() per-tenant using withTenant(), then use composite keys ($tenant:$id) for tracking existing IDs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Mirror.php` around lines 604 - 656, The pre-filter that builds
$existingIds and filters $toForward must be made tenant-aware when
tenantPerDocument is enabled: instead of calling $this->source->find(...) once
for all IDs and keying by $id, group the $documents by tenant (use the same
pattern as Database::upsertDocumentsWithIncrease()), and for each tenant call
$this->source->withTenant($tenant)->silent(fn() => ->find(...)) to fetch only
that tenant's existing IDs; store keys in $existingIds as composite
"$tenant:$id" and update the filtering lambda for $toForward (and the earlier
array_map/array_filter that builds $ids) to use the same composite key so only
documents with matching tenant+id are considered duplicates.

♻️ Duplicate comments (1)

src/Database/Database.php (1)
5728-5732: ⚠️ Potential issue | 🔴 Critical

skipDuplicates() still leaves relationship side effects behind.

By the time execution reaches Line 5728, every input document has already gone through createDocumentRelationships(). If the adapter then ignores a duplicate parent row, those child/junction writes are still committed. That can orphan rows or mutate relationships on an existing document even though the parent create was skipped. Relationship writes need to be driven from the successfully inserted $batch, not from the pre-insert input list.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5728 - 5732, The current flow runs
createDocumentRelationships() before calling the adapter and then uses
adapter->skipDuplicates() which can ignore parent inserts while relationship
writes have already been applied; change the flow so the adapter call (the
$insert closure -> $this->withTransaction(fn () =>
$this->adapter->createDocuments($collection, $chunk))) executes first to produce
the actual inserted $batch (ensure adapter->skipDuplicates(...) returns the real
inserted batch when used), and only after $batch is obtained call
createDocumentRelationships() and any child/junction writes against $batch (not
the original input). Adjust skipDuplicates() usage so it returns the inserted
documents batch and ensure relationship-writing code references $batch instead
of the pre-insert input list.

🧹 Nitpick comments (2)

src/Database/Adapter/Postgres.php (1)
2358-2358: Silence the unused override parameter warning.

$table is intentionally unused here. If the parent signature must stay as-is, rename it to $_table or add a local suppression so PHPMD stops flagging this override.
♻️ Small cleanup
-    protected function getInsertSuffix(string $table): string
+    protected function getInsertSuffix(string $_table): string
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Adapter/Postgres.php` at line 2358, The override of
getInsertSuffix(string $table) triggers an unused-parameter warning; rename the
unused parameter to $_table (or prefix it with an underscore) in the
getInsertSuffix method signature or add a local PHPMD suppression comment inside
that method so the linter stops flagging it; update the method signature in the
Postgres class (getInsertSuffix) and keep the parent signature unchanged.
tests/e2e/Adapter/Scopes/DocumentTests.php (1)
7930-7933: Clarify whether these assertions validate “processed” vs “created” semantics.

At Line 7930, Line 7984, Line 8110, and Line 8155, the tests assert counts and onNext emissions include duplicate IDs. That encodes a “processed inputs” contract, not “newly created rows.” Please make this explicit (e.g., rename to $processedCount / add a short comment) or add before/after count assertions to separately validate created-row deltas.

Also applies to: 7984-7986, 8110-8112, 8155-8157
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php` around lines 7930 - 7933, The
assertions using $count and $emittedIds currently validate “processed inputs”
semantics (they expect duplicate IDs for repeated processing) rather than newly
created rows; update the test to make this explicit by renaming variables (e.g.,
$processedCount, $processedEmittedIds) or adding a short comment above the block
clarifying “processed inputs may include duplicates,” and if you also need to
assert created-row deltas add before/after row-count checks (capture $beforeRows
= ... then run operation then assert $afterRows - $beforeRows equals expected
new creations) around the assertions that reference $count and $emittedIds (the
same pattern appears for the other occurrences currently asserting duplicates).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Database.php`:
- Around line 7153-7184: The pre-read ID queries in the branches that build
$idsByTenant/$docIds must be chunked to respect the Query::equal('$id', ...)
limit controlled by $this->maxQueryValues; update the tenant loop (where you
build $idsByTenant and call $this->find(... Query::equal('$id', $tenantIds)
...)) and the else branch (where $docIds is used in $this->find(...
Query::equal('$id', $docIds) ...)) to split each array with array_chunk(...,
max(1, $this->maxQueryValues)) and call find for each chunk (using the same
authorization->skip/withTenant/silent wrappers), then merge all returned
documents into $existingDocs (use the same key format $tenant . ':' .
$doc->getId() in the tenant branch) or $existing as before so downstream code
sees the combined results. Ensure you preserve unique IDs before chunking and
maintain limit counts per chunk.

In `@src/Database/Mirror.php`:
- Around line 614-624: The prefetch call in Mirror.php uses Query::equal('$id',
$ids) with an unchunked $ids array which can exceed DocumentsValidator limits;
change the logic that builds $existing by chunking $ids with array_chunk($ids,
max(1, $this->maxQueryValues)) (same pattern used in createDocuments() and
upsertDocumentsWithIncrease()), call $this->source->silent(fn() =>
$this->source->find($collection, [Query::equal('$id', $chunk),
Query::limit(count($chunk))])) for each chunk, and merge results into
$existingIds[$doc->getId()] = true to preserve current behavior. Ensure you
reference $this->maxQueryValues and use Query::equal('$id', $chunk) and
Query::limit(count($chunk)) per chunk.

---

Outside diff comments:
In `@src/Database/Mirror.php`:
- Around line 604-656: The pre-filter that builds $existingIds and filters
$toForward must be made tenant-aware when tenantPerDocument is enabled: instead
of calling $this->source->find(...) once for all IDs and keying by $id, group
the $documents by tenant (use the same pattern as
Database::upsertDocumentsWithIncrease()), and for each tenant call
$this->source->withTenant($tenant)->silent(fn() => ->find(...)) to fetch only
that tenant's existing IDs; store keys in $existingIds as composite
"$tenant:$id" and update the filtering lambda for $toForward (and the earlier
array_map/array_filter that builds $ids) to use the same composite key so only
documents with matching tenant+id are considered duplicates.

---

Duplicate comments:
In `@src/Database/Database.php`:
- Around line 5728-5732: The current flow runs createDocumentRelationships()
before calling the adapter and then uses adapter->skipDuplicates() which can
ignore parent inserts while relationship writes have already been applied;
change the flow so the adapter call (the $insert closure ->
$this->withTransaction(fn () => $this->adapter->createDocuments($collection,
$chunk))) executes first to produce the actual inserted $batch (ensure
adapter->skipDuplicates(...) returns the real inserted batch when used), and
only after $batch is obtained call createDocumentRelationships() and any
child/junction writes against $batch (not the original input). Adjust
skipDuplicates() usage so it returns the inserted documents batch and ensure
relationship-writing code references $batch instead of the pre-insert input
list.

---

Nitpick comments:
In `@src/Database/Adapter/Postgres.php`:
- Line 2358: The override of getInsertSuffix(string $table) triggers an
unused-parameter warning; rename the unused parameter to $_table (or prefix it
with an underscore) in the getInsertSuffix method signature or add a local PHPMD
suppression comment inside that method so the linter stops flagging it; update
the method signature in the Postgres class (getInsertSuffix) and keep the parent
signature unchanged.

In `@tests/e2e/Adapter/Scopes/DocumentTests.php`:
- Around line 7930-7933: The assertions using $count and $emittedIds currently
validate “processed inputs” semantics (they expect duplicate IDs for repeated
processing) rather than newly created rows; update the test to make this
explicit by renaming variables (e.g., $processedCount, $processedEmittedIds) or
adding a short comment above the block clarifying “processed inputs may include
duplicates,” and if you also need to assert created-row deltas add before/after
row-count checks (capture $beforeRows = ... then run operation then assert
$afterRows - $beforeRows equals expected new creations) around the assertions
that reference $count and $emittedIds (the same pattern appears for the other
occurrences currently asserting duplicates).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 06d46ba1-a7eb-478b-ad90-a27f372a0327

📥 Commits

Reviewing files that changed from the base of the PR and between 455da70 and 934ec04.

📒 Files selected for processing (10)

src/Database/Adapter.php
src/Database/Adapter/Mongo.php
src/Database/Adapter/Pool.php
src/Database/Adapter/Postgres.php
src/Database/Adapter/SQL.php
src/Database/Adapter/SQLite.php
src/Database/Database.php
src/Database/Mirror.php
tests/e2e/Adapter/MirrorTest.php
tests/e2e/Adapter/Scopes/DocumentTests.php

🚧 Files skipped from review as they are similar to previous changes (3)

src/Database/Adapter.php
src/Database/Adapter/SQLite.php
src/Database/Adapter/Mongo.php

Addresses CodeRabbit #3084994641 and #3084994657 (+ Greptile #3084466265). Three call sites were passing unchunked id arrays into Query::equal('$id', ...), which throws QueryException from DocumentsValidator once the array exceeds 5000 values. Mirror::createDocuments (skipDuplicates pre-filter): defensive — Mirror's public API accepts arbitrary-size input, and while current callers pass batches below the limit, the code shouldn't throw on a theoretical direct caller with >5000 input ids. Database::upsertDocumentsWithIncrease (both tenantPerDocument and single- tenant branches): real regression. The batched pre-read was introduced in ae929db to restore per-tenant grouping for StatsUsage. It replaced a per-doc getDocument() loop on main (which is safe from validator limits because each call queries one id). Passing the full input array to Query::equal now breaks at >5000 input docs — a path that previously worked slowly on main. All three sites use self::RELATION_QUERY_CHUNK_SIZE (5000) and Query::limit(PHP_INT_MAX), matching the existing pattern used by the five populateDocumentsRelationshipsBatch chunking sites. Accumulator shape is adapted to our map-only use case: we build the existence set / lookup map directly inside the chunk loop instead of carrying a flat list we don't need.

coderabbitai

♻️ Duplicate comments (2)

src/Database/Database.php (2)
7160-7187: ⚠️ Potential issue | 🟠 Major

Respect maxQueryValues when chunking the pre-read lookups.

These find() calls still validate Query::equal('$id', $chunk) against $this->maxQueryValues. If a caller lowers that limit below 5000, chunking by RELATION_QUERY_CHUNK_SIZE will still fail before the upsert starts.
✂️ Suggested change
-                foreach (\array_chunk($tenantIds, self::RELATION_QUERY_CHUNK_SIZE) as $chunk) {
+                foreach (\array_chunk($tenantIds, \max(1, $this->maxQueryValues)) as $chunk) {
@@
-                foreach (\array_chunk($docIds, self::RELATION_QUERY_CHUNK_SIZE) as $chunk) {
+                foreach (\array_chunk($docIds, \max(1, $this->maxQueryValues)) as $chunk) {
Based on learnings, both createDocuments() and upsertDocumentsWithIncrease() prefetch existing document IDs in chunks using array_chunk(..., max(1, $this->maxQueryValues)) to satisfy DocumentsValidator limits on Query::equal('$id', ...).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 7160 - 7187, The pre-read find()
calls chunk IDs using self::RELATION_QUERY_CHUNK_SIZE but Query::equal('$id',
$chunk) is validated against $this->maxQueryValues, so change the chunking in
both loops (the tenant branch that iterates over $idsByTenant and the else
branch that iterates over $docIds) to use array_chunk(..., max(1,
$this->maxQueryValues)) instead of self::RELATION_QUERY_CHUNK_SIZE so the chunk
size never exceeds the DocumentsValidator limit when calling find().
5728-5732: ⚠️ Potential issue | 🔴 Critical

skipDuplicates() still lets skipped rows mutate relationship state.

By the time this guard runs, every input document has already gone through createDocumentRelationships() at Lines 5720-5722. A row ignored here can still create/update children or junction rows, and duplicate child/junction inserts triggered during that earlier relationship pass still go through the normal createDocument() path. The insert result needs to be authoritative for relationship work, or those writes need to be deferred until after you know which rows were actually inserted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Database.php` around lines 5728 - 5732, Summary:
duplicate-skipping allows skipped rows to still mutate relationships because
createDocumentRelationships runs before the adapter decides which rows are
inserted. Fix: change the control flow so relationship work is driven by the
authoritative insert result — either defer createDocumentRelationships until
after $batch is computed or make adapter->skipDuplicates return per-row
insertion outcomes and use that to gate relationship writes; specifically update
the sequence around the $insert closure, withTransaction,
adapter->skipDuplicates and createDocumentRelationships (and any calls to
createDocument) so relationship/junction creation only runs for rows confirmed
inserted by createDocuments (or is skipped per the adapter-provided insertion
map).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/Database/Database.php`:
- Around line 7160-7187: The pre-read find() calls chunk IDs using
self::RELATION_QUERY_CHUNK_SIZE but Query::equal('$id', $chunk) is validated
against $this->maxQueryValues, so change the chunking in both loops (the tenant
branch that iterates over $idsByTenant and the else branch that iterates over
$docIds) to use array_chunk(..., max(1, $this->maxQueryValues)) instead of
self::RELATION_QUERY_CHUNK_SIZE so the chunk size never exceeds the
DocumentsValidator limit when calling find().
- Around line 5728-5732: Summary: duplicate-skipping allows skipped rows to
still mutate relationships because createDocumentRelationships runs before the
adapter decides which rows are inserted. Fix: change the control flow so
relationship work is driven by the authoritative insert result — either defer
createDocumentRelationships until after $batch is computed or make
adapter->skipDuplicates return per-row insertion outcomes and use that to gate
relationship writes; specifically update the sequence around the $insert
closure, withTransaction, adapter->skipDuplicates and
createDocumentRelationships (and any calls to createDocument) so
relationship/junction creation only runs for rows confirmed inserted by
createDocuments (or is skipped per the adapter-provided insertion map).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c7a16723-e265-42e8-8fd4-3a629176caf0

📥 Commits

Reviewing files that changed from the base of the PR and between 934ec04 and fa0e373.

📒 Files selected for processing (2)

src/Database/Database.php
src/Database/Mirror.php

🚧 Files skipped from review as they are similar to previous changes (1)

src/Database/Mirror.php

…SIZE CodeRabbit #3084994641 + #3084994657 follow-up: the earlier chunking fix used self::RELATION_QUERY_CHUNK_SIZE (5000), which matches the existing populateDocumentsRelationshipsBatch pattern but doesn't respect the user-configurable validator ceiling. Query::equal('\$id', \$chunk) is validated by DocumentsValidator against \$this->maxQueryValues, not RELATION_QUERY_CHUNK_SIZE. A caller doing \$db->setMaxQueryValues(100) followed by any of the three chunked pre-reads would still throw QueryException because chunks of 5000 exceed the configured 100. Switch to array_chunk(..., max(1, \$this->maxQueryValues)) at all three sites so chunk size always respects the actual validator limit. In the default case (maxQueryValues = 5000) this produces identical chunks to the old code; under customization it correctly scales down. Note: the 5 existing relationship-population chunking sites still use RELATION_QUERY_CHUNK_SIZE and have the same latent bug under customized maxQueryValues. Leaving that for a follow-up since fixing the pattern globally is out of scope for this PR.

CSV / JSON / appwrite-to-appwrite imports that re-run on the same batch (e.g. user re-uploads the same file, or a worker retries a failed chunk) currently throw DuplicateException and abort the whole batch. Wrap the row-buffer flush in the new skipDuplicates() scope guard so duplicate-by-id rows are silently no-op'd at the adapter layer (INSERT IGNORE / ON CONFLICT DO NOTHING / $setOnInsert), letting the rest of the batch proceed. The existing skipRelationshipsExistCheck() wrapper is preserved (FK-target guard); skipDuplicates composes around it. Feature-branch note: depends on utopia-php/database's skipDuplicates() scope guard from PR utopia-php/database#852. composer.json is temporarily pinned to dev-csv-import-upsert-v2 with a 5.99.0 alias so composer can resolve the 5.* constraint transitively. Must be reset to the proper release version (^5.X.Y) once PR #852 merges and utopia-php/database ships.

Exposes two new optional boolean params on the three migration creation endpoints so CSV / JSON / appwrite-to-appwrite imports can choose how to handle rows whose IDs already exist at the destination. Endpoints updated (app/controllers/api/migrations.php): - POST /v1/migrations/appwrite - POST /v1/migrations/csv/imports - POST /v1/migrations/json/imports Parameter semantics: - overwrite=true -> destination uses upsertDocuments instead of createDocuments; existing rows are replaced with imported values - skip=true -> destination wraps createDocuments in skipDuplicates; existing rows are preserved unchanged, duplicate-id rows silently no-op - both false -> default; fails fast on DuplicateException (original behavior, unchanged) - both true -> overwrite wins (upsert subsumes skip) Both params are stored in the migration Document's options array (matches the existing pattern for destination behavior config like path, size, delimiter, bucketId, etc.) and read back in the worker's processDestination() to instantiate DestinationAppwrite with the new constructor params. Feature-branch note: depends on utopia-php/migration#feat/skip-duplicates (DestinationAppwrite constructor params) which in turn depends on utopia-php/database#852 (skipDuplicates scope guard). composer.json is temporarily pinned to dev-feat/skip-duplicates and dev-csv-import-upsert-v2 respectively; both must be reset to proper release versions once the upstream PRs merge.

abnegate · 2026-04-16T12:13:28Z

+                    $found = $this->authorization->skip(fn () => $this->withTenant($tenant, fn () => $this->silent(
+                        fn () => $this->find($collection->getId(), [
+                            Query::equal('$id', $chunk),
+                            Query::limit(PHP_INT_MAX),


If the chunk size is always maxQueryValues, might as well use that here too

abnegate · 2026-04-16T12:14:22Z

+                        ])
+                    )));
+                    foreach ($found as $doc) {
+                        $existingDocs[$tenant . ':' . $doc->getId()] = $doc;


We can use the tenantKey function here

abnegate · 2026-04-16T12:17:03Z

+                    $existing = $this->authorization->skip(fn () => $this->silent(
+                        fn () => $this->find($collection->getId(), [
+                            Query::equal('$id', $chunk),
+                            Query::limit(PHP_INT_MAX),


abnegate · 2026-04-16T12:21:16Z

+        // In skipDuplicates mode, identify which input ids already exist on source.
+        // These will be silently no-oped by the adapter's INSERT IGNORE and must
+        // not propagate to destination — a skipped duplicate is not a user write.


Same as earlier - we should still propagate duplicates to destination, they might not be duplicate there

… destination handle duplicates

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/Database/Mirror.php (1)
622-656: Destination forwarding behavior change is intentional — consider simplifying the branch.

The updated comment correctly captures the design decision addressed in prior review feedback: forward every source-written document to the destination and let it decide per-row via its own INSERT IGNORE/ON CONFLICT path when skipDuplicates is active. This avoids the earlier pre-filter SELECT that could cause source/destination divergence when a row exists on source but not destination (or vice versa during an in-flight upgrade).

Minor readability nit: the two branches only differ by the outer skipDuplicates(...) wrapper, so this can collapse to a single composed closure to reduce duplication.
♻️ Optional refactor
-            if ($this->skipDuplicates) {
-                $this->destination->skipDuplicates(
-                    fn () => $this->destination->withPreserveDates(
-                        fn () => $this->destination->createDocuments($collection, $clones, $batchSize)
-                    )
-                );
-            } else {
-                $this->destination->withPreserveDates(
-                    fn () =>
-                    $this->destination->createDocuments(
-                        $collection,
-                        $clones,
-                        $batchSize,
-                    )
-                );
-            }
+            $write = fn () => $this->destination->withPreserveDates(
+                fn () => $this->destination->createDocuments($collection, $clones, $batchSize)
+            );
+
+            $this->skipDuplicates
+                ? $this->destination->skipDuplicates($write)
+                : $write();
One thing worth double-checking: in the non-skipDuplicates branch, if a row was written to source but already exists on destination (e.g., a partially-backfilled upgrade), the destination createDocuments will throw DuplicateException and be swallowed by the surrounding catch (\Throwable) on Line 668, which will abort the remaining $clones in the batch and skip all subsequent afterCreateDocument filter callbacks. Please confirm that is acceptable, or consider whether the destination write should always propagate the source's intent here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Database/Mirror.php` around lines 622 - 656, The forwarding branch can be
simplified by composing a single closure that runs
$this->destination->withPreserveDates(fn()=>
$this->destination->createDocuments($collection, $clones, $batchSize)) and then
conditionally wrapping that closure with $this->destination->skipDuplicates(...)
when $this->skipDuplicates is true, removing the duplicated createDocuments
calls; update the block that builds $clones and the
writeFilters->beforeCreateDocument usage accordingly. Also verify the
error-handling behavior around createDocuments: confirm whether allowing a
DuplicateException (thrown by createDocuments when not using skipDuplicates) to
be caught by the surrounding catch and thereby abort remaining $clones and skip
subsequent writeFilters->afterCreateDocument calls is acceptable, and if not,
change the error handling so duplicate-row errors don’t abort the whole batch
(e.g., let them propagate differently or handle per-row).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/Database/Database.php`:
- Around line 7153-7193: The unconditional prefetch removed the no-auth
"skipPreRead" behavior used by upsertDocumentsWithIncrease(), causing
caller-supplied $permissions to be persisted incorrectly; restore the original
branch: detect $skipPreRead = !empty($attribute) &&
!$this->authorization->getStatus() in upsertDocumentsWithIncrease(), and when
$skipPreRead is true skip the pre-read of existing documents (do not hydrate
$old/$existingDocs), ensure creates get default permissions [] and updates
preserve existing permissions (i.e. ignore incoming $permissions), and apply the
same conditional (no prefetch when skipPreRead) to the other similar block
referenced (around the later section 7213-7314) so both code paths behave
consistently.
- Around line 5728-5732: The relationship writes (createDocumentRelationships /
any createDocument calls) must be wrapped by the same skipDuplicates guard as
the batch insert so duplicates are skipped atomically; move or refactor so the
closure passed to adapter->skipDuplicates (or the $insert closure) contains both
the relationship expansion and the adapter->createDocuments call (or call
createDocumentRelationships inside the withTransaction closure) so that
adapter->skipDuplicates can opt out/skip all child/junction writes, not just
createDocuments. Ensure you reference and update the existing $insert closure,
createDocumentRelationships, createDocument, and adapter->skipDuplicates usage
so relationship writes occur inside the guarded/transactional block.

---

Nitpick comments:
In `@src/Database/Mirror.php`:
- Around line 622-656: The forwarding branch can be simplified by composing a
single closure that runs $this->destination->withPreserveDates(fn()=>
$this->destination->createDocuments($collection, $clones, $batchSize)) and then
conditionally wrapping that closure with $this->destination->skipDuplicates(...)
when $this->skipDuplicates is true, removing the duplicated createDocuments
calls; update the block that builds $clones and the
writeFilters->beforeCreateDocument usage accordingly. Also verify the
error-handling behavior around createDocuments: confirm whether allowing a
DuplicateException (thrown by createDocuments when not using skipDuplicates) to
be caught by the surrounding catch and thereby abort remaining $clones and skip
subsequent writeFilters->afterCreateDocument calls is acceptable, and if not,
change the error handling so duplicate-row errors don’t abort the whole batch
(e.g., let them propagate differently or handle per-row).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 32ae3c16-14a5-4d26-9d26-7bdefe39ae73

📥 Commits

Reviewing files that changed from the base of the PR and between fa0e373 and fbe5117.

📒 Files selected for processing (3)

src/Database/Database.php
src/Database/Mirror.php
tests/e2e/Adapter/MirrorTest.php

🚧 Files skipped from review as they are similar to previous changes (1)

tests/e2e/Adapter/MirrorTest.php

coderabbitai · 2026-04-17T06:09:23Z

+            $insert = fn () => $this->withTransaction(fn () => $this->adapter->createDocuments($collection, $chunk));
+            // Set adapter flag before withTransaction so Mongo can opt out of a real txn.
+            $batch = $this->skipDuplicates
+                ? $this->adapter->skipDuplicates($insert)
+                : $insert();


⚠️ Potential issue | 🔴 Critical

Guard the relationship writes, not just the final batch insert.

skipDuplicates() only wraps adapter->createDocuments(), but createDocumentRelationships() already ran at Line 5721. If one of these parents is skipped as a duplicate, any child/junction writes done during relationship expansion are still applied, and many-to-many junction inserts can still throw DuplicateException because createDocument() does not honor this guard. That breaks the advertised “silently skip duplicate rows” behavior and can leave orphan relationship data.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/Database/Database.php` around lines 5728 - 5732, The relationship writes (createDocumentRelationships / any createDocument calls) must be wrapped by the same skipDuplicates guard as the batch insert so duplicates are skipped atomically; move or refactor so the closure passed to adapter->skipDuplicates (or the $insert closure) contains both the relationship expansion and the adapter->createDocuments call (or call createDocumentRelationships inside the withTransaction closure) so that adapter->skipDuplicates can opt out/skip all child/junction writes, not just createDocuments. Ensure you reference and update the existing $insert closure, createDocumentRelationships, createDocument, and adapter->skipDuplicates usage so relationship writes occur inside the guarded/transactional block.

coderabbitai · 2026-04-17T06:09:23Z

+        if ($this->getSharedTables() && $this->getTenantPerDocument()) {
+            $idsByTenant = [];
+            foreach ($documents as $doc) {
+                if ($doc->getId() !== '') {
+                    $idsByTenant[$doc->getTenant()][] = $doc->getId();
+                }
+            }
+            foreach ($idsByTenant as $tenant => $tenantIds) {
+                $tenantIds = \array_values(\array_unique($tenantIds));
+                foreach (\array_chunk($tenantIds, \max(1, $this->maxQueryValues)) as $chunk) {
+                    $found = $this->authorization->skip(fn () => $this->withTenant($tenant, fn () => $this->silent(
+                        fn () => $this->find($collection->getId(), [
+                            Query::equal('$id', $chunk),
+                            Query::limit($this->maxQueryValues),
+                        ])
+                    )));
+                    foreach ($found as $doc) {
+                        $existingDocs[$this->tenantKey($doc)] = $doc;
+                    }
+                }
            }
+        } else {
+            $docIds = \array_values(\array_unique(\array_filter(
+                \array_map(fn (Document $doc) => $doc->getId(), $documents),
+                fn ($id) => $id !== ''
+            )));
+
+            if (!empty($docIds)) {
+                foreach (\array_chunk($docIds, \max(1, $this->maxQueryValues)) as $chunk) {
+                    $existing = $this->authorization->skip(fn () => $this->silent(
+                        fn () => $this->find($collection->getId(), [
+                            Query::equal('$id', $chunk),
+                            Query::limit($this->maxQueryValues),
+                        ])
+                    ));
+                    foreach ($existing as $doc) {
+                        $existingDocs[$this->tenantKey($doc)] = $doc;
+                    }
+                }
+            }
+        }


⚠️ Potential issue | 🟠 Major

Restore the no-auth skipPreRead branch for increment upserts.

This unconditional prefetch changes the !empty($attribute) && !$this->authorization->getStatus() path. In that mode, incoming $permissions are supposed to be ignored; with $old always hydrated, the current skipPermissionsUpdate logic can now persist caller-supplied permissions on creates/updates. That is a behavior regression in upsertDocumentsWithIncrease().

Based on learnings, Database::upsertDocumentsWithIncrease() uses $skipPreRead = !empty($attribute) && !$this->authorization->getStatus(), and in that path incoming $permissions must be ignored while creates default to [] and updates preserve existing permissions.

Also applies to: 7213-7314

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/Database/Database.php` around lines 7153 - 7193, The unconditional prefetch removed the no-auth "skipPreRead" behavior used by upsertDocumentsWithIncrease(), causing caller-supplied $permissions to be persisted incorrectly; restore the original branch: detect $skipPreRead = !empty($attribute) && !$this->authorization->getStatus() in upsertDocumentsWithIncrease(), and when $skipPreRead is true skip the pre-read of existing documents (do not hydrate $old/$existingDocs), ensure creates get default permissions [] and updates preserve existing permissions (i.e. ignore incoming $permissions), and apply the same conditional (no prefetch when skipPreRead) to the other similar block referenced (around the later section 7213-7314) so both code paths behave consistently.

skipPreRead is not merged in main branch yet

@premtsd-code Thanks for the clarification — acknowledged. Since skipPreRead hasn’t landed in main yet, my earlier regression note tied to that path doesn’t apply to this PR. The batch pre-read you added in upsertDocumentsWithIncrease() is fine as-is.

If you’d like, I can open a follow-up issue to re-check permission semantics once skipPreRead merges (so we ensure “updates preserve existing permissions; creates default to []; ignore incoming $permissions” in the no-auth + increment case). Happy to proceed if you want that tracked.

🧠 Learnings used

Learnt from: premtsd-code Repo: utopia-php/database PR: 850 File: src/Database/Database.php:0-0 Timestamp: 2026-04-08T16:07:15.177Z Learning: In src/Database/Database.php, both createDocuments() and upsertDocumentsWithIncrease() prefetch existing document IDs in chunks using array_chunk(..., max(1, $this->maxQueryValues)) for tenant-per-document and non-tenant modes to satisfy DocumentsValidator limits on Query::equal('$id', ...).

Learnt from: premtsd-code Repo: utopia-php/database PR: 830 File: src/Database/Database.php:7086-7092 Timestamp: 2026-03-08T22:55:28.782Z Learning: In src/Database/Database.php, Database::upsertDocumentsWithIncrease() sets $skipPreRead = !empty($attribute) && !$this->authorization->getStatus(), so the pre-read-skip path only runs when authorization is disabled. In this path, explicit $permissions on incoming documents are intentionally ignored; updates preserve existing permissions, and creates default to empty permissions ([]), since no auth layer enforces them. Stated by maintainer premtsd-code on PR `#830`.

Learnt from: fogelito Repo: utopia-php/database PR: 733 File: src/Database/Adapter/MariaDB.php:1801-1806 Timestamp: 2025-10-16T09:37:33.531Z Learning: In the MariaDB adapter (src/Database/Adapter/MariaDB.php), only duplicate `_uid` violations should throw `DuplicateException`. All other unique constraint violations, including `PRIMARY` key collisions on the internal `_id` field, should throw `UniqueException`. This is the intended design to distinguish between user-facing document duplicates and internal/user-defined unique constraint violations.

Learnt from: abnegate Repo: utopia-php/database PR: 721 File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439 Timestamp: 2025-10-03T02:04:17.803Z Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/Database/Adapter/Mongo.php Outdated

Comment thread src/Database/Database.php Outdated

Comment thread src/Database/Adapter/SQL.php Outdated

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/Database/Adapter/SQL.php

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/Database/Adapter/Mongo.php Outdated

Comment thread src/Database/Adapter/Mongo.php Outdated

Comment thread src/Database/Database.php Outdated

Comment thread src/Database/Database.php Outdated

greptile-apps bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/Database/Adapter/SQL.php Outdated

Comment thread src/Database/Database.php Outdated

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

Comment thread src/Database/Adapter/SQL.php Outdated

Comment thread src/Database/Adapter/SQL.php Outdated

Comment thread src/Database/Adapter/SQL.php Outdated

coderabbitai bot reviewed Apr 8, 2026

View reviewed changes

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

Comment thread src/Database/Adapter/SQL.php Outdated

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

Comment thread src/Database/Adapter/SQL.php Outdated

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

Comment thread src/Database/Adapter/SQL.php Outdated

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

Comment thread src/Database/Adapter/Mongo.php Outdated

premtsd-code added 2 commits April 9, 2026 11:26

Add e2e tests for createDocuments ignore mode

bee42d4

- testCreateDocumentsIgnoreDuplicates: mixed batch with onNext assertions - testCreateDocumentsIgnoreIntraBatchDuplicates: first-wins, no ACL drift - testCreateDocumentsIgnoreAllDuplicates: zero inserts, empty onNext

premtsd-code force-pushed the csv-import-upsert-v2 branch from f2b08dc to bee42d4 Compare April 9, 2026 10:28

greptile-apps bot reviewed Apr 9, 2026

View reviewed changes

Comment thread src/Database/Adapter/SQL.php

greptile-apps bot reviewed Apr 10, 2026

View reviewed changes

Comment thread src/Database/Adapter/Mongo.php Outdated

premtsd-code added 4 commits April 10, 2026 08:00

Revert "Fix Mongo adapter ignore mode: pass ignoreDuplicates to clien…

2906dda

…t and fix _uid mapping" This reverts commit 93a9136.

greptile-apps bot reviewed Apr 12, 2026

View reviewed changes

Comment thread src/Database/Adapter/SQL.php Outdated

premtsd-code added 5 commits April 12, 2026 14:52

Merge remote-tracking branch 'origin/main' into csv-import-upsert-v2

a24358c

greptile-apps bot reviewed Apr 13, 2026

View reviewed changes

Comment thread src/Database/Adapter/Mongo.php Outdated

Revert "SQL adapter: remove redundant skipDuplicates pre-filter"

3b783af

This reverts commit c6d566e.

greptile-apps bot reviewed Apr 13, 2026

View reviewed changes

Comment thread src/Database/Adapter/Mongo.php

abnegate reviewed Apr 13, 2026

View reviewed changes

greptile-apps bot reviewed Apr 13, 2026

View reviewed changes

Comment thread src/Database/Database.php Outdated

greptile-apps bot reviewed Apr 14, 2026

View reviewed changes

Comment thread src/Database/Database.php Outdated

premtsd-code added 4 commits April 14, 2026 08:44

skipDuplicates: drop intra-batch dedup, trim verbose test comments

0fd7c33

greptile-apps bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/Database/Mirror.php Outdated

premtsd-code mentioned this pull request Apr 15, 2026

Add skipDuplicates() scope guard to createDocuments #855

Closed

3 tasks

greptile-apps bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/Database/Mirror.php Outdated

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

Comment thread src/Database/Database.php Outdated

Comment thread src/Database/Mirror.php Outdated

coderabbitai bot reviewed Apr 15, 2026

View reviewed changes

This was referenced Apr 15, 2026

feat: add overwrite and skip options to DestinationAppwrite utopia-php/migration#169

Draft

feat(migrations): add overwrite and skip options to CSV/JSON/Appwrite imports appwrite/appwrite#11910

Draft

abnegate reviewed Apr 16, 2026

View reviewed changes

Address Jake's follow-up review: tighten limits, reuse tenantKey, let…

fbe5117

… destination handle duplicates

coderabbitai bot reviewed Apr 17, 2026

View reviewed changes

	if ($this->resolveRelationships) {
	$document = $this->silent(fn () => $this->createDocumentRelationships($collection, $document));
	}

Conversation

premtsd-code commented Apr 8, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design

Per-adapter implementation

Hot-path cost

Mirror dual-write correctness

upsertDocumentsWithIncrease tenant grouping (regression fix)

Tests

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

greptile-apps bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Comments Outside Diff (1)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

premtsd-code commented Apr 8, 2026 •

edited by coderabbitai bot

Loading

`upsertDocumentsWithIncrease` tenant grouping (regression fix)

coderabbitai bot commented Apr 8, 2026 •

edited

Loading

greptile-apps bot commented Apr 8, 2026 •

edited

Loading

coderabbitai bot Apr 17, 2026 •

edited

Loading