pgdogdev · meskill · Apr 22, 2026
diff --git a/docs/configuration/pgdog.toml/general.md b/docs/configuration/pgdog.toml/general.md
@@ -457,6 +457,18 @@ Available options:
 
 `text` format is required when migrating from `INTEGER` to `BIGINT` primary keys during resharding.
 
+### `resharding_copy_retry_max_attempts`
+
+How many times PgDog retries a single table copy after a failure (e.g., a shard goes down mid-copy). The delay starts at [`resharding_copy_retry_min_delay`](#resharding_copy_retry_min_delay) and doubles each attempt, capped at 32× that value.
+
+Default: **`5`**
+
+### `resharding_copy_retry_min_delay`
+
+Starting delay between retry attempts, in milliseconds. Doubles on each attempt, capped at 32× this value (`32_000` ms at the default).
+
+Default: **`1_000`** (1s)
+
 ### `reload_schema_on_ddl`
 
 !!! warning

diff --git a/docs/features/sharding/resharding/.pages b/docs/features/sharding/resharding/.pages
@@ -2,5 +2,5 @@ nav:
   - 'index.md'
   - 'databases.md'
   - 'schema.md'
-  - 'hash.md'
+  - 'move.md'
   - 'cutover.md'
diff --git a/docs/features/sharding/resharding/cutover.md b/docs/features/sharding/resharding/cutover.md
@@ -8,7 +8,7 @@ icon: material/set-right
     This is a new and experimental feature. Please make sure to test it before deploying to production
     and report any issues you find.
 
-Traffic cutover involves moving application traffic (read and write queries) to the destination database. This happens when the source and destination databases are in sync: all source data is copied, and resharded with [logical replication](hash.md).
+Traffic cutover involves moving application traffic (read and write queries) to the destination database. This happens when the source and destination databases are in sync: all source data is copied, and resharded with [logical replication](move.md).
 
 ## Performing the cutover
 

diff --git a/docs/features/sharding/resharding/databases.md b/docs/features/sharding/resharding/databases.md
@@ -8,7 +8,7 @@ PgDog's strategy for resharding Postgres databases is to create a new, independe
 
 ## Requirements
 
-New databases should be **empty**: don't migrate your [table definitions](schema.md) or [data](hash.md). These will be taken care of automatically by PgDog.
+New databases should be **empty**: don't migrate your [table definitions](schema.md) or [data](move.md). These will be taken care of automatically by PgDog.
 
 ### Database users
 

diff --git a/docs/features/sharding/resharding/index.md b/docs/features/sharding/resharding/index.md
@@ -24,7 +24,7 @@ The resharding process is composed of four independent operations. The first one
 |-|-|
 | [Create new cluster](databases.md) | Create a new set of empty databases that will be used for storing data in the new, sharded cluster. |
 | [Schema synchronization](schema.md) | Replicate table and index definitions to the new shards, making sure the new cluster has the same schema as the old one. |
-| [Move & reshard data](hash.md) | Copy data using logical replication, while redistributing rows in-flight between new shards. |
+| [Move & reshard data](move.md) | Copy data using logical replication, while redistributing rows in-flight between new shards. |
 | [Cutover traffic](cutover.md) | Make the new cluster service both reads and writes from the application, without taking downtime. |
 
 While each step can be executed separately by the operator, PgDog provides an [admin database](../../../administration/index.md) command to perform online resharding and traffic cutover steps in a completely automated fashion:
@@ -50,5 +50,5 @@ The `<source>` and `<destination>` parameters accept the name of the source and
 
 {{ next_steps_links([
     ("Schema sync", "schema.md", "Synchronize table, index and other schema entities between the source and destination databases."),
-    ("Move data", "hash.md", "Redistribute data between shards using the configured sharding function. This happens without downtime and keeps the shards up-to-date with the source database until traffic cutover."),
+    ("Move data", "move.md", "Redistribute data between shards using the configured sharding function. This happens without downtime and keeps the shards up-to-date with the source database until traffic cutover."),
 ]) }}
diff --git a/docs/features/sharding/resharding/hash.md → docs/features/sharding/resharding/move.md b/docs/features/sharding/resharding/hash.md → docs/features/sharding/resharding/move.md
@@ -121,9 +121,9 @@ PgDog will distribute the table copy load evenly between all replicas in the con
     To prevent the resharding process from impacting production queries, you can create a separate set of replicas just for resharding.
 
     Managed clouds (e.g., AWS RDS) make this especially easy, but require a warm-up period to fetch all the data from the backup snapshot, before they can read data at full speed of their storage volumes.
-    
+
     To make sure dedicated replicas are not used for read queries in production, you can configure PgDog to use them for resharding only:
-    
+
     ```toml
     [[databases]]
     name = "prod"
@@ -215,6 +215,60 @@ FROM pg_replication_slots;
 ```
 The replication delay between the two database clusters is measured in bytes. When that number reaches zero, the two databases are byte-for-byte identical, and traffic can be [cut over](cutover.md) to the destination database.
 
+## Troubleshooting
+
+### Shard failure during copy
+
+If a shard goes down mid-copy, PgDog retries that table with exponential backoff - up to **5 attempts**, starting at **1 second** and doubling each time.
+
+The two cases behave differently:
+
+- **Destination shard down** — PgDog opens a fresh connection on the next attempt.
+- **Source shard down** — the `TEMPORARY` slot for that table is re-created from scratch. No cleanup needed; `TEMPORARY` slots vanish when the connection closes (unlike the [permanent slot](#replication-slot) created at the start of the overall copy).
+
+To change the defaults:
+
+```toml
+[general]
+resharding_copy_retry_max_attempts = 5    # per-table retry attempts
+resharding_copy_retry_min_delay = 1000   # base backoff in ms; doubles each attempt, max 32×
+```
+
+### Rows remaining in destination after failure
+
+Most drops are clean: table copies run inside an implicit transaction, so Postgres rolls back on disconnect and the destination is left empty.
+
+The awkward case: the connection drops *after* `COPY` commits on some or all shards but before PgDog records it as done. The rows survive, and the next attempt hits primary key violations immediately.
+
+PgDog catches this and tells you exactly what to run:
+
+```
+data sync for "public"."orders" failed with rows remaining in destination;
+truncate manually before retrying: TRUNCATE "public"."orders_new";
+```
+
+Run the `TRUNCATE` on the destination (the table name is literal — copy it verbatim), then re-run `COPY_DATA`.
+
+### Binary format mismatch
+
+Different major Postgres versions can produce incompatible binary `COPY` data. PgDog surfaces this as a `BinaryFormatMismatch` error. Switch to text:
+
+```toml
+[general]
+resharding_copy_format = "text"
+```
+
+See [Integer primary keys](#integer-primary-keys) for the other common reason to use text format.
+
+### Replication slot not dropped after abort
+
+Aborting `COPY_DATA` without restarting it leaves the **permanent** replication slot sitting on the source. Postgres won't recycle WAL until it's gone. Drop it manually:
+
+```postgresql
+SELECT pg_drop_replication_slot('slot_name');
+```
+
+The slot name appears in the `COPY_DATA` startup log and in `SHOW REPLICATION_SLOTS`.
 ## Next steps
 
 {{ next_steps_links([

diff --git a/docs/features/sharding/resharding/schema.md b/docs/features/sharding/resharding/schema.md
@@ -12,7 +12,7 @@ The schema synchronization process is composed of 4 distinct steps, all of which
 | Phase | Description |
 |-|-|
 | [Pre-data](#pre-data-phase) | Create identical tables on all shards along with the primary key constraint (and index). Secondary indexes are _not_ created yet. |
-| [Post-data](#post-data-phase) | Create secondary indexes on all tables and shards. This is done after [moving data](hash.md), as a separate step, because it's considerably faster to create indexes on whole tables than while inserting individual rows. |
+| [Post-data](#post-data-phase) | Create secondary indexes on all tables and shards. This is done after [moving data](move.md), as a separate step, because it's considerably faster to create indexes on whole tables than while inserting individual rows. |
 | [Cutover](#cutover) | This step is executed during traffic cutover, while application queries are blocked from executing on the database. |
 | Post-cutover | This step makes sure the rollback database cluster can handle reverse logical replication. |
 
@@ -110,7 +110,7 @@ This will make sure all tables and schemas in your database are copied and resha
 
 ## Post-data phase
 
-The post-data phase is performed after the [data copy](hash.md) is complete and tables have been synchronized with logical replication. Its job is to create all secondary indexes (e.g., `CREATE INDEX`).
+The post-data phase is performed after the [data copy](move.md) is complete and tables have been synchronized with logical replication. Its job is to create all secondary indexes (e.g., `CREATE INDEX`).
 
 This step is performed after copying data because it makes the copy process considerably faster: Postgres doesn't need to update several indexes while writing rows into the tables.
 
@@ -189,5 +189,5 @@ pg_dump_path = "/path/to/pg_dump"
 ## Next steps
 
 {{ next_steps_links([
-    ("Move data", "hash.md", "Redistribute data between shards using the configured sharding function. This happens without downtime and keeps the shards up-to-date with the source database until traffic cutover."),
+    ("Move data", "move.md", "Redistribute data between shards using the configured sharding function. This happens without downtime and keeps the shards up-to-date with the source database until traffic cutover."),
 ]) }}
diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -89,7 +89,7 @@ Support for sorting rows in [cross-shard](features/sharding/cross-shard-queries/
 
 | Feature | Status | Notes |
 |-|-|-|
-| [Data sync](features/sharding/resharding/hash.md) | :material-wrench: | Sync table data with logical replication. |
+| [Data sync](features/sharding/resharding/move.md) | :material-wrench: | Sync table data with logical replication. |
 | [Schema sync](features/sharding/resharding/schema.md) | :material-wrench: | Sync table, index and constraint definitions. |
 | Online rebalancing | :material-calendar-check: | Not automated yet, requires manual orchestration. |