Skip to content

Add JSON handling for search/replace in custom tables and nested JSON#226

Open
ratneshjais wants to merge 2 commits intowp-cli:mainfrom
ratneshjais:fix/json-aware-search-replace
Open

Add JSON handling for search/replace in custom tables and nested JSON#226
ratneshjais wants to merge 2 commits intowp-cli:mainfrom
ratneshjais:fix/json-aware-search-replace

Conversation

@ratneshjais
Copy link
Copy Markdown

No description provided.

@ratneshjais ratneshjais requested a review from a team as a code owner April 9, 2026 11:04
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for searching and replacing strings within JSON-encoded data, including nested structures and escaped URLs. It adds automatic detection of JSON columns and updates the recursive processing logic to handle JSON decoding and encoding. The review feedback identifies a malformed regular expression in the JSON detection logic and suggests a performance optimization to avoid unnecessary json_decode calls by performing a preliminary character check.

// URLs that a simple SQL REPLACE cannot reach.
if ( null === $serial_row ) {
// phpcs:ignore WordPress.DB.PreparedSQL.InterpolatedNotPrepared -- escaped through self::esc_sql_ident
$serial_row = $wpdb->get_row( "SELECT * FROM $table_sql WHERE $col_sql REGEXP '^[\\\\[{]' LIMIT 1" );
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The regular expression ^[\\[{] appears to be malformed as it opens a character class [ but does not close it with a corresponding ]. This will likely result in a database error or fail to match any rows, preventing the automatic detection of JSON columns. A more reliable and readable approach for detecting strings starting with { or [ is to use an alternation.

						$serial_row = $wpdb->get_row( "SELECT * FROM $table_sql WHERE $col_sql REGEXP '^(\\\\[|{)' LIMIT 1" );

Comment on lines +204 to +216
// Try to decode as a JSON object or array and recurse into the
// decoded structure. This properly handles URLs stored inside
// JSON-encoded columns (e.g. Gravity Forms confirmations, block
// editor font data), including nested JSON where slashes are
// double-escaped.
$json_decoded = json_decode( $data, true );
if ( null !== $json_decoded && is_array( $json_decoded ) ) {
$json_decoded = $this->run_recursively( $json_decoded, false, $recursion_level + 1, $visited_data );
$json_result = json_encode( $json_decoded );
if ( false !== $json_result ) {
$data = $json_result;
}
} elseif ( $this->regex ) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Attempting to json_decode() every string in a column detected as JSON can lead to significant performance degradation, especially for large text fields like post_content. Since valid JSON objects and arrays must start with { or [, adding a quick character check before calling json_decode() will avoid unnecessary parsing for the majority of non-JSON strings.

				// Try to decode as a JSON object or array and recurse into the
				// decoded structure. This properly handles URLs stored inside
				// JSON-encoded columns (e.g. Gravity Forms confirmations, block
				// editor font data), including nested JSON where slashes are
				// double-escaped.
				if ( '' !== $data && ( '{' === $data[0] || '[' === $data[0] ) ) {
					$json_decoded = json_decode( $data, true );
					if ( null !== $json_decoded && is_array( $json_decoded ) ) {
						$json_decoded = $this->run_recursively( $json_decoded, false, $recursion_level + 1, $visited_data );
						$json_result  = json_encode( $json_decoded );
						if ( false !== $json_result ) {
							$data = $json_result;
						}
					}
				} elseif ( $this->regex ) {

…refix for custom tables

- Update JSON post content test to expect PHP type (JSON detection now routes to PHP mode)
- Add --all-tables-with-prefix flag for custom table tests so WP-CLI can find them
- Fix double replacement by making JSON decode path and str_replace/regex mutually exclusive

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@swissspidy
Copy link
Copy Markdown
Member

Which issue is this for? Some additional context would be helpful.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

Hello! 👋

Thanks for opening this pull request! Please check out our contributing guidelines. We appreciate you taking the initiative to contribute to this project.

Contributing isn't limited to just code. We encourage you to contribute in the way that best fits your abilities, by writing tutorials, giving a demo at your local meetup, helping other users with their support questions, or revising our documentation.

Here are some useful Composer commands to get you started:

  • composer install: Install dependencies.
  • composer test: Run the full test suite.
  • composer phpcs: Check for code style violations.
  • composer phpcbf: Automatically fix code style violations.
  • composer phpunit: Run unit tests.
  • composer behat: Run behavior-driven tests.

To run a single Behat test, you can use the following command:

# Run all tests in a single file
composer behat features/some-feature.feature

# Run only a specific scenario (where 123 is the line number of the "Scenario:" title)
composer behat features/some-feature.feature:123

You can find a list of all available Behat steps in our handbook.

@github-actions github-actions bot added command:search-replace Related to 'search-replace' command scope:distribution Related to distribution scope:testing Related to testing scope:documentation Related to documentation labels Apr 9, 2026
@ratneshjais
Copy link
Copy Markdown
Author

Context & motivation

Thanks for the review @swissspidy! Expanding on the "why" behind this change:

The problem

wp search-replace has three internal code paths for replacing a string inside a column value:

  1. SQL path — a plain UPDATE ... SET col = REPLACE(col, from, to). Fast, but byte-literal: it can't see into encoded structures.
  2. PHP path with serialized handling — triggered when a column contains PHP-serialized data (detected via the ERROR 1139 / leading a:/s: heuristic). Decodes, recurses, re-serializes, fixing string-length prefixes.
  3. PHP regex path — triggered by --regex.

The gap: JSON-encoded data is not handled by any of these. When a plugin stores JSON in a text column — e.g. Gravity Forms confirmations, block editor font/pattern data, custom form builders, headless CMS blocks — URLs inside the JSON are written as https:\/\/oldsite.com\/page (with escaped slashes). The SQL REPLACE path searches for the literal unescaped string https://oldsite.com and misses every occurrence. Users then see wp search-replace report "0 replacements" on a migration and have to hand-write SQL/PHP to fix it.

Real-world cases that motivated this

  • Gravity Forms — the confirmations column on wp_gf_form is a JSON blob with redirect URLs that contain \/-escaped slashes.
  • Block editor wp_global_styles — font face src URLs are stored as JSON and escape slashes.
  • Custom plugin tables (the case I hit) — a plugin writing to its own table with a JSON meta column, so the existing serialized-detection heuristic doesn't trigger.

In each case the column often isn't PHP-serialized, so path #2 never fires, and path #1 can't match through the escape.

What this PR changes

Two small extensions that keep the default SQL fast path intact:

  1. Column-level JSON detection (Search_Replace_Command.php) — after the existing ERROR 1139 / serialized check, if no serialized row was found, run a cheap REGEXP '^[\[{]' probe for one row that looks like a JSON object/array. If found, promote the column to the PHP path. This is the same "detect once per column, then recurse per row" shape as the serialized detection — no per-row cost on the SQL path.

  2. JSON-aware recursion (SearchReplacer::run_recursively) — at the string leaf, try json_decode on the value. If it decodes to an array, recurse into the decoded structure (so a replacement inside reaches escaped URLs naturally) and re-encode. This also transparently handles the nested case: JSON stored inside a serialized option, or JSON-in-JSON.

Trade-offs / things I'd like maintainer input on

  • Performance. Right now json_decode is attempted on every leaf string on the PHP path. I can gate it behind a $data[0] === '{' || $data[0] === '[' check (gemini-code-assist suggested this) — happy to add, just wanted confirmation that adding the check is preferred over leaving the simpler form.
  • Re-encode fidelity. json_encode defaults preserve \/ escapes (matches how WP-core and most plugins write JSON), but will re-escape non-ASCII to \uXXXX. I can scope the re-encode to only run when a replacement actually occurred in the sub-tree to avoid drive-by mutations on rows that didn't match the search string. Want me to add that?
  • JSON detection REGEXP. gemini pointed out ^[\\\\[{] is a bit overbroad (also matches a leading \). Switching to ^(\\\\[|\\{) alternation — will update.
  • Scope. I intentionally kept this to an automatic detection rather than adding a --json flag, on the theory that the existing --precise/PHP path is already the right place to make search-replace smarter. If you'd prefer an opt-in flag instead, I'm happy to restructure.

Tests

Added three Behat scenarios covering: (a) custom table with JSON column + escaped URLs, (b) JSON nested inside serialized option, (c) automatic column detection. The MySQL-matrix failure currently on the PR is a test-assertion bug on my side — MySQL's db query --skip-column-names output double-escapes backslashes differently than MariaDB does, so the literal expected string is too strict. Will fix by asserting on the round-trip through wp option get / JSON_UNQUOTE instead of raw stored bytes.

@swissspidy
Copy link
Copy Markdown
Member

Mind opening an issue first with the problem statement so that we can discuss it more in-depth there? The PR can then be linked to that issue. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

command:search-replace Related to 'search-replace' command scope:distribution Related to distribution scope:documentation Related to documentation scope:testing Related to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants