ext/uri: fast-path canonical URIs in get_normalized_uri by iliaal · Pull Request #21726 · php/php-src

iliaal · 2026-04-11T19:59:28Z

Summary

Fast-path get_normalized_uri() in ext/uri/uri_parser_rfc3986.c when the parsed URI is already in canonical form. A single call to uriNormalizeSyntaxMaskRequiredExA returns the dirty mask; a zero mask means we alias the raw uri instead of running uriCopyUriMmA plus a full uriNormalizeSyntaxExMmA(..., -1, ...) pass. The struct caches the dirty mask so multiple non-raw reads on the same instance only run the scan once.

This commit replaces an earlier version that bundled four additional changes (port cache, inline digit accumulator in port_str_to_zend_long_checked, emalloc in uriparser_create_uris, guarded normalized_uri destroy). Per review feedback from @TimWolla, the PR now carries only the one change that stands on its own merits.

Benchmark

17 URL shapes: plain http/https, deep paths, query/fragment, userinfo, IPv4, IPv6, mailto, URN, data, file, relative, long paths with query. 100K iterations per run × 17 URLs = 1.7M parses per measurement, 10 runs per scenario, CPU pinned via taskset -c 0, same-session stash-pop A/B.

scenario	baseline mean	optimized mean	delta
parse only	0.3992s (4.26M/s)	0.4083s (4.16M/s)	noise (parse path unchanged)
parse + 1 read	0.6687s (2.54M/s)	0.5464s (3.11M/s)	−18.3% / +22.4% throughput
parse + 7 reads	0.8510s (2.00M/s)	0.7305s (2.33M/s)	−14.2% / +16.5% throughput

parse + 1 read isolates the first-read cost where this optimization lands. parse + 7 reads shows the realistic user pattern: the first getter pays the reduced normalization cost, the remaining six hit the cached normalized uri and cost the same as before.

hyperfine cross-check

15 runs of the full benchmark script per direction, CPU pinned:

Benchmark 1: baseline
  Time (mean ± σ):     20.471 s ±  1.052 s    [19.535 s … 22.985 s]
Benchmark 2: optimized
  Time (mean ± σ):     17.240 s ±  0.540 s    [16.556 s … 18.190 s]

Summary: optimized runs 1.19 ± 0.07 times faster than baseline

Reproducing

Save the script below as bench.php. Build a baseline and an optimized sapi/cli/php, then:

taskset -c 0 /path/to/php-baseline  -n bench.php
taskset -c 0 /path/to/php-optimized -n bench.php

Or with hyperfine:

hyperfine --warmup 3 --runs 15 \
  -n baseline  'taskset -c 0 /path/to/php-baseline  -n bench.php' \
  -n optimized 'taskset -c 0 /path/to/php-optimized -n bench.php'

bench.php

<?php
// Measures Uri\Rfc3986\Uri::parse() and parse-then-read under two
// access patterns: a single first-read (isolates the normalization
// cost) and a realistic seven-getter sequence.

$urls = [
    'simple_http'    => 'http://example.com/',
    'https_host'     => 'https://www.example.com',
    'deep_path'      => 'https://example.com/a/b/c/d/e/f/index.html',
    'with_query'     => 'https://example.com/search?q=test&lang=en&category=web',
    'with_fragment'  => 'https://example.com/docs/guide#section-3.2',
    'with_userinfo'  => 'https://user:pass@example.com:8080/secure/area',
    'long_query'     => 'https://api.example.com/v2/users?filter=name&sort=desc&page=1&per_page=100&include=profile,settings&fields=id,name,email,created_at',
    'ipv4'           => 'http://192.168.1.1:8080/admin/panel',
    'ipv6'           => 'http://[2001:db8::1]:8080/index',
    'unicode_path'   => 'https://example.com/%E6%97%A5%E6%9C%AC/path',
    'ftp'            => 'ftp://files.example.com/pub/release/latest.tar.gz',
    'mailto'         => 'mailto:user@example.com',
    'urn'            => 'urn:isbn:0451450523',
    'data'           => 'data:text/plain;base64,SGVsbG8gV29ybGQ=',
    'only_path'      => '/path/to/resource?x=1',
    'relative'       => '../relative/path.html',
    'long_url'       => 'https://www.example.com/very/long/path/with/many/segments/that/go/on/and/on/file.html?first=1&second=2&third=3&fourth=4&fifth=5#anchor-point',
];

$iterations = 100000;
$runs = 10;

function bench_parse_only(array $urls, int $iter): float {
    $t = hrtime(true);
    for ($i = 0; $i < $iter; $i++) {
        foreach ($urls as $u) {
            \Uri\Rfc3986\Uri::parse($u);
        }
    }
    return (hrtime(true) - $t) / 1e9;
}

function bench_parse_one_read(array $urls, int $iter): float {
    $t = hrtime(true);
    for ($i = 0; $i < $iter; $i++) {
        foreach ($urls as $u) {
            $o = \Uri\Rfc3986\Uri::parse($u);
            if ($o === null) continue;
            $o->getScheme();
        }
    }
    return (hrtime(true) - $t) / 1e9;
}

function bench_parse_full(array $urls, int $iter): float {
    $t = hrtime(true);
    for ($i = 0; $i < $iter; $i++) {
        foreach ($urls as $u) {
            $o = \Uri\Rfc3986\Uri::parse($u);
            if ($o === null) continue;
            $o->getScheme(); $o->getHost(); $o->getPort();
            $o->getPath(); $o->getQuery(); $o->getFragment();
            $o->getUserInfo();
        }
    }
    return (hrtime(true) - $t) / 1e9;
}

function stats(array $samples): array {
    sort($samples);
    $n = count($samples);
    $mean = array_sum($samples) / $n;
    $variance = 0.0;
    foreach ($samples as $s) {
        $variance += ($s - $mean) ** 2;
    }
    $stddev = $n > 1 ? sqrt($variance / ($n - 1)) : 0.0;
    $median = $n % 2 === 1
        ? $samples[(int)(($n - 1) / 2)]
        : ($samples[(int)($n / 2) - 1] + $samples[(int)($n / 2)]) / 2;
    return ['min' => $samples[0], 'mean' => $mean, 'median' => $median, 'stddev' => $stddev];
}

bench_parse_only($urls, 2000);
bench_parse_one_read($urls, 2000);
bench_parse_full($urls, 2000);

$benches = [
    'parse only     ' => 'bench_parse_only',
    'parse + 1 read ' => 'bench_parse_one_read',
    'parse + 7 reads' => 'bench_parse_full',
];

$results = [];
foreach ($benches as $label => $fn) {
    $samples = [];
    for ($r = 0; $r < $runs; $r++) {
        $samples[] = $fn($urls, $iterations);
    }
    $results[$label] = stats($samples);
}

$total = $iterations * count($urls);
printf("runs=%d  iterations=%d  urls=%d  calls/run=%d\n\n", $runs, $iterations, count($urls), $total);
printf("%-17s %9s %9s %9s %9s   %s\n", '', 'min', 'mean', 'median', 'stddev', 'mean ops/s');
foreach ($results as $label => $s) {
    printf("%-17s %8.4fs %8.4fs %8.4fs %8.4fs   %.2fM\n",
        $label, $s['min'], $s['mean'], $s['median'], $s['stddev'],
        ($total / $s['mean']) / 1e6);
}

Tests

All 309 tests in ext/uri/tests pass. The full normalize path still runs for URIs that need it (checked with http://EXAMPLE.com/A/%2e%2e/c resolving to /c) via the nonzero dirty mask.

TimWolla · 2026-04-11T21:17:46Z

Five targeted changes in ext/uri/uri_parser_rfc3986.c; the full per-change breakdown is in the commit message.

Putting five changes into a single commit is not "targeted".

Benchmark

Please provide the benchmarking script for independent verification.

iliaal · 2026-04-11T21:44:56Z

Re: benchmark script. Save the script below as bench.php, run it from the build root:

taskset -c 0 sapi/cli/php -n -d memory_limit=2G bench.php

Repeat 5 or 6 times per direction and use a same-session A/B (stash the patch, rebuild, rerun in the same shell session) to keep CPU cache and scheduler state consistent between baseline and optimized runs. The warmup is 1000 iterations; the timed loop is 100K × 17 URLs = 1.7M parses per measurement.

bench.php

<?php
// Measures Uri\Rfc3986\Uri::parse(), Uri\WhatWg\Url::parse(), and
// full parse-then-read for both backends.

$urls = [
    'simple_http'    => 'http://example.com/',
    'https_host'     => 'https://www.example.com',
    'deep_path'      => 'https://example.com/a/b/c/d/e/f/index.html',
    'with_query'     => 'https://example.com/search?q=test&lang=en&category=web',
    'with_fragment'  => 'https://example.com/docs/guide#section-3.2',
    'with_userinfo'  => 'https://user:pass@example.com:8080/secure/area',
    'long_query'     => 'https://api.example.com/v2/users?filter=name&sort=desc&page=1&per_page=100&include=profile,settings&fields=id,name,email,created_at',
    'ipv4'           => 'http://192.168.1.1:8080/admin/panel',
    'ipv6'           => 'http://[2001:db8::1]:8080/index',
    'unicode_path'   => 'https://example.com/%E6%97%A5%E6%9C%AC/path',
    'ftp'            => 'ftp://files.example.com/pub/release/latest.tar.gz',
    'mailto'         => 'mailto:user@example.com',
    'urn'            => 'urn:isbn:0451450523',
    'data'           => 'data:text/plain;base64,SGVsbG8gV29ybGQ=',
    'only_path'      => '/path/to/resource?x=1',
    'relative'       => '../relative/path.html',
    'long_url'       => 'https://www.example.com/very/long/path/with/many/segments/that/go/on/and/on/file.html?first=1&second=2&third=3&fourth=4&fifth=5#anchor-point',
];

$iterations = 100000;

function bench_rfc3986(array $urls, int $iter): float {
    $t = hrtime(true);
    for ($i = 0; $i < $iter; $i++) {
        foreach ($urls as $u) {
            \Uri\Rfc3986\Uri::parse($u);
        }
    }
    return (hrtime(true) - $t) / 1e9;
}

function bench_whatwg(array $urls, int $iter): float {
    $t = hrtime(true);
    for ($i = 0; $i < $iter; $i++) {
        foreach ($urls as $u) {
            \Uri\WhatWg\Url::parse($u);
        }
    }
    return (hrtime(true) - $t) / 1e9;
}

function bench_rfc3986_full(array $urls, int $iter): float {
    $t = hrtime(true);
    for ($i = 0; $i < $iter; $i++) {
        foreach ($urls as $u) {
            $o = \Uri\Rfc3986\Uri::parse($u);
            if ($o === null) continue;
            $o->getScheme(); $o->getHost(); $o->getPort();
            $o->getPath(); $o->getQuery(); $o->getFragment();
            $o->getUserInfo();
        }
    }
    return (hrtime(true) - $t) / 1e9;
}

function bench_whatwg_full(array $urls, int $iter): float {
    $t = hrtime(true);
    for ($i = 0; $i < $iter; $i++) {
        foreach ($urls as $u) {
            $o = \Uri\WhatWg\Url::parse($u);
            if ($o === null) continue;
            $o->getScheme(); $o->getAsciiHost(); $o->getPort();
            $o->getPath(); $o->getQuery(); $o->getFragment();
            $o->getUsername(); $o->getPassword();
        }
    }
    return (hrtime(true) - $t) / 1e9;
}

bench_rfc3986($urls, 1000); // warmup
bench_whatwg($urls, 1000);

$runs = 5;
$best_rfc = INF; $best_wg = INF; $best_rfc_full = INF; $best_wg_full = INF;
for ($r = 0; $r < $runs; $r++) {
    $t = bench_rfc3986($urls, $iterations);      if ($t < $best_rfc)      $best_rfc      = $t;
    $t = bench_whatwg($urls, $iterations);       if ($t < $best_wg)       $best_wg       = $t;
    $t = bench_rfc3986_full($urls, $iterations); if ($t < $best_rfc_full) $best_rfc_full = $t;
    $t = bench_whatwg_full($urls, $iterations);  if ($t < $best_wg_full)  $best_wg_full  = $t;
}

$total = $iterations * count($urls);
printf("rfc3986 parse only:      %.4fs  (%.0f/s)\n", $best_rfc,      $total / $best_rfc);
printf("whatwg  parse only:      %.4fs  (%.0f/s)\n", $best_wg,       $total / $best_wg);
printf("rfc3986 parse + 7 reads: %.4fs  (%.0f/s)\n", $best_rfc_full, $total / $best_rfc_full);
printf("whatwg  parse + 8 reads: %.4fs  (%.0f/s)\n", $best_wg_full,  $total / $best_wg_full);

Re: "not targeted". The word choice isn't an argument for splitting. On the actual question: three of the five are separable from each other, but two pairs cannot be split without breaking the tree at intermediate commits.

emalloc in uriparser_create_uris (4) and the guarded uriFreeUriMembersMmA on normalized_uri in _destroy (5) have to land in the same commit. The emalloc leaves normalized_uri as uninitialized bytes. Without the destroy guard in the same commit, destroy reads garbage and segfaults. With the guard added first and the emalloc later, the guard is dead code.
The port cache (2) is one invariant spread across three sites: parse-time stash, read-time short-circuit, and write-time invalidate. Pull any one of them out and the others leave state inconsistent: a stash nothing reads, or reads that return stale values after a write.

TimWolla · 2026-04-11T22:26:38Z

On the actual question: three of the five are separable from each other, but two pairs cannot be split without breaking the tree at intermediate commits.

Why did you initially list them as five individual "targeted" changes when some of them cannot stand on their own?

but two pairs cannot be split

You list only one pair and one individual patch in the following list.

Either way, this PR needs to be split into actually targeted changes that each stand on their own and come with their own justification as to why the added complexity to the code is worth it. That includes a targeted benchmark for each change, ideally measured with Hyperfine. It is not useful to benchmark code that clearly is unaffected by the change (e.g. the WHATWG implementation, or calling multiple getters after the normalized URL is already cached).

When Uri\Rfc3986\Uri::parse() produces a URI already in canonical form (the common case: http/https URLs with no uppercase host, no percent-encoding in unreserved ranges, no ".." path segments), get_normalized_uri() no longer deep-copies the parsed struct and runs a full normalization pass. It calls uriNormalizeSyntaxMaskRequiredExA once to compute the dirty mask; a zero mask means we alias the raw uri. The struct caches the dirty mask, so multiple non-raw reads on the same instance only run the scan once. Fallback: when the mask is nonzero, we copy and normalize as before, but only for the flagged components (uriNormalizeSyntaxExMmA(..., dirty_mask, ...) instead of (..., -1, ...)). Measurements on a 17-URL mix with a realistic parse-and-read workload (10 runs of 1.7M parses each, CPU pinned via taskset, same-session stash-pop A/B so both builds share machine state): baseline mean optimized mean delta parse only 0.3992s (4.26M/s) 0.4083s (4.16M/s) noise parse + 1 read 0.6687s (2.54M/s) 0.5464s (3.11M/s) -18.3% parse + 7 reads 0.8510s (2.00M/s) 0.7305s (2.33M/s) -14.2% The "parse + 1 read" row isolates the first-read cost where this change lands. The "parse + 7 reads" row shows the amortized effect under a realistic user pattern: the first getter pays the reduced normalization cost, and the remaining six getters hit the cached normalized uri and cost the same as before. hyperfine cross-check on the whole benchmark script, 15 runs each: baseline 20.471 s +/- 1.052 s [19.535 .. 22.985] optimized 17.240 s +/- 0.540 s [16.556 .. 18.190] optimized runs 1.19 +/- 0.07 times faster. All 309 tests in ext/uri/tests pass. I checked that URIs needing normalization (http://EXAMPLE.com/A/%2e%2e/c resolving to /c) still hit the full normalize path through the nonzero dirty mask.

iliaal · 2026-04-11T23:20:44Z

It is not useful to benchmark code that clearly is unaffected by the change... calling multiple getters after the normalized URL is already cached

Every iteration of the benchmark parses a fresh URI, so the normalization cache starts cold on each call. The first getter is where this optimization lands, and the subsequent six amortize that saving. That's a realistic read pattern. I've added a parse + 1 read variant to the new benchmark that isolates the first-read cost on its own, which is the clearest single number for this change (−18.3%).

I've also rewritten the commit to carry only the normalize-alias fast path. That's the single biggest contributor of the five original changes. The other four (port cache, inline digit accumulator in port_str_to_zend_long_checked, emalloc in uriparser_create_uris, guarded normalized_uri destroy) are marginal on their own and I'm keeping them in my own fork. The per-change review overhead doesn't make sense.

Updated PR description has the per-scenario numbers (three access patterns, min/mean/median/stddev across 10 runs) plus a hyperfine cross-check (1.19 ± 0.07x faster, 15 runs per direction, process-level). Bench script is in the body under <details> so you can reproduce.

LamentXU123 · 2026-04-12T07:38:30Z

Bench script is in the body under <details> so you can reproduce.

It's in your PR description under the 'Reproducing' section I presume. It seems like your benchmark already has a timer function? No need to add it if you are running hyperfine.

Every iteration of the benchmark parses a fresh URI, so the normalization cache starts cold on each call. The first getter is where this optimization lands, and the subsequent six amortize that saving. That's a realistic read pattern.

It is actually cached.

I would try reproduce your changes later.

iliaal · 2026-04-12T10:55:18Z

It's in your PR description under the 'Reproducing' section I presume. It seems like your benchmark already has a timer function? No need to add it if you are running hyperfine.

Admittedly a duplication, I prefer timers inside the code itself, hyperfine (new tool I learned about) is fine (no pun intended 😆 ) but it measures entire PHP stack, which in theory is the same across runs, while hrtime() (used here) and microtime(1) time specifically the code being evaluated and don't include any php init/shutdown overhead.

Every iteration of the benchmark parses a fresh URI, so the normalization cache starts cold on each call. The first getter is where this optimization lands, and the subsequent six amortize that saving. That's a realistic read pattern.

It is actually cached.

Uri\Rfc3986\Uri::parse() returns a new object each iteration. The normalized cache lives on the object instance (normalized_uri_initialized = false at construction). So yes, within one iteration getters 2-7 hit the warm cache, but the first getter on each freshly-parsed URI pays the full normalization cost. That's what parse + 1 read isolates. The "parse + 7 reads" variant is there to show the amortized pattern, not to claim all seven benefit from the optimization.

TimWolla · 2026-04-12T12:02:13Z

ext/uri/uri_parser_rfc3986.c

 struct php_uri_parser_rfc3986_uris {
 	UriUriA uri;
 	UriUriA normalized_uri;
+	unsigned int dirty_mask;
 	bool normalized_uri_initialized;
+	bool normalized_uri_is_alias;
+	bool dirty_mask_valid;
 };


I don't think we need all these additional fields. dirty_mask_valid is guaranteed to be identical to normalized_uri_initialized and normalized_uri_is_alias is identical to dirty_mask == URI_NORMALIZED.

TimWolla · 2026-04-12T12:05:24Z

ext/uri/uri_parser_rfc3986.c

 struct php_uri_parser_rfc3986_uris {
 	UriUriA uri;
 	UriUriA normalized_uri;
+	unsigned int dirty_mask;


dirty_mask is a misleading name, it implies some fields need to be stored after changing them.

TimWolla · 2026-04-12T12:06:59Z

ext/uri/uri_parser_rfc3986.c

+			uriparser_uris->dirty_mask_valid = true;
+		}
+
+		if (uriparser_uris->dirty_mask == 0) {


Should check against URI_NORMALIZED.

iliaal requested review from TimWolla and kocsismate as code owners April 11, 2026 19:59

iliaal mentioned this pull request Apr 11, 2026

ext/uri: speed up Uri\Rfc3986\Uri component reads (~22%) iliaal/php-src#30

Closed

github-actions bot added the Extension: uri label Apr 11, 2026

iliaal force-pushed the perf/uri-rfc3986-reads branch from b5d6508 to 6c8cc4c Compare April 11, 2026 23:13

iliaal changed the title ~~ext/uri: speed up Uri\Rfc3986\Uri component reads (~22%)~~ ext/uri: fast-path canonical URIs in get_normalized_uri Apr 11, 2026

TimWolla requested changes Apr 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ext/uri: fast-path canonical URIs in get_normalized_uri#21726

ext/uri: fast-path canonical URIs in get_normalized_uri#21726
iliaal wants to merge 1 commit intophp:masterfrom
iliaal:perf/uri-rfc3986-reads

iliaal commented Apr 11, 2026 •

edited

Loading

Uh oh!

TimWolla commented Apr 11, 2026

Uh oh!

iliaal commented Apr 11, 2026 •

edited

Loading

Uh oh!

TimWolla commented Apr 11, 2026

Uh oh!

iliaal commented Apr 11, 2026

Uh oh!

LamentXU123 commented Apr 12, 2026 •

edited

Loading

Uh oh!

iliaal commented Apr 12, 2026

Uh oh!

TimWolla Apr 12, 2026

Uh oh!

TimWolla Apr 12, 2026

Uh oh!

TimWolla Apr 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

iliaal commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Benchmark

hyperfine cross-check

Reproducing

Tests

Uh oh!

TimWolla commented Apr 11, 2026

Uh oh!

iliaal commented Apr 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TimWolla commented Apr 11, 2026

Uh oh!

iliaal commented Apr 11, 2026

Uh oh!

LamentXU123 commented Apr 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

iliaal commented Apr 12, 2026

Uh oh!

TimWolla Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

TimWolla Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

TimWolla Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

iliaal commented Apr 11, 2026 •

edited

Loading

iliaal commented Apr 11, 2026 •

edited

Loading

LamentXU123 commented Apr 12, 2026 •

edited

Loading