ext/uri: fast-path canonical URIs in get_normalized_uri#21726
ext/uri: fast-path canonical URIs in get_normalized_uri#21726iliaal wants to merge 1 commit intophp:masterfrom
Conversation
Putting five changes into a single commit is not "targeted".
Please provide the benchmarking script for independent verification. |
|
Re: benchmark script. Save the script below as Repeat 5 or 6 times per direction and use a same-session A/B (stash the patch, rebuild, rerun in the same shell session) to keep CPU cache and scheduler state consistent between baseline and optimized runs. The warmup is 1000 iterations; the timed loop is 100K × 17 URLs = 1.7M parses per measurement. bench.php<?php
// Measures Uri\Rfc3986\Uri::parse(), Uri\WhatWg\Url::parse(), and
// full parse-then-read for both backends.
$urls = [
'simple_http' => 'http://example.com/',
'https_host' => 'https://www.example.com',
'deep_path' => 'https://example.com/a/b/c/d/e/f/index.html',
'with_query' => 'https://example.com/search?q=test&lang=en&category=web',
'with_fragment' => 'https://example.com/docs/guide#section-3.2',
'with_userinfo' => 'https://user:pass@example.com:8080/secure/area',
'long_query' => 'https://api.example.com/v2/users?filter=name&sort=desc&page=1&per_page=100&include=profile,settings&fields=id,name,email,created_at',
'ipv4' => 'http://192.168.1.1:8080/admin/panel',
'ipv6' => 'http://[2001:db8::1]:8080/index',
'unicode_path' => 'https://example.com/%E6%97%A5%E6%9C%AC/path',
'ftp' => 'ftp://files.example.com/pub/release/latest.tar.gz',
'mailto' => 'mailto:user@example.com',
'urn' => 'urn:isbn:0451450523',
'data' => 'data:text/plain;base64,SGVsbG8gV29ybGQ=',
'only_path' => '/path/to/resource?x=1',
'relative' => '../relative/path.html',
'long_url' => 'https://www.example.com/very/long/path/with/many/segments/that/go/on/and/on/file.html?first=1&second=2&third=3&fourth=4&fifth=5#anchor-point',
];
$iterations = 100000;
function bench_rfc3986(array $urls, int $iter): float {
$t = hrtime(true);
for ($i = 0; $i < $iter; $i++) {
foreach ($urls as $u) {
\Uri\Rfc3986\Uri::parse($u);
}
}
return (hrtime(true) - $t) / 1e9;
}
function bench_whatwg(array $urls, int $iter): float {
$t = hrtime(true);
for ($i = 0; $i < $iter; $i++) {
foreach ($urls as $u) {
\Uri\WhatWg\Url::parse($u);
}
}
return (hrtime(true) - $t) / 1e9;
}
function bench_rfc3986_full(array $urls, int $iter): float {
$t = hrtime(true);
for ($i = 0; $i < $iter; $i++) {
foreach ($urls as $u) {
$o = \Uri\Rfc3986\Uri::parse($u);
if ($o === null) continue;
$o->getScheme(); $o->getHost(); $o->getPort();
$o->getPath(); $o->getQuery(); $o->getFragment();
$o->getUserInfo();
}
}
return (hrtime(true) - $t) / 1e9;
}
function bench_whatwg_full(array $urls, int $iter): float {
$t = hrtime(true);
for ($i = 0; $i < $iter; $i++) {
foreach ($urls as $u) {
$o = \Uri\WhatWg\Url::parse($u);
if ($o === null) continue;
$o->getScheme(); $o->getAsciiHost(); $o->getPort();
$o->getPath(); $o->getQuery(); $o->getFragment();
$o->getUsername(); $o->getPassword();
}
}
return (hrtime(true) - $t) / 1e9;
}
bench_rfc3986($urls, 1000); // warmup
bench_whatwg($urls, 1000);
$runs = 5;
$best_rfc = INF; $best_wg = INF; $best_rfc_full = INF; $best_wg_full = INF;
for ($r = 0; $r < $runs; $r++) {
$t = bench_rfc3986($urls, $iterations); if ($t < $best_rfc) $best_rfc = $t;
$t = bench_whatwg($urls, $iterations); if ($t < $best_wg) $best_wg = $t;
$t = bench_rfc3986_full($urls, $iterations); if ($t < $best_rfc_full) $best_rfc_full = $t;
$t = bench_whatwg_full($urls, $iterations); if ($t < $best_wg_full) $best_wg_full = $t;
}
$total = $iterations * count($urls);
printf("rfc3986 parse only: %.4fs (%.0f/s)\n", $best_rfc, $total / $best_rfc);
printf("whatwg parse only: %.4fs (%.0f/s)\n", $best_wg, $total / $best_wg);
printf("rfc3986 parse + 7 reads: %.4fs (%.0f/s)\n", $best_rfc_full, $total / $best_rfc_full);
printf("whatwg parse + 8 reads: %.4fs (%.0f/s)\n", $best_wg_full, $total / $best_wg_full);Re: "not targeted". The word choice isn't an argument for splitting. On the actual question: three of the five are separable from each other, but two pairs cannot be split without breaking the tree at intermediate commits.
|
Why did you initially list them as five individual "targeted" changes when some of them cannot stand on their own?
You list only one pair and one individual patch in the following list. Either way, this PR needs to be split into actually targeted changes that each stand on their own and come with their own justification as to why the added complexity to the code is worth it. That includes a targeted benchmark for each change, ideally measured with Hyperfine. It is not useful to benchmark code that clearly is unaffected by the change (e.g. the WHATWG implementation, or calling multiple getters after the normalized URL is already cached). |
When Uri\Rfc3986\Uri::parse() produces a URI already in canonical form
(the common case: http/https URLs with no uppercase host, no
percent-encoding in unreserved ranges, no ".." path segments),
get_normalized_uri() no longer deep-copies the parsed struct and runs
a full normalization pass. It calls uriNormalizeSyntaxMaskRequiredExA
once to compute the dirty mask; a zero mask means we alias the raw
uri. The struct caches the dirty mask, so multiple non-raw reads on
the same instance only run the scan once.
Fallback: when the mask is nonzero, we copy and normalize as before,
but only for the flagged components (uriNormalizeSyntaxExMmA(...,
dirty_mask, ...) instead of (..., -1, ...)).
Measurements on a 17-URL mix with a realistic parse-and-read workload
(10 runs of 1.7M parses each, CPU pinned via taskset, same-session
stash-pop A/B so both builds share machine state):
baseline mean optimized mean delta
parse only 0.3992s (4.26M/s) 0.4083s (4.16M/s) noise
parse + 1 read 0.6687s (2.54M/s) 0.5464s (3.11M/s) -18.3%
parse + 7 reads 0.8510s (2.00M/s) 0.7305s (2.33M/s) -14.2%
The "parse + 1 read" row isolates the first-read cost where this
change lands. The "parse + 7 reads" row shows the amortized effect
under a realistic user pattern: the first getter pays the reduced
normalization cost, and the remaining six getters hit the cached
normalized uri and cost the same as before.
hyperfine cross-check on the whole benchmark script, 15 runs each:
baseline 20.471 s +/- 1.052 s [19.535 .. 22.985]
optimized 17.240 s +/- 0.540 s [16.556 .. 18.190]
optimized runs 1.19 +/- 0.07 times faster.
All 309 tests in ext/uri/tests pass. I checked that URIs needing
normalization (http://EXAMPLE.com/A/%2e%2e/c resolving to /c) still
hit the full normalize path through the nonzero dirty mask.
b5d6508 to
6c8cc4c
Compare
Every iteration of the benchmark parses a fresh URI, so the normalization cache starts cold on each call. The first getter is where this optimization lands, and the subsequent six amortize that saving. That's a realistic read pattern. I've added a I've also rewritten the commit to carry only the normalize-alias fast path. That's the single biggest contributor of the five original changes. The other four (port cache, inline digit accumulator in Updated PR description has the per-scenario numbers (three access patterns, min/mean/median/stddev across 10 runs) plus a hyperfine cross-check (1.19 ± 0.07x faster, 15 runs per direction, process-level). Bench script is in the body under |
It's in your PR description under the 'Reproducing' section I presume. It seems like your benchmark already has a timer function? No need to add it if you are running hyperfine.
It is actually cached. I would try reproduce your changes later. |
Admittedly a duplication, I prefer timers inside the code itself, hyperfine (new tool I learned about) is fine (no pun intended 😆 ) but it measures entire PHP stack, which in theory is the same across runs, while
|
| struct php_uri_parser_rfc3986_uris { | ||
| UriUriA uri; | ||
| UriUriA normalized_uri; | ||
| unsigned int dirty_mask; | ||
| bool normalized_uri_initialized; | ||
| bool normalized_uri_is_alias; | ||
| bool dirty_mask_valid; | ||
| }; |
There was a problem hiding this comment.
I don't think we need all these additional fields. dirty_mask_valid is guaranteed to be identical to normalized_uri_initialized and normalized_uri_is_alias is identical to dirty_mask == URI_NORMALIZED.
| struct php_uri_parser_rfc3986_uris { | ||
| UriUriA uri; | ||
| UriUriA normalized_uri; | ||
| unsigned int dirty_mask; |
There was a problem hiding this comment.
dirty_mask is a misleading name, it implies some fields need to be stored after changing them.
| uriparser_uris->dirty_mask_valid = true; | ||
| } | ||
|
|
||
| if (uriparser_uris->dirty_mask == 0) { |
There was a problem hiding this comment.
Should check against URI_NORMALIZED.
Summary
Fast-path
get_normalized_uri()inext/uri/uri_parser_rfc3986.cwhen the parsed URI is already in canonical form. A single call touriNormalizeSyntaxMaskRequiredExAreturns the dirty mask; a zero mask means we alias the raw uri instead of runninguriCopyUriMmAplus a fulluriNormalizeSyntaxExMmA(..., -1, ...)pass. The struct caches the dirty mask so multiple non-raw reads on the same instance only run the scan once.This commit replaces an earlier version that bundled four additional changes (port cache, inline digit accumulator in
port_str_to_zend_long_checked,emallocinuriparser_create_uris, guardednormalized_uridestroy). Per review feedback from @TimWolla, the PR now carries only the one change that stands on its own merits.Benchmark
17 URL shapes: plain http/https, deep paths, query/fragment, userinfo, IPv4, IPv6, mailto, URN, data, file, relative, long paths with query. 100K iterations per run × 17 URLs = 1.7M parses per measurement, 10 runs per scenario, CPU pinned via
taskset -c 0, same-session stash-pop A/B.parse + 1 readisolates the first-read cost where this optimization lands.parse + 7 readsshows the realistic user pattern: the first getter pays the reduced normalization cost, the remaining six hit the cached normalized uri and cost the same as before.hyperfine cross-check
15 runs of the full benchmark script per direction, CPU pinned:
Reproducing
Save the script below as
bench.php. Build a baseline and an optimizedsapi/cli/php, then:Or with hyperfine:
bench.php
Tests
All 309 tests in
ext/uri/testspass. The full normalize path still runs for URIs that need it (checked withhttp://EXAMPLE.com/A/%2e%2e/cresolving to/c) via the nonzero dirty mask.