Skip to content

mpremote: Add smart encoding selection for fs_writefile.#11

Open
andrewleech wants to merge 1 commit intomasterfrom
feature/smart-encoding-fs-writefile
Open

mpremote: Add smart encoding selection for fs_writefile.#11
andrewleech wants to merge 1 commit intomasterfrom
feature/smart-encoding-fs-writefile

Conversation

@andrewleech
Copy link
Copy Markdown
Owner

Summary

mpremote fs cp file transfers to device are slow because fs_writefile() uses repr() encoding, expanding each byte to \xNN (~4x wire overhead for binary data).

This adds automatic encoding selection with a three-tier fallback:

  1. deflate+base64 — device has deflate module and data compresses >20%
  2. base64 — device has binascii.a2b_base64 but data doesn't compress well
  3. repr — universal fallback (existing behaviour, unchanged)

The ROMFS deploy path is updated to share the new compression utilities and capability detection, replacing its inline zlib.compressobj(wbits=-9), hardcoded wbits value, and 14-line try/except capability detection. Also fixes a missing .strip() on the ROMFS base64-only encoding path.

Testing

64 transfer+readback integrity tests on STM32WB55 over 115200 baud UART with SPI flash. All verified via SHA-256 readback.

Random binary (incompressible, ratio ~1.0 — auto selects base64):

Size repr base64 deflate auto best/repr
1 KB 1.56 KB/s 2.87 KB/s 2.28 KB/s 2.81 KB/s 1.8x
5 KB 1.87 KB/s 4.19 KB/s 3.88 KB/s 4.18 KB/s 2.2x
10 KB 1.91 KB/s 4.52 KB/s 4.46 KB/s 4.49 KB/s 2.4x
50 KB 1.96 KB/s 4.77 KB/s 4.91 KB/s 4.76 KB/s 2.5x

Python source (ratio ~0.4 — auto selects deflate):

Size repr base64 deflate auto best/repr
1 KB 2.26 KB/s 2.88 KB/s 3.06 KB/s 2.75 KB/s 1.4x
5 KB 2.91 KB/s 4.13 KB/s 6.28 KB/s 5.98 KB/s 2.2x
10 KB 2.71 KB/s 4.52 KB/s 8.03 KB/s 7.58 KB/s 3.0x
50 KB 3.18 KB/s 4.79 KB/s 9.15 KB/s 9.41 KB/s 3.0x

Log data (ratio ~0.5 — auto selects deflate):

Size repr base64 deflate auto best/repr
1 KB 2.37 KB/s 2.81 KB/s 2.97 KB/s 2.68 KB/s 1.3x
5 KB 3.03 KB/s 4.23 KB/s 5.55 KB/s 5.47 KB/s 1.8x
10 KB 3.08 KB/s 4.56 KB/s 6.76 KB/s 6.84 KB/s 2.2x
50 KB 3.52 KB/s 4.82 KB/s 7.83 KB/s 7.54 KB/s 2.2x

All zeros (ratio ~0.005 — auto selects deflate):

Size repr base64 deflate auto best/repr
1 KB 1.23 KB/s 2.43 KB/s 3.67 KB/s 3.90 KB/s 3.2x
5 KB 1.50 KB/s 2.86 KB/s 13.68 KB/s 13.25 KB/s 9.1x
10 KB 1.53 KB/s 4.54 KB/s 18.55 KB/s 18.41 KB/s 12.2x
50 KB 1.54 KB/s 4.76 KB/s 23.98 KB/s 23.92 KB/s 15.6x

Auto-selection picks the fastest encoding for each data type in all cases.

Not tested on other ports or boards.

Trade-offs and Alternatives

  • chunk_size default changes from 256 to None (auto-sized per encoding). Callers omitting chunk_size get 256 for repr (matching prior behaviour). Explicit values are respected.
  • Devices without binascii.a2b_base64 fall back to repr() with no behaviour change.
  • Device capabilities are probed once via hasattr() and cached for the session.
  • An alternative would be to always use base64 without deflate, which would be simpler but miss the 2-3x additional speedup on compressible data (typical firmware files).

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 25, 2026

Code size report:

Reference:  zephyr/mpconfigport: Remove duplicate builtins.open definition. [1ab9b66]
Comparison: mpremote: Add smart encoding selection for fs_writefile. [merge of 7f0d6a6]
  mpy-cross:   +80 +0.021% 
   bare-arm:   +28 +0.050% 
minimal x86:   +42 +0.022% 
   unix x64:   +56 +0.007% standard
      stm32:   +96 +0.024% PYBV10
      esp32:  +508 +0.029% ESP32_GENERIC
     mimxrt:   +88 +0.023% TEENSY40
        rp2: +1232 +0.134% RPI_PICO_W[incl +8(bss)]
       samd:   +88 +0.032% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:   +30 +0.007% VIRT_RV32

@andrewleech
Copy link
Copy Markdown
Owner Author

/review

Copy link
Copy Markdown

@mpy-reviewer mpy-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a critical bug in _detect_encoding_capabilities: the single dict eval includes __import__('deflate') first, so on any device without the deflate module, ImportError is raised mid-expression and the whole eval fails. The except Exception path then sets both deflate and base64 to False, meaning devices without deflate permanently lose base64 encoding and fall back to repr — a regression. The deflate and base64 capabilities need to be probed independently. Also, test_compression_ratio is defined twice in compression_utils.py.

Comment thread tools/mpremote/mpremote/transport.py Outdated
try:
caps = self.eval(
"{"
"'deflate':hasattr(__import__('deflate'),'DeflateIO'),"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If deflate is not available on the device, __import__('deflate') raises ImportError, aborting the entire dict construction. The except Exception at the outer level then sets both deflate and base64 to False — even though binascii.a2b_base64 is almost certainly present. On any device without deflate this PR is a regression vs repr.

The capabilities must be probed independently. Either do two separate eval calls, or use a device-side try/except around the deflate import:

caps = self.eval(
    "{'base64':hasattr(__import__('binascii'),'a2b_base64'),"
    "'bytesio':hasattr(__import__('io'),'BytesIO')}"
)
try:
    has_dfl = self.eval("hasattr(__import__('deflate'),'DeflateIO')")
except Exception:
    has_dfl = False
caps['deflate'] = has_dfl and caps.get('base64') and caps.get('bytesio')

if not sample:
return 1.0
compressed = compress_chunk(sample)
return len(compressed) / len(sample)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_compression_ratio is already defined at line 69. Remove the duplicate.

@@ -0,0 +1,86 @@
#!/usr/bin/env python3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the shebang; this is a library module, not a standalone script.

Comment thread tools/mpremote/mpremote/transport.py Outdated
# Setup imports and file handle on device
if encoding == "deflate":
self.exec(
"from binascii import a2b_base64 as _a\n"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ROMFS path in commands.py imports under the full names (a2b_base64, BytesIO, DeflateIO, RAW). Using single-letter aliases (_a, _B, _D, _R) here is inconsistent. Please use the same names in both paths.

@andrewleech
Copy link
Copy Markdown
Owner Author

/review

Copy link
Copy Markdown

@mpy-reviewer mpy-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .strip() fix in the ROMFS base64 path is correct and the performance gains are well documented. However there is a correctness bug in _detect_encoding_capabilities that will regress base64 support on devices without deflate, and a few smaller issues below.

Comment thread tools/mpremote/mpremote/transport.py Outdated
try:
caps = self.eval(
"{"
"'deflate':hasattr(__import__('deflate'),'DeflateIO'),"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will raise ImportError on devices that don't have the deflate module, causing the entire eval() to fail. The except Exception then sets base64 to False too, which is a regression — devices without deflate but with binascii.a2b_base64 will silently lose base64 support and fall back to repr.

The old code probed each capability with a separate try/exec block precisely to avoid this. Either do the same here, or guard the import in the eval string, e.g. with a helper that catches ImportError on the device side.

Comment thread tools/mpremote/mpremote/transport.py Outdated
"deflate": caps.get("deflate") and caps.get("bytesio") and caps.get("base64"),
"base64": caps.get("base64"),
}
except Exception:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Catching bare Exception here silently swallows errors that are not ImportError / transport errors (e.g. OOM on the device, serialisation bugs). Should be except (Exception,): at minimum with a comment, or ideally just except TransportExecError.

Comment thread tools/mpremote/mpremote/commands.py Outdated
chunk_size = max(chunk_size, rom_min_write)

# Detect capabilities of the device to use the fastest method of transfer.
caps = transport._detect_encoding_capabilities()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_detect_encoding_capabilities is a private method (underscore prefix). Calling it directly from commands.py breaks the encapsulation the underscore signals. Either make it public or expose the capabilities through a higher-level API.

@@ -0,0 +1,86 @@
#!/usr/bin/env python3
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the shebang — this is a library module, not a standalone script.

def compress_chunk(data, wbits=DEFAULT_WBITS):
"""Compress a single chunk using raw deflate.

Each chunk is independently compressed/decompressable, which is required
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"decompressable" → "decompressible"

Returns:
Ratio of compressed/original size (0.0-1.0+). Lower = better compression.
"""
sample = data[:sample_size] if len(data) > sample_size else data
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data[:sample_size] if len(data) > sample_size else data can just be data[:sample_size] — Python slicing past the end is safe.

@andrewleech andrewleech force-pushed the feature/smart-encoding-fs-writefile branch 2 times, most recently from a46e4ae to 7f0d6a6 Compare March 2, 2026 11:16
Automatically detect device capabilities (deflate, base64, bytes.fromhex)
and select the best encoding for file transfers.  Deflate+base64 is used
when the device supports it and data compresses well, base64 alone as a
fallback, and repr as the universal fallback.  Each capability is probed
independently so a missing deflate module does not suppress base64
detection.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
@andrewleech andrewleech force-pushed the feature/smart-encoding-fs-writefile branch from 7f0d6a6 to 47d6725 Compare March 27, 2026 19:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants