Skip to content

extmod/zephyr_ble: Add L2CAPChannel stream type and TX pipeline flush.#27

Draft
andrewleech wants to merge 31 commits intopr/zephyr-ble-nrffrom
pr/zephyr-ble-l2cap
Draft

extmod/zephyr_ble: Add L2CAPChannel stream type and TX pipeline flush.#27
andrewleech wants to merge 31 commits intopr/zephyr-ble-nrffrom
pr/zephyr-ble-l2cap

Conversation

@andrewleech
Copy link
Copy Markdown
Owner

@andrewleech andrewleech commented Mar 16, 2026

Summary

This adds a stream API (L2CAPChannel type) to L2CAP CoC with automatic chunking to optimise transfer speed. It supports both blocking and non-blocking transfers via setblocking()/settimeout(), and works as a standard MicroPython stream with read()/write()/close() plus context manager support.

The existing raw API (l2cap_send/l2cap_recvinto) requires Python to manually chunk data to the peer's MTU size and handle stall/flow control via IRQ callbacks. Each l2cap_send call submits a single SDU, waits for a SEND_READY event, then sends the next — one round-trip per chunk. For a 10KB transfer at 512-byte MTU that's ~20 round-trips through the Python event loop.

The stream write() path calls send_bulk() in C which batches multiple SDU chunks per iteration, pre-allocates net_bufs for the batch, and drains in-flight chunks between batches using mp_event_wait_ms(). This eliminates the Python round-trip overhead and pipelines chunks through the controller. On Zephyr, batching up to L2CAP_BATCH_SIZE chunks at a time gives ~2.3x throughput vs the per-SDU raw API (31K vs 13K B/s on nRF52840 Z2Z).

NimBLE's send_bulk uses a simpler per-MTU loop with mem_stalled drain. BTstack is stubbed (EOPNOTSUPP).

This PR also includes several TX pipeline fixes that are prerequisites for the stream API to perform well:

  • TX flush: after bt_l2cap_chan_send(), explicitly flush with work_process() twice then poll_now(). Same after returning credits in recvinto(). Without this, PDU fragments sit in the TX queue until the next poll timer tick — kills throughput on Z2Z links where both sides use cooperative polling.
  • Adaptive poll reschedule: poll() and work_process() now return bool indicating work done. Port run_task implementations reschedule immediately when work was processed, idle interval otherwise.
  • DLE fix: set BT_HCI_QUIRK_NO_AUTO_DLE so Zephyr doesn't skip the DLE exchange (was silently capping PDUs at 27 bytes on some controllers).
  • L2CAP credit flow: proportional credit return per MPS consumed in recvinto() instead of batch return on full drain. RX buffer increased to 16KB for 65 initial credits at MPS=251.
  • STM32WB55 rfcore: use IPCC flag for ACL flow control instead of polling the channel status register.

Testing

Tested on nRF52840 dongle + WB55 (Z2Z) and Pico 2 W + PYBD (NimBLE central):

Stream API tests:

  • ble_l2cap_stream.py — 4KB blocking write + raw recv, data integrity verified. 3/3 stability runs on both device pairs.
  • perf_l2cap_bulk.py — 10KB stream write, measures throughput.

Throughput comparison (nRF52840 dongle + WB55 Z2Z):

  • Per-SDU raw API (perf_l2cap.py): 13,368 B/s
  • Stream bulk write (perf_l2cap_bulk.py): 31,030 B/s — 2.3x improvement

Throughput (Pico 2 W + PYBD):

  • Stream bulk write: 16,650 B/s

Regression: ble_l2cap.py and perf_l2cap.py pass unchanged on all device pairs.

Builds verified: PCA10059 zephyr_ble, RPI_PICO_W zephyr_ble, RPI_PICO2_W zephyr_ble, NUCLEO_WB55 zephyr_ble.

Trade-offs and Alternatives

L2CAP_SDU_BUF_COUNT increased from 5 to 10 for bulk batching headroom (~2.8KB additional RAM). Could be made conditional later if RAM pressure becomes an issue but the throughput gain justifies it for now.

readline deliberately excluded from L2CAPChannel — L2CAP is message-oriented and unbuffered readline across SDU boundaries gives confusing results.

Generative AI

I used generative AI tools when creating this PR, but a human has checked the code and is responsible for the description above.

pi-anl added 7 commits March 16, 2026 15:51
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Switch L2CAP CoC from recv+alloc_buf to the seg_recv API, which gives the
application per-PDU callbacks with manual credit control.  The old path
issued one credit per SDU, forcing the peer to wait for the full SDU to
be delivered before sending the next one.  With seg_recv, credits are
issued one per non-last PDU (allowing the peer to pipeline all K-frames
of a single SDU) and one credit per SDU from recvinto() (for the first
PDU of the next SDU), keeping at most one assembled SDU buffered.

Work around a Zephyr bug in l2cap_chan_seg_recv_rx_init() which leaves
rx.mps at zero for seg_recv channels (unlike l2cap_chan_rx_init for the
normal path), causing immediate channel disconnect on the first received
PDU.  Set rx.mps = BT_L2CAP_RX_MTU in l2cap_create_channel() and use
bt_l2cap_chan_give_credits() in accept/connect paths, matching the
pattern from Zephyr's credits_seg_recv test.

Also enable Data Length Extension (DLE) so the controller can negotiate
251-byte PDU payloads, reducing per-PDU overhead.

TX pipeline: allow up to L2CAP_SDU_BUF_COUNT-1 SDUs in flight concurrently
(tracked via tx_in_flight counter) rather than stalling after every send.

On nRF52840 dongle (PCA10059) with PYBD (NimBLE) as central:
  perf_l2cap.py before: ~2,184 B/s
  perf_l2cap.py after:  ~11,518 B/s  (5.3x improvement)
  All 11 BLE multitests pass.

Signed-off-by: Andrew Leech <andrew@alelec.net>
Replace single-SDU L2CAP accumulation buffer with a FIFO that
holds multiple SDUs.  Deep initial credit window (fills rx_buf)
allows the peer to pipeline SDUs without per-SDU credit
round-trips, which is critical for Z2Z throughput where each
credit round-trip costs 2+ connection intervals.

Add deferred L2CAP recv notification (rx_notify_pending) to
avoid re-entrancy between seg_recv_cb and Python IRQ handlers.
Each port's port_run_task must call flush_recv_notify() after
work_process completes.

Disable DLE auto-negotiation (CONFIG_BT_AUTO_DATA_LEN_UPDATE 0)
for CYW43 compatibility — CYW43 disconnects with "Instant
Passed" (0x16) when DLE is negotiated.

Add l2cap_status_cb TX kick via bt_tx_irq_raise() to unblock
queued SDUs when credits arrive.

Run codeformat.py on extmod files.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add ZEPHYR_BLE_POLL_INTERVAL_MS define (default 128ms)
matching NimBLE convention. IRQ-driven ports use poll_now()
for immediate processing; this is a fallback for timer
housekeeping.

Change CONFIG_BT_AUTO_DATA_LEN_UPDATE to #ifndef guard so
ports with capable controllers can override via CFLAGS.

Move random data generation out of the timed window in
perf_l2cap.py and use getrandbits(8) for faster generation.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
pi-anl added 4 commits March 16, 2026 20:26
Cleanup unused functions, macros and debug helpers that were never called
or only conditionally compiled behind disabled feature flags. Removes dead
registry system, PSA crypto stubs, LIFO operations, and various unused
helper functions and inlines from kernel/device/config headers. Also
deletes gatt_pragma.h which is unreferenced.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add Zephyr BLE variant configuration for PYBD_SF6 board, with HCI UART
readpacket support for bulk reading from the BT coprocessor.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Update mp_handle_pending() calls to use the
mp_handle_pending_behaviour_t enum instead of bool, matching
the current signature in py/scheduler.c.

Add mp_bluetooth_zephyr_l2cap_flush_recv_notify() call to
port_run_task, consistent with RP2 and nRF ports. Without this,
deferred L2CAP recv notifications were never delivered to Python.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 16, 2026

Code size report:

Reference:  nrf: Use shared poll interval define and enable DLE. [633fa4b]
Comparison: stm32: Use IPCC flag for ACL flow control in rfcore. [merge of 271e41b]
  mpy-cross:    +0 +0.000% 
   bare-arm:    +0 +0.000% 
minimal x86:    +0 +0.000% 
   unix x64:    +0 +0.000% standard
      stm32:    +0 +0.000% PYBV10
      esp32:    +0 +0.000% ESP32_GENERIC
     mimxrt:    +0 +0.000% TEENSY40
        rp2:    +0 +0.000% RPI_PICO_W
       samd:    +0 +0.000% ADAFRUIT_ITSYBITSY_M4_EXPRESS
  qemu rv32:    +0 +0.000% VIRT_RV32

pi-anl added 7 commits March 16, 2026 21:12
Use ZEPHYR_BLE_POLL_INTERVAL_MS for poll timer values.
Enable CONFIG_BT_AUTO_DATA_LEN_UPDATE for NUCLEO_WB55.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Replace the native Zephyr port's custom BLE bindings with the shared
extmod/zephyr_ble integration layer. This unifies the BLE API across
all ports using Zephyr BLE, with the native Zephyr port using Zephyr's
own kernel primitives instead of the HAL shim stubs.

Includes machine.idle() fix to yield to Zephyr threads, and test
fixes for nRF52840 DK BLE multitests.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Enable synchronous BLE events and increase UART RX buffer to 512 bytes
for reliable raw-paste operation on the nRF52840 DK.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
The nRF52840 DK's USB CDC ACM console was unreliable — device enumeration
failures and stalls during raw-paste mode. Switch to UART via JLink OB
(uart0 with hw-flow-control) which is always available.

Move USB device stack init from mp_task (after console init) to
zephyr_start.c main() (before console init) so CDC ACM UART is ready
when the console subsystem opens the device. Add DTR wait for CDC ACM
console boards so output isn't lost before a host connects.

Reduce MICROPY_REPL_STDIN_BUFFER_MAX to 64 (raw-paste window=32 bytes)
to avoid overflowing USB-UART bridge buffers at 115200 baud.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Extends the Zephyr BLE HAL layer to support running the BLE controller
on-core alongside the host stack. All changes are guarded by
MICROPY_BLUETOOTH_ZEPHYR_CONTROLLER so host-only ports are unaffected.

New files provide IRQ management (NVIC wrappers), ISR dispatch table,
clock control (HFCLK/LFCLK), and controller kernel stubs (k_poll,
k_thread_create). Existing shims are updated to use real PRIMASK-based
interrupt control and IPSR-based ISR detection when the controller is
active.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Adds Zephyr BLE controller source files (ULL, LLL, ticker, HAL) to
zephyr_ble.mk under MICROPY_BLUETOOTH_ZEPHYR_CONTROLLER=1 guard. Adds
controller configuration defines (CONFIG_BT_CTLR_*, CONFIG_SOC_*,
ticker, LLCP) and header stubs needed by controller code (devicetree,
IRQ, entropy, version).

Also enables L2CAP dynamic channels (COC) in the shared build flags
and adds LTO type-mismatch warning suppression for stub declarations
that intentionally differ from Zephyr internals.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add 1kHz SysTick handler with millisecond counter (uwTick), PendSV
dispatch mechanism for deferred soft timer processing, and SysTick
init function. Enable SEVONPEND so WFE wakes on SysTick interrupts.

Also unconditionally enable the MicroPython scheduler (previously
gated behind MICROPY_HW_ENABLE_USBDEV) since BLE event processing
requires mp_sched_schedule_node().

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
pi-anl added 12 commits March 16, 2026 21:12
Add Makefile integration, linker script extensions, and board variant
configurations (PCA10056, PCA10059) for building the nRF port with
Zephyr BLE. Includes PCA10059 DAPLink variant and hci_driver_poll_rx
support for the on-core controller.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add mpzephyrport_nrf.c with on-core BLE controller initialization,
LFXO startup, cooperative HCI polling, and scheduler node callback
handling. Includes NULL callback guard for safe deinit during scheduler
node draining.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Extend the bt_disable deinit path to the nRF port's on-core controller.
The controller is shut down via ll_deinit() called from bt_disable(),
with proper LFXO re-start on next init.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add work processing interleaving to the nRF port's controller polling
loop. Each HCI packet is followed by a work_process() call to ensure
connection events are handled before subsequent packets.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Move the remaining two Zephyr submodule patches (HCI driver, quirk
reset) into wrapper files, eliminating all custom patches from the
lib/zephyr submodule.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add mp_bluetooth_zephyr_l2cap_flush_recv_notify() call to
port_run_task and re-entrancy guard to prevent recursive
port_run_task execution.

Update lib/zephyr submodule with LLL preempt ticker fix —
ticker_stop(TICKER_ID_LLL_PREEMPT) in init_reset() prevents
ll_deinit assertion on second bt_disable/bt_enable cycle.

Run codeformat.py on nRF port files and hci_driver_wrapper.c.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Use ZEPHYR_BLE_POLL_INTERVAL_MS for poll timer values.
Enable CONFIG_BT_AUTO_DATA_LEN_UPDATE for PCA10056 and
PCA10059 on-core controller variants.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add mp_bluetooth_zephyr_l2cap_flush_recv_notify() call to
port_run_task and re-entrancy guard to prevent recursive
execution from nested mp_handle_pending() calls.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
The BT_DT_HCI_QUIRKS_GET stub was returning 0, so the Zephyr host
never sent LE_Set_Data_Length or LE_Write_Default_Data_Length even
with CONFIG_BT_AUTO_DATA_LEN_UPDATE=1.  The on-core controller
initialises default_tx_octets to 27 (PDU_DC_PAYLOAD_SIZE_MIN) and
does not auto-negotiate DLE — this matches Nordic's upstream
devicetree default of bt-hci-quirks: ["no-auto-dle"] for
zephyr,bt-hci-ll-sw-split.

Set the quirk globally and default CONFIG_BT_AUTO_DATA_LEN_UPDATE
to 1 so all ports get DLE.  Remove per-port CFLAGS overrides that
are now redundant.

Tested on nRF52840 dongle + WB55 Z2Z (11/11 BLE tests pass, DLE
confirmed via LE_Data_Length_Change event: tx=251/2120us).

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Increase L2CAP_RX_BUF_SIZE from 4KB to 16KB, giving 65 initial credits
at MPS=251.  This allows the sender to pipeline ~21 SDUs of 512 bytes
without any credit round-trips during transfer.

Replace the batch credit return scheme (rx_credits_pending/returned,
gated on all data consumed) with position-based proportional credits.
Credits are returned per MPS-worth of buffer consumed in recvinto()
rather than waiting for all complete SDU data to drain.

Pre-allocate the receive buffer in perf_l2cap.py and use slice
assignment instead of bytearray.extend() to remove reallocation
overhead from the timed measurement window.

Tested on Pico 2 W + nRF52840 dongle Z2Z: 46,334 B/s (was 13,617 B/s).

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Add L2CAPChannel type implementing the stream protocol for L2CAP CoC
channels with blocking, non-blocking, and timed modes.  Add send_bulk()
to all backends for streaming large payloads with batched chunking.

After bt_l2cap_chan_send(), run work_process twice then schedule async
transport processing via poll_now.  Similarly flush after returning
credits in recvinto().

Change poll() and work_process() to return bool indicating work done.
Use this in port_run_task for adaptive reschedule (immediate when work
was processed, idle interval otherwise).

Increase L2CAP_SDU_BUF_COUNT from 5 to 10 for bulk batching headroom.
Add tx_chunks_per_send tracking in sent_cb to gate SEND_READY events.
Validate negative settimeout() values per CPython semantics.  Clear
l2cap_channel_obj on BLE deactivation.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Replace NUM_COMPLETED_PACKETS-based ACL flow control with direct IPCC
channel flag polling.  The IPCC flag clears when M0+ consumes the shared
SRAM2B buffer, which is much faster than waiting for the radio round-trip
needed by NUM_COMPLETED_PACKETS.

Debug instrumentation confirmed the IPCC flag always clears before the
next ACL send attempt, so the timeout path is defensive only.

Signed-off-by: Andrew Leech <andrew.leech@planetinnovation.com.au>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants