Skip to content

xml.sax expatreader swallows KeyboardInterrupt during external entity parsing #148427

@WYSIATI

Description

@WYSIATI

When parsing XML with external entity resolution enabled, pressing Ctrl+C (or raising KeyboardInterrupt / SystemExit inside a content handler) is silently swallowed and converted into a generic SAXParseException.

The root cause is a bare except: in ExpatParser.external_entity_ref() at Lib/xml/sax/expatreader.py line 427:

try:
    xmlreader.IncrementalParser.parse(self, source)
except:
    return 0  # FIXME: save error info here?

The bare except: catches everything — KeyboardInterrupt, SystemExit, MemoryError — and returns 0 to expat, which then raises a generic "error in processing external entity reference". The inline FIXME notes the error info is lost, but the broader issue is that BaseException subclasses like KeyboardInterrupt should never be caught here at all.

There's also a secondary issue: the _entity_stack cleanup (lines 430–431) only runs on success, so the parser's internal state is corrupted after any error during entity parsing.

Reproduction

import xml.sax
from xml.sax.handler import feature_external_ges
from xml.sax import ContentHandler
from xml.sax.xmlreader import InputSource
from io import BytesIO

class KBHandler(ContentHandler):
    def startElement(self, name, attrs):
        if name == 'entity':
            raise KeyboardInterrupt('simulated Ctrl+C')

class Resolver:
    def resolveEntity(self, pubId, sysId):
        src = InputSource()
        src.setByteStream(BytesIO(b'<entity/>'))
        return src

parser = xml.sax.make_parser()
parser.setFeature(feature_external_ges, True)
parser.setEntityResolver(Resolver())
parser.setContentHandler(KBHandler())

try:
    parser.feed('<!DOCTYPE d [<!ENTITY e SYSTEM "x">]><d>&e;</d>')
    parser.close()
except KeyboardInterrupt:
    print('GOOD: KeyboardInterrupt propagated')
except xml.sax.SAXParseException as e:
    print(f'BUG: KeyboardInterrupt became SAXParseException: {e}')

Output: BUG: KeyboardInterrupt became SAXParseException: <unknown>:1:6: error in processing external entity reference

Suggested fix

Change except: to except Exception: and move the _entity_stack cleanup into a finally block. I checked pyexpat.c — the C layer handles Python exception propagation correctly through call_with_frame() / XML_StopParser() / get_parse_result(), so letting KeyboardInterrupt pass through is safe.

The other bare except: in the same file (line 104 in parse()) correctly re-raises after cleanup, so this is not a deliberate pattern.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibStandard Library Python modules in the Lib/ directorytopic-XMLtype-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions