Skip to content

Spark SQL: Add SparkSqlDialect#2305

Open
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:main
Open

Spark SQL: Add SparkSqlDialect#2305
andygrove wants to merge 1 commit intoapache:mainfrom
andygrove:main

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 15, 2026

Summary

  • Adds SparkSqlDialect (src/dialect/spark.rs) with support for the key Spark SQL syntax features needed to parse real-world Spark SQL workloads
  • Adds tests/sqlparser_spark.rs with tests based on tests from Comet project

New Dialect trait methods (all default false)

Method Purpose
supports_create_table_using CREATE TABLE t (...) USING parquet
supports_long_type_as_bigint LONG as alias for BIGINT
supports_map_literal_with_angle_brackets MAP<K, V> type syntax

Parser/AST changes

  • New HiveIOFormat::Using { format: Ident } variant — renders as USING <format>, distinct from Hive's STORED AS <format>
  • parse_hive_formats handles USING <format> when dialect opts in
  • STRUCT type parsing now uses supports_struct_literal() trait method instead of dialect_is!(BigQueryDialect | DatabricksDialect | GenericDialect) — cleaner and extensible
  • New MAP<K, V> angle-bracket parsing path in parse_data_type_helper

SparkSqlDialect capabilities

USING <format> in CREATE TABLE, lambda functions, DIV integer division, aggregate FILTER, GROUP BY expressions/modifiers, SELECT * EXCEPT, STRUCT<> and MAP<> types, LONG type alias, nested comments, ! as NOT, CTE without AS, multi-column aliases, IGNORE NULLS in window functions (already supported by the generic parser).

🤖 Generated with Claude Code

@andygrove andygrove force-pushed the main branch 2 times, most recently from bcb12d5 to d23053c Compare April 15, 2026 17:59
…TRUCT types

Adds a new `SparkSqlDialect` with the following features:
- `CREATE TABLE ... USING <format>` via new `HiveIOFormat::Using` AST variant
- `MAP<K, V>` angle-bracket type syntax (`supports_map_literal_with_angle_brackets`)
- `STRUCT<field: type>` type parsing now driven by `supports_struct_literal()` trait method
- `LONG` as an alias for `BIGINT` (`supports_long_type_as_bigint`)
- Lambda functions, `DIV` integer division, aggregate `FILTER`, `SELECT * EXCEPT`,
  struct literals, nested comments, `!` as NOT, CTE without AS, multi-column aliases

Also adds `tests/sqlparser_spark.rs` with 16 tests including integration with
the Apache DataFusion Comet SQL test files (1,152 statements, all passing).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant