Add time filtering to SDK + extra model fields#278
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds end-to-end temporal filtering for DigitalRF (rf@*.h5) file listings/downloads, and expands dataset/capture models and APIs to expose richer relationship/ownership metadata (including a new dataset detail endpoint).
Changes:
- SDK: Add
start_time/end_timesupport tolist_filesanddownload, plus pagination propagation and warning forwarding. - Gateway: Add temporal query params to file listing with a
warningsarray included in paginated responses; add dataset detailretrievereturning captures + artifact files. - Models/serializers/tests: Introduce shared capture enums, expand dataset/capture fields, and add unit/integration coverage for temporal filtering and composite capture serialization.
Reviewed changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| sdk/tests/ops/test_paginator.py | Adds unit tests for preserving temporal kwargs across pages and logging API warnings. |
| sdk/tests/ops/test_files.py | Adds unit tests for temporal param validation and datetime-to-ISO query formatting. |
| sdk/tests/integration/test_file_ops.py | Adds integration tests for temporal listing/download behavior and warning logging. |
| sdk/src/spectrumx/ops/pagination.py | Forwards first-page warnings from API via log_user_warning. |
| sdk/src/spectrumx/models/datasets.py | Adds captures/files to Dataset and introduces nested dataset-side models. |
| sdk/src/spectrumx/models/captures.py | Moves enums to shared module; adds additional indexed time display fields and sharing flags. |
| sdk/src/spectrumx/models/capture_enums.py | New shared CaptureType / CaptureOrigin enums module. |
| sdk/src/spectrumx/gateway.py | Adds start_time/end_time query params to list_files; adds get_dataset. |
| sdk/src/spectrumx/client.py | Adds temporal params to download/list_files; adds dataset convenience methods. |
| sdk/src/spectrumx/api/sds_files.py | Validates paired temporal params and formats datetimes as UTC ISO for gateway. |
| sdk/src/spectrumx/api/datasets.py | Adds dataset detail fetch + helpers to list captures/artifact files from that payload. |
| sdk/pyproject.toml | Updates Ruff per-file ignores for monorepo path layouts. |
| sdk/docs/mkdocs/changelog.md | Documents new temporal filtering and model expansions. |
| gateway/sds_gateway/api_methods/views/file_endpoints.py | Adds temporal query params, RF filtering, and always returns warnings in paginated responses. |
| gateway/sds_gateway/api_methods/views/dataset_endpoints.py | Adds dataset retrieve endpoint returning dataset metadata with captures + artifact files. |
| gateway/sds_gateway/api_methods/utils/swagger_example_schema.py | Adds warnings key to example paginated file list response. |
| gateway/sds_gateway/api_methods/tests/test_file_endpoints.py | Adds tests for warnings key presence and temporal filtering behavior. |
| gateway/sds_gateway/api_methods/tests/test_dataset_endpoints.py | Adds test for dataset detail retrieval containing captures/files. |
| gateway/sds_gateway/api_methods/tests/test_composite_capture_serialization.py | Adds serializer-level tests for multi-channel composite capture output. |
| gateway/sds_gateway/api_methods/tests/test_celery_tasks.py | Updates temporal filtering task docstring references. |
| gateway/sds_gateway/api_methods/serializers/file_serializers.py | Adds a nested artifact file summary serializer for dataset payloads. |
| gateway/sds_gateway/api_methods/serializers/dataset_serializers.py | Extends dataset serializer to embed captures/artifact files and break serializer cycles. |
| gateway/sds_gateway/api_methods/serializers/capture_serializers.py | Adds new derived time fields and enriches composite channel rows with OpenSearch bounds. |
| gateway/sds_gateway/api_methods/helpers/temporal_filtering.py | Refactors temporal filtering to share filter_files_by_temporal_bounds. |
| gateway/pyproject.toml | Adds Ruff ignore for composite serialization test magic numbers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _datetime_string_to_milliseconds(self, datetime_string: str) -> int: | ||
| """Converts a datetime string to milliseconds since start of capture.""" | ||
| parsed = datetime.fromisoformat(datetime_string) | ||
| return int(parsed.timestamp() * 1000) | ||
|
|
There was a problem hiding this comment.
_datetime_string_to_milliseconds uses datetime.fromisoformat() without validation/handling. If a client passes an invalid datetime (or a naive datetime), this will raise ValueError and bubble up as a 500; and naive datetimes will be interpreted in the server's local timezone when calling .timestamp(), which is likely not what you want for temporal filtering. Consider catching ValueError and returning a 400 with a clear message, and either requiring timezone-aware inputs or treating naive inputs as UTC (to match SDK behavior). Also, the docstring says "milliseconds since start of capture" but this function is computing epoch milliseconds.
| ) | ||
| log.warning(msg) | ||
| warnings.append(msg) | ||
| elif start_time or end_time: |
There was a problem hiding this comment.
In the non-RF warning branch you use elif start_time or end_time: but start_time/end_time are converted to integers earlier. A valid bound at the Unix epoch converts to 0, which is falsy, so this can incorrectly skip the warning (and makes the check inconsistent with the RF branch which uses is not None). Prefer checking start_time is not None or end_time is not None after conversion.
| elif start_time or end_time: | |
| elif start_time is not None or end_time is not None: |
| def _enriched_channels(self, obj: dict[str, Any]) -> list[dict[str, Any]]: | ||
| """Per-channel rows with OpenSearch bounds (each channel may differ).""" | ||
| key = str(obj.get("uuid", "")) | ||
| if not hasattr(self, "_enriched_channels_cache"): | ||
| self._enriched_channels_cache: dict[str, list[dict[str, Any]]] = {} | ||
| if key not in self._enriched_channels_cache: | ||
| out: list[dict[str, Any]] = [] | ||
| for ch in obj.get("channels") or []: | ||
| entry: dict[str, Any] = { | ||
| "channel": ch["channel"], | ||
| "uuid": ch["uuid"], | ||
| "channel_metadata": ch.get("channel_metadata", {}), | ||
| } | ||
| try: | ||
| capture = Capture.objects.get(uuid=ch["uuid"]) | ||
| except Capture.DoesNotExist: | ||
| entry["capture_start_epoch_sec"] = None | ||
| entry["capture_end_epoch_sec"] = None | ||
| entry["capture_start_iso_utc"] = None | ||
| entry["capture_end_iso_utc"] = None | ||
| entry["capture_start_display"] = None | ||
| entry["capture_end_display"] = None | ||
| entry["length_of_capture_ms"] = None | ||
| entry["file_cadence_ms"] = None | ||
| else: | ||
| # Per-channel bounds/cadence (Capture.get_opensearch_metadata). | ||
| start_sec = capture.start_time | ||
| end_sec = capture.end_time | ||
| entry["capture_start_epoch_sec"] = start_sec | ||
| entry["capture_end_epoch_sec"] = end_sec | ||
| entry["capture_start_iso_utc"] = ( | ||
| _epoch_sec_to_iso_utc_z(start_sec) | ||
| if start_sec is not None | ||
| else None | ||
| ) | ||
| entry["capture_end_iso_utc"] = ( | ||
| _epoch_sec_to_iso_utc_z(end_sec) | ||
| if end_sec is not None | ||
| else None | ||
| ) | ||
| entry["capture_start_display"] = ( | ||
| _epoch_sec_to_local_display(start_sec) | ||
| if start_sec is not None | ||
| else None | ||
| ) | ||
| entry["capture_end_display"] = ( | ||
| _epoch_sec_to_local_display(end_sec) | ||
| if end_sec is not None | ||
| else None | ||
| ) | ||
| if start_sec is None or end_sec is None: | ||
| entry["length_of_capture_ms"] = None | ||
| else: | ||
| entry["length_of_capture_ms"] = (end_sec - start_sec) * 1000 | ||
| entry["file_cadence_ms"] = capture.file_cadence | ||
| out.append(entry) |
There was a problem hiding this comment.
CompositeCaptureSerializer._enriched_channels() introduces per-channel Capture.objects.get(...) lookups and then reads capture.start_time / end_time / file_cadence, each of which calls get_opensearch_metadata(). This creates an N+1 pattern (DB + OpenSearch) for composite serialization and can become very expensive for multi-channel captures and list endpoints. Consider passing the Capture instances (or their already-fetched OpenSearch metadata) into the composite payload from build_composite_capture_data, or bulk-fetching captures with filter(uuid__in=...) and caching get_opensearch_metadata() results per UUID so each channel is resolved once.
No description provided.