Feature/upgrade to 0.4.0 by costezki · Pull Request #7 · OP-TED/entity-resolution-engine-basic

costezki · 2026-04-02T20:21:23Z

No description provided.

…to avoid shadowing Co-authored-by: gkostkowski <12532923+gkostkowski@users.noreply.github.com>

Fix variable shadowing in pytest_configure hook

ci(sonarcloud): make SonarCloud scan conditional on SONAR_TOKEN

…ation-hardening ops: containerization hardening

…ytest config Merge of PR#10 (feature/ERE1-123): fixes variable shadowing in pytest_configure. Conflict resolved by preserving HEAD's TEST_RESOURCES_DIR path and encoding="utf-8".

Merge of PR#21 (feature/ERS1-124/code-formatting). Conflicts resolved: kept HEAD config paths (config/ not infra/config/) which were updated by the infra layout refactor after PR#21 branched.

Addresses the SonarQube quality gate failure from PR#21 (69% coverage on new code). New test modules: - test/unit/adapters/test_utils.py: message parsing (get_request/response_from_message) - test/unit/adapters/test_adapter_factories.py: build_rdf_mapper factory - test/unit/entrypoints/test_queue_worker.py: RedisQueueWorker with mocked Redis - test/unit/utils/test_logging.py: configure_logging and TRACE level - test/unit/services/test_services_factories.py: build_entity_resolver (in-memory + persistent DuckDB) - test/unit/test_models.py: MentionLink/ResolutionResult edge cases + app.main() failure paths Extended test files: - stubs.py: add StubRDFMapper and find_by_id to InMemoryMentionRepository - test_entity_resolution_service.py: EntityResolutionService process_request paths - test_duckdb_adapters.py: load_all and save_all([]) coverage Coverage: 61% → 85% (unit + BDD combined)

Renames LOG_LEVEL → ERE_LOG_LEVEL to match the ERSys unified .env.example, allowing integration tests to run against the shared ERSys infrastructure without any compose-level variable mapping. Changes: - src/ere/utils/logging.py: read ERE_LOG_LEVEL instead of LOG_LEVEL - src/ere/entrypoints/app.py: update env var name in docstring - demo/demo.py: read ERE_LOG_LEVEL instead of LOG_LEVEL - test/unit/utils/test_logging.py: update env var references - infra/.env.example: new file, ERE-relevant subset of ERSys .env.example - infra/compose.dev.yaml: remove LOG_LEVEL mapping (no longer needed)

AGENTS.md (and its CLAUDE.md mirror) now contains ERE-specific guidance: commits/PR rules, dev workflow, make targets reference, architecture rules, memory conventions, and gotchas. Replaces GitNexus-only boilerplate. Also aligns dev tooling: - Makefile: test-integration depends on check-env; test target sources .env - infra/.env.example: REDIS_HOST defaults to localhost for standalone dev - test/e2e/test_app.py: replace walrus operator with explicit env default

Copilot

Pull request overview

This PR upgrades the project to v0.4.0 and aligns configuration, infrastructure, and tests with the new layout and runtime expectations (notably moving configs to config/ and standardizing ERE_LOG_LEVEL).

Changes:

Move configuration from infra/config/ to top-level config/ and update factories/docs accordingly.
Replace Docker Compose setup (infra/docker-compose.yml + .env.local) with infra/compose.dev.yaml + infra/.env.example and expand Makefile infra targets.
Add/expand unit tests around logging, factories, queue worker/message parsing, and service/model edge-cases; update existing tests for new behavior and formatting.

Reviewed changes

Copilot reviewed 56 out of 64 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
test/unit/utils/test_logging.py	New tests for `configure_logging()` and TRACE support.
test/unit/utils/init.py	Package marker for utils unit tests.
test/unit/test_models.py	New unit tests for model edge-cases and `app.main()` failure paths.
test/unit/services/test_services_factories.py	New tests for `services.factories` builder functions.
test/unit/services/test_entity_resolution_service.py	Extends resolver/service tests and adds request-processing coverage.
test/unit/entrypoints/test_queue_worker.py	New unit tests for Redis queue worker behavior and error handling.
test/unit/entrypoints/init.py	Package marker for entrypoints unit tests.
test/unit/adapters/test_utils.py	New tests for adapter message parsing utilities.
test/unit/adapters/test_duckdb_adapters.py	Extends DuckDB adapter coverage (incl. load_all/save_all edge cases).
test/unit/adapters/test_adapter_factories.py	New tests for RDF mapper factory.
test/unit/adapters/stubs.py	Adds stub RDF mapper and expands repository stubs for tests.
test/stress/stress_test.py	Updates stress runner default config path and formatting.
test/stress/README.md	Updates stress docs to new config path.
test/integration/test_redis_integration.py	Formatting improvements in integration test assertions/decoding.
test/integration/test_entity_resolver.py	Formatting improvements and readability tweaks.
test/features/steps/test_entity_resolution_algorithm_steps.py	Step formatting changes for readability.
test/features/steps/test_direct_service_resolution_steps.py	Step formatting changes; clarifies multi-line parsers/steps.
test/e2e/test_ere.py	Minor assertion formatting change.
test/e2e/test_app.py	Adjusts env setup for app e2e run (notably REDIS_PASSWORD).
test/conftest.py	Renames local logging config var; minor formatting.
src/ere/utils/logging.py	Switches env var to `ERE_LOG_LEVEL` for log-level configuration.
src/ere/services/resolver_config.py	Formatting for DuckDBConfig path field.
src/ere/services/factories.py	Default resolver config path updated to `config/resolver.yaml`.
src/ere/services/entity_resolution_service.py	Formatting and minor structural readability improvements.
src/ere/models/resolver/mention.py	Formatting for `_from_flat_dict` guard.
src/ere/models/exceptions.py	Formatting for ConflictError init signature.
src/ere/entrypoints/queue_worker.py	Formatting for Redis calls and error response builder signature.
src/ere/entrypoints/app.py	Updates docs/env var to `ERE_LOG_LEVEL`; minor formatting.
src/ere/adapters/utils.py	Formatting for supported request/response class registries.
src/ere/adapters/splink_linker_impl.py	Formatting/quoting consistency and readability changes.
src/ere/adapters/repositories.py	Formatting for resolver model imports.
src/ere/adapters/rdf_mapper.py	Simplifies ValueError formatting.
src/ere/adapters/rdf_mapper_impl.py	Default mapping path moved to `config/`; minor formatting.
src/ere/adapters/duckdb_repositories.py	Formatting for resolver model imports.
README.md	Updates paths/docs for new config/infra layout and Make targets.
pyproject.toml	Bumps version to 0.4.0, renames project, updates ers-spec dependency.
Makefile	Adds compose/env vars wiring and expands infra targets.
infra/README.md	New infra documentation for compose.dev workflow and env setup.
infra/Dockerfile	Switches to multi-stage build (builder/runtime) and copies `config/`.
infra/docker-compose.yml	Removed (replaced by `infra/compose.dev.yaml`).
infra/compose.dev.yaml	New dev compose stack (redis/redisinsight/ere) with compose watch.
infra/.env.local	Removed (replaced by `infra/.env.example` + `infra/.env`).
infra/.env.example	New env template for local dev.
docs/algorithm.md	Updates configuration link to `config/README.md`.
demo/README.md	Updates demo instructions for new infra workflow (`make infra-*`).
demo/demo.py	Updates env/logging variable usage and infra instructions.
config/resolver.yaml	New default resolver config under top-level `config/`.
config/resolver_multirule.yaml	Formatting normalization for YAML file.
config/resolver_compound.yaml	Formatting normalization for YAML file.
config/README.md	New configuration documentation (resolver + rdf mapping).
config/rdf_mapping.yaml	Formatting normalization for YAML file.
CLAUDE.md	Adds agent operating instructions / repo guidance.
CHANGELOG.md	Updates infra compose filename reference.
AGENTS.md	Adds agent operating instructions (duplicated guidance for agents).
.gitignore	Ignores `infra/.env` and adds `.idea`.
.github/workflows/code-quality.yaml	Adjusts SonarCloud step expectations and env cleanup.
.gitattributes	Adds LF line-ending enforcement for some file types.
.dockerignore	Adds docker build context exclusions.
.claude/skills/gitnexus/gitnexus-refactoring/SKILL.md	New GitNexus guidance docs.
.claude/skills/gitnexus/gitnexus-impact-analysis/SKILL.md	New GitNexus guidance docs.
.claude/skills/gitnexus/gitnexus-guide/SKILL.md	New GitNexus guidance docs.
.claude/skills/gitnexus/gitnexus-exploring/SKILL.md	New GitNexus guidance docs.
.claude/skills/gitnexus/gitnexus-debugging/SKILL.md	New GitNexus guidance docs.
.claude/skills/gitnexus/gitnexus-cli/SKILL.md	New GitNexus guidance docs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-02T20:25:54Z

+import logging
+from unittest.mock import call, patch
+
+import pytest


call and pytest are imported but never used in this test module. This will trigger lint failures (e.g., pylint unused-import) and should be removed (or used) to keep CI green.

Copilot · 2026-04-02T20:25:54Z

    monkeypatch.setenv("REDIS_HOST", os.environ.get("REDIS_HOST", "localhost"))
    monkeypatch.setenv("REDIS_PORT", os.environ.get("REDIS_PORT", "6379"))
    monkeypatch.setenv("REDIS_DB", os.environ.get("REDIS_DB", "0"))
-    if redis_password := os.environ.get("REDIS_PASSWORD"):
-        monkeypatch.setenv("REDIS_PASSWORD", redis_password)
+    monkeypatch.setenv("REDIS_PASSWORD", os.environ.get("REDIS_PASSWORD", "changeme"))
    monkeypatch.setenv("REQUEST_QUEUE", req_queue)


Setting REDIS_PASSWORD to a default value ("changeme") will break against Redis instances that do not require authentication (e.g., the GitHub Actions redis service in this repo). Prefer only setting REDIS_PASSWORD when it is present in the environment, or set it to an empty string/omit it so unauthenticated Redis works.

Copilot · 2026-04-02T20:25:55Z

+DUCKDB_PATH=/data/app.duckdb
+
+# --- Logging ---
+# ERSys uses ERE_LOG_LEVEL; compose.dev.yaml maps it to LOG_LEVEL internally.


This comment says compose.dev.yaml maps ERE_LOG_LEVEL to LOG_LEVEL, but the compose file passes env vars through as-is and the app reads ERE_LOG_LEVEL directly. Please update/remove this note to avoid misleading users.

Suggested change

# ERSys uses ERE_LOG_LEVEL; compose.dev.yaml maps it to LOG_LEVEL internally.

# ERSys uses ERE_LOG_LEVEL for logging configuration.

Copilot · 2026-04-02T20:25:55Z

 test: ## Run all tests
 	@ echo -e "$(BUILD_PRINT)$(ICON_PROGRESS) Running all tests$(END_BUILD_PRINT)"
-	@ poetry run pytest $(TEST_PATH)
+	@ set -a && . $(ENV_FILE) && set +a && poetry run pytest $(TEST_PATH)
 	@ echo -e "$(BUILD_PRINT)$(ICON_DONE) All tests passed$(END_BUILD_PRINT)"


The test / test-integration / test-coverage targets source infra/.env unconditionally. This will make make test fail on a fresh clone unless the user has copied the env file, even though unit tests don’t require infrastructure. Consider either (a) making test depend on check-env, or (b) sourcing infra/.env only when it exists (and keeping unit tests runnable without extra setup).

Copilot · 2026-04-02T20:25:55Z

@@ -0,0 +1,4 @@
+# Enforce Unix line endings
+*.sh text eol=lf
+Dockerfile text eol=lf


.gitattributes enforces LF for Dockerfile, but your Dockerfile lives at infra/Dockerfile and won’t match this pattern. If the intent is to enforce LF for Dockerfiles in subdirectories too, update the pattern (e.g., *Dockerfile / **/Dockerfile depending on git attribute matching needs).

Suggested change

Dockerfile text eol=lf

*Dockerfile text eol=lf

Copilot · 2026-04-02T20:25:56Z


 # TODO: should we have a registry?
-ers-spec = { git = "https://github.com/OP-TED/entity-resolution-spec.git", branch = "0.3.0-rc.1" }
+ers-spec = { git = "https://github.com/OP-TED/entity-resolution-spec.git", branch = "develop" }


Using a moving git branch (branch = "develop") for ers-spec makes builds non-reproducible and can introduce breaking changes unexpectedly. Prefer pinning to a released tag/version or a specific commit SHA (and bump intentionally when needed).

Suggested change

ers-spec = { git = "https://github.com/OP-TED/entity-resolution-spec.git", branch = "develop" }

ers-spec = { git = "https://github.com/OP-TED/entity-resolution-spec.git", rev = "<pin-to-commit-sha>" }

Copilot AI and others added 19 commits February 26, 2026 08:07

Initial plan

386fa71

refactor(test): rename config var to logging_cfg in pytest_configure …

9b1748e

…to avoid shadowing Co-authored-by: gkostkowski <12532923+gkostkowski@users.noreply.github.com>

Merge pull request #8 from meaningfy-ws/copilot/sub-pr-4-again

2600188

Fix variable shadowing in pytest_configure hook

chore: reformat code with Ruff

cd3e1d1

Merge pull request #22 from meaningfy-ws/ci/optional-sonarcloud

3af5fcf

ci(sonarcloud): make SonarCloud scan conditional on SONAR_TOKEN

refactor(docker): multi-stage build, .dockerignore, non-root user

e2f5af3

refactor(infra): modernize compose, env, Makefile and config layout

1cf319c

fix(docs): update references for new infra layout

ee8e8cb

chore: update line endings in the demo script

9332da6

Merge pull request #23 from meaningfy-ws/feature/ERS1-148/containeriz…

850137f

…ation-hardening ops: containerization hardening

using develop branch of ers-spec for now

def5ab0

refactor(test): rename config var to logging_cfg to avoid shadowing p…

aeacbec

…ytest config Merge of PR#10 (feature/ERE1-123): fixes variable shadowing in pytest_configure. Conflict resolved by preserving HEAD's TEST_RESOURCES_DIR path and encoding="utf-8".

chore: reformat code with Ruff

d7a6630

Merge of PR#21 (feature/ERS1-124/code-formatting). Conflicts resolved: kept HEAD config paths (config/ not infra/config/) which were updated by the infra layout refactor after PR#21 branched.

updated project setup

c63a431

updated project setup

0daca1d

updated project setup

e5b1440

costezki requested a review from Copilot April 2, 2026 20:21

costezki self-assigned this Apr 2, 2026

costezki closed this Apr 2, 2026

Copilot started reviewing on behalf of costezki April 2, 2026 20:22 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/upgrade to 0.4.0#7

Feature/upgrade to 0.4.0#7
costezki wants to merge 19 commits intoOP-TED:developfrom
meaningfy-ws:feature/upgrade-to-0.3.0

costezki commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Copilot AI Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

	# ERSys uses ERE_LOG_LEVEL; compose.dev.yaml maps it to LOG_LEVEL internally.
	# ERSys uses ERE_LOG_LEVEL for logging configuration.

	ers-spec = { git = "https://github.com/OP-TED/entity-resolution-spec.git", branch = "develop" }
	ers-spec = { git = "https://github.com/OP-TED/entity-resolution-spec.git", rev = "<pin-to-commit-sha>" }

Conversation

costezki commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants