Skip to content

Fix static IP subnet mismatch when adding network to nic_group#2716

Open
vlast3k wants to merge 2 commits intocloudfoundry:mainfrom
vlast3k:fix/static-ip-subnet-match-in-nic-group
Open

Fix static IP subnet mismatch when adding network to nic_group#2716
vlast3k wants to merge 2 commits intocloudfoundry:mainfrom
vlast3k:fix/static-ip-subnet-match-in-nic-group

Conversation

@vlast3k
Copy link
Copy Markdown

@vlast3k vlast3k commented Apr 20, 2026

Summary

  • When adding a new network to existing instances in the same nic_group, the placement planner picked the first unclaimed static IP in the AZ without checking which cloud subnet it belonged to. This caused CPI errors (Networks in nic_group have different subnet_ids) when an AZ has multiple cloud subnets with interleaved static IP ranges.
  • Store cloud_properties from the containing subnet on each StaticIpToAzs entry. When selecting an IP for a network that shares a nic_group with an already-assigned sibling, constrain the lookup to IPs on a subnet with matching cloud_properties.
  • Fail with a clear error when no matching IP is available, instead of silently assigning a mismatched one.

What changed

networks_to_static_ips.rb

  • StaticIpToAzs struct now includes cloud_properties
  • create() records the subnet's cloud_properties for each IP
  • Added find_by_network_az_and_cloud_properties and next_ip_for_network_with_cloud_properties

static_ips_availability_zone_picker.rb

  • create_network_plan_with_az looks up the cloud_properties of an already-assigned sibling in the same nic_group, then constrains IP selection to matching subnets
  • Added nic_group_sibling_cloud_properties helper

Tests

  • networks_to_static_ips_spec.rb: cloud_properties storage and filtered lookup
  • static_ips_availability_zone_picker_spec.rb: multi-subnet nic_group pairing, mismatch error, no-nic_group unchanged behavior

Context

On AWS, all networks in a nic_group must resolve to the same ENI subnet. Landscapes with multiple /24 subnets per AZ and interleaved static IP ranges hit this when enabling IPv6 (adding a second network to the nic_group). BOSH assigned IPv6 IPs on a different subnet than the existing IPv4 IPs because it only filtered by AZ, not by cloud subnet.

Verified against a production cloud config: paired IPv4/IPv6 subnets sharing a nic_group have identical cloud_properties (same AWS subnet ID, same security groups), because AWS subnets are dual-stack and a nic_group maps to a single ENI. Full cloud_properties hash equality is the correct comparison — narrowing to just the subnet key would be fragile and CPI-specific.

Test plan

  • Existing placement planner specs pass (no behavioral change when nic_group is not used)
  • New spec: two networks in nic_group, two subnets per AZ with different cloud_properties — IPs are paired on matching subnets
  • New spec: mismatched static IPs (no IP on sibling subnet) raises clear error
  • New spec: without nic_group, behavior unchanged
  • Integration test on a landscape with multi-subnet AZ + dual-stack nic_group

When adding a new network to existing instances in the same nic_group,
the placement planner picked the first unclaimed static IP in the AZ
without checking which cloud subnet it belonged to. This caused CPI
errors ("Networks in nic_group have different subnet_ids") when an AZ
has multiple cloud subnets with interleaved static IP ranges.

Store cloud_properties from the containing subnet on each StaticIpToAzs
entry. When selecting an IP for a network that shares a nic_group with
an already-assigned sibling, constrain the lookup to IPs on a subnet
with matching cloud_properties. Fail with a clear error when no
matching IP is available.

Fixes: BOSH-1922
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 99ef08c5-99c0-4f7c-9fb2-886e44279d1b

📥 Commits

Reviewing files that changed from the base of the PR and between 2c66bf3 and 4bdaa97.

📒 Files selected for processing (1)
  • src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker.rb

Walkthrough

The PR extends static IP placement to track and match subnet cloud_properties. StaticIpToAzs now includes a cloud_properties field. NetworksToStaticIps.create stores subnet cloud_properties for each IP and adds selection queries: next_ip_for_network_with_cloud_properties and find_by_network_az_and_cloud_properties. StaticIpsAvailabilityZonePicker obtains sibling-subnet cloud_properties for nic-grouped networks and routes IP lookups through the cloud-property-aware queries, raising NetworkReservationError when no matching IP is found. Unit tests were added to verify cloud_properties storage and sibling-subnet matching and failure behavior.

Suggested reviewers

  • rkoster
  • s4heid
  • fmoehler
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: fixing static IP subnet mismatch when adding a network to a nic_group, which aligns with the core problem and solution described throughout the PR.
Description check ✅ Passed The PR description addresses all required sections: explains what changed (summary, code changes, tests), provides context (AWS nic_group requirements, multi-subnet scenario), includes test plan, and covers breaking changes. All critical information is present and well-structured.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker.rb`:
- Around line 288-300: The helper nic_group_sibling_cloud_properties may
accidentally match the current network as its own sibling; add a defensive early
check inside the loop to skip when sibling_network == network so it only
considers other networks, keeping the rest of the logic (matching nic_group and
checking plan.reservation.ip and subnet) unchanged; update the loop in
nic_group_sibling_cloud_properties to continue to next when sibling_network is
the same object as the provided network.
- Around line 118-129: The error message raised when static_ip_to_azs is nil
uses a nic_group-specific message even when sibling_cloud_props is nil; update
the branch inside the desired_instance.az.nil? block to raise a different
message depending on whether sibling_cloud_props is present: if
sibling_cloud_props is truthy, raise Bosh::Director::NetworkReservationError
with the subnet/nic_group message using network.nic_group (matching the existing
nic_group wording), otherwise raise the generic "Failed to distribute static
IPs..." style message used in the else-branch; adjust the code paths around
sibling_cloud_props,
`@networks_to_static_ips.next_ip_for_network_with_cloud_properties` and
`@networks_to_static_ips.next_ip_for_network` so the error raised mirrors the
else-branch behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0e44e9cf-74b2-4434-8c77-eb01b375d94e

📥 Commits

Reviewing files that changed from the base of the PR and between 0dbe149 and 2c66bf3.

📒 Files selected for processing (4)
  • src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/networks_to_static_ips.rb
  • src/bosh-director/lib/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker.rb
  • src/bosh-director/spec/unit/bosh/director/deployment_plan/placement_planner/networks_to_static_ips_spec.rb
  • src/bosh-director/spec/unit/bosh/director/deployment_plan/placement_planner/static_ips_availability_zone_picker_spec.rb

@vlast3k
Copy link
Copy Markdown
Author

vlast3k commented Apr 20, 2026

Validation: cloud_properties hash equality is safe

A potential concern was raised during review: is full cloud_properties == cloud_properties hash equality too strict? Could paired IPv4/IPv6 subnets in the same nic_group have slightly different cloud_properties?

Verified against a production AWS cloud config. When two BOSH networks share a nic_group (e.g., IPv4-ext and IPv6-single-ext), their subnets in each AZ reference the same AWS subnet — because AWS subnets are dual-stack by default and a nic_group maps to a single ENI. The cloud_properties hashes are identical:

# IPv4 network, z1, subnet 1
cloud_properties:
  security_groups: [sg-aaa, sg-bbb, sg-ccc, sg-ddd]
  subnet: subnet-12345

# IPv6 network, z1, subnet 1
cloud_properties:
  security_groups: [sg-aaa, sg-bbb, sg-ccc, sg-ddd]
  subnet: subnet-12345

All 8 subnets across both networks were 1:1 matched — same subnet ID, same security_groups list. The only differences between the two BOSH subnets are in non-cloud_properties fields (range, gateway, dns, static).

Conclusion: Full hash equality is correct. Narrowing the comparison to just the subnet key would be fragile — it assumes the cloud_properties schema, which varies by CPI (AWS vs GCP vs Azure vs vSphere).

- Differentiate error when static IP lookup fails: nic_group-specific
  message when sibling_cloud_props is present, generic message otherwise.
  Mirrors the else-branch error handling.
- Skip self-match in nic_group_sibling_cloud_properties to prevent
  the current network from matching itself as a sibling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Pending Merge | Prioritized

Development

Successfully merging this pull request may close these issues.

1 participant