Skip to content

Conversation

@jbonilla-tao
Copy link
Collaborator

Taoshi Pull Request

Description

[Provide a brief description of the changes introduced by this pull request.]

Related Issues (JIRA)

[Reference any related issues or tasks that this pull request addresses or closes.]

Checklist

  • I have tested my changes on testnet.
  • I have updated any necessary documentation.
  • I have added unit tests for my changes (if applicable).
  • If there are breaking changes for validators, I have (or will) notify the community in Discord of the release.

Reviewer Instructions

[Provide any specific instructions or areas you would like the reviewer to focus on.]

Definition of Done

  • Code has been reviewed.
  • All checks and tests pass.
  • Documentation is up to date.
  • Approved by at least one reviewer.

Checklist (for the reviewer)

  • Code follows project conventions.
  • Code is well-documented.
  • Changes are necessary and align with the project's goals.
  • No breaking changes introduced.

Optional: Deploy Notes

[Any instructions or notes related to deployment, if applicable.]

/cc @mention_reviewer

@github-actions
Copy link

github-actions bot commented Oct 29, 2025

🤖 Claude AI Code Review

Last reviewed on: 14:23:47

Summary

This PR implements a re-registration prevention system that tracks miners who de-register from the subnet and prevents them from re-registering. The implementation includes:

  • Departed hotkey tracking with persistence
  • Anomaly detection to avoid false positives from network issues
  • Validator-level rejection of re-registered miners
  • Comprehensive test coverage

✅ Strengths

  1. Robust Anomaly Detection - The shared utility function is_anomalous_hotkey_loss() with configurable thresholds (>10 hotkeys AND ≥25%) prevents false positives from network connectivity issues.

  2. Excellent Test Coverage - 15 comprehensive integration tests covering edge cases, persistence, boundaries, and the full lifecycle of de-registration → re-registration → rejection.

  3. Backward Compatibility - Graceful handling of legacy list format and fallback to default file when runtime data is missing.

  4. Efficient Lookups - Uses dict/set operations (O(1)) for hotkey checks rather than list scans.

  5. Clear Separation of Concerns - Anomaly detection logic is properly extracted to a shared utility module.

⚠️ Concerns

Critical Issues

  1. Race Condition in Validator Check (neurons/validator.py:955-968)

    if self.elimination_manager.is_hotkey_re_registered(synapse.dendrite.hotkey):
    • This check happens AFTER the miner has already sent a signal
    • If departed_hotkeys dict is updated between the check and processing, inconsistent behavior could occur
    • Recommendation: Add lock protection around the departed hotkeys access or use atomic operations
  2. Missing Default File Validation (data/default_departed_hotkeys.json)

    • The 1-line JSON file is committed but contains no schema validation
    • Malformed JSON could break the validator on startup
    • Recommendation: Add JSON schema validation or at minimum a sanity check on load
  3. IPC Dictionary Consistency (vali_objects/utils/elimination_manager.py:56)

    self.departed_hotkeys = ipc_manager.dict()
    • Multiprocessing proxy dicts don't guarantee ACID properties
    • Concurrent updates from multiple processes could cause data corruption
    • Recommendation: Use a proper locking mechanism or single-writer pattern

Major Issues

  1. Unbounded Growth (vali_objects/utils/elimination_manager.py:381)

    for hotkey in new_departures:
        self.departed_hotkeys[hotkey] = {"detected_ms": current_time_ms}
    • The departed_hotkeys dict will grow indefinitely
    • No archival or cleanup mechanism for old entries
    • Recommendation: Add periodic cleanup for entries older than X months/years, or implement a size cap
  2. Error Handling Gap (vali_objects/utils/elimination_manager.py:218)

    except Exception as e:
        bt.logging.warning(f"Could not load departed hotkeys from default file: {e}. Starting with empty dict.")
        return {}
    • Broad exception catching masks specific errors (file permissions, corrupted JSON, etc.)
    • Starting with empty dict after failure could allow re-registrations that should be blocked
    • Recommendation: Catch specific exceptions and fail-safe to read-only mode rather than allowing all re-registrations
  3. Inconsistent Timestamp Format (vali_objects/utils/elimination_manager.py:965)

    dereg_date = TimeUtil.millis_to_formatted_date_str(detected_ms) if detected_ms else "unknown"
    • The validator shows human-readable date but stores milliseconds
    • No timezone indication in the error message (assumes UTC but not explicit)
    • Recommendation: Make UTC explicit in error message: f"{dereg_date} UTC"

💡 Suggestions

Architecture & Design

  1. Consider Event Sourcing Pattern (vali_objects/utils/elimination_manager.py:373-407)

    • Currently tracking only current state (departed vs active)
    • Consider storing full history: de-registration time, re-registration time, rejection count
    • Would help with debugging and future policy changes
  2. Add Observability Metrics

    # Track these metrics:
    - Total departed hotkeys count
    - Re-registration attempts per time period
    - Anomaly detection trigger rate
    - False positive rate (if determinable)
  3. Configuration Management

    • Constants ANOMALY_DETECTION_MIN_LOST and ANOMALY_DETECTION_PERCENT_THRESHOLD should be in config file
    • Allows runtime tuning without code changes
    • Consider making them validator startup parameters

Code Quality

  1. Type Hints (shared_objects/metagraph_utils.py:13)

    def is_anomalous_hotkey_loss(lost_hotkeys: set, total_hotkeys_before: int) -> tuple[bool, float]:
    • Use Set[str] instead of set for clarity
    • Consider using typing.TypedDict for the metadata dict structure
  2. Magic Numbers (neurons/validator.py:964)

    dereg_date = TimeUtil.millis_to_formatted_date_str(detected_ms) if detected_ms else "unknown"
    • The check if detected_ms treats 0 as "unknown" but 0 is a valid timestamp (Jan 1, 1970)
    • Recommendation: Use if detected_ms is not None or sentinel value like -1
  3. Duplicate Code (tests/vali_tests/test_reregistration.py:63-66)

    def _setup_polygon_mocks(self, mock_candle_fetcher, mock_get_candles, mock_market_close):
        """Helper to set up Polygon API mocks"""
    • This helper is repeated across multiple test methods
    • Recommendation: Make it a test fixture or base class method

Testing

  1. Missing Test Cases

    • Test what happens when departed_hotkeys.json is corrupted
    • Test behavior when file permissions prevent writing
    • Test concurrent access scenarios (multiple validators)
    • Test migration from old format to new format
  2. Test Data Management (tests/vali_tests/test_reregistration.py:128-137)

    • Consider using factory pattern for test data creation
    • Reduces duplication and makes tests more maintainable

🔒 Security Notes

  1. File System Security

    • departed_hotkeys.json should have restricted permissions (600)
    • No validation that file hasn't been tampered with
    • Recommendation: Add file integrity check (hash/signature) or use encrypted storage
  2. Denial of Service Vector

    • Malicious actor could repeatedly de-register/re-register to fill departed_hotkeys dict
    • No rate limiting on departed hotkey tracking
    • Recommendation: Implement max entries per time window
  3. Information Disclosure (neurons/validator.py:962-968)

    msg = (f"This miner hotkey {synapse.dendrite.hotkey} was previously de-registered..."
           f"De-registered on: {dereg_date} UTC. "
    • Error message reveals internal state (when miner was de-registered)
    • Could be used for reconnaissance
    • Recommendation: Consider generic message: "Registration not permitted" without details
  4. Missing Input Validation

    • No validation that hotkeys in departed_hotkeys.json are valid SS58 addresses
    • Could accept arbitrary strings
    • Recommendation: Add SS58 format validation on load

📝 Additional Notes

  1. Version Bump (meta/meta.json)

    • Appropriately bumped from 7.0.1 → 7.0.2 for a minor feature
    • ✅ Correct semantic versioning
  2. PR Description

    • The PR description template is empty - should be filled out with:
      • Why re-registration prevention is needed
      • Security/economic rationale
      • Migration plan for existing validators
      • Discord community notification plan
  3. Documentation Gap

    • No updates to README or docs explaining the new feature
    • No operator guide for validators on what to expect
    • Recommendation: Add section explaining re-registration policy

Conclusion

This is a solid implementation with excellent test coverage and thoughtful design. The main concerns are around concurrent access safety, unbounded growth, and error handling. The anomaly detection is particularly well-designed and prevents a major class of false positives.

Recommendation: Approve with changes - Address the critical race condition and IPC dictionary consistency issues before merging. The other suggestions can be follow-up improvements.

jbonilla-tao and others added 6 commits November 4, 2025 01:55
Implements automatic tracking of miners that de-register and attempt to
re-register, permanently blocking their orders from being processed.

Changes:
- Track all departed hotkeys with anomaly detection (>10 hotkeys AND >=25%)
- Persist departed hotkeys to JSON for recovery after restarts
- Validator rejects orders from re-registered miners
- Add comprehensive test suite (15 tests, all passing)

Resolves issue where miners could circumvent elimination by re-registering.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Changes departed_hotkeys from list to dict to enable:
- Fast O(1) lookups instead of O(n) with set conversion
- Metadata storage (detected_ms, block number) for debugging
- Future extensibility (can add more metadata fields)

Structure:
{
  "hotkey": {
    "detected_ms": timestamp,
    "block": block_number
  }
}

Includes backwards compatibility for legacy list format.

All 15 tests pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Created shared utility module for detecting anomalous hotkey loss to eliminate
code duplication between elimination_manager.py and metagraph_updater.py.

Changes:
- Added shared_objects/metagraph_utils.py with is_anomalous_hotkey_loss()
- Moved anomaly detection constants to shared module (>10 hotkeys AND >=25%)
- Updated elimination_manager.py to use shared function
- Updated metagraph_updater.py to use shared function
- Updated test imports to reference new module location

All 15 re-registration tests pass successfully.
Simplified departed hotkeys metadata to only store detection timestamp.
Block number tracking was unnecessary for the re-registration detection
feature.

Changes:
- Removed "block" field from departed_hotkeys metadata dict
- Updated backwards compatibility code to not include block
- Updated docstring to reflect single metadata field
- Updated test to only verify detected_ms field

All 15 re-registration tests pass successfully.
…essage

Enhanced the error message shown to re-registered miners to include the specific
date when they were de-registered, making it more informative.

Changes:
- Retrieve detected_ms from departed_hotkeys metadata
- Convert timestamp to UTC date format using TimeUtil.millis_to_formatted_date_str()
- Update error message to include: "De-registered on: YYYY-MM-DD HH:MM:SS UTC"

Example message:
"This miner hotkey 5ABC... was previously de-registered and is not allowed
to re-register. De-registered on: 2025-01-15 14:32:45 UTC. Re-registration
is not permitted on this subnet."

All 15 re-registration tests pass successfully.
@jbonilla-tao jbonilla-tao force-pushed the feat/enforce_no_re_reg branch from d19ea53 to 36213cb Compare November 4, 2025 06:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants