Skip to content

Fix out of data feature gate schedule #879

Merged
Woody4618 merged 9 commits intosolana-foundation:masterfrom
Woody4618:fix-feature-gate-schedule
Mar 16, 2026
Merged

Fix out of data feature gate schedule #879
Woody4618 merged 9 commits intosolana-foundation:masterfrom
Woody4618:fix-feature-gate-schedule

Conversation

@Woody4618
Copy link
Collaborator

@Woody4618 Woody4618 commented Mar 16, 2026

Description

Feature gate schedule was out of date 6 months.
In the action i triggered i saw ratelimits. So i added some delays and ran it locally.
Also in the PR docs it said the github token parameter should be token not branch-token so i changed that.
These should run every night.
Lets monitor if it succeeds tomorrow. But we will have the up to data data right now

@vercel
Copy link

vercel bot commented Mar 16, 2026

@Woody4618 is attempting to deploy a commit to the Solana Foundation Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 16, 2026

Greptile Summary

This PR refreshes the stale featureGates.json data (6 months out of date) and overhauls the two Python scripts that populate it — adding rate-limit retry logic, fixing AsyncClient connection leaks, and improving resilience when on-chain RPC calls fail.

Key changes:

  • featureGates.json: Six feature gate testnet/devnet activation epochs incremented by 1 to reflect current on-chain state.
  • parse_feature_gates.py: Refactored from one-connection-per-feature to a single shared AsyncClient per cluster; added exponential-backoff retry on HTTP 429; correctly falls back to the existing stored epoch (not the wiki value) on RPC failure; added epoch_schedule is None guard; cluster_name extraction made reliable via substring check.
  • fetch_mainnet_activations.py: Same retry/backoff pattern added; connection leak fixed via async with AsyncClient(...).
  • .github/workflows/update-feature-gates.yml: Cosmetic quote style normalisation only; no functional changes.

One notable gap: the epoch_schedule is None guard added to parse_feature_gates.py was not applied to the equivalent line in fetch_mainnet_activations.py, leaving that script vulnerable to an AttributeError crash on a transient mainnet RPC error.

Confidence Score: 3/5

  • Safe to merge with minor fix needed in fetch_mainnet_activations.py before the next scheduled run.
  • The data refresh and the parse_feature_gates.py improvements are solid. However, fetch_mainnet_activations.py is missing the epoch_schedule None guard that was correctly added to parse_feature_gates.py — a transient mainnet RPC hiccup could cause the script to crash with an unhandled AttributeError and silently skip writing mainnet activation epochs. The fragile '429' string-matching for rate-limit detection (noted in prior review threads) also remains in both scripts.
  • scripts/fetch_mainnet_activations.py — missing epoch_schedule None guard on line 46.

Important Files Changed

Filename Overview
scripts/fetch_mainnet_activations.py Added constants, retry logic with exponential backoff, and fixed connection leak via async context manager. Missing epoch_schedule is None guard (same fix applied in parse_feature_gates.py was not carried over).
scripts/parse_feature_gates.py Major refactor: per-cluster connection reuse, retry/backoff logic, epoch_schedule None guard added, and backup epoch correctly falls back to existing stored value. Minor: new_feature is unpacked but never used in fetch_cluster_activations loop.
.github/workflows/update-feature-gates.yml Cosmetic-only changes: single quotes replaced with double quotes throughout. No functional behaviour changed; branch-token parameter remains in place.
app/utils/feature-gate/featureGates.json Six feature gate testnet/devnet activation epochs incremented by 1 each, refreshing data that was approximately 6 months stale. No structural changes.

Sequence Diagram

sequenceDiagram
    participant WF as GitHub Actions Workflow
    participant PFG as parse_feature_gates.py
    participant FMA as fetch_mainnet_activations.py
    participant Wiki as Agave Wiki (GitHub)
    participant DEV as Devnet RPC
    participant TEST as Testnet RPC
    participant MAIN as Mainnet RPC
    participant JSON as featureGates.json

    WF->>PFG: python scripts/parse_feature_gates.py
    PFG->>Wiki: GET Feature-Gate-Tracker-Schedule.md
    Wiki-->>PFG: Markdown tables (tables 1, 2, 3)
    PFG->>JSON: Load existing features
    JSON-->>PFG: existing_features[]

    PFG->>DEV: AsyncClient (single shared connection)
    DEV-->>PFG: epoch_schedule
    loop Each feature (with retry + backoff)
        PFG->>DEV: get_account_info(key)
        DEV-->>PFG: account data
    end

    PFG->>TEST: AsyncClient (single shared connection)
    TEST-->>PFG: epoch_schedule
    loop Each feature (with retry + backoff)
        PFG->>TEST: get_account_info(key)
        TEST-->>PFG: account data
    end

    PFG->>JSON: Write updated features

    WF->>FMA: python scripts/fetch_mainnet_activations.py
    FMA->>JSON: Load features (only devnet+testnet activated, mainnet pending)
    FMA->>MAIN: AsyncClient (single shared connection)
    MAIN-->>FMA: epoch_schedule
    loop Each candidate feature (with retry + backoff)
        FMA->>MAIN: get_account_info(key)
        MAIN-->>FMA: account data
    end
    FMA->>JSON: Write mainnet_activation_epoch

    WF->>WF: peter-evans/create-pull-request
Loading

Last reviewed commit: ebc540e


connection = AsyncClient(cluster_url)
epoch_schedule = (await connection.get_epoch_schedule()).value
cluster_name = cluster_url.split('.')[-2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cluster name extraction is fragile

cluster_url.split('.')[-2] extracts "solana" from "https://api.devnet.solana.com", not "devnet" or "testnet" as likely intended. This means the log message on line 217 will print [solana] Checked ... for both clusters, making it hard to distinguish which cluster is being queried.

Consider using a more reliable extraction:

Suggested change
cluster_name = cluster_url.split('.')[-2]
cluster_name = [part for part in cluster_url.split('.') if part in ('devnet', 'testnet', 'mainnet-beta')][0] if any(x in cluster_url for x in ('devnet', 'testnet', 'mainnet-beta')) else cluster_url

Or more simply:

cluster_name = "devnet" if "devnet" in cluster_url else "testnet" if "testnet" in cluster_url else cluster_url

uses: peter-evans/create-pull-request@v7
with:
branch-token: ${{ secrets.GITHUB_TOKEN }}
token: ${{ secrets.GITHUB_TOKEN }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

branch-token was correct for v7

According to the peter-evans/create-pull-request v7 release notes, git-token was renamed to branch-token in v7. The token input controls which token is used for API calls (creating/updating the PR), while branch-token controls git push operations on the branch. When branch-token is omitted, it defaults to token, so this change won't break anything — but the original branch-token was the correct and more explicit v7 parameter name, not a deprecated one.

If the intent was to fix "the deprecated PR token", the original branch-token was already correct for v7. Consider reverting this change if you want explicit control over the git operations token.

@Woody4618
Copy link
Collaborator Author

@greptile-apps please update your review

Comment on lines +204 to +205
connection = AsyncClient(cluster_url)
epoch_schedule = (await connection.get_epoch_schedule()).value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AsyncClient connection never closed

The AsyncClient (backed by an aiohttp session) is created but never explicitly closed. If get_epoch_schedule() raises an exception, the connection leaks. Python will log ResourceWarning: Unclosed client session and, on repeated CI runs, the accumulated open handles can cause failures.

Use an async context manager or an explicit finally block:

connection = AsyncClient(cluster_url)
try:
    epoch_schedule = (await connection.get_epoch_schedule()).value
    ...
finally:
    await connection.close()

account = await connection.get_account_info(Pubkey.from_string(key))
break
except Exception as e:
if '429' in str(e) and attempt < MAX_RETRIES - 1:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate-limit detection relies on string matching

'429' in str(e) is fragile — it will also match any exception whose message happens to contain the substring "429" (e.g. a public key that happens to include those digits, or a different HTTP error like 4290). The same pattern is repeated in fetch_mainnet_activations.py at line 58.

Consider checking the actual exception type or HTTP status code. If the Solana RPC client wraps HTTP errors in a typed exception, match on that instead. As a minimal improvement, anchoring the string is safer:

if ('429' in str(e) or 'Too Many Requests' in str(e)) and attempt < MAX_RETRIES - 1:

At minimum, document the assumption about the exception string format with a comment so future readers know why this string check is used.

print(f"Failed to fetch {key}: {e}")
return backup_epoch

if account.value and account.value.data:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it makes sense to add a guard if account is None here as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to move the RPC address to the const to make it easier to change it for #844



async def fetch_cluster_activations(cluster_url: str, features_to_check: list[tuple[StoredFeature, Feature]]) -> None:
"""Fetch activation epochs for all features from a single cluster, reusing one connection."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert in Python, but would the connection be closed after the fn finishes its work? I mean we create a connection with AsyncClient to the endpoint, and at the fn's end we have to close it, right?

@Woody4618
Copy link
Collaborator Author

@greptile-apps please update your review

Comment on lines +215 to +223
for existing, new_feature in features_to_check:
if 'devnet' in cluster_url:
existing.devnet_activation_epoch = await fetch_activation_epoch(
connection, epoch_schedule, existing.key, new_feature.devnet_activation_epoch
)
elif 'testnet' in cluster_url:
existing.testnet_activation_epoch = await fetch_activation_epoch(
connection, epoch_schedule, existing.key, new_feature.testnet_activation_epoch
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On-chain value overwritten by wiki fallback on failure

When fetch_activation_epoch fails (all retries exhausted), it returns backup_epoch which is new_feature.devnet_activation_epoch / new_feature.testnet_activation_epoch — the value scraped from the Agave wiki markdown. However, existing.devnet_activation_epoch / existing.testnet_activation_epoch already holds the previously-stored, on-chain–verified epoch from a prior successful run. Overwriting it with the wiki value on failure means a transient RPC outage can silently regress accurate on-chain data to the less-authoritative wiki value.

The backup should be the existing stored value, not the wiki value:

if 'devnet' in cluster_url:
    existing.devnet_activation_epoch = await fetch_activation_epoch(
        connection, epoch_schedule, existing.key, existing.devnet_activation_epoch
    )
elif 'testnet' in cluster_url:
    existing.testnet_activation_epoch = await fetch_activation_epoch(
        connection, epoch_schedule, existing.key, existing.testnet_activation_epoch
    )

Note that fetch_mainnet_activations.py correctly handles this with if account is None: continue, preserving the stored value on failure. The same pattern should be applied here.

cluster_name = "devnet" if "devnet" in cluster_url else "testnet" if "testnet" in cluster_url else cluster_url

async with AsyncClient(cluster_url) as connection:
epoch_schedule = (await connection.get_epoch_schedule()).value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No guard against get_epoch_schedule() returning None

(await connection.get_epoch_schedule()).value may return None if the RPC node returns an error response instead of raising an exception. This would cause an AttributeError later when get_epoch_for_slot accesses epoch_schedule.first_normal_slot, silently skipping all features in that cluster without any useful error message.

Consider adding a guard:

epoch_schedule = (await connection.get_epoch_schedule()).value
if epoch_schedule is None:
    print(f"[{cluster_name}] Failed to fetch epoch schedule, skipping cluster.")
    return

@Woody4618 Woody4618 changed the title Added rate limits and fixed the deprecated PR token Fix out of data feature gate schedule Mar 16, 2026
@vercel
Copy link

vercel bot commented Mar 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
explorer Ready Ready Preview, Comment Mar 16, 2026 4:30pm

Request Review

@Woody4618
Copy link
Collaborator Author

@greptile-apps please update your review summary

else:
print(feature['key'], "initialized, but not activated")
async with AsyncClient(MAINNET_RPC_URL) as connection:
epoch_schedule = (await connection.get_epoch_schedule()).value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing None guard for epoch_schedule

(await connection.get_epoch_schedule()).value can return None if the RPC node returns an error response without raising an exception. When epoch_schedule is None, any subsequent call to get_epoch_for_slot(epoch_schedule, activation_slot) at line 79 will raise AttributeError: 'NoneType' object has no attribute 'first_normal_slot', causing the entire mainnet activation fetch to crash silently.

The equivalent fix was correctly applied in parse_feature_gates.py (lines 213–216) but was not carried over to this file.

Suggested change
epoch_schedule = (await connection.get_epoch_schedule()).value
epoch_schedule = (await connection.get_epoch_schedule()).value
if epoch_schedule is None:
print("Failed to fetch epoch schedule from mainnet, skipping.")
return

@Woody4618 Woody4618 merged commit 21e11e8 into solana-foundation:master Mar 16, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants