Fix out of data feature gate schedule #879
Fix out of data feature gate schedule #879Woody4618 merged 9 commits intosolana-foundation:masterfrom
Conversation
|
@Woody4618 is attempting to deploy a commit to the Solana Foundation Team on Vercel. A member of the Team first needs to authorize it. |
Greptile SummaryThis PR refreshes the stale Key changes:
One notable gap: the Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant WF as GitHub Actions Workflow
participant PFG as parse_feature_gates.py
participant FMA as fetch_mainnet_activations.py
participant Wiki as Agave Wiki (GitHub)
participant DEV as Devnet RPC
participant TEST as Testnet RPC
participant MAIN as Mainnet RPC
participant JSON as featureGates.json
WF->>PFG: python scripts/parse_feature_gates.py
PFG->>Wiki: GET Feature-Gate-Tracker-Schedule.md
Wiki-->>PFG: Markdown tables (tables 1, 2, 3)
PFG->>JSON: Load existing features
JSON-->>PFG: existing_features[]
PFG->>DEV: AsyncClient (single shared connection)
DEV-->>PFG: epoch_schedule
loop Each feature (with retry + backoff)
PFG->>DEV: get_account_info(key)
DEV-->>PFG: account data
end
PFG->>TEST: AsyncClient (single shared connection)
TEST-->>PFG: epoch_schedule
loop Each feature (with retry + backoff)
PFG->>TEST: get_account_info(key)
TEST-->>PFG: account data
end
PFG->>JSON: Write updated features
WF->>FMA: python scripts/fetch_mainnet_activations.py
FMA->>JSON: Load features (only devnet+testnet activated, mainnet pending)
FMA->>MAIN: AsyncClient (single shared connection)
MAIN-->>FMA: epoch_schedule
loop Each candidate feature (with retry + backoff)
FMA->>MAIN: get_account_info(key)
MAIN-->>FMA: account data
end
FMA->>JSON: Write mainnet_activation_epoch
WF->>WF: peter-evans/create-pull-request
Last reviewed commit: ebc540e |
scripts/parse_feature_gates.py
Outdated
|
|
||
| connection = AsyncClient(cluster_url) | ||
| epoch_schedule = (await connection.get_epoch_schedule()).value | ||
| cluster_name = cluster_url.split('.')[-2] |
There was a problem hiding this comment.
Cluster name extraction is fragile
cluster_url.split('.')[-2] extracts "solana" from "https://api.devnet.solana.com", not "devnet" or "testnet" as likely intended. This means the log message on line 217 will print [solana] Checked ... for both clusters, making it hard to distinguish which cluster is being queried.
Consider using a more reliable extraction:
| cluster_name = cluster_url.split('.')[-2] | |
| cluster_name = [part for part in cluster_url.split('.') if part in ('devnet', 'testnet', 'mainnet-beta')][0] if any(x in cluster_url for x in ('devnet', 'testnet', 'mainnet-beta')) else cluster_url |
Or more simply:
cluster_name = "devnet" if "devnet" in cluster_url else "testnet" if "testnet" in cluster_url else cluster_url
| uses: peter-evans/create-pull-request@v7 | ||
| with: | ||
| branch-token: ${{ secrets.GITHUB_TOKEN }} | ||
| token: ${{ secrets.GITHUB_TOKEN }} |
There was a problem hiding this comment.
branch-token was correct for v7
According to the peter-evans/create-pull-request v7 release notes, git-token was renamed to branch-token in v7. The token input controls which token is used for API calls (creating/updating the PR), while branch-token controls git push operations on the branch. When branch-token is omitted, it defaults to token, so this change won't break anything — but the original branch-token was the correct and more explicit v7 parameter name, not a deprecated one.
If the intent was to fix "the deprecated PR token", the original branch-token was already correct for v7. Consider reverting this change if you want explicit control over the git operations token.
|
@greptile-apps please update your review |
scripts/parse_feature_gates.py
Outdated
| connection = AsyncClient(cluster_url) | ||
| epoch_schedule = (await connection.get_epoch_schedule()).value |
There was a problem hiding this comment.
AsyncClient connection never closed
The AsyncClient (backed by an aiohttp session) is created but never explicitly closed. If get_epoch_schedule() raises an exception, the connection leaks. Python will log ResourceWarning: Unclosed client session and, on repeated CI runs, the accumulated open handles can cause failures.
Use an async context manager or an explicit finally block:
connection = AsyncClient(cluster_url)
try:
epoch_schedule = (await connection.get_epoch_schedule()).value
...
finally:
await connection.close()| account = await connection.get_account_info(Pubkey.from_string(key)) | ||
| break | ||
| except Exception as e: | ||
| if '429' in str(e) and attempt < MAX_RETRIES - 1: |
There was a problem hiding this comment.
Rate-limit detection relies on string matching
'429' in str(e) is fragile — it will also match any exception whose message happens to contain the substring "429" (e.g. a public key that happens to include those digits, or a different HTTP error like 4290). The same pattern is repeated in fetch_mainnet_activations.py at line 58.
Consider checking the actual exception type or HTTP status code. If the Solana RPC client wraps HTTP errors in a typed exception, match on that instead. As a minimal improvement, anchoring the string is safer:
if ('429' in str(e) or 'Too Many Requests' in str(e)) and attempt < MAX_RETRIES - 1:At minimum, document the assumption about the exception string format with a comment so future readers know why this string check is used.
| print(f"Failed to fetch {key}: {e}") | ||
| return backup_epoch | ||
|
|
||
| if account.value and account.value.data: |
There was a problem hiding this comment.
Maybe it makes sense to add a guard if account is None here as well
scripts/fetch_mainnet_activations.py
Outdated
There was a problem hiding this comment.
I think it makes sense to move the RPC address to the const to make it easier to change it for #844
|
|
||
|
|
||
| async def fetch_cluster_activations(cluster_url: str, features_to_check: list[tuple[StoredFeature, Feature]]) -> None: | ||
| """Fetch activation epochs for all features from a single cluster, reusing one connection.""" |
There was a problem hiding this comment.
I'm not an expert in Python, but would the connection be closed after the fn finishes its work? I mean we create a connection with AsyncClient to the endpoint, and at the fn's end we have to close it, right?
…618/explorer into fix-feature-gate-schedule
|
@greptile-apps please update your review |
| for existing, new_feature in features_to_check: | ||
| if 'devnet' in cluster_url: | ||
| existing.devnet_activation_epoch = await fetch_activation_epoch( | ||
| connection, epoch_schedule, existing.key, new_feature.devnet_activation_epoch | ||
| ) | ||
| elif 'testnet' in cluster_url: | ||
| existing.testnet_activation_epoch = await fetch_activation_epoch( | ||
| connection, epoch_schedule, existing.key, new_feature.testnet_activation_epoch | ||
| ) |
There was a problem hiding this comment.
On-chain value overwritten by wiki fallback on failure
When fetch_activation_epoch fails (all retries exhausted), it returns backup_epoch which is new_feature.devnet_activation_epoch / new_feature.testnet_activation_epoch — the value scraped from the Agave wiki markdown. However, existing.devnet_activation_epoch / existing.testnet_activation_epoch already holds the previously-stored, on-chain–verified epoch from a prior successful run. Overwriting it with the wiki value on failure means a transient RPC outage can silently regress accurate on-chain data to the less-authoritative wiki value.
The backup should be the existing stored value, not the wiki value:
if 'devnet' in cluster_url:
existing.devnet_activation_epoch = await fetch_activation_epoch(
connection, epoch_schedule, existing.key, existing.devnet_activation_epoch
)
elif 'testnet' in cluster_url:
existing.testnet_activation_epoch = await fetch_activation_epoch(
connection, epoch_schedule, existing.key, existing.testnet_activation_epoch
)Note that fetch_mainnet_activations.py correctly handles this with if account is None: continue, preserving the stored value on failure. The same pattern should be applied here.
| cluster_name = "devnet" if "devnet" in cluster_url else "testnet" if "testnet" in cluster_url else cluster_url | ||
|
|
||
| async with AsyncClient(cluster_url) as connection: | ||
| epoch_schedule = (await connection.get_epoch_schedule()).value |
There was a problem hiding this comment.
No guard against get_epoch_schedule() returning None
(await connection.get_epoch_schedule()).value may return None if the RPC node returns an error response instead of raising an exception. This would cause an AttributeError later when get_epoch_for_slot accesses epoch_schedule.first_normal_slot, silently skipping all features in that cluster without any useful error message.
Consider adding a guard:
epoch_schedule = (await connection.get_epoch_schedule()).value
if epoch_schedule is None:
print(f"[{cluster_name}] Failed to fetch epoch schedule, skipping cluster.")
return|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
@greptile-apps please update your review summary |
| else: | ||
| print(feature['key'], "initialized, but not activated") | ||
| async with AsyncClient(MAINNET_RPC_URL) as connection: | ||
| epoch_schedule = (await connection.get_epoch_schedule()).value |
There was a problem hiding this comment.
Missing None guard for epoch_schedule
(await connection.get_epoch_schedule()).value can return None if the RPC node returns an error response without raising an exception. When epoch_schedule is None, any subsequent call to get_epoch_for_slot(epoch_schedule, activation_slot) at line 79 will raise AttributeError: 'NoneType' object has no attribute 'first_normal_slot', causing the entire mainnet activation fetch to crash silently.
The equivalent fix was correctly applied in parse_feature_gates.py (lines 213–216) but was not carried over to this file.
| epoch_schedule = (await connection.get_epoch_schedule()).value | |
| epoch_schedule = (await connection.get_epoch_schedule()).value | |
| if epoch_schedule is None: | |
| print("Failed to fetch epoch schedule from mainnet, skipping.") | |
| return |
Description
Feature gate schedule was out of date 6 months.
In the action i triggered i saw ratelimits. So i added some delays and ran it locally.
Also in the PR docs it said the github token parameter should be token not branch-token so i changed that.
These should run every night.
Lets monitor if it succeeds tomorrow. But we will have the up to data data right now