Conversation
Reviewed latest changes (5a09a0a). All previously flagged issues remain resolved. No new issues found.
Mention @roomote in a comment to request specific changes to this pull request or fix all unresolved issues. |
d05f61c to
1798915
Compare
There was a problem hiding this comment.
Pull request overview
This PR integrates the audit module from the Lumera blockchain (v1.11.0-rc) into the Supernode, introducing epoch-based storage challenge functionality and host reporting capabilities. The changes replace the legacy metrics reporting system with a new audit-driven architecture that submits evidence and epoch reports on-chain.
Changes:
- Upgraded Lumera dependency from v1.10.0 to v1.11.0-rc across all modules
- Added audit module integration (query and message modules) to the Lumera client
- Implemented storage challenge service with deterministic selection, gRPC handlers, and evidence submission
- Added host reporter service for epoch-scoped reporting (replacing legacy metrics collector)
- Introduced cascade client failure evidence submission in the SDK
Reviewed changes
Copilot reviewed 46 out of 50 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| go.mod, tests/system/go.mod, sn-manager/go.mod, cmd/sncli/go.mod | Bumped lumera dependency to v1.11.0-rc |
| go.sum, tests/system/go.sum, sn-manager/go.sum, cmd/sncli/go.sum | Updated checksums for new lumera version |
| pkg/lumera/interface.go, pkg/lumera/client.go | Added Audit() and AuditMsg() module accessors |
| pkg/lumera/modules/audit/*.go | New audit query module implementation |
| pkg/lumera/modules/audit_msg/*.go | New audit transaction module with evidence/report submission |
| supernode/storage_challenge/service.go | Core storage challenge service with epoch-driven execution |
| supernode/storage_challenge/README.md | Documentation for storage challenge protocol |
| supernode/transport/grpc/storage_challenge/handler.go | gRPC handlers for GetSliceProof and VerifySliceProof |
| supernode/host_reporter/service.go | Epoch report submission with port probing |
| pkg/storagechallenge/deterministic/*.go | Deterministic selection algorithms for challengers/files |
| supernode/config/*.go, supernode/config.yml | Added StorageChallengeConfig with defaults |
| supernode/cmd/start.go | Replaced metrics collector with host reporter and storage challenge services |
| sdk/task/evidence.go, sdk/adapters/lumera/adapter.go | Cascade failure evidence submission |
| proto/supernode/storage_challenge.proto | Protobuf definitions for storage challenge service |
| gen/supernode/storage_challenge*.go | Generated gRPC and protobuf code |
| pkg/testutil/lumera.go | Added mock audit modules to test utilities |
| tests/integration/securegrpc/secure_connection_test.go | Fixed test isolation by moving keyring creation into loop |
| .github/workflows/*.yml, .github/actions/setup-env/action.yml | Commented out bust_lumera_retag parameter |
| Makefile | Added storage_challenge.proto to generation targets |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // parseHostAndPort parses a "host" or "host:port" string and returns a host and port. | ||
| // If a port is not present, defaultPort is returned. If a port is present but invalid, | ||
| func parseHostAndPort(address string, defaultPort int) (host string, port int, ok bool) { | ||
| address = strings.TrimSpace(address) | ||
| if address == "" { | ||
| return "", 0, false | ||
| } | ||
|
|
||
| // If it looks like a URL, parse and use the host[:port] portion. | ||
| if u, err := url.Parse(address); err == nil && u.Host != "" { | ||
| address = u.Host | ||
| } | ||
|
|
||
| if h, p, err := net.SplitHostPort(address); err == nil { | ||
| h = strings.TrimSpace(h) | ||
| if h == "" { | ||
| return "", 0, false | ||
| } | ||
| if n, err := strconv.Atoi(p); err == nil && n > 0 && n <= 65535 { | ||
| return h, n, true | ||
| } | ||
| return h, defaultPort, true | ||
| } | ||
|
|
||
| // No port present; return default. | ||
| return address, defaultPort, true | ||
| } |
There was a problem hiding this comment.
In parseHostAndPort, when net.SplitHostPort returns an error (line 508), the function still attempts to use address as the host (line 520). However, if the address contains invalid characters or format, this could lead to unexpected behavior. While the caller performs additional validation, consider adding a check to ensure the address is valid before returning it as the host.
| @@ -33,17 +33,17 @@ runs: | |||
| sudo apt-get update | |||
| sudo apt-get install -y libwebp-dev make | |||
|
|
|||
| - name: One-time reset retagged lumera checksums | |||
| if: ${{ inputs.bust_lumera_retag == 'true' }} | |||
| shell: bash | |||
| run: | | |||
| echo "Busting go.sum entries for github.com/LumeraProtocol/lumera v1.8.0 (one-time)" | |||
| # Remove stale checksums in all local modules | |||
| find . -name 'go.sum' -maxdepth 3 -print0 | xargs -0 -I{} sed -i \ | |||
| '/github.com\/LumeraProtocol\/lumera v1.8.0/d' {} | |||
| # Clear module/build caches to avoid cached zips | |||
| go clean -modcache || true | |||
| rm -rf "$(go env GOCACHE)" || true | |||
| # - name: One-time reset retagged lumera checksums | |||
| # if: ${{ inputs.bust_lumera_retag == 'true' }} | |||
| # shell: bash | |||
| # run: | | |||
| # echo "Busting go.sum entries for github.com/LumeraProtocol/lumera v1.11.0-rc (one-time)" | |||
| # # Remove stale checksums in all local modules | |||
| # find . -name 'go.sum' -maxdepth 3 -print0 | xargs -0 -I{} sed -i \ | |||
| # '/github.com\/LumeraProtocol\/lumera v1.11.0-rc/d' {} | |||
| # # Clear module/build caches to avoid cached zips | |||
| # go clean -modcache || true | |||
| # rm -rf "$(go env GOCACHE)" || true | |||
There was a problem hiding this comment.
The bust_lumera_retag input parameter has been commented out but not removed entirely. This suggests it may still be needed for future use. Consider either fully removing it if it's no longer needed, or adding a comment explaining why it's temporarily disabled and under what conditions it should be re-enabled.
| func (t *BaseTask) submitCascadeClientFailureEvidence( | ||
| ctx context.Context, | ||
| subjectAddress string, | ||
| targetSupernodeAccounts []string, | ||
| details map[string]string, | ||
| ) { | ||
| subjectAddress = strings.TrimSpace(subjectAddress) | ||
| if subjectAddress == "" { | ||
| return | ||
| } | ||
|
|
||
| submitter, ok := any(t.client).(cascadeClientFailureEvidenceSubmitter) | ||
| if !ok { | ||
| t.logger.Debug(ctx, "Cascade client failure evidence submitter not configured") | ||
| return | ||
| } | ||
|
|
||
| if details == nil { | ||
| details = map[string]string{} | ||
| } | ||
| if _, exists := details["task_id"]; !exists { | ||
| details["task_id"] = t.TaskID | ||
| } | ||
| if _, exists := details["action_id"]; !exists { | ||
| details["action_id"] = t.ActionID | ||
| } | ||
|
|
||
| targetsCopy := append([]string(nil), targetSupernodeAccounts...) | ||
| detailsCopy := make(map[string]string, len(details)) | ||
| for k, v := range details { | ||
| detailsCopy[k] = v | ||
| } | ||
|
|
||
| // Evidence submission should not block retry loops. | ||
| go func(parent context.Context, subject string, actionID string, targets []string, metadata map[string]string) { | ||
| submitCtx, cancel := context.WithTimeout(context.WithoutCancel(parent), cascadeEvidenceSubmitTimeout) | ||
| defer cancel() | ||
|
|
||
| if err := submitter.SubmitCascadeClientFailureEvidence( | ||
| submitCtx, | ||
| subject, | ||
| actionID, | ||
| targets, | ||
| metadata, | ||
| ); err != nil { | ||
| t.logger.Warn(submitCtx, "Failed to submit cascade client failure evidence", | ||
| "subject_address", subject, | ||
| "targets", targets, | ||
| "error", err, | ||
| ) | ||
| } | ||
| }(ctx, subjectAddress, t.ActionID, targetsCopy, detailsCopy) | ||
| } |
There was a problem hiding this comment.
The submitCascadeClientFailureEvidence function launches a goroutine with context.WithoutCancel(parent) to ensure evidence submission completes even if the parent context is cancelled. However, this could lead to goroutine leaks if the submission context never completes (e.g., network issues). The 10-second timeout provides some protection, but consider adding monitoring or limiting the number of concurrent evidence submissions to prevent resource exhaustion in high-failure scenarios.
| func (s *Service) Run(ctx context.Context) error { | ||
| if !s.cfg.Enabled { | ||
| <-ctx.Done() | ||
| return nil | ||
| } | ||
|
|
||
| if err := s.initClients(); err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // Effective knobs (production defaults). Jitter is bounded by the epoch length | ||
| // to avoid sleeping past the epoch window on short epochs. | ||
| lookbackEpochs := scCandidateKeysLookbackEpochs | ||
| respTimeout := scResponseTimeout | ||
| affirmTimeout := scAffirmationTimeout | ||
| logtrace.Debug(ctx, "storage challenge runtime knobs", logtrace.Fields{ | ||
| "start_jitter_ms": scStartJitterMs, | ||
| "response_timeout_ms": respTimeout.Milliseconds(), | ||
| "affirmation_timeout_ms": affirmTimeout.Milliseconds(), | ||
| "submit_evidence_config": s.cfg.SubmitEvidence, | ||
| "poll_interval_ms": s.cfg.PollInterval.Milliseconds(), | ||
| "sc_files_per_challenger": scFilesPerChallenger, | ||
| "sc_replica_count": scReplicaCount, | ||
| "sc_observer_threshold": scObserverThreshold, | ||
| "sc_keys_lookback_epochs": lookbackEpochs, | ||
| }) | ||
|
|
||
| ticker := time.NewTicker(s.cfg.PollInterval) | ||
| defer ticker.Stop() | ||
|
|
||
| var lastRunEpoch uint64 | ||
| var lastRunOK bool | ||
| var loggedAlreadyRanEpoch uint64 | ||
| var loggedNotSelectedEpoch uint64 | ||
| var loggedDisabledEpoch uint64 | ||
|
|
||
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return ctx.Err() | ||
| case <-ticker.C: | ||
| height, ok := s.latestHeight(ctx) | ||
| if !ok { | ||
| continue | ||
| } | ||
|
|
||
| params, ok := s.auditParams(ctx) | ||
| if !ok { | ||
| continue | ||
| } | ||
|
|
||
| epochID, ok := deterministic.EpochID(height, params.EpochZeroHeight, params.EpochLengthBlocks) | ||
| if !ok { | ||
| continue | ||
| } | ||
| if !params.ScEnabled { | ||
| if loggedDisabledEpoch != epochID { | ||
| logtrace.Debug(ctx, "storage challenge disabled by on-chain params", logtrace.Fields{"epoch_id": epochID}) | ||
| loggedDisabledEpoch = epochID | ||
| } | ||
| lastRunEpoch = epochID | ||
| lastRunOK = true | ||
| continue | ||
| } | ||
| if lastRunOK && lastRunEpoch == epochID { | ||
| if loggedAlreadyRanEpoch != epochID { | ||
| logtrace.Debug(ctx, "storage challenge already ran this epoch; skipping", logtrace.Fields{"epoch_id": epochID}) | ||
| loggedAlreadyRanEpoch = epochID | ||
| } | ||
| continue | ||
| } | ||
|
|
||
| anchorResp, err := s.lumera.Audit().GetEpochAnchor(ctx, epochID) | ||
| if err != nil || anchorResp == nil || anchorResp.Anchor.EpochId != epochID { | ||
| // Anchor may not be committed yet at epoch boundary; retry on next tick. | ||
| continue | ||
| } | ||
| anchor := anchorResp.Anchor | ||
|
|
||
| challengers := deterministic.SelectChallengers(anchor.ActiveSupernodeAccounts, anchor.Seed, epochID, params.ScChallengersPerEpoch) | ||
| if !containsString(challengers, s.identity) { | ||
| if loggedNotSelectedEpoch != epochID { | ||
| logtrace.Debug(ctx, "storage challenge: not selected challenger; skipping", logtrace.Fields{ | ||
| "epoch_id": epochID, | ||
| "identity": s.identity, | ||
| "selected": len(challengers), | ||
| "sc_param": params.ScChallengersPerEpoch, | ||
| }) | ||
| loggedNotSelectedEpoch = epochID | ||
| } | ||
| lastRunEpoch = epochID | ||
| lastRunOK = true | ||
| continue | ||
| } | ||
|
|
||
| // Bound jitter by a conservative estimate of epoch duration (assume ~1s blocks). | ||
| // This is intentionally simple and is primarily to avoid sleeping past the epoch window. | ||
| jitterMaxMs := scStartJitterMs | ||
| epochBudgetMs := uint64(params.EpochLengthBlocks) * 1000 | ||
| if epochBudgetMs > 0 && epochBudgetMs/2 < jitterMaxMs { | ||
| jitterMaxMs = epochBudgetMs / 2 | ||
| } | ||
|
|
||
| jitterMs := deterministic.DeterministicJitterMs(anchor.Seed, epochID, s.identity, jitterMaxMs) | ||
| if jitterMs > 0 { | ||
| logtrace.Debug(ctx, "storage challenge jitter sleep", logtrace.Fields{ | ||
| "epoch_id": epochID, | ||
| "jitter_ms": jitterMs, | ||
| "jitter_max_ms": jitterMaxMs, | ||
| "challenger_id": s.identity, | ||
| }) | ||
| timer := time.NewTimer(time.Duration(jitterMs) * time.Millisecond) | ||
| select { | ||
| case <-ctx.Done(): | ||
| timer.Stop() | ||
| return ctx.Err() | ||
| case <-timer.C: | ||
| } | ||
| } | ||
|
|
||
| if err := s.runEpoch(ctx, anchor, params, lookbackEpochs, respTimeout, affirmTimeout); err != nil { | ||
| logtrace.Warn(ctx, "storage challenge epoch run error", logtrace.Fields{ | ||
| "epoch_id": epochID, | ||
| "error": err.Error(), | ||
| }) | ||
| lastRunEpoch = epochID | ||
| lastRunOK = false | ||
| continue | ||
| } | ||
|
|
||
| lastRunEpoch = epochID | ||
| lastRunOK = true | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The storage challenge service uses lastRunOK and lastRunEpoch variables to track whether a challenge has already run for an epoch (lines 155-156). However, these variables are not protected by any synchronization mechanism. If the Run method could be called concurrently (though it appears to be designed for single-threaded use), this could lead to race conditions. Consider documenting that Run must not be called concurrently, or add synchronization if concurrent calls are expected.
| } | ||
|
|
||
| func RegisterStorageChallengeServiceServer(s grpc.ServiceRegistrar, srv StorageChallengeServiceServer) { | ||
| // If the following call pancis, it indicates UnimplementedStorageChallengeServiceServer was |
There was a problem hiding this comment.
Typo in generated code comment: "pancis" should be "panics". This appears to be in generated code from protoc-gen-go-grpc. Consider updating the generator version or reporting this upstream if it persists.
| // If the following call pancis, it indicates UnimplementedStorageChallengeServiceServer was | |
| // If the following call panics, it indicates UnimplementedStorageChallengeServiceServer was |
| } | ||
|
|
||
| // parseHostAndPort parses a "host" or "host:port" string and returns a host and port. | ||
| // If a port is not present, defaultPort is returned. If a port is present but invalid, |
There was a problem hiding this comment.
The function comment states "If a port is not present, defaultPort is returned. If a port is present but invalid," but the sentence is incomplete. The comment should be completed to explain what happens when a port is present but invalid (the code shows it returns the host with defaultPort).
| func (s *Server) persistRecipientProof(ctx context.Context, req *supernode.GetSliceProofRequest, resp *supernode.GetSliceProofResponse) { | ||
| if s.store == nil { | ||
| return | ||
| } | ||
|
|
||
| challenge := types.MessageData{ | ||
| ChallengerID: req.ChallengerId, | ||
| RecipientID: req.RecipientId, | ||
| Observers: append([]string(nil), req.ObserverIds...), | ||
| Challenge: types.ChallengeData{ | ||
| FileHash: req.FileKey, | ||
| StartIndex: int(req.RequestedStart), | ||
| EndIndex: int(req.RequestedEnd), | ||
| Timestamp: time.Now().UTC(), | ||
| }, | ||
| } | ||
| challengeBz, _ := json.Marshal(challenge) | ||
| _ = s.store.InsertStorageChallengeMessage(types.StorageChallengeLogMessage{ | ||
| MessageType: int(types.ChallengeMessageType), | ||
| ChallengeID: req.ChallengeId, | ||
| Data: challengeBz, | ||
| Sender: s.identity, | ||
| SenderSignature: []byte{}, | ||
| }) | ||
|
|
||
| response := types.MessageData{ | ||
| ChallengerID: req.ChallengerId, | ||
| RecipientID: req.RecipientId, | ||
| Observers: append([]string(nil), req.ObserverIds...), | ||
| Response: types.ResponseData{ | ||
| Hash: resp.ProofHashHex, | ||
| Timestamp: time.Now().UTC(), | ||
| }, | ||
| } | ||
| responseBz, _ := json.Marshal(response) | ||
| _ = s.store.InsertStorageChallengeMessage(types.StorageChallengeLogMessage{ | ||
| MessageType: int(types.ResponseMessageType), | ||
| ChallengeID: req.ChallengeId, | ||
| Data: responseBz, | ||
| Sender: s.identity, | ||
| SenderSignature: []byte{}, | ||
| }) | ||
|
|
||
| logtrace.Debug(ctx, "storage challenge proof served", logtrace.Fields{ | ||
| "challenge_id": req.ChallengeId, | ||
| "file_key": req.FileKey, | ||
| "start": resp.Start, | ||
| "end": resp.End, | ||
| }) | ||
| } | ||
|
|
||
| func (s *Server) persistObserverVerification(ctx context.Context, req *supernode.VerifySliceProofRequest, resp *supernode.VerifySliceProofResponse) { | ||
| if s.store == nil { | ||
| return | ||
| } | ||
|
|
||
| eval := types.MessageData{ | ||
| ChallengerID: req.ChallengerId, | ||
| RecipientID: req.RecipientId, | ||
| Observers: []string{s.identity}, | ||
| ObserverEvaluation: types.ObserverEvaluationData{ | ||
| IsEvaluationResultOK: resp.Ok, | ||
| Reason: resp.Error, | ||
| TrueHash: req.ProofHashHex, | ||
| Timestamp: time.Now().UTC(), | ||
| }, | ||
| } | ||
| bz, _ := json.Marshal(eval) | ||
| _ = s.store.InsertStorageChallengeMessage(types.StorageChallengeLogMessage{ | ||
| MessageType: int(types.AffirmationMessageType), | ||
| ChallengeID: req.ChallengeId, | ||
| Data: bz, | ||
| Sender: s.identity, | ||
| SenderSignature: []byte{}, | ||
| }) | ||
|
|
||
| logtrace.Debug(ctx, "storage challenge proof verified", logtrace.Fields{ | ||
| "challenge_id": req.ChallengeId, | ||
| "ok": resp.Ok, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Error handling in persistRecipientProof and persistObserverVerification silently ignores errors from json.Marshal and InsertStorageChallengeMessage. While these operations are best-effort logging, consider at least logging these errors at debug level for troubleshooting purposes, similar to how other errors are handled in this codebase.
| func (s *Service) buildStorageChallengeObservations(ctx context.Context, epochID uint64, requiredOpenPorts []uint32, targets []string) []*audittypes.StorageChallengeObservation { | ||
| if len(targets) == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| out := make([]*audittypes.StorageChallengeObservation, len(targets)) | ||
|
|
||
| type workItem struct { | ||
| index int | ||
| target string | ||
| } | ||
|
|
||
| work := make(chan workItem) | ||
| done := make(chan struct{}) | ||
|
|
||
| worker := func() { | ||
| defer func() { done <- struct{}{} }() | ||
| for item := range work { | ||
| out[item.index] = s.observeTarget(ctx, epochID, requiredOpenPorts, item.target) | ||
| } | ||
| } | ||
|
|
||
| workers := maxConcurrentTargets | ||
| if workers > len(targets) { | ||
| workers = len(targets) | ||
| } | ||
| for i := 0; i < workers; i++ { | ||
| go worker() | ||
| } | ||
|
|
||
| for i, t := range targets { | ||
| work <- workItem{index: i, target: t} | ||
| } | ||
| close(work) | ||
|
|
||
| for i := 0; i < workers; i++ { | ||
| <-done | ||
| } | ||
|
|
||
| // ensure no nil elements (MsgSubmitEpochReport rejects nil observations) | ||
| final := make([]*audittypes.StorageChallengeObservation, 0, len(out)) | ||
| for i := range out { | ||
| if out[i] != nil { | ||
| final = append(final, out[i]) | ||
| } | ||
| } | ||
| return final | ||
| } |
There was a problem hiding this comment.
The buildStorageChallengeObservations function uses goroutines to probe targets concurrently but writes to a shared out slice without synchronization. While the workers process distinct indices making this technically safe, consider using proper synchronization primitives (like sync.Mutex) or channels to make the concurrency pattern more explicit and maintainable. This would prevent potential data races if the code is refactored in the future.
| func SelectFileKeys(candidateKeys []string, seed []byte, epochID uint64, challengerID string, count uint32) []string { | ||
| if count == 0 || len(candidateKeys) == 0 { | ||
| return nil | ||
| } | ||
|
|
||
| keys := append([]string(nil), candidateKeys...) | ||
| sort.Strings(keys) | ||
|
|
||
| want := int(count) | ||
| if want > len(keys) { | ||
| want = len(keys) | ||
| } | ||
|
|
||
| out := make([]string, 0, want) | ||
| seedHex := hex.EncodeToString(seed) | ||
|
|
||
| for i := 0; i < want; i++ { | ||
| msg := []byte("sc:files:" + challengerID + ":" + strconv.FormatUint(epochID, 10) + ":" + seedHex + ":" + strconv.Itoa(i)) | ||
| sum := blake3.Sum256(msg) | ||
| idx := int(binary.BigEndian.Uint64(sum[0:8]) % uint64(len(keys))) | ||
| out = append(out, keys[idx]) | ||
|
|
||
| // remove selected key (stable, deterministic) | ||
| keys = append(keys[:idx], keys[idx+1:]...) | ||
| if len(keys) == 0 { | ||
| break | ||
| } | ||
| } | ||
|
|
||
| return out | ||
| } |
There was a problem hiding this comment.
The SelectFileKeys function removes selected keys from the candidate list to avoid duplicates (line 127). However, this modifies the slice during iteration which could be problematic. While the current implementation appears correct (creating a copy with append and reassigning), consider adding a comment explaining this deduplication behavior to make the intent clearer for future maintainers.
| func (s *Service) tick(ctx context.Context) { | ||
| epochResp, err := s.lumera.Audit().GetCurrentEpoch(ctx) | ||
| if err != nil || epochResp == nil { | ||
| return | ||
| } | ||
| epochID := epochResp.EpochId | ||
| reachability.SetCurrentEpochID(epochID) | ||
|
|
||
| anchorResp, err := s.lumera.Audit().GetEpochAnchor(ctx, epochID) | ||
| if err != nil || anchorResp == nil || anchorResp.Anchor.EpochId != epochID { | ||
| // Anchor may not be committed yet at the epoch boundary; retry on next tick. | ||
| return | ||
| } | ||
|
|
||
| // Idempotency: if a report exists for this epoch, do nothing. | ||
| if _, err := s.lumera.Audit().GetEpochReport(ctx, epochID, s.identity); err == nil { | ||
| return | ||
| } else if status.Code(err) != codes.NotFound { | ||
| return | ||
| } | ||
|
|
||
| assignResp, err := s.lumera.Audit().GetAssignedTargets(ctx, s.identity, epochID) | ||
| if err != nil || assignResp == nil { | ||
| return | ||
| } | ||
|
|
||
| storageChallengeObservations := s.buildStorageChallengeObservations(ctx, epochID, assignResp.RequiredOpenPorts, assignResp.TargetSupernodeAccounts) | ||
|
|
||
| hostReport := audittypes.HostReport{ | ||
| // Intentionally submit 0% usage for CPU/memory so the chain treats these as "unknown". | ||
| // Disk usage is reported accurately (legacy-aligned) so disk-based enforcement can work. | ||
| CpuUsagePercent: 0, | ||
| MemUsagePercent: 0, | ||
| } | ||
| if diskUsagePercent, ok := s.diskUsagePercent(ctx); ok { | ||
| hostReport.DiskUsagePercent = diskUsagePercent | ||
| } | ||
|
|
||
| if _, err := s.lumera.AuditMsg().SubmitEpochReport(ctx, epochID, hostReport, storageChallengeObservations); err != nil { | ||
| logtrace.Warn(ctx, "epoch report submit failed", logtrace.Fields{ | ||
| "epoch_id": epochID, | ||
| "error": err.Error(), | ||
| }) | ||
| return | ||
| } | ||
|
|
||
| logtrace.Info(ctx, "epoch report submitted", logtrace.Fields{ | ||
| "epoch_id": epochID, | ||
| "storage_challenge_observations_count": len(storageChallengeObservations), | ||
| }) | ||
| } |
There was a problem hiding this comment.
The tick function queries the epoch anchor and checks for an existing report, but there's a race condition window between the idempotency check (line 120) and the report submission (line 143). If multiple instances or goroutines call tick simultaneously, they could both pass the idempotency check and attempt to submit duplicate reports. Consider using a mutex or atomic flag to prevent concurrent execution of the tick logic for the same epoch.
No description provided.