-
Notifications
You must be signed in to change notification settings - Fork 338
feat(secrets): Add new secrets management package #797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cf118e6 to
24ec0f7
Compare
694bf26 to
ef24c3d
Compare
|
@pintohutch @bernot-dev @bwplotka PTAL, should be ready! |
bwplotka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, amazing work!
Generally, it's great - maybe first big question is if we want SecretField to store so much state or do we want to move some of this state to manager (or and providers). This might be more composable and easier to reason about. Prometheus discovery is doing some of this. I mentioned one idea (A) 1 and 2 in comments.
However, it's not a blocker, even in the current state, I would say we could try this out on Prometheus (and someone might try on AM side!). What matters is that I see a clean YAML surface format, clean code, idiomatic struct reflection to find fields and healthy amount of test, so amazing! With this we can iterate.
No matter if you want to try (A) or skip it for now, I think I would try to check before merging:
- Limit the global variable spread (https://github.com/prometheus/common/pull/797/files#r2414525826)
- If we can skip validator and timeout based validator state per field, if it's YAGNI (see https://github.com/prometheus/common/pull/797/files#r2414536374)
- Use testable examples: https://github.com/prometheus/common/pull/797/files#r2414499436
Then I would like to review in more depth the manager code, but generally... we could start with this! 🎉 Thanks!
| } | ||
|
|
||
| type SecretFieldSettings struct { | ||
| RefreshInterval time.Duration `yaml:"refreshInterval,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: Hm, good question if refresh is really relevant to all providers... E.g it's not for inline and.. also not for file based? Refreshing in general is a key concept here so we at least needs a good commentary in the code for this field on how it suppose to be implemented and consumed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I guess it is not relevant to inline but could be relevant to file. Although we might want to think about listening to filesystem changes for a file based provider instead, but I found that to complicate the API quite a bit...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, comment would be nice. I am worried it's YAGNI - we don't know if refresh in such a form would be even useful for other providers (we are guessing). Perhaps moving this to inside provider config is safer?
secrets/manager.go
Outdated
| return secret.secret | ||
| } | ||
|
|
||
| func (m *Manager) triggerRefresh(s *SecretField) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to myself: Manager code needs deeper review, quite many locks 🙈 prone to errors, but perhaps needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah it was very complicated before with the verify logic, I think it should be a bit simpler now.. Still need some cleanup there, but should be more readable
bwplotka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks solid, thanks!
Still some few comments to address, but overall looks close to be merged!
37e45e0 to
3abfb21
Compare
bwplotka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work, did early pass, will go with more review next week, thanks! 💪🏽
| @@ -0,0 +1,18 @@ | |||
| # Secret Management | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
One thing to discuss/put in README.
How it relates to
Line 28 in d80d854
| type Secret string |
What would be the plan forward? Deprecate config.Secret?
|
|
||
| // PolicyFunc returns true if secrets should be printed (insecure), | ||
| // and false if they should be scrubbed (secure). | ||
| type PolicyFunc func() bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: We have epic opportunity to deglobalize this and have Field.redact bool field with perhaps a DAG method to set it on or all off per config. WDYT?
Also if we do bool, we could just do simpler global as in https://github.com/prometheus/common/blob/d80d8544703e59a080a204b6f7429ac6561fb24f/config/config.go#L33C5-L33C23
bwplotka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this work! Solid, high quality code.
Current Algorithm
To review properly I researched what's the algorithm and noted on the way. Perhaps useful for other reviewers:
- Unmarshal instantiates secret fields in a raw state. It does NOT unmarshal the content of the secret YAML yet. We keep it raw
Field.rawConfig. - secrets.Manager "registers" the unmarshalled config structure.
- It uses value reflection (not type reflection) to gather all configured fields into a map by "path"
- It goes through paths (fields) and unmarshall here properly (
parseRawConfig)- It uses some hack
convertConfigto marshal any to YAML and unmarshal back to certain struct (wonder if there's alternative, but good for now I guess?) - For complex providers (not inline) it constructs provider data and settings
- It retrieves providerConfig to unmarshal
- It unmarshals providerConfig
- It unmarshals settings (refresh interval if any)
- packs all in fieldState
- It uses some hack
- registers each secret field State as a "managedSecret" by path+provider (or update fresh interval)
- Manager Start starts goroutines that actually fetches secrets async
- You have to ask for readiness using
manager.SecretsReady(&cfg)- it goes through all paths again (why? We have all in
m.secrets)
- it goes through all paths again (why? We have all in
- "// This method should be called before accessing any secret values to ensure
// that they are available."
- You can get value without error
High Level Suggestions
Sorry for complaining. I do think we could slightly improve the flow. It's mostly great, but I have suspicion it might be awkward to fit in the Prometheus ApplyConfig reloading pattern. The main bottlenecks IMO are:
- (5) Get is trying to avoid returning error, but I wonder if this use case is even worth optimizing worth. In the current flow you have to ask for SecretReady (4) before every Get (reading from comments)? Which returns error, so why not having
Get() (ret, error)wait until ready or return old value 🤔 - Prometheus dynamic reloading makes it so we unmarshal config, sometimes every minute. Here it seems you need new manager for every unmarshal? That might be a problem.
- Do we envision problems with duplicates provider name vs field level settings?
- Smaller nits (inlined comments)
How to progress
Some of the above suggestions are opinionated. I might be wrong. So... have you tried adopting this PR on Prometheus? I think it would be the ultimate test how it looks like. Maybe you did and it fits well. Can we have a quick draft PR that imports this commit? I think that's a blocker to really merge this PR (proof this code will work). WDYT?
Also such a long iteration cycles makes it harder to review (context gathering). Plus it's easy to complain. I wonder if we could do some pair programming, or I could propose changes in a PR towards your PR? Would you like that help to make a progress? Or would you prefer to continue with async reviews only?
| // convertConfig takes a yaml-parsed any and unmarshals it into a typed struct. | ||
| // It achieves this by first marshalling the input to YAML and then unmarshalling | ||
| // it into the target struct. | ||
| func convertConfig[T any](source any, target T) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps more readable?
| func convertConfig[T any](source any, target T) error { | |
| func unmarshalTo[T any](source any, target T) error { |
| // It achieves this by first marshalling the input to YAML and then unmarshalling | ||
| // it into the target struct. | ||
| func convertConfig[T any](source any, target T) error { | ||
| bytes, err := yaml.Marshal(source) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting, I thought any is internally just bytes of unmarshalled blob, sounds like it's not so simple?
|
|
||
| // splitProviderAndSettings separates provider-specific configuration from the generic SecretField settings. | ||
| func splitProviderAndSettings(provReg *ProviderRegistry, baseMap mapType) (providerName string, providerData interface{}, settingsMap mapType, err error) { | ||
| settingsMap = make(mapType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We typically in Go just inline map instantiation, no need for mapType
| providerName = k | ||
| providerData = v | ||
| } else { | ||
| // If it's not a provider key, treat it as a setting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For readability (was confused reading this code)
| // If it's not a provider key, treat it as a setting. | |
| // If it's not a provider key, treat it as a setting. We will detect missing provider later. |
| providerData = v | ||
| } else { | ||
| // If it's not a provider key, treat it as a setting. | ||
| settingsMap[k] = v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean settings fields can conflict with provider name. Is that a problem? Do we need settings? Example does not show it. Can we apply YAGNI to descope (add it later when needed?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't also mind if we kill FieldSettings and duplicate refresh and other logic in the provider. We can share common things once we have at least two complex providers.. (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess to avoid/handle dups one could try to unmarshal FieldSettings here? Anyway just a bit pain to add more fields, you never know if it conflicts with dynamic provider someone might implement in fork, perhaps easier to inline with provider?
secrets/manager.go
Outdated
| metricLabels: labels, | ||
| provider: provider, | ||
| } | ||
| m.secrets[state.id()] = ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do id() once and save local var?
| m.secrets[state.id()] = ms | ||
| m.secretState.With(labels).Set(stateInitializing) | ||
| m.fetchSuccessTotal.With(labels).Add(0) | ||
| m.fetchFailuresTotal.With(labels).Add(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return early?
secrets/example_test.go
Outdated
|
|
||
| // Create a secret manager. This discovers and manages all Fields in cfg. | ||
| // The manager will handle refreshing secrets in the background. | ||
| manager, err := secrets.NewManager(promRegisterer, secrets.Providers, &cfg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: What's the process for refreshing secrets ON reconfiguration (very common in Prometheus -- ApplyConfig). It sounds like with this abstraction we need to recreate manager? (secrets.NewManager has to be re-run?)
| } | ||
|
|
||
| type SecretFieldSettings struct { | ||
| RefreshInterval time.Duration `yaml:"refreshInterval,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, comment would be nice. I am worried it's YAGNI - we don't know if refresh in such a form would be even useful for other providers (we are guessing). Perhaps moving this to inside provider config is safer?
Signed-off-by: Henrique Matulis <hmatulis@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a new secrets package to prometheus/common, providing a unified and extensible framework for managing secrets in Prometheus configuration files. The implementation supports multiple secret providers (inline and file built-in, with custom provider support) and includes comprehensive observability through Prometheus metrics.
Key Changes
- New secrets management framework with Field type for configuration structs
- Provider registry system supporting pluggable secret providers (inline and file providers included)
- Manager component that discovers, manages, and automatically refreshes secrets with configurable intervals
- Comprehensive test coverage including unit tests and example tests
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| secrets/field.go | Core Field type with YAML/JSON marshaling and visibility policy support |
| secrets/provider.go | Provider and ProviderConfig interfaces for extensibility |
| secrets/registry.go | Provider registry for registering and retrieving secret providers |
| secrets/internal_providers.go | Built-in inline and file providers with init registration |
| secrets/manager.go | Secret manager with automatic discovery, refresh loops, and Prometheus metrics |
| secrets/resolve.go | Reflection-based field discovery for traversing configuration structs |
| secrets/lifecycle.go | Secret preparation hooks for config preprocessing |
| secrets/util.go | Utility functions for counting non-zero values |
| secrets/*_test.go | Comprehensive test coverage for all components |
| secrets/doc.go | Package documentation |
| secrets/README.md | User-facing documentation with usage examples |
| config/secrets.go | Integration with existing config package visibility policy |
| go.mod | Dependency update moving client_golang from indirect to direct |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| func NewManager(ctx context.Context, r prometheus.Registerer, providers *ProviderRegistry, config interface{}) (*Manager, error) { | ||
| manager := &Manager{ | ||
| secrets: make(map[string]*managedSecret), | ||
| providers: providers, |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The refreshC channel is never initialized in NewManager. This will cause a nil channel panic when triggerRefresh is called (line 272). Add refreshC: make(chan struct{}, 1) to the Manager initialization.
| providers: providers, | |
| providers: providers, | |
| refreshC: make(chan struct{}, 1), |
| }) | ||
|
|
||
| t.Run("FetchSecret_NotFound", func(t *testing.T) { | ||
| badFPC := &FileProviderConfig{Path: filepath.Join(tempDir, "non-existant.txt")} |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error: "non-existant" should be "non-existent".
| badFPC := &FileProviderConfig{Path: filepath.Join(tempDir, "non-existant.txt")} | |
| badFPC := &FileProviderConfig{Path: filepath.Join(tempDir, "non-existent.txt")} |
| @@ -0,0 +1,79 @@ | |||
| package secrets | |||
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing copyright header. All other files in the package have a copyright header starting with // Copyright 2025 The Prometheus Authors. This file should include the same header for consistency.
| @@ -0,0 +1,12 @@ | |||
| package secrets | |||
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing copyright header. All other files in the package have a copyright header starting with // Copyright 2025 The Prometheus Authors. This file should include the same header for consistency.
| if err != nil { | ||
| return fmt.Errorf("%s: %w", path, err) | ||
| } | ||
| ctx, cancel := context.WithTimeout(ctx, time.Millisecond) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timeout of time.Millisecond for the initial fetch attempt is extremely short and may be insufficient for most providers. This could cause unnecessary failures during initialization. Consider using a more reasonable timeout (e.g., 1-5 seconds) or using fetchTimeout constant defined elsewhere in the file.
| ctx, cancel := context.WithTimeout(ctx, time.Millisecond) | |
| ctx, cancel := context.WithTimeout(ctx, fetchTimeout) |
| val: val, | ||
| depth: w.depth + 1, | ||
| } | ||
|
|
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an unnecessary blank line before the closing brace. Consider removing it for consistency with Go code style.
| func (h Hydrator) OrFromFile( | ||
| secretField **Field, filePath string) error { | ||
| if h.manager == nil || h.manager.hydrating == false { | ||
| panic("secret field is sealed by the manager; call this from PrepareSecrets") |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The panic message "secret field is sealed by the manager; call this from PrepareSecrets" is unclear. Consider rewording to something like "OrFromFile can only be called during PrepareSecrets; the manager has not yet started hydration" for better clarity.
| panic("secret field is sealed by the manager; call this from PrepareSecrets") | |
| panic("OrFromFile can only be called during PrepareSecrets; the manager has not yet started hydration") |
| m.mtx.RUnlock() | ||
|
|
||
| if !ok { | ||
| promslog.NewNopLogger() |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The promslog.NewNopLogger() call has no effect since its result is not used or assigned. This appears to be leftover debug code. Consider either removing it or logging a warning message about the secret not being found.
| promslog.NewNopLogger() | |
| promslog.Warnf("triggerRefresh: secret with id %s not found; refresh not triggered", s.state.id()) |
|
|
||
| } | ||
|
|
||
| func (h Hydrator) OrFromFile( |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function name OrFromFile is unclear. Consider renaming to something more descriptive like SetFromFieldOrFile or HydrateFromFieldOrFile to better convey that it's setting a field value from either the field itself or a file path.
| func (h Hydrator) OrFromFile( | |
| func (h Hydrator) SetFromFieldOrFile( |
| m.fetchDuration = promauto.With(r).NewHistogramVec( | ||
| prometheus.HistogramOpts{ | ||
| Name: "prometheus_remote_secret_fetch_duration_seconds", | ||
| Help: "Duration of secret fetch attempts.", | ||
| Buckets: prometheus.DefBuckets, | ||
| }, | ||
| labels, | ||
| ) |
Copilot
AI
Dec 4, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fetchDuration metric is registered but never used in the code to record actual fetch durations. Consider adding timing instrumentation in the fetchAndStoreSecret method to observe the time taken for secret fetches.
This PR introduces a new package, secrets, to prometheus/common. This package provides a unified way to handle secrets within configuration files for Prometheus and its ecosystem components. It is designed to be extensible and observable. See the proposal here