This section covers operations-related features for running and managing operators in production.
1 - Configurations
The Java Operator SDK (JOSDK) provides abstractions that work great out of the box. However, we recognize that default behavior isn’t always suitable for every use case. Numerous configuration options help you tailor the framework to your specific needs.
Configuration options operate at several levels:
- Operator-level using
ConfigurationService - Reconciler-level using
ControllerConfiguration - DependentResource-level using the
DependentResourceConfiguratorinterface - EventSource-level where some event sources (like
InformerEventSource) need fine-tuning to identify which events trigger the associated reconciler
Operator-Level Configuration
Configuration that impacts the entire operator is performed via the ConfigurationService class. ConfigurationService is an abstract class with different implementations based on which framework flavor you use (e.g., Quarkus Operator SDK replaces the default implementation). Configurations initialize with sensible defaults but can be changed during initialization.
For example, to disable CRD validation on startup and configure leader election:
Operator operator = new Operator( override -> override
.checkingCRDAndValidateLocalModel(false)
.withLeaderElectionConfiguration(new LeaderElectionConfiguration("bar", "barNS")));
Reconciler-Level Configuration
While reconcilers are typically configured using the @ControllerConfiguration annotation, you can also override configuration at runtime when registering the reconciler with the operator. You can either:
- Pass a completely new
ControllerConfigurationinstance - Override specific aspects using a
ControllerConfigurationOverriderConsumer(preferred)
Operator operator;
Reconciler reconciler;
...
operator.register(reconciler, configOverrider ->
configOverrider.withFinalizer("my-nifty-operator/finalizer").withLabelSelector("foo=bar"));
Dynamically Changing Target Namespaces
A controller can be configured to watch a specific set of namespaces in addition of the
namespace in which it is currently deployed or the whole cluster. The framework supports
dynamically changing the list of these namespaces while the operator is running.
When a reconciler is registered, an instance of
RegisteredController
is returned, providing access to the methods allowing users to change watched namespaces as the
operator is running.
A typical scenario would probably involve extracting the list of target namespaces from a
ConfigMap or some other input but this part is out of the scope of the framework since this is
use-case specific. For example, reacting to changes to a ConfigMap would probably involve
registering an associated Informer and then calling the changeNamespaces method on
RegisteredController.
public static void main(String[] args) {
KubernetesClient client = new DefaultKubernetesClient();
Operator operator = new Operator(client);
RegisteredController registeredController = operator.register(new WebPageReconciler(client));
operator.installShutdownHook();
operator.start();
// call registeredController further while operator is running
}
If watched namespaces change for a controller, it might be desirable to propagate these changes to
InformerEventSources associated with the controller. In order to express this,
InformerEventSource implementations interested in following such changes need to be
configured appropriately so that the followControllerNamespaceChanges method returns true:
@ControllerConfiguration
public class MyReconciler implements Reconciler<TestCustomResource> {
@Override
public Map<String, EventSource> prepareEventSources(
EventSourceContext<ChangeNamespaceTestCustomResource> context) {
InformerEventSource<ConfigMap, TestCustomResource> configMapES =
new InformerEventSource<>(InformerEventSourceConfiguration.from(ConfigMap.class, TestCustomResource.class)
.withNamespacesInheritedFromController(context)
.build(), context);
return EventSourceUtils.nameEventSources(configMapES);
}
}
As seen in the above code snippet, the informer will have the initial namespaces inherited from controller, but also will adjust the target namespaces if it changes for the controller.
See also the integration test for this feature.
DependentResource-level configuration
It is possible to define custom annotations to configure custom DependentResource implementations. In order to provide
such a configuration mechanism for your own DependentResource implementations, they must be annotated with the
@Configured annotation. This annotation defines 3 fields that tie everything together:
by, which specifies which annotation class will be used to configure your dependents,with, which specifies the class holding the configuration object for your dependents andconverter, which specifies theConfigurationConverterimplementation in charge of converting the annotation specified by thebyfield into objects of the class specified by thewithfield.
ConfigurationConverter instances implement a single configFrom method, which will receive, as expected, the
annotation instance annotating the dependent resource instance to be configured, but it can also extract information
from the DependentResourceSpec instance associated with the DependentResource class so that metadata from it can be
used in the configuration, as well as the parent ControllerConfiguration, if needed. The role of
ConfigurationConverter implementations is to extract the annotation information, augment it with metadata from the
DependentResourceSpec and the configuration from the parent controller on which the dependent is defined, to finally
create the configuration object that the DependentResource instances will use.
However, one last element is required to finish the configuration process: the target DependentResource class must
implement the ConfiguredDependentResource interface, parameterized with the annotation class defined by the
@Configured annotation by field. This interface is called by the framework to inject the configuration at the
appropriate time and retrieve the configuration, if it’s available.
For example, KubernetesDependentResource, a core implementation that the framework provides, can be configured via the
@KubernetesDependent annotation. This set up is configured as follows:
@Configured(
by = KubernetesDependent.class,
with = KubernetesDependentResourceConfig.class,
converter = KubernetesDependentConverter.class)
public abstract class KubernetesDependentResource<R extends HasMetadata, P extends HasMetadata>
extends AbstractEventSourceHolderDependentResource<R, P, InformerEventSource<R, P>>
implements ConfiguredDependentResource<KubernetesDependentResourceConfig<R>> {
// code omitted
}
The @Configured annotation specifies that KubernetesDependentResource instances can be configured by using the
@KubernetesDependent annotation, which gets converted into a KubernetesDependentResourceConfig object by a
KubernetesDependentConverter. That configuration object is then injected by the framework in the
KubernetesDependentResource instance, after it’s been created, because the class implements the
ConfiguredDependentResource interface, properly parameterized.
For more information on how to use this feature, we recommend looking at how this mechanism is implemented for
KubernetesDependentResource in the core framework, SchemaDependentResource in the samples or CustomAnnotationDep
in the BaseConfigurationServiceTest test class.
Loading Configuration from External Sources
JOSDK ships a ConfigLoader that bridges any key-value configuration source to the operator and
controller configuration APIs. This lets you drive operator behaviour from environment variables,
system properties, YAML files, or any config library (MicroProfile Config, SmallRye Config,
Spring Environment, etc.) without writing glue code by hand.
Architecture
The system is built around two thin abstractions:
ConfigProvider— a single-method interface that resolves a typed value for a dot-separated key:public interface ConfigProvider { <T> Optional<T> getValue(String key, Class<T> type); }ConfigLoader— reads all known JOSDK keys from aConfigProviderand returnsConsumer<ConfigurationServiceOverrider>/Consumer<ControllerConfigurationOverrider<R>>values that you pass directly to theOperatorconstructor oroperator.register().
The default ConfigLoader (no-arg constructor) stacks environment variables over system
properties: environment variables win, system properties are the fallback.
// uses env vars + system properties out of the box
Operator operator = new Operator(ConfigLoader.getDefault().applyConfigs());
Built-in Providers
| Provider | Source | Key mapping |
|---|---|---|
EnvVarConfigProvider | System.getenv() | dots and hyphens → underscores, upper-cased (josdk.check-crd → JOSDK_CHECK_CRD) |
PropertiesConfigProvider | java.util.Properties or .properties file | key used as-is; use PropertiesConfigProvider.systemProperties() to read Java system properties |
YamlConfigProvider | YAML file | dot-separated key traverses nested mappings |
AggregatePriorityListConfigProvider | ordered list of providers | first non-empty result wins |
All string-based providers convert values to the target type automatically.
Supported types: String, Boolean, Integer, Long, Double, Duration (ISO-8601, e.g. PT30S).
Plugging in Any Config Library
ConfigProvider is a single-method interface, so adapting any config library takes only a few
lines. As an example, here is an adapter for
SmallRye Config:
public class SmallRyeConfigProvider implements ConfigProvider {
private final SmallRyeConfig config;
public SmallRyeConfigProvider(SmallRyeConfig config) {
this.config = config;
}
@Override
public <T> Optional<T> getValue(String key, Class<T> type) {
return config.getOptionalValue(key, type);
}
}
The same pattern applies to MicroProfile Config, Spring Environment, Apache Commons
Configuration, or any other library that can look up typed values by string key.
Wiring Everything Together
Pass the ConfigLoader results when constructing the operator and registering reconcilers:
// Load operator-wide config from a YAML file via SmallRye Config
URL configUrl = MyOperator.class.getResource("/application.yaml");
var configLoader = new ConfigLoader(
new SmallRyeConfigProvider(
new SmallRyeConfigBuilder()
.withSources(new YamlConfigSource(configUrl))
.build()));
// applyConfigs() → Consumer<ConfigurationServiceOverrider>
Operator operator = new Operator(configLoader.applyConfigs());
// applyControllerConfigs(name) → Consumer<ControllerConfigurationOverrider<R>>
operator.register(new MyReconciler(),
configLoader.applyControllerConfigs(MyReconciler.NAME));
Only keys that are actually present in the source are applied; everything else retains its programmatic or annotation-based default.
You can also compose multiple sources with explicit priority using
AggregatePriorityListConfigProvider:
var configLoader = new ConfigLoader(
new AggregatePriorityListConfigProvider(List.of(
new EnvVarConfigProvider(), // highest priority
PropertiesConfigProvider.systemProperties(),
new YamlConfigProvider(Path.of("config/operator.yaml")) // lowest priority
)));
Operator-Level Configuration Keys
All operator-level keys are prefixed with josdk..
General
| Key | Type | Description |
|---|---|---|
josdk.check-crd | Boolean | Validate CRDs against local model on startup |
josdk.close-client-on-stop | Boolean | Close the Kubernetes client when the operator stops |
josdk.use-ssa-to-patch-primary-resource | Boolean | Use Server-Side Apply to patch the primary resource |
josdk.clone-secondary-resources-when-getting-from-cache | Boolean | Clone secondary resources on cache reads |
Reconciliation
| Key | Type | Description |
|---|---|---|
josdk.reconciliation.concurrent-threads | Integer | Thread pool size for reconciliation |
josdk.reconciliation.termination-timeout | Duration | How long to wait for in-flight reconciliations to finish on shutdown |
Workflow
| Key | Type | Description |
|---|---|---|
josdk.workflow.executor-threads | Integer | Thread pool size for workflow execution |
Informer
| Key | Type | Description |
|---|---|---|
josdk.informer.cache-sync-timeout | Duration | Timeout for the initial informer cache sync |
josdk.informer.stop-on-error-during-startup | Boolean | Stop the operator if an informer fails to start |
Dependent Resources
| Key | Type | Description |
|---|---|---|
josdk.dependent-resources.ssa-based-create-update-match | Boolean | Use SSA-based matching for dependent resource create/update |
Leader Election
Leader election is activated when at least one josdk.leader-election.* key is present.
josdk.leader-election.lease-name is required when any other leader-election key is set.
Setting josdk.leader-election.enabled=false suppresses leader election even if other keys are
present.
| Key | Type | Description |
|---|---|---|
josdk.leader-election.enabled | Boolean | Explicitly enable (true) or disable (false) leader election |
josdk.leader-election.lease-name | String | Required. Name of the Kubernetes Lease object used for leader election |
josdk.leader-election.lease-namespace | String | Namespace for the Lease object (defaults to the operator’s namespace) |
josdk.leader-election.identity | String | Unique identity for this instance; defaults to the pod name |
josdk.leader-election.lease-duration | Duration | How long a lease is valid (default PT15S) |
josdk.leader-election.renew-deadline | Duration | How long the leader tries to renew before giving up (default PT10S) |
josdk.leader-election.retry-period | Duration | How often a candidate polls while waiting to become leader (default PT2S) |
Controller-Level Configuration Keys
All controller-level keys are prefixed with josdk.controller.<controller-name>., where
<controller-name> is the value returned by the reconciler’s name (typically set via
@ControllerConfiguration(name = "...")).
General
| Key | Type | Description |
|---|---|---|
josdk.controller.<name>.finalizer | String | Finalizer string added to managed resources |
josdk.controller.<name>.generation-aware | Boolean | Skip reconciliation when the resource generation has not changed |
josdk.controller.<name>.label-selector | String | Label selector to filter watched resources |
josdk.controller.<name>.max-reconciliation-interval | Duration | Maximum interval between reconciliations even without events |
josdk.controller.<name>.field-manager | String | Field manager name used for SSA operations |
josdk.controller.<name>.trigger-reconciler-on-all-events | Boolean | Trigger reconciliation on every event, not only meaningful changes |
Informer
| Key | Type | Description |
|---|---|---|
josdk.controller.<name>.informer.label-selector | String | Label selector for the primary resource informer (alias for label-selector) |
josdk.controller.<name>.informer.list-limit | Long | Page size for paginated informer list requests; omit for no pagination |
Retry
If any retry.* key is present, a GenericRetry is configured starting from the
default limited exponential retry.
Only explicitly set keys override the defaults.
| Key | Type | Description |
|---|---|---|
josdk.controller.<name>.retry.max-attempts | Integer | Maximum number of retry attempts |
josdk.controller.<name>.retry.initial-interval | Long (ms) | Initial backoff interval in milliseconds |
josdk.controller.<name>.retry.interval-multiplier | Double | Exponential backoff multiplier |
josdk.controller.<name>.retry.max-interval | Long (ms) | Maximum backoff interval in milliseconds |
Rate Limiter
The rate limiter is only activated when rate-limiter.limit-for-period is present and has a
positive value. rate-limiter.refresh-period is optional and falls back to the default of 10 s.
| Key | Type | Description |
|---|---|---|
josdk.controller.<name>.rate-limiter.limit-for-period | Integer | Maximum number of reconciliations allowed per refresh period. Must be positive to activate the limiter |
josdk.controller.<name>.rate-limiter.refresh-period | Duration | Window over which the limit is counted (default PT10S) |
2 - Logging
Contextual Info for Logging with MDC
Logging is enhanced with additional contextual information using MDC. The following attributes are available in most parts of reconciliation logic and during the execution of the controller:
| MDC Key | Value added from primary resource |
|---|---|
resource.apiVersion | .apiVersion |
resource.kind | .kind |
resource.name | .metadata.name |
resource.namespace | .metadata.namespace |
resource.resourceVersion | .metadata.resourceVersion |
resource.generation | .metadata.generation |
resource.uid | .metadata.uid |
For more information about MDC see this link.
MDC entries during event handling
Although, usually users might not require it in their day-to-day workflow, it is worth mentioning that
there are additional MDC entries managed for event handling. Typically, you might be interested in it
in your SecondaryToPrimaryMapper related logs.
For InformerEventSource and ControllerEventSource the following information is present:
| MDC Key | Value from Resource from the Event |
|---|---|
eventsource.event.resource.name | .metadata.name |
eventsource.event.resource.uid | .metadata.uid |
eventsource.event.resource.namespace | .metadata.namespace |
eventsource.event.resource.kind | resource kind |
eventsource.event.resource.resourceVersion | .metadata.resourceVersion |
eventsource.event.action | action name (e.g. ADDED, UPDATED, DELETED) |
eventsource.name | name of the event source |
Note on null values
If a resource doesn’t provide values for one of the specified keys, the key will be omitted and not added to the MDC
context. There is, however, one notable exception: the resource’s namespace, where, instead of omitting the key, we emit
the MDCUtils.NO_NAMESPACE value instead. This allows searching for resources without namespace (notably, clustered
resources) in the logs more easily.
Disabling MDC support
MDC support is enabled by default. If you want to disable it, you can set the JAVA_OPERATOR_SDK_USE_MDC environment
variable to false when you start your operator.
3 - Metrics
Runtime Info
RuntimeInfo is used mainly to check the actual health of event sources. Based on this information it is easy to implement custom liveness probes.
stopOnInformerErrorDuringStartup setting, where this flag usually needs to be set to false, in order to control the exact liveness properties.
See also an example implementation in the WebPage sample
Metrics
JOSDK provides built-in support for metrics reporting on what is happening with your reconcilers in the form of
the Metrics interface which can be implemented to connect to your metrics provider of choice, JOSDK calling the
methods as it goes about reconciling resources. By default, a no-operation implementation is provided thus providing a
no-cost sane default. A micrometer-based implementation is also provided.
You can use a different implementation by overriding the default one provided by the default ConfigurationService, as
follows:
Metrics metrics; // initialize your metrics implementation
Operator operator = new Operator(client, o -> o.withMetrics(metrics));
MicrometerMetricsV2
MicrometerMetricsV2
is the recommended micrometer-based implementation. It is designed with low cardinality in mind:
all meters are scoped to the controller, not to individual resources. This avoids unbounded cardinality growth as
resources come and go.
The simplest way to create an instance:
MeterRegistry registry; // initialize your registry implementation
Metrics metrics = MicrometerMetricsV2.newBuilder(registry).build();
Optionally, include a namespace tag on per-reconciliation counters (disabled by default to avoid unexpected
cardinality increases in existing deployments):
Metrics metrics = MicrometerMetricsV2.newBuilder(registry)
.withNamespaceAsTag()
.build();
You can also supply a custom timer configuration for reconciliations.execution.duration:
Metrics metrics = MicrometerMetricsV2.newBuilder(registry)
.withExecutionTimerConfig(builder -> builder.publishPercentiles(0.5, 0.95, 0.99))
.build();
MicrometerMetricsV2 metrics
All meters use controller.name as their primary tag. Counters optionally carry a namespace tag when
withNamespaceAsTag() is enabled.
| Meter name (Micrometer) | Type | Tags | Description |
|---|---|---|---|
reconciliations.active | gauge | controller.name | Number of reconciler executions currently executing |
reconciliations.queue | gauge | controller.name | Number of resources currently queued for reconciliation |
custom_resources | gauge | controller.name | Number of custom resources tracked by the controller |
reconciliations.execution.duration | timer | controller.name | Reconciliation execution duration with explicit bucket histogram |
reconciliations.started.total | counter | controller.name, namespace* | Number of reconciliations started (including retries) |
reconciliations.success.total | counter | controller.name, namespace* | Number of successfully finished reconciliations |
reconciliations.failure.total | counter | controller.name, namespace* | Number of failed reconciliations |
reconciliations.retries.total | counter | controller.name, namespace* | Number of reconciliation retries |
events.received | counter | controller.name, event, action, namespace* | Number of events received by the controller |
* namespace tag is only included when withNamespaceAsTag() is enabled.
The execution timer uses explicit boundaries (10ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s, 10s, 30s) to ensure
compatibility with histogram_quantile() queries in Prometheus. This is important when using the OpenTelemetry Protocol (OTLP) registry, where
publishPercentileHistogram() would otherwise produce Base2 Exponential Histograms that are incompatible with classic
_bucket queries.
Note on Prometheus metric names: The exact Prometheus metric name suffix depends on the
MeterRegistryin use. ForPrometheusMeterRegistrythe timer is exposed asreconciliations_execution_duration_seconds_*. ForOtlpMeterRegistry(metrics exported via OpenTelemetry Collector), it is exposed asreconciliations_execution_duration_milliseconds_*.
Grafana Dashboard
A ready-to-use Grafana dashboard is available at
observability/josdk-operator-metrics-dashboard.json.
It visualizes all of the metrics listed above, including reconciliation throughput, error rates, queue depth, active
executions, resource counts, and execution duration histograms and heatmaps.
The dashboard is designed to work with metrics exported via OpenTelemetry Collector to Prometheus, as set up by the observability sample (see below).
Exploring metrics end-to-end
The
metrics-processing sample operator
includes a full end-to-end test,
MetricsHandlingE2E,
that:
- Installs a local observability stack (Prometheus, Grafana, OpenTelemetry Collector) via
observability/install-observability.sh. That imports also the Grafana dashboards. - Runs two reconcilers that produce both successful and failing reconciliations over a sustained period
- Verifies that the expected metrics appear in Prometheus
This is a good starting point for experimenting with the metrics and the Grafana dashboard in a real cluster without having to deploy your own operator.
MicrometerMetrics (Deprecated)
Deprecated:
MicrometerMetrics(V1) is deprecated as of JOSDK 5.3.0. UseMicrometerMetricsV2instead. V1 attaches resource-specific metadata (name, namespace, etc.) as tags to every meter, which causes unbounded cardinality growth and can lead to performance issues in your metrics backend.
The legacy MicrometerMetrics implementation is still available. To create an instance that behaves as it historically
has:
MeterRegistry registry; // initialize your registry implementation
Metrics metrics = MicrometerMetrics.newMicrometerMetricsBuilder(registry).build();
To collect metrics on a per-resource basis, deleting the associated meters after 5 seconds when a resource is deleted, using up to 2 threads:
MicrometerMetrics.newPerResourceCollectingMicrometerMetricsBuilder(registry)
.withCleanUpDelayInSeconds(5)
.withCleaningThreadNumber(2)
.build();
Operator SDK metrics (V1)
The V1 micrometer implementation records the following metrics:
| Meter name | Type | Tag names | Description |
|---|---|---|---|
operator.sdk.reconciliations.executions.<reconciler name> | gauge | group, version, kind | Number of executions of the named reconciler |
operator.sdk.reconciliations.queue.size.<reconciler name> | gauge | group, version, kind | How many resources are queued to get reconciled by named reconciler |
operator.sdk.<map name>.size | gauge map size | Gauge tracking the size of a specified map (currently unused but could be used to monitor caches size) | |
| operator.sdk.events.received | counter | <resource metadata>, event, action | Number of received Kubernetes events |
| operator.sdk.events.delete | counter | <resource metadata> | Number of received Kubernetes delete events |
| operator.sdk.reconciliations.started | counter | <resource metadata>, reconciliations.retries.last, reconciliations.retries.number | Number of started reconciliations per resource type |
| operator.sdk.reconciliations.failed | counter | <resource metadata>, exception | Number of failed reconciliations per resource type |
| operator.sdk.reconciliations.success | counter | <resource metadata> | Number of successful reconciliations per resource type |
| operator.sdk.controllers.execution.reconcile | timer | <resource metadata>, controller | Time taken for reconciliations per controller |
| operator.sdk.controllers.execution.cleanup | timer | <resource metadata>, controller | Time taken for cleanups per controller |
| operator.sdk.controllers.execution.reconcile.success | counter | controller, type | Number of successful reconciliations per controller |
| operator.sdk.controllers.execution.reconcile.failure | counter | controller, exception | Number of failed reconciliations per controller |
| operator.sdk.controllers.execution.cleanup.success | counter | controller, type | Number of successful cleanups per controller |
| operator.sdk.controllers.execution.cleanup.failure | counter | controller, exception | Number of failed cleanups per controller |
All V1 metrics start with the operator.sdk prefix. <resource metadata> refers to resource-specific metadata and
depends on the considered metric and how the implementation is configured: group?, version, kind, [name, namespace?], scope where tags in square brackets ([]) won’t be present when per-resource collection is disabled and tags followed
by a question mark are omitted if the value is empty. In the context of controllers’ execution metrics, these tag names
are prefixed with resource..
Aggregated Metrics
The AggregatedMetrics class provides a way to combine multiple metrics providers into a single metrics instance using
the composite pattern. This is particularly useful when you want to simultaneously collect metrics data from different
monitoring systems or providers.
You can create an AggregatedMetrics instance by providing a list of existing metrics implementations:
// create individual metrics instances
Metrics micrometerMetrics = MicrometerMetrics.withoutPerResourceMetrics(registry);
Metrics customMetrics = new MyCustomMetrics();
Metrics loggingMetrics = new LoggingMetrics();
// combine them into a single aggregated instance
Metrics aggregatedMetrics = new AggregatedMetrics(List.of(
micrometerMetrics,
customMetrics,
loggingMetrics
));
// use the aggregated metrics with your operator
Operator operator = new Operator(client, o -> o.withMetrics(aggregatedMetrics));
This approach allows you to easily combine different metrics collection strategies, such as sending metrics to both Prometheus (via Micrometer) and a custom logging system simultaneously.
4 - Leader Election
When running multiple replicas of an operator for high availability, leader election ensures that only one instance actively reconciles resources at a time. JOSDK uses Kubernetes Lease objects for leader election.
Enabling Leader Election
Programmatic Configuration
var operator = new Operator(o -> o.withLeaderElectionConfiguration(
new LeaderElectionConfiguration("my-operator-lease", "operator-namespace")));
Or using the builder for full control:
import static io.javaoperatorsdk.operator.api.config.LeaderElectionConfigurationBuilder.aLeaderElectionConfiguration;
var config = aLeaderElectionConfiguration("my-operator-lease")
.withLeaseNamespace("operator-namespace")
.withIdentity(System.getenv("POD_NAME"))
.withLeaseDuration(Duration.ofSeconds(15))
.withRenewDeadline(Duration.ofSeconds(10))
.withRetryPeriod(Duration.ofSeconds(2))
.build();
var operator = new Operator(o -> o.withLeaderElectionConfiguration(config));
External Configuration
Leader election can also be configured via properties (e.g. environment variables or a config file).
See details under configurations page.
How It Works
- When leader election is enabled, the operator starts but does not process events until it acquires the lease.
- Once leadership is acquired, event processing begins normally.
- If leadership is lost (e.g. the leader pod becomes unresponsive), another instance acquires the lease
and takes over reconciliation. The instance that lost the lead is terminated (
System.exit())
Identity and Namespace Inference
If not explicitly set:
- Identity is resolved from the
HOSTNAMEenvironment variable, then the pod name, falling back to a random UUID. - Lease namespace defaults to the namespace the operator pod is running in.
RBAC Requirements
The operator’s service account needs permissions to manage Lease objects:
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create", "update", "get"]
JOSDK checks for these permissions at startup and throws a clear error if they are missing.
Sample E2E Test
A complete working example is available in the
leader-election sample operator,
including multi-replica deployment manifests and an E2E test that verifies failover behavior.
5 - Generic Helm Chart
A generic, reusable Helm chart for deploying Java operators built with JOSDK is available at
helm/generic-helm-chart.
It is intended as a template for operator developers — a starting point that covers common deployment patterns so you don’t have to write a chart from scratch. The chart is maintained on a best-effort basis. Contributions are more than welcome.
The chart is used in the
metrics-processing sample operator E2E test
to deploy the operator to a cluster via Helm.
What the Chart Provides
- Deployment with security defaults (non-root user, read-only filesystem, no privilege escalation)
- Dynamic RBAC (ClusterRole, ClusterRoleBinding, ServiceAccount) — permissions are generated automatically
from the primary and secondary resources you declare in
values.yaml - ConfigMap for operator configuration (
config.yaml) and logging (log4j2.xml), mounted at/config - Leader election support (opt-in)
- Extensibility via extra containers, init containers, volumes, and environment variables
Key Configuration
The most important values to set when adapting the chart for your operator:
image:
repository: my-operator-image # required
tag: "latest"
# Custom resources your operator reconciles
primaryResources:
- apiGroup: "sample.javaoperatorsdk"
resources:
- myresources
# Kubernetes resources your operator manages
secondaryResources:
- apiGroup: ""
resources:
- configmaps
- services
Primary resources get read/watch/patch permissions and status sub-resource access. Secondary resources get full CRUD permissions. Default verbs can be overridden per resource entry.
Operator Environment
The chart injects OPERATOR_NAMESPACE automatically. You can optionally set WATCH_NAMESPACE to
restrict the operator to a single namespace, and add arbitrary environment variables:
operator:
watchNamespace: "" # empty = all namespaces
env:
- name: MY_CUSTOM_VAR
value: "some-value"
Resource Defaults
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
See the full
values.yaml
for all available options.
Usage Example
A working example of how to use the chart can be found in the metrics-processing sample operator’s
helm-values.yaml:
image:
repository: metrics-processing-operator
pullPolicy: Never
tag: "latest"
nameOverride: "metrics-processing-operator"
resources: {}
primaryResources:
- apiGroup: "sample.javaoperatorsdk"
resources:
- metricshandlingcustomresource1s
- metricshandlingcustomresource2s
Install with:
helm install my-operator ./helm/generic-helm-chart -f my-values.yaml --namespace my-ns
Testing the Chart
The chart includes unit tests using the helm-unittest plugin. Run them with:
./helm/run-tests.sh