Posts

Welcome read-cache-after-write consistency!

TL;DR: In version 5.3.0 we introduced strong consistency guarantees for updates with a new API. You can now update resources (both your custom resource and managed resources) and the framework will guarantee that these updates will be instantly visible when accessing resources from caches, and naturally also for subsequent reconciliations.

I briefly talked about this topic at KubeCon last year.

public UpdateControl<WebPage> reconcile(WebPage webPage, Context<WebPage> context) {
    
    ConfigMap managedConfigMap = prepareConfigMap(webPage);
    // apply the resource with new API
    context.resourceOperations().serverSideApply(managedConfigMap);
    
    // fresh resource instantly available from our update in the caches
    var upToDateResource = context.getSecondaryResource(ConfigMap.class);
    
    // from now on built-in update methods by default use this feature;
    // it is guaranteed that resource changes will be visible for next reconciliation
    return UpdateControl.patchStatus(alterStatusObject(webPage));
}

In addition to that, the framework will automatically filter events for your own updates, so they don’t trigger the reconciliation again.

This post will deep dive into this topic, exploring the details and rationale behind it.

See the related umbrella issue on GitHub.

Informers and eventual consistency

First, we have to understand a fundamental building block of Kubernetes operators: Informers. Since there is plentiful accessible information about this topic, here’s a brief summary. Informers:

  1. Watch Kubernetes resources — the K8S API sends events if a resource changes to the client through a websocket. An event usually contains the whole resource. (There are some exceptions, see Bookmarks.) See details about watch as a K8S API concept in the official docs.
  2. Cache the latest state of the resource.
  3. If an informer receives an event in which the metadata.resourceVersion is different from the version in the cached resource, it calls the event handler, thus in our case triggering the reconciliation.

A controller is usually composed of multiple informers: one tracking the primary resource, and additional informers registered for each (secondary) resource we manage. Informers are great since we don’t have to poll the Kubernetes API — it is push-based. They also provide a cache, so reconciliations are very fast since they work on top of cached resources.

Now let’s take a look at the flow when we update a resource:

graph LR
    subgraph Controller
        Informer:::informer
        Cache[(Cache)]:::teal
        Reconciler:::reconciler
        Informer -->|stores| Cache
        Reconciler -->|reads| Cache
    end
    K8S[⎈ Kubernetes API Server]:::k8s

    Informer -->|watches| K8S
    Reconciler -->|updates| K8S

    classDef informer fill:#C0527A,stroke:#8C3057,color:#fff
    classDef reconciler fill:#E8873A,stroke:#B05E1F,color:#fff
    classDef teal fill:#3AAFA9,stroke:#2B807B,color:#fff
    classDef k8s fill:#326CE5,stroke:#1A4AAF,color:#fff

It is easy to see that the cache of the informer is eventually consistent with the update we sent from the reconciler. It usually takes only a very short time (a few milliseconds) to sync the caches and everything is fine. Well, sometimes it isn’t. The websocket can be disconnected (which actually happens on purpose sometimes), the API Server can be slow, etc.

The problem(s) we try to solve

Let’s consider an operator with the following requirements:

  • we have a custom resource PrefixedPod where the spec contains only one field: podNamePrefix
  • the goal of the operator is to create a Pod with a name that has the prefix and a random suffix
  • it should never run two Pods at once; if the podNamePrefix changes, it should delete the current Pod and then create a new one
  • the status of the custom resource should contain the generatedPodName

How the code would look in 5.2.x:


public UpdateControl<PrefixedPod> reconcile(PrefixedPod primary, Context<PrefixedPod> context) {
    
    Optional<Pod> currentPod = context.getSecondaryResource(Pod.class);
    
    if (currentPod.isPresent()) {
        if (podNameHasPrefix(primary.getSpec().getPodNamePrefix() ,currentPod.get())) {
            // all ok we can return
            return UpdateControl.noUpdate();
        } else {
            // deletes the current pod with different name pattern
            context.getClient().resource(currentPod.get()).delete();
           // return; pod delete event will trigger the reconciliation
           return UpdateControl.noUpdate();
        }
    } else {
        // creates new pod
       var newPod = context.getClient().resource(createPodWithOwnerReference(primary)).serverSideApply();
       return UpdateControl.patchStatus(setGeneratedPodNameToStatus(primary,newPod));
    }
}

@Override
public List<EventSource<?, PrefixedPod>> prepareEventSources(EventSourceContext<PrefixedPod> context) {
    // Code omitted for adding InformerEventSource for the Pod
}

That is quite simple: if there is a Pod with a different name prefix we delete it, otherwise we create the Pod and update the status. The Pod is created with an owner reference, so any update on the Pod will trigger the reconciliation.

Now consider the following sequence of events:

  1. We create a PrefixedPod with spec.podNamePrefix: first-pod-prefix.
  2. Concurrently:
    • The reconciliation logic runs and creates a Pod with a generated name suffix: “first-pod-prefix-a3j3ka”; it also sets this in the status and updates the custom resource status.
    • While the reconciliation is running, we update the custom resource to have the value second-pod-prefix.
  3. The update of the custom resource triggers the reconciliation.

When the spec change triggers the reconciliation in point 3, there is absolutely no guarantee that:

  • the created Pod will already be visible — currentPod might simply be empty
  • the status.generatedPodName will be visible

Since both are backed by an informer and the caches of those informers are only eventually consistent with our updates, the next reconciliation would create a new Pod, violating the requirement to not have two Pods running at the same time. In addition, the controller would override the status. Although in the case of a Kubernetes resource we can still find the existing Pods later via owner references, if we were managing a non-Kubernetes (external) resource we would not notice that we had already created one.

So can we have stronger guarantees regarding caches? It turns out we can now…

Achieving read-cache-after-write consistency

When we send an update (this also applies to various create and patch requests) to the Kubernetes API, in the response we receive the up-to-date resource with the resource version that is the most recent at that point. The idea is that we can cache this response in a cache on top of the Informer’s cache. We call this cache TemporaryResourceCache (TRC), and besides caching such responses, it also plays a role in event filtering as we will see later.

Note that the challenge in the past was knowing when to evict this response from the TRC. Eventually, we will receive an event in the informer and the informer cache will be populated with an up-to-date resource. But it was not possible to reliably tell whether an event contained a resource that was the result of an update before or after our own update. The reason is that the Kubernetes documentation stated that metadata.resourceVersion should be treated as an opaque string and matched only with equality. Although with optimistic locking we were able to overcome this issue — see this blog post.

From this point the idea of the algorithm is very simple:

  1. After updating a Kubernetes resource, cache the response in the TRC.
  2. When the informer propagates an event, check if its resource version is greater than or equal to the one in the TRC. If yes, evict the resource from the TRC.
  3. When the controller reads a resource from cache, it checks the TRC first, then falls back to the Informer’s cache.
sequenceDiagram
    box rgba(50,108,229,0.1)
        participant K8S as ⎈ Kubernetes API Server
    end
    box rgba(232,135,58,0.1)
        participant R as Reconciler
    end
    box rgba(58,175,169,0.1)
        participant I as Informer
        participant IC as Informer Cache
        participant TRC as Temporary Resource Cache
    end

    R->>K8S: 1. Update resource
    K8S-->>R: Updated resource (with new resourceVersion)
    R->>TRC: 2. Cache updated resource in TRC

    I-)K8S: 3. Watch event (resource updated)
    I->>TRC: On event: event resourceVersion ≥ TRC version?
    alt Yes: event is up-to-date
        I-->>TRC: Evict resource from TRC       
    else No: stale event        
        Note over TRC: TRC entry retained
    end

    R->>TRC: 4. Read resource from cache
    alt Resource found in TRC
        TRC-->>R: Return cached resource
    else Not in TRC
        R->>IC: Read from Informer Cache
        IC-->>R: Return resource
    end

Filtering events for our own updates

When we update a resource, eventually the informer will propagate an event that would trigger a reconciliation. However, this is mostly not desired. Since we already have the up-to-date resource at that point, we would like to be notified only if the resource is changed after our change. Therefore, in addition to caching the resource, we also filter out events that contain a resource version older than or equal to our cached resource version.

Note that the implementation of this is relatively complex, since while performing the update we want to record all the events received in the meantime and decide whether to propagate them further once the update request is complete.

However, this way we significantly reduce the number of reconciliations, making the whole process much more efficient.

The case for instant reschedule

We realize that some of our users might rely on the fact that reconciliation is triggered by their own updates. To support backwards compatibility, or rather a migration path, we now provide a way to instruct the framework to queue an instant reconciliation:

public UpdateControl<WebPage> reconcile(WebPage webPage, Context<WebPage> context) {
 
    // omitted reconciliation logic
    
   return UpdateControl.<WebPage>noUpdate().reschedule();
}

Additional considerations and alternatives

An alternative approach would be to not trigger the next reconciliation until the target resource appears in the Informer’s cache. The upside is that we don’t have to maintain an additional cache of the resource, just the target resource version; therefore this approach might have a smaller memory footprint, but not necessarily. See the related KEP that takes this approach.

On the other hand, when we make a request, the response object is always deserialized regardless of whether we are going to cache it or not. This object in most cases will be cached for a very short time and later garbage collected. Therefore, the memory overhead should be minimal.

Having the TRC has an additional advantage: since we have the resource instantly in our caches, we can elegantly continue the reconciliation in the same pass and reconcile resources that depend on the latest state. More concretely, this also helps with our Dependent Resources / Workflows which rely on up-to-date caches. In this sense, this approach is much more optimal regarding throughput.

Conclusion

I personally worked on a prototype of an operator that depended on an unreleased version of JOSDK already implementing these features. The most obvious gain was how much simpler the reasoning became in some cases and how it reduced the corner cases that we would otherwise have to solve with the expectation pattern or other facilities.

Special thanks

I would like to thank all the contributors who directly or indirectly contributed, including metacosm, manusa, and xstefank.

Last but certainly not least, special thanks to Steven Hawkins, who maintains the Informer implementation in the fabric8 Kubernetes client and implemented the first version of the algorithms. We then iterated on it together multiple times. Covering all the edge cases was quite an effort. Just as a highlight, I’ll mention the last one.

Thank you!

How to guarantee allocated values for next reconciliation

We recently released v5.1 of Java Operator SDK (JOSDK). One of the highlights of this release is related to a topic of so-called allocated values.

To describe the problem, let’s say that our controller needs to create a resource that has a generated identifier, i.e. a resource which identifier cannot be directly derived from the custom resource’s desired state as specified in its spec field. To record the fact that the resource was successfully created, and to avoid attempting to recreate the resource again in subsequent reconciliations, it is typical for this type of controller to store the generated identifier in the custom resource’s status field.

The Java Operator SDK relies on the informers’ cache to retrieve resources. These caches, however, are only guaranteed to be eventually consistent. It could happen that, if some other event occurs, that would result in a new reconciliation, before the update that’s been made to our resource status has the chance to be propagated first to the cluster and then back to the informer cache, that the resource in the informer cache does not contain the latest version as modified by the reconciler. This would result in a new reconciliation where the generated identifier would be missing from the resource status and, therefore, another attempt to create the resource by the reconciler, which is not what we’d like.

Java Operator SDK now provides a utility class PrimaryUpdateAndCacheUtils to handle this particular use case. Using that overlay cache, your reconciler is guaranteed to see the most up-to-date version of the resource on the next reconciliation:


@Override
public UpdateControl<StatusPatchCacheCustomResource> reconcile(
        StatusPatchCacheCustomResource resource,
        Context<StatusPatchCacheCustomResource> context) {

    // omitted code

    var freshCopy = createFreshCopy(resource); // need fresh copy just because we use the SSA version of update
    freshCopy
            .getStatus()
            .setValue(statusWithAllocatedValue());

    // using the utility instead of update control to patch the resource status
    var updated =
            PrimaryUpdateAndCacheUtils.ssaPatchStatusAndCacheResource(resource, freshCopy, context);
    return UpdateControl.noUpdate();
}

How does PrimaryUpdateAndCacheUtils work? There are multiple ways to solve this problem, but ultimately, we only provide the solution described below. If you want to dig deep in alternatives, see this PR.

The trick is to intercept the resource that the reconciler updated and cache that version in an additional cache on top of the informer’s cache. Subsequently, if the reconciler needs to read the resource, the SDK will first check if it is in the overlay cache and read it from there if present, otherwise read it from the informer’s cache. If the informer receives an event with a fresh resource, we always remove the resource from the overlay cache, since that is a more recent resource. But this works only if the reconciler updates the resource using optimistic locking. If the update fails on conflict, because the resource has already been updated on the cluster before we got the chance to get our update in, we simply wait and poll the informer cache until the new resource version from the server appears in the informer’s cache, and then try to apply our updates to the resource again using the updated version from the server, again with optimistic locking.

So why is optimistic locking required? We hinted at it above, but the gist of it, is that if another party updates the resource before we get a chance to, we wouldn’t be able to properly handle the resulting situation correctly in all cases. The informer would receive that new event before our own update would get a chance to propagate. Without optimistic locking, there wouldn’t be a fail-proof way to determine which update should prevail (i.e. which occurred first), in particular in the event of the informer losing the connection to the cluster or other edge cases (the joys of distributed computing!).

Optimistic locking simplifies the situation and provides us with stronger guarantees: if the update succeeds, then we can be sure we have the proper resource version in our caches. The next event will contain our update in all cases. Because we know that, we can also be sure that we can evict the cached resource in the overlay cache whenever we receive a new event. The overlay cache is only used if the SDK detects that the original resource (i.e. the one before we applied our status update in the example above) is still in the informer’s cache.

The following diagram sums up the process:

flowchart TD
    A["Update Resource with Lock"] --> B{"Is Successful"}
    B -- Fails on conflict --> D["Poll the Informer cache until resource updated"]
    D --> A
    B -- Yes --> n2{"Original resource still in informer cache?"}
    n2 -- Yes --> C["Cache the resource in overlay cache"]
    n2 -- No --> n3["Informer cache already contains up-to-date version, do not use overlay cache"]

From legacy approach to server-side apply

From version 5 of Java Operator SDK server side apply is a first-class feature and is used by default to update resources. As we will see, unfortunately (or fortunately), using it requires changes for your reconciler implementation.

For this reason, we prepared a feature flag, which you can flip if you are not prepared to migrate yet: ConfigurationService.useSSAToPatchPrimaryResource

Setting this flag to false will make the operations done by UpdateControl using the former approach (not SSA). Similarly, the finalizer handling won’t utilize SSA handling. The plan is to keep this flag and allow the use of the former approach (non-SSA) also in future releases.

For dependent resources, a separate flag exists (this was true also before v5) to use SSA or not: ConfigurationService.ssaBasedCreateUpdateMatchForDependentResources

Resource handling without and with SSA

Until version 5, changing primary resources through UpdateControl did not use server-side apply. So usually, the implementation of the reconciler looked something like this:


 @Override
  public UpdateControl<WebPage> reconcile(WebPage webPage, Context<WebPage> context) {
        
    reconcileLogicForManagedResources(webPage);
    webPage.setStatus(updatedStatusForWebPage(webPage));
    
    return UpdateControl.patchStatus(webPage);
  }

In other words, after the reconciliation of managed resources, the reconciler updates the status of the primary resource passed as an argument to the reconciler. Such changes on the primary are fine since we don’t work directly with the cached object, the argument is already cloned.

So, how does this change with SSA? For SSA, the updates should contain (only) the “fully specified intent”. In other words, we should only fill in the values we care about. In practice, it means creating a fresh copy of the resource and setting only what is necessary:


@Override
public UpdateControl<WebPage> reconcile(WebPage webPage, Context<WebPage> context) {

    reconcileLogicForManagedResources(webPage);

    WebPage statusPatch = new WebPage();
    statusPatch.setMetadata(new ObjectMetaBuilder()
            .withName(webPage.getMetadata().getName())
            .withNamespace(webPage.getMetadata().getNamespace())
            .build());
    statusPatch.setStatus(updatedStatusForWebPage(webPage));

    return UpdateControl.patchStatus(statusPatch);
}

Note that we just filled out the status here since we patched the status (not the resource spec). Since the status is a sub-resource in Kubernetes, it will only update the status part.

Every controller you register will have its default field manager. You can override the field manager name using ControllerConfiguration.fieldManager. That will set the field manager for the primary resource and dependent resources as well.

Migrating to SSA

Using the legacy or the new SSA way of resource management works well. However, migrating existing resources to SSA might be a challenge. We strongly recommend testing the migration, thus implementing an integration test where a custom resource is created using the legacy approach and is managed by the new approach.

We prepared an integration test to demonstrate how such migration, even in a simple case, can go wrong, and how to fix it.

To fix some cases, you might need to strip managed fields from the custom resource.

See StatusPatchSSAMigrationIT for details.

Feel free to report common issues, so we can prepare some utilities to handle them.

Optimistic concurrency control

When you create a resource for SSA as mentioned above, the framework will apply changes even if the underlying resource or status subresource is changed while the reconciliation was running. First, it always forces the conflicts in the background as advised in Kubernetes docs, in addition to that since the resource version is not set it won’t do optimistic locking. If you still want to have optimistic locking for the patch, use the resource version of the original resource:

@Override
public UpdateControl<WebPage> reconcile(WebPage webPage, Context<WebPage> context) {

    reconcileLogicForManagedResources(webPage);

    WebPage statusPatch = new WebPage();
    statusPatch.setMetadata(new ObjectMetaBuilder()
            .withName(webPage.getMetadata().getName())
            .withNamespace(webPage.getMetadata().getNamespace())
            .withResourceVersion(webPage.getMetadata().getResourceVersion())
            .build());
    statusPatch.setStatus(updatedStatusForWebPage(webPage));

    return UpdateControl.patchStatus(statusPatch);
}

Using k8s' ETCD as your application DB

FAQ: Is Kubernetes’ ETCD the Right Database for My Application?

Answer

While the idea of moving your application data to Custom Resources (CRs) aligns with the “Cloud Native” philosophy, it often introduces more challenges than benefits. Let’s break it down:


Top Reasons Why Storing Data in ETCD Through CRs Looks Appealing

  1. Storing application data as CRs enables treating your application’s data like infrastructure:
    • GitOps compatibility: Declarative content can be stored in Git repositories, ensuring reproducibility.
    • Infrastructure alignment: Application data can follow the same workflow as other infrastructure components.

Challenges of Using Kubernetes’ ETCD as Your Application’s Database

Technical Limitations:

  • Data Size Limitations 🔴:

    • Each CR is capped at 1.5 MB by default. Raising this limit is possible but impacts cluster performance.
    • Kubernetes ETCD has a storage cap of 2 GB by default. Adjusting this limit affects the cluster globally, with potential performance degradation.
  • API Server Load Considerations 🟡:

    • The Kubernetes API server is designed to handle infrastructure-level requests.
    • Storing application data in CRs might add significant load to the API server, requiring it to be scaled appropriately to handle both infrastructure and application demands.
    • This added load can impact cluster performance and increase operational complexity.
  • Guarantees 🟡:

    • Efficient queries are hard to implement and there is no support for them.
    • ACID properties are challenging to leverage and everything holds mostly in read-only mode.

Operational Impact:

  • Lost Flexibility 🟡:

    • Modifying application data requires complex YAML editing and full redeployment.
    • This contrasts with traditional databases that often feature user-friendly web UIs or APIs for real-time updates.
  • Infrastructure Complexity 🟠:

    • Backup, restore, and lifecycle management for application data are typically separate from deployment workflows.
    • Storing both in ETCD mixes these concerns, complicating operations and standardization.

Security:

  • Governance and Security 🔴:
    • Sensitive data stored in plain YAML may lack adequate encryption or access controls.
    • Applying governance policies over text-based files can become a significant challenge.

When Might Using CRs Make Sense?

For small, safe subsets of data—such as application configurations—using CRs might be appropriate. However, this approach requires a detailed evaluation of the trade-offs.


Conclusion

While it’s tempting to unify application data with infrastructure control via CRs, this introduces risks that can outweigh the benefits. For most applications, separating concerns by using a dedicated database is the more robust, scalable, and manageable solution.


A Practical Example

A typical “user” described in JSON:

{
  "username": "myname",
  "enabled": true,
  "email": "myname@test.com",
  "firstName": "MyFirstName",
  "lastName": "MyLastName",
  "credentials": [
    {
      "type": "password",
      "value": "test"
    },
    {
      "type": "token",
      "value": "oidc"
    }
  ],
  "realmRoles": [
    "user",
    "viewer",
    "admin"
  ],
  "clientRoles": {
    "account": [
      "view-profile",
      "change-group",
      "manage-account"
    ]
  }
}

This example represents about 0.5 KB of data, meaning (with standard settings) a maximum of ~2000 users can be defined in the same CR. Additionally:

  • It contains sensitive information, which should be securely stored.
  • Regulatory rules (like GDPR) apply.

References

Releases

Version 5.3 Released!

We’re pleased to announce the release of Java Operator SDK v5.3.0! This minor version brings two headline features — read-cache-after-write consistency and a new metrics implementation — along with a configuration adapter system, MDC improvements, and a number of smaller improvements and cleanups.

Key Features

Read-cache-after-write Consistency and Event Filtering

This is the headline feature of 5.3. Informer caches are inherently eventually consistent: after your reconciler updates a resource, there is a window of time before the change is visible in the cache. This can cause subtle bugs, particularly when storing allocated values in the status sub-resource and reading them back in the next reconciliation.

From 5.3.0, the framework provides two guarantees when you use ResourceOperations (accessible from Context):

  1. Read-after-write: Reading from the cache after your update — even within the same reconciliation — returns at least the version of the resource from your update response.
  2. Event filtering: Events produced by your own writes no longer trigger a redundant reconciliation.

UpdateControl and ErrorStatusUpdateControl use this automatically. Secondary resources benefit via context.resourceOperations():

public UpdateControl<WebPage> reconcile(WebPage webPage, Context<WebPage> context) {

    ConfigMap managedConfigMap = prepareConfigMap(webPage);
    // update is cached and will suppress the resulting event
    context.resourceOperations().serverSideApply(managedConfigMap);

    // fresh resource instantly available from the cache
    var upToDateResource = context.getSecondaryResource(ConfigMap.class);

    makeStatusChanges(webPage);
    // UpdateControl also uses this by default
    return UpdateControl.patchStatus(webPage);
}

If your reconciler relied on being re-triggered by its own writes, a new reschedule() method on UpdateControl lets you explicitly request an immediate re-queue.

Note: InformerEventSource.list(..) bypasses the additional caches and will not reflect in-flight updates. Use context.getSecondaryResources(..) or InformerEventSource.get(ResourceID) instead.

See the related blog post and reconciler docs for details.

MicrometerMetricsV2

A new micrometer-based Metrics implementation designed with low cardinality in mind. All meters are scoped to the controller, not to individual resources, avoiding unbounded cardinality growth as resources come and go.

MeterRegistry registry; // initialize your registry
Metrics metrics = MicrometerMetricsV2.newBuilder(registry).build();
Operator operator = new Operator(client, o -> o.withMetrics(metrics));

Optionally attach a namespace tag to per-reconciliation counters (disabled by default):

Metrics metrics = MicrometerMetricsV2.newBuilder(registry)
        .withNamespaceAsTag()
        .build();

The full list of meters:

MeterTypeDescription
reconciliations.activegaugeReconciler executions currently running
reconciliations.queuegaugeResources queued for reconciliation
custom_resourcesgaugeResources tracked by the controller
reconciliations.execution.durationtimerExecution duration with explicit histogram buckets
reconciliations.started.totalcounterReconciliations started
reconciliations.success.totalcounterSuccessful reconciliations
reconciliations.failure.totalcounterFailed reconciliations
reconciliations.retries.totalcounterRetry attempts
events.receivedcounterKubernetes events received

The execution timer uses explicit bucket boundaries (10ms–30s) to ensure compatibility with histogram_quantile() in both PrometheusMeterRegistry and OtlpMeterRegistry.

A ready-to-use Grafana dashboard is included at observability/josdk-operator-metrics-dashboard.json.

The metrics-processing sample operator provides a complete end-to-end setup with Prometheus, Grafana, and an OpenTelemetry Collector, installable via observability/install-observability.sh. This is a good starting point for verifying metrics in a real cluster.

Deprecated: The original MicrometerMetrics (V1) is deprecated as of 5.3.0. It attaches resource-specific metadata as tags to every meter, causing unbounded cardinality. Migrate to MicrometerMetricsV2.

See the observability docs for the full reference.

Configuration Adapters

A new ConfigLoader bridges any key-value configuration source to the JOSDK operator and controller configuration APIs. This lets you drive operator behaviour from environment variables, system properties, YAML files, or any config library without writing glue code by hand.

The default instance stacks environment variables over system properties out of the box:

Operator operator = new Operator(ConfigLoader.getDefault().applyConfigs());

Built-in providers: EnvVarConfigProvider, PropertiesConfigProvider, YamlConfigProvider, and AggregatePriorityListConfigProvider for explicit priority ordering.

ConfigProvider is a single-method interface, so adapting any config library (MicroProfile Config, SmallRye Config, etc.) takes only a few lines:

public class SmallRyeConfigProvider implements ConfigProvider {
    private final SmallRyeConfig config;

    @Override
    public <T> Optional<T> getValue(String key, Class<T> type) {
        return config.getOptionalValue(key, type);
    }
}

Pass the results when constructing the operator and registering reconcilers:

var configLoader = new ConfigLoader(new SmallRyeConfigProvider(smallRyeConfig));

Operator operator = new Operator(configLoader.applyConfigs());
operator.register(new MyReconciler(), configLoader.applyControllerConfigs(MyReconciler.NAME));

See the configuration docs for the full list of supported keys.

Note: This new configuration mechanism is useful when using the SDK by itself. Framework (Spring Boot, Quarkus, …) integrations usually provide their own configuration mechanisms that should be used instead of this new mechanism.

MDC Improvements

MDC in workflow execution: MDC context is now propagated through workflow (dependent resource graph) execution threads, not just the top-level reconciler thread. Logging from dependent resources now carries the same contextual fields as the primary reconciliation.

NO_NAMESPACE for cluster-scoped resources: Instead of omitting the resource.namespace MDC key for cluster-scoped resources, the framework now emits MDCUtils.NO_NAMESPACE. This makes log queries for cluster-scoped resources reliable.

De-duplicated Secondary Resources from Context

When multiple event sources manage the same resource type, context.getSecondaryResources(..) now returns a de-duplicated stream. When the same resource appears from more than one source, only the copy with the highest resource version is returned.

Record Desired State in Context

Dependent resources now record their desired state in the Context during reconciliation. This allows reconcilers and downstream dependents in a workflow to inspect what a dependent resource computed as its desired state and guarantees that the desired state is computed only once per reconciliation.

Informer Health Checks

Informer health checks no longer rely on isWatching. For readiness and startup probes, you should primarily use hasSynced. Once an informer has started, isWatching is not suitable for liveness checks.

Additional Improvements

  • Annotation removal using locking: Finalizer and annotation management no longer uses createOrReplace; a locking-based createOrUpdate avoids conflicts under concurrent updates.
  • KubernetesDependentResource uses ResourceOperations directly, removing an indirection layer and automatically benefiting from the read-after-write guarantees.
  • Skip namespace deletion in JUnit extension: The JUnit extension now supports a flag to skip namespace deletion after a test run, useful for debugging CI failures.
  • ManagedInformerEventSource.getCachedValue() deprecated: Use context.getSecondaryResource(..) instead.
  • Improved event filtering for multiple parallel updates: The filtering algorithm now handles cases where multiple parallel updates are in flight for the same resource.
  • exitOnStopLeading is being prepared for removal from the public API.

Migration Notes

JUnit module rename

<!-- before -->
<artifactId>operator-framework-junit-5</artifactId>
<!-- after -->
<artifactId>operator-framework-junit</artifactId>

Metrics interface renames

v5.2v5.3
reconcileCustomResourcereconciliationSubmitted
reconciliationExecutionStartedreconciliationStarted
reconciliationExecutionFinishedreconciliationSucceeded
failedReconciliationreconciliationFailed
finishedReconciliationreconciliationFinished
cleanupDoneForcleanupDone
receivedEventeventReceived

reconciliationFinished(..) is extended with RetryInfo. monitorSizeOf(..) is removed.

ResourceAction relocated

ResourceAction in io.javaoperatorsdk.operator.processing.event.source.controller has been removed. Use io.javaoperatorsdk.operator.processing.event.source.ResourceAction instead.

See the full migration guide for details.

Getting Started

<dependency>
    <groupId>io.javaoperatorsdk</groupId>
    <artifactId>operator-framework</artifactId>
    <version>5.3.0</version>
</dependency>

All Changes

See the comparison view for the full list of changes.

Feedback

Please report issues or suggest improvements on our GitHub repository.

Happy operator building! 🚀

Version 5.2 Released!

We’re pleased to announce the release of Java Operator SDK v5.2! This minor version brings several powerful new features and improvements that enhance the framework’s capabilities for building Kubernetes operators. This release focuses on flexibility, external resource management, and advanced reconciliation patterns.

Key Features

ResourceIDMapper for External Resources

One of the most significant improvements in 5.2 is the introduction of a unified approach to working with custom ID types across the framework through ResourceIDMapper and ResourceIDProvider.

Previously, when working with external resources (non-Kubernetes resources), the framework assumed resource IDs could always be represented as strings. This limitation made it challenging to work with external systems that use complex ID types.

Now, you can define custom ID types for your external resources by implementing the ResourceIDProvider interface:

public class MyExternalResource implements ResourceIDProvider<MyCustomID> {
    @Override
    public MyCustomID getResourceID() {
        return new MyCustomID(this.id);
    }
}

This capability is integrated across multiple components:

  • ExternalResourceCachingEventSource
  • ExternalBulkDependentResource
  • AbstractExternalDependentResource and its subclasses

If you cannot modify the external resource class (e.g., it’s generated or final), you can provide a custom ResourceIDMapper to the components above.

See the migration guide for detailed migration instructions.

Trigger Reconciliation on All Events

Version 5.2 introduces a new execution mode that provides finer control over when reconciliation occurs. By setting triggerReconcilerOnAllEvents to true, your reconcile method will be called for every event, including Delete events.

This is particularly useful when:

  • Only some primary resources need finalizers (e.g., some resources create external resources, others don’t)
  • You maintain custom in-memory caches that need cleanup without using finalizers
  • You need fine-grained control over resource lifecycle

When enabled:

  • The reconcile method receives the last known state even if the resource is deleted
  • Check deletion status using Context.isPrimaryResourceDeleted()
  • Retry, rate limiting, and rescheduling work normally
  • You manage finalizers explicitly using PrimaryUpdateAndCacheUtils

Example:

@ControllerConfiguration(triggerReconcilerOnAllEvents = true)
public class MyReconciler implements Reconciler<MyResource> {

    @Override
    public UpdateControl<MyResource> reconcile(MyResource resource, Context<MyResource> context) {
        if (context.isPrimaryResourceDeleted()) {
            // Handle deletion
            cleanupCache(resource);
            return UpdateControl.noUpdate();
        }
        // Normal reconciliation
        return UpdateControl.patchStatus(resource);
    }
}

See the detailed documentation and integration test.

Expectation Pattern Support (Experimental)

The framework now provides built-in support for the expectations pattern, a common Kubernetes controller design pattern that ensures secondary resources are in an expected state before proceeding.

The expectation pattern helps avoid race conditions and ensures your controller makes decisions based on the most current state of your resources. The implementation is available in the io.javaoperatorsdk.operator.processing.expectation package.

Example usage:

public class MyReconciler implements Reconciler<MyResource> {

    private final ExpectationManager<MyResource> expectationManager = new ExpectationManager<>();

    @Override
    public UpdateControl<MyResource> reconcile(MyResource primary, Context<MyResource> context) {
        // Exit early if expectation is not yet fulfilled or timed out
        if (expectationManager.ongoingExpectationPresent(primary, context)) {
            return UpdateControl.noUpdate();
        }

        var deployment = context.getSecondaryResource(Deployment.class);
        if (deployment.isEmpty()) {
            createDeployment(primary, context);
            expectationManager.setExpectation(
                primary, Duration.ofSeconds(30), deploymentReadyExpectation());
            return UpdateControl.noUpdate();
        }

        // Check if expectation is fulfilled
        var result = expectationManager.checkExpectation("deploymentReady", primary, context);
        if (result.isFulfilled()) {
            return updateStatusReady(primary);
        } else if (result.isTimedOut()) {
            return updateStatusTimeout(primary);
        }

        return UpdateControl.noUpdate();
    }
}

This feature is marked as @Experimental as we gather feedback and may refine the API based on user experience. Future versions may integrate this pattern directly into Dependent Resources and Workflows.

See the documentation and integration test.

Field Selectors for InformerEventSource

You can now use field selectors when configuring InformerEventSource, allowing you to filter resources at the server side before they’re cached locally. This reduces memory usage and network traffic by only watching resources that match your criteria.

Field selectors work similarly to label selectors but filter on resource fields like metadata.name or status.phase:

@Informer(
    fieldSelector = @FieldSelector(
        fields = @Field(key = "status.phase", value = "Running")
    )
)

This is particularly useful when:

  • You only care about resources in specific states
  • You want to reduce the memory footprint of your operator
  • You’re watching cluster-scoped resources and only need a subset

See the integration test for examples.

AggregatedMetrics for Multiple Metrics Providers

The new AggregatedMetrics class implements the composite pattern, allowing you to combine multiple metrics implementations. This is useful when you need to send metrics to different monitoring systems simultaneously.

// Create individual metrics instances
Metrics micrometerMetrics = MicrometerMetrics.withoutPerResourceMetrics(registry);
Metrics customMetrics = new MyCustomMetrics();
Metrics loggingMetrics = new LoggingMetrics();

// Combine them into a single aggregated instance
Metrics aggregatedMetrics = new AggregatedMetrics(List.of(
    micrometerMetrics,
    customMetrics,
    loggingMetrics
));

// Use with your operator
Operator operator = new Operator(client, o -> o.withMetrics(aggregatedMetrics));

This enables hybrid monitoring strategies, such as sending metrics to both Prometheus and a custom logging system.

See the observability documentation for more details.

Additional Improvements

GenericRetry Enhancements

  • GenericRetry no longer provides a mutable singleton instance, improving thread safety
  • Configurable duration for initial retry interval

Test Infrastructure Improvements

  • Ability to override test infrastructure Kubernetes client separately, providing more flexibility in testing scenarios

Fabric8 Client Update

Updated to Fabric8 Kubernetes Client 7.4.0, bringing the latest features and bug fixes from the client library.

Experimental Annotations

Starting with this release, new features marked as experimental will be annotated with @Experimental. This annotation indicates that while we intend to support the feature, the API may evolve based on user feedback.

Migration Notes

For most users, upgrading to 5.2 should be straightforward. The main breaking change involves the introduction of ResourceIDMapper for external resources. If you’re using external dependent resources or bulk dependents with custom ID types, please refer to the migration guide.

Getting Started

Update your dependency to version 5.2.0:

<dependency>
    <groupId>io.javaoperatorsdk</groupId>
    <artifactId>operator-framework</artifactId>
    <version>5.2.0</version>
</dependency>

All Changes

You can see all changes in the comparison view.

Feedback

As always, we welcome your feedback! Please report issues or suggest improvements on our GitHub repository.

Happy operator building! 🚀

Version 5 Released!

We are excited to announce that Java Operator SDK v5 has been released. This significant effort contains various features and enhancements accumulated since the last major release and required changes in our APIs. Within this post, we will go through all the main changes and help you upgrade to this new version, and provide a rationale behind the changes if necessary.

We will omit descriptions of changes that should only require simple code updates; please do contact us if you encounter issues anyway.

You can see an introduction and some important changes and rationale behind them from KubeCon.

Various Changes

  • From this release, the minimal Java version is 17.
  • Various deprecated APIs are removed. Migration should be easy.

All Changes

You can see all changes here.

Changes in low-level APIs

Server Side Apply (SSA)

Server Side Apply is now a first-class citizen in the framework and the default approach for patching the status resource. This means that patching a resource or its status through UpdateControl and adding the finalizer in the background will both use SSA.

Migration from a non-SSA based patching to an SSA based one can be problematic. Make sure you test the transition when you migrate from older version of the frameworks. To continue to use a non-SSA based on, set ConfigurationService.useSSAToPatchPrimaryResource to false.

See some identified problematic migration cases and how to handle them in StatusPatchSSAMigrationIT.

For more detailed description, see our blog post on SSA.

Multi-cluster support in InformerEventSource

InformerEventSource now supports watching remote clusters. You can simply pass a KubernetesClient instance initialized to connect to a different cluster from the one where the controller runs when configuring your event source. See InformerEventSourceConfiguration.withKubernetesClient

Such an informer behaves exactly as a regular one. Owner references won’t work in this situation, though, so you have to specify a SecondaryToPrimaryMapper (probably based on labels or annotations).

See related integration test here

SecondaryToPrimaryMapper now checks resource types

The owner reference based mappers are now checking the type (kind and apiVersion) of the resource when resolving the mapping. This is important since a resource may have owner references to a different resource type with the same name.

See implementation details here

There are multiple smaller changes to InformerEventSource and related classes:

  1. InformerConfiguration is renamed to InformerEventSourceConfiguration
  2. InformerEventSourceConfiguration doesn’t require EventSourceContext to be initialized anymore.

All EventSource are now ResourceEventSources

The EventSource abstraction is now always aware of the resources and handles accessing (the cached) resources, filtering, and additional capabilities. Before v5, such capabilities were present only in a sub-class called ResourceEventSource, but we decided to merge and remove ResourceEventSource since this has a nice impact on other parts of the system in terms of architecture.

If you still need to create an EventSource that only supports triggering of your reconciler, see TimerEventSource for an example of how this can be accomplished.

Naming event sources

EventSource are now named. This reduces the ambiguity that might have existed when trying to refer to an EventSource.

You no longer have to annotate the reconciler with @ControllerConfiguration annotation. This annotation is (one) way to override the default properties of a controller. If the annotation is not present, the default values from the annotation are used.

PR: https://github.com/operator-framework/java-operator-sdk/pull/2203

In addition to that, the informer-related configurations are now extracted into a separate @Informer annotation within @ControllerConfiguration. Hopefully this explicits which part of the configuration affects the informer associated with primary resource. Similarly, the same @Informer annotation is used when configuring the informer associated with a managed KubernetesDependentResource via the KubernetesDependent annotation.

EventSourceInitializer and ErrorStatusHandler are removed

Both the EventSourceInitializer and ErrorStatusHandler interfaces are removed, and their methods moved directly under Reconciler.

If possible, we try to avoid such marker interfaces since it is hard to deduce related usage just by looking at the source code. You can now simply override those methods when implementing the Reconciler interface.

Cloning accessing secondary resources

When accessing the secondary resources using Context.getSecondaryResource(s)(...), the resources are no longer cloned by default, since cloning could have an impact on performance. This means that you now need to ensure that these any changes are now made directly to the underlying cached resource. This should be avoided since the same resource instance may be present for other reconciliation cycles and would no longer represent the state on the server.

If you want to still clone resources by default, set ConfigurationService.cloneSecondaryResourcesWhenGettingFromCache to true.

Removed automated observed generation handling

The automatic observed generation handling feature was removed since it is easy to implement inside the reconciler, but it made the implementation much more complex, especially if the framework would have to support it both for served side apply and client side apply.

You can check a sample implementation how to do it manually in this integration test.

The primary reason ResourceDiscriminator was introduced was to cover the case when there are more than one dependent resources of a given type associated with a given primary resource. In this situation, JOSDK needed a generic mechanism to identify which resources on the cluster should be associated with which dependent resource implementation. We improved this association mechanism, thus rendering ResourceDiscriminator obsolete.

As a replacement, the dependent resource will select the target resource based on the desired state. See the generic implementation in AbstractDependentResource. Calculating the desired state can be costly and might depend on other resources. For KubernetesDependentResource it is usually enough to provide the name and namespace (if namespace-scoped) of the target resource, which is what the KubernetesDependentResource implementation does by default. If you can determine which secondary to target without computing the desired state via its associated ResourceID, then we encourage you to override the ResourceID targetSecondaryResourceID() method as shown in this example

Read-only bulk dependent resources

Read-only bulk dependent resources are now supported; this was a request from multiple users, but it required changes to the underlying APIs. Please check the documentation for further details.

See also the related integration test.

Multiple Dependents with Activation Condition

Until now, activation conditions had a limitation that only one condition was allowed for a specific resource type. For example, two ConfigMap dependent resources were not allowed, both with activation conditions. The underlying issue was with the informer registration process. When an activation condition is evaluated as “met” in the background, the informer is registered dynamically for the target resource type. However, we need to avoid registering multiple informers of the same kind. To prevent this the dependent resource must specify the name of the informer.

See the complete example here.

getSecondaryResource is Activation condition aware

When an activation condition for a resource type is not met, no associated informer might be registered for that resource type. However, in this situation, calling Context.getSecondaryResource and its alternatives would previously throw an exception. This was, however, rather confusing and a better user experience would be to return an empty value instead of throwing an error. We changed this behavior in v5 to make it more user-friendly and attempting to retrieve a secondary resource that is gated by an activation condition will now return an empty value as if the associated informer existed.

See related issue for details.

@Workflow annotation

The managed workflow definition is now a separate @Workflow annotation; it is no longer part of @ControllerConfiguration.

See sample usage here

Explicit workflow invocation

Before v5, the managed dependents part of a workflow would always be reconciled before the primary Reconciler reconcile or cleanup methods were called. It is now possible to explictly ask for a workflow reconciliation in your primary Reconciler, thus allowing you to control when the workflow is reconciled. This mean you can perform all kind of operations - typically validations - before executing the workflow, as shown in the sample below:


@Workflow(explicitInvocation = true,
        dependents = @Dependent(type = ConfigMapDependent.class))
@ControllerConfiguration
public class WorkflowExplicitCleanupReconciler
        implements Reconciler<WorkflowExplicitCleanupCustomResource>,
        Cleaner<WorkflowExplicitCleanupCustomResource> {

    @Override
    public UpdateControl<WorkflowExplicitCleanupCustomResource> reconcile(
            WorkflowExplicitCleanupCustomResource resource,
            Context<WorkflowExplicitCleanupCustomResource> context) {

        context.managedWorkflowAndDependentResourceContext().reconcileManagedWorkflow();

        return UpdateControl.noUpdate();
    }

    @Override
    public DeleteControl cleanup(WorkflowExplicitCleanupCustomResource resource,
                                 Context<WorkflowExplicitCleanupCustomResource> context) {

        context.managedWorkflowAndDependentResourceContext().cleanupManageWorkflow();
        // this can be checked
        // context.managedWorkflowAndDependentResourceContext().getWorkflowCleanupResult()
        return DeleteControl.defaultDelete();
    }
}

To turn on this mode of execution, set explicitInvocation flag to true in the managed workflow definition.

See the following integration tests for invocation and cleanup.

Explicit exception handling

If an exception happens during a workflow reconciliation, the framework automatically throws it further. You can now set handleExceptionsInReconciler to true for a workflow and check the thrown exceptions explicitly in the execution results.


@Workflow(handleExceptionsInReconciler = true,
        dependents = @Dependent(type = ConfigMapDependent.class))
@ControllerConfiguration
public class HandleWorkflowExceptionsInReconcilerReconciler
        implements Reconciler<HandleWorkflowExceptionsInReconcilerCustomResource>,
        Cleaner<HandleWorkflowExceptionsInReconcilerCustomResource> {

    private volatile boolean errorsFoundInReconcilerResult = false;
    private volatile boolean errorsFoundInCleanupResult = false;

    @Override
    public UpdateControl<HandleWorkflowExceptionsInReconcilerCustomResource> reconcile(
            HandleWorkflowExceptionsInReconcilerCustomResource resource,
            Context<HandleWorkflowExceptionsInReconcilerCustomResource> context) {

        errorsFoundInReconcilerResult = context.managedWorkflowAndDependentResourceContext()
                .getWorkflowReconcileResult().erroredDependentsExist();

        // check errors here:
        Map<DependentResource, Exception> errors = context.getErroredDependents();

        return UpdateControl.noUpdate();
    }
}

See integration test here.

CRDPresentActivationCondition

Activation conditions are typically used to check if the cluster has specific capabilities (e.g., is cert-manager available). Such a check can be done by verifying if a particular custom resource definition (CRD) is present on the cluster. You can now use the generic CRDPresentActivationCondition for this purpose, it will check if the CRD of a target resource type of a dependent resource exists on the cluster.

See usage in integration test here.

Fabric8 client updated to 7.0

The Fabric8 client has been updated to version 7.0.0. This is a new major version which implies that some API might have changed. Please take a look at the Fabric8 client 7.0.0 migration guide.

CRD generator changes

Starting with v5.0 (in accordance with changes made to the Fabric8 client in version 7.0.0), the CRD generator will use the maven plugin instead of the annotation processor as was previously the case. In many instances, you can simply configure the plugin by adding the following stanza to your project’s POM build configuration:

<plugin>
    <groupId>io.fabric8</groupId>
    <artifactId>crd-generator-maven-plugin</artifactId>
    <version>${fabric8-client.version}</version>
    <executions>
      <execution>
        <goals>
          <goal>generate</goal>
        </goals>
      </execution>
    </executions>
</plugin>

NOTE: If you use the SDK’s JUnit extension for your tests, you might also need to configure the CRD generator plugin to access your test CustomResource implementations as follows:


<plugin>
    <groupId>io.fabric8</groupId>
    <artifactId>crd-generator-maven-plugin</artifactId>
    <version>${fabric8-client.version}</version>
    <executions>
        <execution>
            <goals>
                <goal>generate</goal>
            </goals>
            <phase>process-test-classes</phase>
            <configuration>
                <classesToScan>${project.build.testOutputDirectory}</classesToScan>
                <classpath>WITH_ALL_DEPENDENCIES_AND_TESTS</classpath>
            </configuration>
        </execution>
    </executions>
</plugin>

Please refer to the CRD generator documentation for more details.

Experimental

Check if the following reconciliation is imminent

You can now check if the subsequent reconciliation will happen right after the current one because the SDK has already received an event that will trigger a new reconciliation This information is available from the Context.

Note that this could be useful, for example, in situations when a heavy task would be repeated in the follow-up reconciliation. In the current reconciliation, you can check this flag and return to avoid unneeded processing. Note that this is a semi-experimental feature, so please let us know if you found this helpful.


@Override
public UpdateControl<NextReconciliationImminentCustomResource> reconcile(MyCustomResource resource, Context<MyCustomResource> context) {

    if (context.isNextReconciliationImminent()) {
        // your logic, maybe return?
    }
}

See related integration test.

Version 5 Released! (beta1)

See release notes here.