Good Practices
What is “Reconciliation” in Operators?
When you create a project using Kubebuilder, see the scaffolded code generated under cmd/main.go. This code initializes a Manager, and the project relies on the controller-runtime framework. The Manager manages Controllers, which offer a reconcile function that synchronizes resources until the desired state is achieved within the cluster.
Reconciliation is an ongoing loop that executes necessary operations to maintain the desired state, adhering to Kubernetes principles, such as the control loop. For further information, check out the Operator patterns documentation from Kubernetes to better understand those concepts.
Why should reconciliations be idempotent?
When developing operators, the controller’s reconciliation loop needs to be idempotent. By following the Operator pattern we create controllers that provide a reconcile function responsible for synchronizing resources until the desired state is reached on the cluster. Developing idempotent solutions will allow the reconciler to correctly respond to generic or unexpected events, easily deal with application startup or upgrade. More explanation on this is available here.
Writing reconciliation logic according to specific events, breaks the recommendation of operator pattern and goes against the design principles of controller-runtime. This may lead to unforeseen consequences, such as resources becoming stuck and requiring manual intervention.
Understanding Kubernetes APIs and following API conventions
Building your operator commonly involves extending the Kubernetes API itself. It is helpful to understand precisely how Custom Resource Definitions (CRDs) interact with the Kubernetes API. Also, the Kubebuilder documentation on Groups and Versions and Kinds may be helpful to understand these concepts better as they relate to operators.
Additionally, we recommend checking the documentation on Operator patterns from Kubernetes to better understand the purpose of the standard solutions built with KubeBuilder.
Why you should adhere to the Kubernetes API conventions and standards
Embracing the Kubernetes API conventions and standards is crucial for maximizing the potential of your applications and deployments. By adhering to these established practices, you can benefit in several ways.
Firstly, adherence ensures seamless interoperability within the Kubernetes ecosystem. Following conventions allows your applications to work harmoniously with other components, reducing compatibility issues and promoting a consistent user experience.
Secondly, sticking to API standards enhances the maintainability and troubleshooting of your applications. Adopting familiar patterns and structures makes debugging and supporting your deployments easier, leading to more efficient operations and quicker issue resolution.
Furthermore, leveraging the Kubernetes API conventions empowers you to harness the platform’s full capabilities. By working within the defined framework, you can leverage the rich set of features and resources offered by Kubernetes, enabling scalability, performance optimization, and resilience.
Lastly, embracing these standards future-proofs your native solutions. By aligning with the evolving Kubernetes ecosystem, you ensure compatibility with future updates, new features, and enhancements introduced by the vibrant Kubernetes community.
In summary, by adhering to the Kubernetes API conventions and standards, you unlock the potential for seamless integration, simplified maintenance, optimal performance, and future-readiness, all contributing to the success of your applications and deployments.
Why should one avoid a system design where a single controller is responsible for managing multiple CRDs (Custom Resource Definitions)(for example, an ‘install_all_controller.go’)?
Avoid a design solution where the same controller reconciles more than one Kind. Having many Kinds (such as CRDs), that are all managed by the same controller, usually goes against the design proposed by controller-runtime. Furthermore, this might hurt concepts such as encapsulation, the Single Responsibility Principle, and Cohesion. Damaging these concepts may cause unexpected side effects and increase the difficulty of extending, reusing, or maintaining the operator. Having one controller manage many Custom Resources (CRs) in an Operator can lead to several issues:
- Complexity: A single controller managing multiple CRs can increase the complexity of the code, making it harder to understand, maintain, and debug.
- Scalability: Each controller typically manages a single kind of CR for scalability. If a single controller handles multiple CRs, it could become a bottleneck, reducing the overall efficiency and responsiveness of your system.
- Single Responsibility Principle: Following this principle from software engineering, each controller should ideally have only one job. This approach simplifies development and debugging, and makes the system more robust.
- Error Isolation: If one controller manages multiple CRs and an error occurs, it could potentially impact all the CRs it manages. Having a single controller per CR ensures that an issue with one controller or CR does not directly affect others.
- Concurrency and Synchronization: A single controller managing multiple CRs could lead to race conditions and require complex synchronization, especially if the CRs have interdependencies.
In conclusion, while it might seem efficient to have a single controller manage multiple CRs, it often leads to higher complexity, lower scalability, and potential stability issues. It’s generally better to adhere to the single responsibility principle, where each CR is managed by its own controller.
Why You Should Adopt Status Conditions
We recommend you manage your solutions using Status Conditionals following the K8s Api conventions because:
- Standardization: Conditions provide a standardized way to represent the state of an Operator’s custom resources, making it easier for users and tools to understand and interpret the resource’s status.
- Readability: Conditions can clearly express complex states by using a combination of multiple conditions, making it easier for users to understand the current state and progress of the resource.
- Extensibility: As new features or states are added to your Operator, conditions can be easily extended to represent these new states without requiring significant changes to the existing API or structure.
- Observability: Status conditions can be monitored and tracked by cluster administrators and external monitoring tools, enabling better visibility into the state of the custom resources managed by the Operator.
- Compatibility: By adopting the common pattern of using conditions in Kubernetes APIs, Operator authors ensure their custom resources align with the broader ecosystem, which helps users to have a consistent experience when interacting with multiple Operators and resources in their clusters.
You Should Adopt K8s Conventions for Instrumentation and Observability
Proper logging is essential for observability in Kubernetes-native applications. However, it’s important to understand which logging conventions to apply based on the context of your code.
Understanding Go vs. Kubernetes Logging Conventions
When developing with Go, you may be familiar with the Go Code Review Comments guidelines, which state that error strings should not be capitalized and should not end with punctuation. These conventions are designed for error messages that are often composed into larger contexts:
// Go conventions (for general Go code, libraries, CLI tools)
return fmt.Errorf("something bad happened") // lowercase, no period
log.Printf("failed to connect: %v", err) // lowercase
However, when developing Kubernetes-native solutions (controllers, operators, webhooks) that run on the cluster, you should follow the Kubernetes Logging Conventions for better observability and consistency with the Kubernetes ecosystem.
Kubernetes Logging Conventions
For controllers, operators, and webhooks, follow these guidelines:
- Start from a capital letter.
- Do not end the message with a period.
- Use active voice. Use complete sentences when there is an acting subject (“A could not do B”) or omit the subject if the subject would be the program itself (“Could not do B”).
- Use past tense (“Could not delete B” instead of “Cannot delete B”)
- When referring to an object, state what type of object it is. (“Deleted Pod” instead of “Deleted”)
- Use structured logging with balanced key-value pairs.
Examples:
// Kubernetes conventions (for controllers, operators, webhooks)
log.Info("Starting reconciliation") // Capital letter, no period
log.Info("Creating Deployment", "name", name, "namespace", ns) // Specify object type, structured logging
log.Info("Created Deployment", "name", deploy.Name) // Past tense, specify type
log.Error(err, "Failed to create Pod", "name", name) // Past tense, specify type
log.Info("Deployment could not create Pod", "deployment", name) // Acting subject
log.Info("Could not delete Pod", "name", name) // Subject is the program itself
Why Different Conventions?
- Go conventions are optimized for error messages that get composed into larger contexts and displayed inline with other text
- Kubernetes conventions are optimized for structured logging in distributed systems where logs are:
- Aggregated from multiple components across the cluster
- Parsed by log collectors (Fluentd, Fluentbit, Loki, etc.)
- Displayed in monitoring dashboards and UIs
- Used for alerting and troubleshooting in production
Following these conventions ensures your logs integrate seamlessly with Kubernetes observability tools and provide clear, actionable information for cluster operators and SREs.