Techno Ladder: 2024-10-13

Sunday, October 13, 2024

How Does Longhorn Use Kubernetes Worker Node Storage as PV?

Longhorn installs as a set of microservices within a Kubernetes cluster and treats each worker node as a potential storage provider. It uses disk paths available on each node to create storage pools and allocates storage from these pools to dynamically provision Persistent Volumes (PVs) for applications. By default, Longhorn uses /var/lib/longhorn/ on each node, but you can specify custom paths if you have other storage paths available.

Configuring Longhorn to Use a Custom Storage Path

To configure Longhorn to use existing storage paths on the nodes (e.g., /mnt/disks), follow these steps:

1. Install Longhorn in the Kubernetes Cluster:

Install Longhorn using Helm or the Longhorn YAML manifest:

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

You can also install Longhorn from the Kubernetes marketplace or directly from the Longhorn UI.

2. Access the Longhorn UI:

Once installed, access the Longhorn UI to configure and manage your Longhorn setup.

By default, Longhorn is accessible through a Service of type ClusterIP, but you can change it to NodePort or LoadBalancer if needed.

kubectl get svc -n longhorn-system

3. Add a New Storage Path on Each Node:

Before configuring Longhorn, ensure that the desired storage paths are created and available on each node. For example, you might want to use /mnt/disks as your custom storage directory:

mkdir -p /mnt/disks

You may want to mount additional disks or directories to this path for greater storage capacity.

4. Configure Longhorn to Use the New Storage Path:

Open the Longhorn UI (<Longhorn-IP>:<Port>) and navigate to Node settings.

Select the node where you want to add a new disk path.

Click Edit Node and Disks, and then Add Disk.

Specify the Path (e.g., /mnt/disks) and Tags (optional).

Set the Storage Allow Scheduling option to true to enable Longhorn to schedule storage volumes on this disk.

Repeat this process for each node in the cluster that should contribute storage.

5. Verify Storage Path Configuration:

After adding the new storage paths, Longhorn will automatically create storage pools based on these paths. Check the Nodes section in the Longhorn UI to see the updated disk paths and available storage.

6. Create a Persistent Volume (PV) Using Longhorn:

Now that Longhorn is using your custom storage paths, you can create Persistent Volumes that utilize this storage.

Either create a new PersistentVolumeClaim (PVC) that dynamically provisions a PV using the Longhorn StorageClass or use the Longhorn UI to manually create volumes.

Example: Configuring a Node's Storage for Longhorn

Below is an example YAML configuration for adding a disk path (/mnt/disks) to a node, which can also be done through the UI:

apiVersion: longhorn.io/v1beta1

kind: Node

metadata:

namespace: longhorn-system

spec:

disks:

disk-1:

path: /mnt/disks

allowScheduling: true

storageReserved: 0

tags: []

path: Specifies the custom path on the node where Longhorn will allocate storage.

allowScheduling: Enables Longhorn to schedule volumes on this disk.

storageReserved: (Optional) Specifies the amount of storage to be reserved and not used for Longhorn volumes.

Important Considerations When Using Node Storage for Longhorn:

1. Data Redundancy and Availability:

Longhorn provides replication for data redundancy. When using node-local storage, ensure that you have sufficient replicas configured (e.g., 3 replicas for high availability) so that data remains safe even if one node goes down.

This means you need enough storage capacity across multiple nodes to accommodate these replicas.

2. Storage Path Consistency:

Ensure that the same storage path (/mnt/disks) is present on each node where you want Longhorn to store data.

If a node does not have the specified path, Longhorn will not be able to use it, leading to scheduling failures.

3. Handling Node Failures:

If the node with the custom storage path fails or becomes unavailable, the volumes stored on that node may be temporarily inaccessible.

Consider setting up anti-affinity rules and replication strategies in Longhorn to handle such scenarios gracefully.

4. Storage Permissions:

Make sure the Kubernetes worker node's storage directory has the appropriate permissions for Longhorn to read/write data.

5. Longhorn's Built-in Backup and Restore:

Utilize Longhorn’s built-in backup and restore capabilities to safeguard data if you are using node-local storage paths, as this storage may not be as reliable as network-based or cloud-backed storage solutions.

How to create a Kubernetes Operator?

Creating a Kubernetes operator involves building a controller that watches Kubernetes resources and takes action based on their state. The common approach to create an operator is using the kubebuilder framework or the Operator SDK, but a custom solution using the Kubernetes API client directly can also be done.

Below, I'll show an example of a simple operator using the client-go library, which is the official Kubernetes client for Go. This operator will watch a custom resource called Foo and log whenever a Foo resource is created, updated, or deleted.

Prerequisites

Go programming language installed.

Kubernetes cluster and kubectl configured.

client-go and apimachinery libraries installed.

To install these dependencies, run:

go get k8s.io/client-go@v0.27.1

go get k8s.io/apimachinery@v0.27.1

Step 1: Define a Custom Resource Definition (CRD)

Create a foo-crd.yaml file to define a Foo custom resource:

apiVersion: apiextensions.k8s.io/v1

kind: CustomResourceDefinition

metadata:

spec:

group: samplecontroller.k8s.io

versions:

- name: v1

served: true

storage: true

scope: Namespaced

names:

plural: foos

singular: foo

kind: Foo

shortNames:

- fo

Apply this CRD to the cluster:

kubectl apply -f foo-crd.yaml

Step 2: Create a Go File for the Operator

Create a new Go file named main.go:

package main

import (

"context"

"flag"

"fmt"

"log"

"os"

"os/signal"

"syscall"

"time"

"k8s.io/apimachinery/pkg/apis/meta/v1/unstructured"

"k8s.io/apimachinery/pkg/runtime/schema"

"k8s.io/client-go/dynamic"

"k8s.io/client-go/tools/cache"

"k8s.io/client-go/tools/clientcmd"

)

func main() {

// Load the Kubernetes configuration from ~/.kube/config

kubeconfig := flag.String("kubeconfig", clientcmd.RecommendedHomeFile, "Path to the kubeconfig file")

config, err := clientcmd.BuildConfigFromFlags("", *kubeconfig)

if err != nil {

log.Fatalf("Error building kubeconfig: %v", err)

}

// Create a dynamic client

dynClient, err := dynamic.NewForConfig(config)

if err != nil {

log.Fatalf("Error creating dynamic client: %v", err)

}

// Define the GVR (GroupVersionResource) for the Foo custom resource

gvr := schema.GroupVersionResource{

Group: "samplecontroller.k8s.io",

Version: "v1",

Resource: "foos",

}

// Create a list watcher for Foo resources

fooListWatcher := cache.NewListWatchFromClient(

dynClient.Resource(gvr), "foos", "", cache.ResourceEventHandlerFuncs{

AddFunc: func(obj interface{}) {

foo := obj.(*unstructured.Unstructured)

fmt.Printf("New Foo Added: %s\n", foo.GetName())

UpdateFunc: func(oldObj, newObj interface{}) {

foo := newObj.(*unstructured.Unstructured)

fmt.Printf("Foo Updated: %s\n", foo.GetName())

DeleteFunc: func(obj interface{}) {

foo := obj.(*unstructured.Unstructured)

fmt.Printf("Foo Deleted: %s\n", foo.GetName())

)

// Create a controller to handle Foo events

stopCh := make(chan struct{})

defer close(stopCh)

_, controller := cache.NewInformer(fooListWatcher, &unstructured.Unstructured{}, 0, cache.ResourceEventHandlerFuncs{

AddFunc: func(obj interface{}) {

fmt.Println("Foo Created:", obj)

UpdateFunc: func(oldObj, newObj interface{}) {

fmt.Println("Foo Updated:", newObj)

DeleteFunc: func(obj interface{}) {

fmt.Println("Foo Deleted:", obj)

})

// Run the controller

go controller.Run(stopCh)

// Wait for a signal to stop the operator

sigCh := make(chan os.Signal, 1)

signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)

<-sigCh

fmt.Println("Stopping the Foo operator...")

}

Step 3: Running the Operator

1. Build and run the Go program:

go run main.go

2. Create a sample Foo resource to test:

# Save this as foo-sample.yaml

apiVersion: samplecontroller.k8s.io/v1

kind: Foo

metadata:

Apply this resource:

kubectl apply -f foo-sample.yaml

Step 4: Check the Output

You should see logs in the terminal indicating when Foo resources are added, updated, or deleted:

New Foo Added: example-foo

Foo Updated: example-foo

Foo Deleted: example-foo

Explanation

1. Dynamic Client: The operator uses the dynamic client to interact with the custom resource since Foo is a CRD.

2. ListWatcher: The NewListWatchFromClient is used to monitor changes in Foo resources.

3. Controller: The controller is set up to handle Add, Update, and Delete events for the Foo resource.

4. Signal Handling: It gracefully shuts down on receiving a termination signal.

Further Enhancements

Use a code generation framework like kubebuilder or Operator SDK for complex operators.

Implement reconcile logic to manage the desired state.

Add leader election for high availability.

This example demonstrates the basic structure of an operator using the Kubernetes API. For production-grade operators, using a dedicated framework is recommended.

The need for ExternalName service type in Kubernetes

In Kubernetes, the Service resource defines a way to expose applications running in pods. There are several Service types (ClusterIP, NodePort, LoadBalancer, etc.), and one of them is ExternalName. This service type is unique because it maps a service name to an external DNS name instead of providing access to an IP address.

Understanding Service Type: ExternalName

The ExternalName service allows Kubernetes to proxy traffic to an external service using a DNS name. It doesn't create a typical cluster-internal IP and doesn't expose the service using ClusterIP or any other method. Instead, it returns a CNAME record with the value specified in the externalName field.

Use Case

The ExternalName type is used when you want Kubernetes to act as a DNS alias for services that are external to the cluster (e.g., a service running outside of Kubernetes, in another cluster, or even a third-party service).

Example Configuration

Here’s a sample YAML for a service of type ExternalName:

apiVersion: v1

kind: Service

metadata:

spec:

type: ExternalName

externalName: example.com

Key Fields Explained:

1. type: ExternalName: Specifies that the service type is ExternalName.

2. externalName: example.com: This is the external DNS name that the service will map to. Any requests to my-external-service within the cluster will be redirected to example.com.

How It Works

When pods within the same namespace try to access my-external-service (e.g., via my-external-service:port), Kubernetes will resolve this to the example.com address. It acts like a DNS CNAME record, and no cluster IP or load balancer is created.

Limitations

This service type does not provide load balancing.

There’s no IP address or port assignment.

It only supports DNS name resolution.

Cannot be used for connecting to IP addresses directly—only valid DNS names.

This type is primarily used for use cases where external dependencies need to be aliased using the Kubernetes DNS system.

Data Flow for Kubernetes Service Networking

1. External Traffic Ingress:

If you are using a LoadBalancer service or NodePort, traffic from the External Network first hits the load balancer or the node’s external IP at the specified port.

The external traffic is directed to the Kubernetes Service (via the LoadBalancer or NodePort).

2. Service:

The Service uses a ClusterIP to expose an internal, stable endpoint for communication within the cluster.

The service acts as a load balancer that forwards requests to the correct Pods. The Kube Proxy ensures that traffic gets routed correctly.

3. Kube Proxy:

The Kube Proxy running on each node maintains IP tables or network rules to ensure that traffic destined for a particular service (i.e., its ClusterIP) is routed to the corresponding Pods.

It balances requests between different Pods based on the service’s configuration.

4. Pod Communication:

Inside the cluster, Pods communicate with each other using the ClusterIP. The service ensures that traffic is routed to the appropriate Pods, which may be distributed across different nodes.

The Kube Proxy facilitates this internal communication between services and Pods within the cluster.

Example Traffic Flow:

An external user makes a request from the External Network (e.g., via a browser or API).

If the service is of type LoadBalancer or NodePort, the request enters the cluster via the load balancer or node port.

The service routes the request to the appropriate Pods using its ClusterIP, with the Kube Proxy forwarding the traffic to the specific Pods based on the current Pod status.

The Pod processes the request, and the response is sent back to the user through the same path.

This architecture allows for seamless load balancing, internal Pod communication, and external access depending on the service type, all managed through the Kubernetes network infrastructure.

Key Components in Kubernetes Service Networking

1. External Network: This is any external user or system that wants to access your Kubernetes services (e.g., via a browser or API request).

2. Service: A Kubernetes resource that defines a stable endpoint to expose your application, abstracting the underlying Pods. There are different service types: ClusterIP (default, internal only), NodePort, and LoadBalancer (for external access).

3. Pods: The smallest deployable units in Kubernetes, hosting one or more containers. Each Pod has its own IP address, but it’s ephemeral, meaning it can change when Pods are recreated.

4. Kube Proxy: A component running on each node that ensures proper routing of traffic between services and Pods. It watches the Kubernetes API for new services and endpoints and maintains the network rules.

5. ClusterIP: The internal IP address assigned to a service. It acts as a virtual IP that directs traffic from the service to the appropriate Pods.

6. LoadBalancer/NodePort: These expose the service to external networks:

LoadBalancer: Automatically provisions an external load balancer (typically on cloud platforms) and assigns it a public IP.

NodePort: Opens a specific port on each node to allow external traffic to enter.

Why Kubernetes: The Need for Container Orchestration

In recent years, Kubernetes has become one of the most popular tools in the tech industry, especially when it comes to managing containerized applications. But why exactly has Kubernetes gained such a significant foothold? What problem does it solve, and why do so many organizations choose it as their go-to platform for container orchestration? Let’s explore the reasons behind Kubernetes' growing adoption.

1. The Rise of Containers

Before understanding why Kubernetes is important, it's essential to grasp the role of containers in modern software development. Containers package an application and its dependencies into a single, lightweight unit that can run reliably across different computing environments. They are portable, fast to start, and require fewer resources than traditional virtual machines (VMs).

While containers provide flexibility, scaling, and isolation, managing them across large, distributed environments becomes increasingly complex as more containers are deployed. This is where Kubernetes comes in.

2. Automation at Scale

In dynamic production environments, manually managing and scaling containers isn’t feasible. Kubernetes automates this process, making it possible to manage and orchestrate hundreds or even thousands of containers efficiently.

Kubernetes handles:

Automated scheduling: It decides which servers (nodes) should run which containers based on resource availability and performance requirements.

Scaling: As the demand for an application increases or decreases, Kubernetes automatically scales containers up or down to meet performance goals without wasting resources.

Self-healing: If a container or node fails, Kubernetes automatically replaces it, ensuring your application remains available with minimal disruption.

3. Portability and Multi-Cloud Compatibility

One of Kubernetes' most powerful features is its ability to run across different cloud environments and on-premises infrastructure, making it a true multi-cloud solution. You are no longer tied to a single cloud provider or limited by your on-premises hardware. This portability allows organizations to avoid vendor lock-in, migrate workloads between clouds, or adopt hybrid cloud strategies easily.

4. Microservices Architecture

Kubernetes is a natural fit for applications following a microservices architecture, where each component of an application (e.g., user authentication, database, front-end) runs as a separate service. In such architectures, Kubernetes simplifies managing these services, orchestrating how they communicate with one another, handling load balancing, and providing mechanisms to manage network traffic between services.

This microservices model is essential for modern, scalable applications, and Kubernetes is the go-to platform to manage such environments.

5. DevOps and Continuous Deployment

Kubernetes works seamlessly with DevOps practices and CI/CD pipelines. It allows developers to:

Rapidly deploy new code updates.

Automate testing, integration, and deployment processes.

Roll back to previous versions easily in case of failure.

With Kubernetes, you can set up automated deployment strategies like blue-green deployments or canary releases, ensuring that new features are rolled out smoothly without downtime.

6. Community Support and Ecosystem

Kubernetes benefits from an enormous open-source community backed by major players like Google, Red Hat, IBM, Microsoft, and others. This means that there is a wealth of resources, tools, and plugins available for integration. The ecosystem surrounding Kubernetes, from monitoring tools like Prometheus to service meshes like Istio, is vast and continues to grow, allowing you to extend Kubernetes in numerous ways based on your needs.

7. Flexibility and Extensibility

Kubernetes provides flexibility by supporting a wide variety of workloads and programming languages. Whether you’re running stateless or stateful applications, batch processing, or streaming data, Kubernetes can handle it. Additionally, with its custom resource definitions (CRDs) and operators, Kubernetes is highly extensible, allowing you to automate even more advanced use cases and integrate it with other tools.

Conclusion

Kubernetes is more than just a container orchestrator; it is a key enabler of modern cloud-native applications. Its automation capabilities, scalability, portability, and alignment with microservices architectures make it an essential tool for organizations that want to innovate quickly and manage their infrastructure efficiently. As the industry continues to shift toward containerization and multi-cloud strategies, Kubernetes will likely remain at the forefront of container orchestration for years to come.

By adopting Kubernetes, organizations can reduce operational overhead, scale efficiently, and ensure that their applications are ready for the challenges of today’s complex, distributed environments.