Part 3: Talos Configuration Management - GitOps for Infrastructure

Part 3: Talos Configuration Management - GitOps for Infrastructure

TL;DR

Learn how to customize and manage your Talos Linux cluster configurations using GitOps principles. This article covers machine configuration structure, applying patches, managing secrets, network customization, disk encryption, system extensions, and versioning your infrastructure as code.

Key Takeaways:

  • Machine configurations are declarative YAML files that define your entire cluster state
  • Configuration patches allow incremental changes without modifying base configs
  • GitOps enables version-controlled, auditable infrastructure changes
  • Talos upgrades are seamless when configurations are properly managed
  • Secrets and certificates are managed securely through the Talos API

Introduction

Why This Matters

In Part 2: Talos Installation, you learned how to generate and apply basic machine configurations to bootstrap your cluster. However, real-world clusters require customization: static IPs, disk encryption, network VLANs, kernel parameters, and more.

This article teaches you how to:

  • Understand and customize machine configuration structure
  • Apply configuration patches for incremental changes
  • Manage secrets and certificates securely
  • Configure advanced networking (static IPs, VLANs)
  • Enable disk encryption
  • Add system extensions
  • Version control your infrastructure configurations
  • Perform seamless Talos upgrades

What You’ll Learn

  • Machine configuration structure and schema
  • Configuration patching strategies
  • Secrets and certificate management
  • Network configuration (static IPs, VLANs, bonding)
  • Disk partitioning and encryption
  • Kernel parameters and system tuning
  • System extensions installation
  • GitOps workflow for infrastructure
  • Upgrading Talos Linux safely

Prerequisites

Before starting, you should have:

  • Completed Part 2: Talos Installation
  • A running Talos Linux cluster (minimum 1 control plane, 1 worker)
  • talosctl installed and configured
  • kubectl configured with cluster access
  • Basic understanding of YAML syntax
  • Git installed (for version control)
  • A Git repository (GitLab, GitHub, or self-hosted)

Understanding Machine Configurations

Configuration Structure

Machine configurations are YAML files that define the complete state of a Talos node. They follow a hierarchical structure with several key sections:

version: v1alpha1
machine:
  # Machine-specific configuration
  type: controlplane  # or worker
  token: <bootstrap-token>
  ca:
    crt: <certificate>
    key: <private-key>
  # ... machine settings
cluster:
  # Cluster-wide configuration
  id: <cluster-id>
  secret: <cluster-secret>
  # ... cluster settings

Key Configuration Sections

  1. Machine Section

    • Node type (controlplane/worker)
    • Installation settings (disk, image)
    • Network configuration
    • Kernel parameters
    • System extensions
    • Features and settings
  2. Cluster Section

    • Cluster identity and secrets
    • API server configuration
    • Discovery settings
    • Proxy configuration
    • etcd configuration
    • Scheduler and controller manager settings

Viewing Current Configuration

# Get current machine configuration
talosctl get machineconfig --nodes <NODE_IP>

# Get specific configuration section
talosctl get machineconfig --nodes <NODE_IP> -o yaml

Example Output:

The following shows the actual machine configuration retrieved from a control plane node. This demonstrates the structure and content of a Talos machine configuration:

# Machine Configuration - Control Plane Node
# Retrieved via: talosctl get machineconfig --nodes <CONTROL_PLANE_IP> -o yaml

version: v1alpha1
debug: false
persist: true

# Machine-specific configuration
machine:
  type: controlplane  # Defines the role of the machine within the cluster

  # Machine token used to join the PKI of the cluster
  token: <ANONYMIZED_TOKEN>

  # Root certificate authority of the PKI
  ca:
    crt: <ANONYMIZED_CERTIFICATE>
    key: <ANONYMIZED_PRIVATE_KEY>

  # Extra certificate subject alternative names
  certSANs: []

  # Kubelet configuration
  kubelet:
    image: ghcr.io/siderolabs/kubelet:v1.34.3
    defaultRuntimeSeccompProfileEnabled: true
    disableManifestsDirectory: true

  # Network configuration (empty = using defaults/DHCP)
  network: {}

  # Installation settings
  install:
    disk: /dev/sda
    image: ghcr.io/siderolabs/installer:v1.11.5
    wipe: false

  # Talos features
  features:
    rbac: true
    stableHostname: true
    apidCheckExtKeyUsage: true
    diskQuotaSupport: true
    kubePrism:
      enabled: true
      port: 7445
    hostDNS:
      enabled: true
      forwardKubeDNSToHost: true

  # Node labels
  nodeLabels:
    node.kubernetes.io/exclude-from-external-load-balancers: ""

# Cluster-specific configuration
cluster:
  # Globally unique identifier for this cluster
  id: <ANONYMIZED_CLUSTER_ID>

  # Shared secret of cluster
  secret: <ANONYMIZED_CLUSTER_SECRET>

  # Control plane endpoint
  controlPlane:
    endpoint: https://<CONTROL_PLANE_IP>:6443

  # Cluster name
  clusterName: discworld-homelab

  # Network configuration
  network:
    dnsDomain: cluster.local
    podSubnets:
      - 10.244.0.0/16
    serviceSubnets:
      - 10.96.0.0/12

  # Bootstrap token used to join the cluster
  token: <ANONYMIZED_BOOTSTRAP_TOKEN>

  # Secretbox encryption secret for Kubernetes secrets at rest
  secretboxEncryptionSecret: <ANONYMIZED_ENCRYPTION_SECRET>

  # Root certificate authority used by Kubernetes
  ca:
    crt: <ANONYMIZED_K8S_CA_CERT>
    key: <ANONYMIZED_K8S_CA_KEY>

  # Aggregator certificate authority for front-proxy
  aggregatorCA:
    crt: <ANONYMIZED_AGGREGATOR_CA_CERT>
    key: <ANONYMIZED_AGGREGATOR_CA_KEY>

  # Service account private key
  serviceAccount:
    key: <ANONYMIZED_SERVICE_ACCOUNT_KEY>

  # API server configuration
  apiServer:
    image: registry.k8s.io/kube-apiserver:v1.34.3
    certSANs:
      - <CONTROL_PLANE_IP>
    disablePodSecurityPolicy: true

    # Pod Security Standards admission control
    admissionControl:
      - name: PodSecurity
        configuration:
          apiVersion: pod-security.admission.config.k8s.io/v1alpha1
          defaults:
            audit: restricted
            audit-version: latest
            enforce: baseline
            enforce-version: latest
            warn: restricted
            warn-version: latest
          exemptions:
            namespaces:
              - kube-system
            runtimeClasses: []
            usernames: []
          kind: PodSecurityConfiguration

    # Audit policy
    auditPolicy:
      apiVersion: audit.k8s.io/v1
      kind: Policy
      rules:
        - level: Metadata

  # Controller manager configuration
  controllerManager:
    image: registry.k8s.io/kube-controller-manager:v1.34.3

  # Kube-proxy configuration
  proxy:
    image: registry.k8s.io/kube-proxy:v1.34.3

  # Scheduler configuration
  scheduler:
    image: registry.k8s.io/kube-scheduler:v1.34.3

  # Cluster member discovery
  discovery:
    enabled: true
    registries:
      kubernetes:
        disabled: true
      service: {}

  # etcd configuration
  etcd:
    ca:
      crt: <ANONYMIZED_ETCD_CA_CERT>
      key: <ANONYMIZED_ETCD_CA_KEY>

  # Additional manifests (empty in this case)
  extraManifests: []
  inlineManifests: []

Configuration Analysis:

  • The configuration shows a standard control plane node setup
  • Network configuration is minimal (using DHCP/defaults)
  • Pod Security Standards are configured with baseline enforcement
  • Audit logging is enabled at Metadata level
  • All default Talos features are enabled (RBAC, stable hostname, etc.)
  • KubePrism is enabled for local load balancing
  • Host DNS caching is enabled

Configuration Patching

Why Use Patches?

Instead of modifying base configurations directly, Talos uses configuration patches to apply incremental changes. This approach:

  • Preserves base configuration integrity
  • Enables rollback capabilities
  • Supports GitOps workflows
  • Allows multiple patches to be applied in sequence

Creating Configuration Patches

Example: Adding Static IP Configuration

# patch-static-ip.yaml
machine:
  network:
    interfaces:
      - interface: enp1s0
        addresses:
          - 192.168.178.55/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.178.1
        nameservers:
          - 192.168.178.1
          - 8.8.8.8

Applying a Patch:

# Apply patch to a node
talosctl patch machineconfig \
  --nodes <NODE_IP> \
  --patch @patch-static-ip.yaml

# Apply patch to multiple nodes
talosctl patch machineconfig \
  --nodes <NODE_IP_1>,<NODE_IP_2> \
  --patch @patch-static-ip.yaml

Alternative Approach: DHCP Reservations

For homelab setups, using DHCP reservations at your router can be simpler than static IP patches. This approach provides the stability of static IPs without requiring manual network configuration on each node. In this setup, all nodes have reserved IPs configured in the router (192.168.178.55 for control plane, 192.168.178.56 and 192.168.178.57 for workers).

Patch Best Practices

  1. Keep Patches Focused: One patch per concern (network, storage, extensions)
  2. Use Descriptive Names: patch-static-ip.yaml, patch-disk-encryption.yaml
  3. Version Control: Store patches in Git with clear commit messages
  4. Test First: Apply patches to a single node before rolling out
  5. Document Changes: Include comments explaining why patches are needed

Secrets and Certificate Management

Understanding Talos Secrets

Talos manages several types of secrets:

  • Machine secrets: Node-specific certificates and keys
  • Cluster secrets: Shared cluster certificates
  • Kubernetes secrets: API server, etcd, service account certificates

Viewing Secrets

# List all secrets (metadata only, not actual values)
talosctl get secrets --nodes <NODE_IP>

# Get specific secret metadata
talosctl get secrets --nodes <NODE_IP> -i <SECRET_ID>

Note: The secrets resource may not be available in all Talos versions. If you receive an error like “resource ‘secrets’ is not registered”, this indicates that the server version doesn’t support this resource. Secrets are managed internally by Talos and may not be directly accessible via the API in some versions.

Alternative ways to verify secrets/certificates:

  • Check certificate validity via Kubernetes API: kubectl get certificates
  • View machine configuration (which contains certificate references): talosctl get machineconfig --nodes <NODE_IP> -o yaml
  • Check certificate expiration via Kubernetes: kubectl get nodes -o yaml | grep -A 5 certificates

Certificate Rotation

Talos automatically manages certificate rotation, but you can manually trigger it:

# Rotate certificates for a node
talosctl reset --nodes <NODE_IP> --graceful

# Rotate etcd certificates
talosctl reset --nodes <CONTROL_PLANE_IP> --reboot=false

Network Configuration

Static IP Configuration

Creating a Static IP Patch:

# patch-static-ip.yaml
machine:
  network:
    interfaces:
      - interface: enp1s0
        addresses:
          - 192.168.178.55/24
        routes:
          - network: 0.0.0.0/0
            gateway: 192.168.178.1
        nameservers:
          - 192.168.178.1
          - 8.8.8.8

Applying Static IP Configuration:

# Apply to control plane
talosctl patch machineconfig \
  --nodes 192.168.178.55 \
  --patch @patch-static-ip.yaml

# Apply to worker nodes
talosctl patch machineconfig \
  --nodes 192.168.178.56,192.168.178.57 \
  --patch @patch-static-ip-worker.yaml

VLAN Configuration

VLAN Patch Example:

# patch-vlan.yaml
machine:
  network:
    interfaces:
      - interface: enp1s0
        vlans:
          - vlanId: 100
            addresses:
              - 10.0.100.10/24
            routes:
              - network: 10.0.100.0/24
                gateway: 10.0.100.1

Network Bonding

Bond Configuration Example:

# patch-bond.yaml
machine:
  network:
    interfaces:
      - interface: bond0
        bond:
          interfaces:
            - enp1s0
            - enp2s0
          mode: 802.3ad  # LACP
          lacpRate: fast
        addresses:
          - 192.168.178.55/24

Disk Partitioning and Encryption

Disk Partitioning

Custom Partition Layout:

# patch-disk-partitioning.yaml
machine:
  disks:
    - device: /dev/sda
      partitions:
        - size: 512MiB
          label: EFI
        - size: 4GiB
          label: BOOT
        - size: 0  # Use remaining space
          label: STATE

Note: Talos automatically handles disk partitioning during installation. With single-disk nodes, the default partitioning is usually sufficient. Default layout: EFI (512MiB), BOOT (4GiB), STATE (remaining space).

Disk Encryption

Enabling Disk Encryption:

# patch-disk-encryption.yaml
machine:
  systemDiskEncryption:
    state:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0
    ephemeral:
      provider: luks2
      keys:
        - nodeID: {}
          slot: 0

Applying Encryption:

# Apply encryption patch
talosctl patch machineconfig \
  --nodes <NODE_IP> \
  --patch @patch-disk-encryption.yaml

# Node will reboot to apply encryption

Important Notes:

  • Encryption requires a node reboot
  • Data on unencrypted disks will be lost
  • Ensure you have backups before enabling encryption
  • Test on a single node first

Kernel Parameters

Configuring Kernel Parameters

Kernel Parameter Patch:

# patch-kernel-params.yaml
machine:
  kernel:
    modules:
      - name: br_netfilter
    sysctls:
      net.ipv4.ip_forward: "1"
      net.ipv4.conf.all.forwarding: "1"
      vm.swappiness: "10"

Applying Kernel Parameters:

talosctl patch machineconfig \
  --nodes <NODE_IP> \
  --patch @patch-kernel-params.yaml

Note: Talos automatically configures these Kubernetes-required kernel parameters by default, so custom kernel parameter patches are typically not needed unless you have specific tuning requirements.

System Extensions

What Are System Extensions?

System extensions add additional functionality to Talos without modifying the base image:

  • Hardware drivers (GPU, network cards)
  • Filesystem tools
  • Debugging utilities
  • Custom kernel modules

Installing System Extensions

Extension Patch Example:

# patch-extensions.yaml
machine:
  install:
    extensions:
      - siderolabs/amd-ucode
      - siderolabs/intel-ucode
      - siderolabs/iscsi-tools

Available Extensions:

# List available extensions
talosctl images list

# Common extensions:
# - siderolabs/amd-ucode: AMD microcode updates
# - siderolabs/intel-ucode: Intel microcode updates
# - siderolabs/iscsi-tools: iSCSI utilities
# - siderolabs/gvisor: gVisor container runtime

Applying Extensions:

talosctl patch machineconfig \
  --nodes <NODE_IP> \
  --patch @patch-extensions.yaml

# Extensions are applied on next upgrade or reinstall

Note: Modern Talos images typically include microcode updates by default, so extensions are only needed if specific hardware requires newer microcode than included in the base image, or if you need iSCSI storage or gVisor runtime.

Configuration Versioning in Git

GitOps Workflow

Storing configurations in Git enables:

  • Version control and history
  • Collaboration and code review
  • Automated deployments
  • Disaster recovery
  • Compliance and auditing

Repository Structure

talos-configs/
├── README.md
├── clusters/
│   └── discworld-homelab/
│       ├── controlplane.yaml
│       ├── worker.yaml
│       └── patches/
│           ├── static-ip.yaml
│           ├── disk-encryption.yaml
│           └── extensions.yaml
└── .gitignore

Setting Up Git Repository

# Initialize repository
mkdir talos-configs
cd talos-configs
git init

# Create directory structure
mkdir -p clusters/discworld-homelab/patches

# Add configurations
cp controlplane.yaml clusters/discworld-homelab/
cp worker.yaml clusters/discworld-homelab/
cp patch-*.yaml clusters/discworld-homelab/patches/

# Create .gitignore
cat > .gitignore <<EOF
# Sensitive files
talosconfig
*.key
*.crt
*.pem
# Temporary files
*.tmp
*.bak
EOF

# Initial commit
git add .
git commit -m "Initial Talos configuration repository"

Git Workflow Example

# Make configuration changes
vim clusters/discworld-homelab/patches/static-ip.yaml

# Test changes locally
talosctl patch machineconfig \
  --nodes <NODE_IP> \
  --patch @clusters/discworld-homelab/patches/static-ip.yaml

# Commit changes
git add clusters/discworld-homelab/patches/static-ip.yaml
git commit -m "Add static IP configuration for worker nodes"

# Push to remote
git push origin main

Best Practices

Configuration Management

  • Use Patches: Always use patches instead of modifying base configs
  • Version Control: Store all configurations in Git
  • Test First: Test patches on a single node before rolling out
  • Document Changes: Include clear commit messages and documentation
  • Backup Regularly: Export and backup configurations before major changes

Security

  • Secure Storage: Never commit secrets or private keys to Git
  • Access Control: Limit who can modify configurations
  • Audit Trail: Use Git history as an audit log
  • Certificate Rotation: Regularly rotate certificates
  • Encryption: Enable disk encryption for sensitive data

Network Configuration

  • Static IPs: Use static IPs or DHCP reservations for stability
  • Documentation: Document network topology and IP assignments
  • Testing: Test network changes on a single node first
  • Rollback Plan: Have a plan to revert network changes if needed

Upgrades

  • Staged Upgrades: Upgrade control plane before workers
  • Health Checks: Monitor cluster health during upgrades
  • Backup First: Always backup configurations before upgrading
  • Test Environment: Test upgrades in a lab environment first
  • Rollback Ready: Know how to rollback if upgrade fails

Troubleshooting

Common Issue 1: Patch Application Fails

Problem: talosctl patch command fails with validation errors

Solution:

# Validate patch syntax
talosctl config validate --config @patch.yaml

# Check current configuration
talosctl get machineconfig --nodes <NODE_IP> -o yaml

# Apply patch with debug output
talosctl patch machineconfig \
  --nodes <NODE_IP> \
  --patch @patch.yaml \
  --debug

Common Issue 2: Network Configuration Breaks Connectivity

Problem: After applying network patch, node becomes unreachable

Solution:

# If you have console access, check network status
talosctl --nodes <NODE_IP> get links

# Revert to previous configuration
# (Requires console access or physical access)

Prevention:

  • Test network patches on a single node first
  • Have console/physical access available
  • Keep backup of working configuration

Common Issue 3: Upgrade Fails

Problem: Talos upgrade fails or node doesn’t come back online

Solution:

# Check node status
talosctl --nodes <NODE_IP> get members

# Check logs
talosctl --nodes <NODE_IP> logs machined

# If node is unreachable, check console output
# May need to reinstall with previous version

Common Issue 4: Configuration Drift

Problem: Node configuration doesn’t match Git repository

Solution:

# Export current configuration
talosctl get machineconfig --nodes <NODE_IP> -o yaml > current-config.yaml

# Compare with Git version
diff current-config.yaml clusters/discworld-homelab/controlplane.yaml

# Reapply configuration from Git
talosctl apply-config \
  --nodes <NODE_IP> \
  --file clusters/discworld-homelab/controlplane.yaml

Summary

Key takeaways from configuration management:

  • Machine configurations are declarative and version-controllable
  • Configuration patches enable incremental, safe changes
  • GitOps provides audit trail and collaboration
  • Proper configuration management simplifies upgrades
  • Security best practices protect sensitive data

What We Accomplished:

  • Understood machine configuration structure with machine and cluster sections
  • Learned how to create and apply configuration patches
  • Explored network configuration options (static IPs, VLANs, bonding)
  • Reviewed disk partitioning and encryption options
  • Examined kernel parameters and system extensions
  • Set up Git repository for version-controlled infrastructure

Next Steps

Now that you can manage Talos configurations:

  • Part 4: High Availability Setup - Build a production-grade multi-control-plane cluster
  • Explore advanced configuration options
  • Set up automated configuration deployment (CI/CD)

Recommended Reading

If you want to dive deeper into Talos Linux and Kubernetes, here are some excellent books that complement this series:

Note: The Amazon links below are affiliate links for Amazon Influencers and Associates. If you make a purchase through these links, I may earn a small commission at no additional cost to you.

Talos Linux Books

Kubernetes Books


Resources

Official Documentation

Related Articles

Tools and Utilities

Community Resources


Series Navigation

Previous: Part 2 - Talos Installation - Building Your First Cluster

Current: Part 3 - Talos Configuration Management - GitOps for Infrastructure

Next: Part 4 - High Availability Setup - Production-Grade Cluster

Full Series:

  1. Talos Linux Introduction
  2. Talos Installation - Building Your First Cluster
  3. Talos Configuration Management - GitOps for Infrastructure (You are here)
  4. High Availability Setup - Production-Grade Cluster
  5. Storage Configuration - Persistent Storage for Kubernetes (Coming Soon)
  6. Networking - CNI, Load Balancing, and Ingress (Coming Soon)
  7. Security Hardening - Securing Your Homelab Cluster (Coming Soon)
  8. Monitoring and Maintenance - Keeping Your Cluster Healthy (Coming Soon)

This article is part of the “Talos Linux Homelab” series. Follow along as we build a production-grade Kubernetes homelab from the ground up.

Questions or feedback? Reach out via email or connect on LinkedIn.