Skip to content

Aotanami Architecture

Overview

Aotanami is a Kubernetes Operator built with Kubebuilder and controller-runtime. It runs as a single deployment in your cluster with read-only access to cluster resources, using Agentic AI (BYO LLM keys) to autonomously detect, diagnose, and remediate issues via GitOps.

System Architecture

graph TB
    subgraph "Kubernetes Cluster — Read-Only Access"
        Events[K8s Events]
        Logs[Pod Logs]
        Nodes[Node Conditions]
        Net[Network Telemetry]
        Metrics[Resource Metrics]
        CRDs[Aotanami CRDs]
    end

    subgraph "Aotanami Operator"
        direction TB

        subgraph "Detection Layer"
            Monitor[Real-Time Monitor]
            Scanner[Security Scanner]
            ComplianceEng[Compliance Engine]
            CostEng[Cost Optimizer]
            AnomalyDet[Anomaly Detector]
            ThreatDet[Threat Detector]
            DriftDet[Drift Detector]
        end

        subgraph "Intelligence Layer"
            Correlator[Incident Correlator]
            PolicyEng[Policy Engine — CEL]
            LLMEng[LLM Engine — BYO Keys]
        end

        subgraph "Action Layer"
            Remediation[GitOps Fix Generator]
            Notifier[Notification Router]
        end

        subgraph "Platform Layer"
            Dashboard[Embedded Dashboard]
            API[REST API]
            MetricsExp[Prometheus / OTEL]
        end
    end

    subgraph "External Integrations"
        GitHub[GitHub App — PRs]
        Slack[Slack / Teams / PagerDuty]
        AlertMgr[AlertManager]
        Telegram[Telegram / WhatsApp]
    end

    Events & Logs & Nodes & Net & Metrics --> Monitor
    CRDs --> Scanner & ComplianceEng & CostEng
    Monitor --> AnomalyDet & ThreatDet
    Scanner --> DriftDet
    AnomalyDet & Scanner & CostEng & ThreatDet & ComplianceEng & DriftDet --> Correlator
    Correlator --> PolicyEng --> LLMEng
    LLMEng --> Remediation --> GitHub
    LLMEng --> Notifier --> Slack & AlertMgr & Telegram
    Monitor & LLMEng --> Dashboard
    API --> Dashboard
    Monitor --> MetricsExp

Operator Lifecycle

sequenceDiagram
    participant K8s as Kubernetes API
    participant Mon as Monitor
    participant Cor as Correlator
    participant Pol as Policy Engine
    participant LLM as LLM Engine
    participant Rem as Remediation
    participant Not as Notifier

    K8s->>Mon: Watch events, logs, metrics
    Mon->>Mon: Detect anomaly / threat
    Mon->>Cor: Forward findings
    Cor->>Cor: Deduplicate & correlate
    Cor->>Pol: Evaluate against policies
    Pol->>Pol: CEL expression evaluation

    alt Complex / Novel Issue
        Pol->>LLM: Request AI diagnosis
        LLM->>LLM: Analyze with structured output
        LLM-->>Rem: Generate fix recommendation
    end

    alt Protect Mode
        Rem->>Rem: Generate manifest patch
        Rem->>K8s: Create PR via GitHub App
    end

    Pol->>Not: Route alert
    Not->>Not: Rate limit & aggregate
    Not-->>Not: Send to configured channels

Core Components

Controllers (Kubebuilder-generated)

Each CRD has a dedicated reconciliation controller:

Controller Watches Reconciles
SecurityPolicyReconciler SecurityPolicy Configures scanner rules, triggers evaluations
RemediationPolicyReconciler RemediationPolicy Manages GitOps PR generation settings
ClusterScanReconciler ClusterScan Schedules and executes scans
ScanReportReconciler ScanReport Manages scan result lifecycle
CostPolicyReconciler CostPolicy Configures cost monitoring thresholds
MonitoringPolicyReconciler MonitoringPolicy Configures real-time monitoring
NotificationChannelReconciler NotificationChannel Validates and activates notification channels
AotanamiConfigReconciler AotanamiConfig Applies global configuration changes
GitOpsRepositoryReconciler GitOpsRepository Onboards repos, manages sync lifecycle

Internal Packages

Layer Package Purpose
Intelligence llm BYO LLM client with token optimization
Intelligence anomaly Statistical anomaly detection
Intelligence correlator Incident dedup & correlation
Intelligence policy CEL-based policy evaluation
Detection monitor Real-time K8s event/log watcher
Detection scanner Security & config scanning
Detection compliance CIS, NSA, PCI-DSS, SOC2, HIPAA
Detection supplychain SBOM, image signatures, CVEs
Detection threat Runtime threat detection
Detection drift Config drift vs. GitOps repo
Detection costoptimizer Resource rightsizing & cost analysis
Actions remediation GitOps fix generator
Actions gitops Repo onboarding & sync
Actions github GitHub App client
Actions notifier Multi-channel alert routing
Platform dashboard Embedded web UI (htmx + SSE)
Platform api REST API (OpenAPI)
Platform metrics Prometheus + OTEL export
Platform multicluster Cross-cluster federation

Data Flow

flowchart LR
    A[K8s Events/Logs/Metrics] -->|Read-Only| B[Monitor]
    B --> C{Correlator}
    C -->|Deduplicated| D[Policy Engine]
    D -->|Complex Issues| E[LLM Engine]
    D -->|Simple Issues| F[Notifier]
    E -->|Protect Mode| G[GitOps PR]
    E -->|Audit Mode| F
    E --> H[Dashboard]
    B --> I[Prometheus/OTEL]

Security Model

  • Read-only cluster access: Aotanami uses only get, list, watch verbs on cluster resources
  • No direct mutations: All fixes are delivered as GitOps PRs, never applied directly
  • API key isolation: LLM API keys stored in Kubernetes Secrets, never logged or exposed
  • Non-root container: Runs as UID 65532 in a distroless image with read-only rootfs
  • Signed artifacts: All container images and Helm charts are Cosign-signed with SBOM attestations