Skip to content

Kyuubi

Kyuubi runs on each env-dataops data-platform EKS cluster and exposes Spark SQL through HiveServer2-compatible endpoints. The current service root is terraform/kyuubi/env/<env>.

Current Shape

Each environment has three warehouses:

WarehouseNamespaceMain Use
defaultkyuubidbt, Airflow ETL, regular service jobs
maintenancekyuubi-maintenanceSnowpack, Iceberg table health, maintenance jobs
interactivekyuubi-interactivehuman and BI-style ad hoc SQL

The endpoint pattern is:

kyuubi[-maintenance|-interactive].data-platform.us-east-1.<env>-dataops.fetchrewards.com:10009

Catalogs

Kyuubi configures the same-account Glue catalog as lakehouse_<env>. It also configures remote catalogs for cross-env reads. Remote catalog names intentionally use the same lakehouse_<env> pattern instead of a _ro suffix; Lake Formation and S3 permissions enforce read-only behavior for remote catalogs except for the explicit dev-write exception on interactive Kyuubi.

Cost And Observability

Kyuubi writes Spark event logs into the environment lakehouse bucket. The spark-cost-tracking job reads those event logs and populates data_platform_usage tables used by Spark Query History and Grafana dashboards.

Legacy Reference

The older Kyuubi clusters reference has useful warehouse tuning context, but many URLs and account references are test-dataops-era. Prefer this page and live Terraform roots for env-dataops deployment details.

Checked Against

  • terraform/kyuubi/env/dev/main.tf, stage, preprod, and prod on origin/main.
  • terraform/config/services.yaml.
  • implementations/2026-05-21-dl-419-prod-dataops-runtime-progress.md.
  • implementations/2026-05-22-dl-419-dev-catalog-interactive-write-exception-progress.md.