Skip to content

Spark Query History

Spark Query History runs in each env-dataops cluster from terraform/spark-query-history/env/<env>.

Current Shape

The app runs in the kyuubi namespace and uses an IRSA role named:

data-platform-<env>-spark-query-history

The host pattern is:

spark-query-history.data-platform.us-east-1.<env>-dataops.fetchrewards.com

Data Sources

Spark Query History queries the environment’s data_platform_usage database through Athena and Glue. It depends on the Kyuubi Spark event log pipeline and the spark-cost-tracking aggregation job.

Core tables include:

  • data_platform_usage.spark_query_costs
  • data_platform_usage.spark_query_operations
  • data_platform_usage.iceberg_table_health

Validation

Use /api/health, /queries?limit=5, and /costs?days=7 for basic runtime checks. If data is missing, verify Kyuubi produced Spark event logs and the cost-tracking job has populated the Iceberg tables before debugging the web app.

Checked Against

  • terraform/spark-query-history/env/dev/main.tf, stage, preprod, and prod on origin/main.
  • terraform/modules/spark-cost-tracking.
  • implementations/2026-05-28-dl-474-iceberg-table-health-all-envs-progress.md.