BigQuery

Monitoring Kit · v1

11 metrics. Every BigQuery
project. One setup.

Every BigQuery project exposes the same INFORMATION_SCHEMA.JOBS_BY_PROJECT and TABLE_STORAGE_BY_PROJECT views. These 11 queries turn them into live alerts with smart comparison — each metric checks against a 30-day rolling baseline so you only get paged when something is genuinely wrong. Query the region-us.INFORMATION_SCHEMA variant to match your project's region.

Start monitoring free →

Production-ready metrics

Categories covered

30-day

Smart comparison baseline

10 min

From connect to first alert

BigQuery Kit · 11 metrics · Live

MetricCategorySchedule

Slot Usage Spike — Last 24h

CostHourly

Bytes Billed per User Spike — Last 24h

CostDaily

Dataset Storage — Weekly Snapshot

CostWeekly

Long-Term Storage Accumulation

CostWeekly

Long-Running Jobs — Last 1h

PerformanceHourly

Job Wait Time Anomaly — Last 1h

PerformanceHourly

Job Failure Rate Spike — Last 1h

FailuresHourly

Failed Jobs — Last 1h

FailuresHourly

Stale Tables — 90-Day No Query

OptimizationWeekly

Peak Slot Usage — Yesterday

OptimizationDaily

Large Bytes Scanned — Last 24h

OptimizationDaily

Permissions

2 IAM roles. Read-only. That's it.

Lighthouse only reads metadata — job history, storage stats, slot usage. It cannot access your actual table data, run queries, create objects, or modify anything in your BigQuery project.

Job History Access

roles/bigquery.resourceViewer

One-time IAM grant. Covers all current and future jobs automatically.

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:lighthouse@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.resourceViewer"

INFORMATION_SCHEMA.JOBS_BY_PROJECT
JOBS_TIMELINE_BY_PROJECT
Slot usage & reservation data
Failure data & error messages
Job wait time & bytes billed

Storage Metadata Access

roles/bigquery.metadataViewer

Read-only metadata only — no access to actual table data.

gcloud projects add-iam-policy-binding PROJECT_ID \
  --member="serviceAccount:lighthouse@PROJECT_ID.iam.gserviceaccount.com" \
  --role="roles/bigquery.metadataViewer"

INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT
Dataset and table metadata
Stale table detection
Active vs long-term storage breakdown

BigQuery Read Session (optional)

roles/bigquery.readSessionUser

Optional — confirms read-only posture. Lighthouse never writes to your BigQuery project.

-- No additional grant needed
-- roles/bigquery.resourceViewer and metadataViewer
-- are both read-only by definition

Confirms read-only access posture
No write permissions granted
Lighthouse never writes to your BigQuery project

Read-only, always → Both IAM roles grant read-only access to operational metadata. Run them once and you're done — new jobs, tables, datasets, and activity are automatically covered. Lighthouse cannot write data, drop tables, create datasets, or access the actual contents of your tables — only operational metadata like job durations, slot consumption, and storage sizes.

How it works

We configure it. You get the alerts.

Connect BigQuery

Service account with read-only IAM roles, 3 clicks. Lighthouse reads your job metadata and storage stats — never your actual data. Takes under 5 minutes.

Deploy the kit

All 11 monitoring metrics go live in your Lighthouse workspace — pre-configured, pre-tuned, with 30-day smart comparison baselines. No SQL required.

Get Slack alerts

When costs spike, jobs fail, or slot usage anomalies appear, you get a Slack message with value, baseline, and context — before anyone has to ask.

Time windows explained → JOBS_BY_PROJECT retains 180 days of history. Real-time metrics use a -2h to -1h window to avoid partially-written rows — BigQuery can take up to 60 minutes to fully populate job records. TABLE_STORAGE_BY_PROJECT is a point-in-time snapshot updated hourly — weekly cadence queries read it as-is. No partial reads, no false drops.

💰Cost & Billing4 ⚡Performance2 🚨Failures2 🎯Optimization3

💰

Cost & Billing

4 metrics

01 / COSTReal-timeHourly

Slot Usage Spike — Last 24h

Project-level slot consumption exceeds the 30-day daily max by more than 30%.

Why monitor

Compares today's slot consumption against a 30-day rolling baseline — alerts only when compute is genuinely anomalous, not just on normally busy days.
Slot-hours is the true cost signal — unlike bytes billed, slot usage reflects actual compute consumed, including flat-rate reservation waste.
Project-level segmentation isolates which workload is responsible when you run multiple projects under one billing account.

SQL · BigQuery

WITH
  main_metric AS (
    SELECT
      project_id,
      ROUND(SUM(total_slot_ms) / 3600000.0, 2) AS METRIC_VALUE
    FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
    WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 25 HOUR)
      AND creation_time <  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
      AND state = 'DONE'
    GROUP BY project_id
  ),
  baseline_windows AS (
    SELECT
      TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY),
        INTERVAL (day_offset + 1) DAY) AS window_start,
      TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY),
        INTERVAL day_offset DAY)       AS window_end
    FROM UNNEST(GENERATE_ARRAY(1, 30)) AS day_offset
  ),
  baseline_metric AS (
    SELECT
      j.project_id,
      w.window_start,
      ROUND(SUM(j.total_slot_ms) / 3600000.0, 2) AS METRIC_VALUE
    FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
    JOIN baseline_windows w
      ON j.creation_time >= w.window_start
     AND j.creation_time <  w.window_end
    WHERE j.state = 'DONE'
    GROUP BY j.project_id, w.window_start
  ),
  baseline AS (
    SELECT
      project_id,
      MAX(METRIC_VALUE)       AS COMPARED_VALUE,
      MAX(METRIC_VALUE) * 1.3 AS COMPARED_VALUE_THRESHOLD
    FROM baseline_metric
    GROUP BY project_id
  )
SELECT
  m.project_id,
  m.METRIC_VALUE,
  ROUND(b.COMPARED_VALUE, 2)           AS COMPARED_VALUE,
  ROUND(b.COMPARED_VALUE_THRESHOLD, 2) AS COMPARED_VALUE_THRESHOLD,
  CASE WHEN m.METRIC_VALUE > b.COMPARED_VALUE_THRESHOLD
    THEN 'SPIKE' ELSE 'Normal'
  END AS status
FROM main_metric m
LEFT JOIN baseline b ON m.project_id = b.project_id
ORDER BY m.METRIC_VALUE DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

⚠️ Slot Usage Spike — my-project used 847 slot-hours in the last 24h. 30-day max: 312 slot-hours · 2.7× above baseline.

02 / COSTReal-timeDaily

Bytes Billed per User Spike — Last 24h

A user's bytes billed exceeds their 30-day daily max by more than 30%. Estimated cost included.

Why monitor

BigQuery-specific: a single SELECT * on a large unpartitioned table can generate a surprise bill — this catches it the same day, not on the invoice.
Per-user segmentation tells you exactly who to talk to — and whether it's a one-time mistake or a recurring pattern.
Estimated cost in USD ($6.25/TB at on-demand rates) makes the alert immediately actionable for finance and engineering alike.

SQL · BigQuery

WITH
  main_metric AS (
    SELECT
      user_email,
      ROUND(SUM(total_bytes_billed) / POW(10, 12), 4) AS METRIC_VALUE
    FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
    WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 25 HOUR)
      AND creation_time <  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
      AND state = 'DONE'
      AND total_bytes_billed > 0
    GROUP BY user_email
  ),
  baseline_windows AS (
    SELECT
      TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY),
        INTERVAL (day_offset + 1) DAY) AS window_start,
      TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY),
        INTERVAL day_offset DAY)       AS window_end
    FROM UNNEST(GENERATE_ARRAY(1, 30)) AS day_offset
  ),
  baseline_metric AS (
    SELECT
      j.user_email,
      w.window_start,
      ROUND(SUM(j.total_bytes_billed) / POW(10, 12), 4) AS METRIC_VALUE
    FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
    JOIN baseline_windows w
      ON j.creation_time >= w.window_start
     AND j.creation_time <  w.window_end
    WHERE j.state = 'DONE'
      AND j.total_bytes_billed > 0
    GROUP BY j.user_email, w.window_start
  ),
  baseline AS (
    SELECT
      user_email,
      MAX(METRIC_VALUE)       AS COMPARED_VALUE,
      MAX(METRIC_VALUE) * 1.3 AS COMPARED_VALUE_THRESHOLD
    FROM baseline_metric
    GROUP BY user_email
  )
SELECT
  m.user_email,
  m.METRIC_VALUE                          AS METRIC_VALUE_TB,
  ROUND(m.METRIC_VALUE * 6.25, 2)         AS ESTIMATED_COST_USD,
  ROUND(b.COMPARED_VALUE, 4)              AS COMPARED_VALUE,
  ROUND(b.COMPARED_VALUE_THRESHOLD, 4)    AS COMPARED_VALUE_THRESHOLD
FROM main_metric m
LEFT JOIN baseline b ON m.user_email = b.user_email
WHERE m.METRIC_VALUE > COALESCE(b.COMPARED_VALUE_THRESHOLD, 0)
ORDER BY m.METRIC_VALUE DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

⚠️ Bytes Billed Spike — analyst@company.com billed 4.8 TB in the last 24h (est. $30). 30-day max: 0.9 TB · 5.3× above baseline.

03 / COSTAuditWeekly

Dataset Storage — Weekly Snapshot

Current storage by dataset — active vs long-term bytes, table count, and long-term percentage.

Why monitor

TABLE_STORAGE_BY_PROJECT is a live snapshot — shows exactly what you're paying for right now, split between active (higher rate) and long-term (lower rate) storage.
High long-term % signals stale data — data sitting untouched for 90+ days is automatically moved to long-term pricing, indicating cleanup opportunity.
Weekly cadence surfaces gradual storage growth before it shows up as a surprise line item on the bill.

SQL · BigQuery

SELECT
  table_catalog AS project_id,
  table_schema  AS dataset_id,
  COUNT(*)      AS table_count,
  ROUND(SUM(total_logical_bytes)     / POW(1024, 3), 2) AS total_gb,
  ROUND(SUM(active_logical_bytes)    / POW(1024, 3), 2) AS active_gb,
  ROUND(SUM(long_term_logical_bytes) / POW(1024, 3), 2) AS long_term_gb,
  ROUND(SAFE_DIVIDE(
    SUM(long_term_logical_bytes),
    SUM(total_logical_bytes)
  ) * 100, 1) AS long_term_pct
FROM `region-us.INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT`
WHERE total_logical_bytes > 0
GROUP BY 1, 2
ORDER BY total_gb DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

📦 Storage Snapshot — analytics_prod: 2.4 TB total (68% long-term). raw_events: 890 GB, 100% active. 3 datasets exceed 500 GB.

04 / COSTAuditWeekly

Long-Term Storage Accumulation

Tables > 10 GB where more than 90% of storage is long-term (untouched for 90+ days).

Why monitor

BigQuery auto-promotes data to long-term after 90 days of no modification — long-term storage is cheaper but accumulates silently.
Tables with 90%+ long-term bytes are candidates for expiry policies or archiving — catching them weekly prevents bill creep.
Size filter (> 10 GB) keeps the alert focused — ignores small tables where the cost impact is negligible.

SQL · BigQuery

SELECT
  table_catalog AS project_id,
  table_schema  AS dataset_id,
  table_name,
  ROUND(total_logical_bytes     / POW(1024, 3), 3) AS total_gb,
  ROUND(long_term_logical_bytes / POW(1024, 3), 3) AS long_term_gb,
  ROUND(SAFE_DIVIDE(
    long_term_logical_bytes,
    total_logical_bytes
  ) * 100, 1) AS long_term_pct
FROM `region-us.INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT`
WHERE total_logical_bytes     > POW(1024, 3) * 10     -- tables > 10 GB
  AND long_term_logical_bytes > total_logical_bytes * 0.9
ORDER BY long_term_gb DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

📦 Long-Term Storage — events_2023: 340 GB, 97% long-term. orders_archive: 89 GB, 100% long-term. Consider partitioning or expiry policies.

⚡

Performance

2 metrics

05 / PERFReal-timeHourly

Long-Running Jobs — Last 1h

Jobs exceeding 30 minutes of execution time in the last hour, with bytes processed and slot seconds.

Why monitor

30 minutes is the threshold — anything above that in BigQuery is either a data volume regression, missing partition filter, or an expensive cross-join.
Returns job_id, user, and bytes processed — you can pull the full query plan from BigQuery console and pinpoint the bottleneck immediately.
Hourly cadence with -2h to -1h window — catches runaway jobs within the hour, not the next morning.

SQL · BigQuery

SELECT
  job_id,
  user_email,
  job_type,
  TIMESTAMP_DIFF(end_time, start_time, MINUTE)    AS duration_minutes,
  ROUND(total_bytes_processed / POW(1024, 3), 2)  AS bytes_processed_gb,
  ROUND(total_slot_ms / 1000.0, 1)                AS slot_seconds,
  query
FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 HOUR)
  AND creation_time <  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
  AND state = 'DONE'
  AND TIMESTAMP_DIFF(end_time, start_time, MINUTE) > 30
ORDER BY duration_minutes DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

⚠️ Long-Running Job — 2 jobs exceeded 30 min in the last hour. Longest: 52 min by analyst@company.com · 1.2 TB processed.

06 / PERFReal-timeHourly

Job Wait Time Anomaly — Last 1h

A job's queue wait time (creation → start) exceeds the 30-day daily max by more than 30%.

Why monitor

Wait time = time from query submission to execution start — spikes indicate slot contention, reservation pressure, or a quota hit.
Compares against 30-day daily max — filters out normal scheduling jitter, only alerts on genuine queuing anomalies.
Hourly detection means you catch reservation bottlenecks while the workload is still running, not after a slow day has already happened.

SQL · BigQuery

WITH
  main_metric AS (
    SELECT
      job_id,
      user_email,
      TIMESTAMP_DIFF(start_time, creation_time, SECOND) AS METRIC_VALUE
    FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
    WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 HOUR)
      AND creation_time <  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
      AND state = 'DONE'
      AND start_time > creation_time
  ),
  baseline_windows AS (
    SELECT
      TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY),
        INTERVAL (day_offset + 1) DAY) AS window_start,
      TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY),
        INTERVAL day_offset DAY)       AS window_end
    FROM UNNEST(GENERATE_ARRAY(1, 30)) AS day_offset
  ),
  baseline_metric AS (
    SELECT
      w.window_start,
      MAX(TIMESTAMP_DIFF(j.start_time, j.creation_time, SECOND)) AS METRIC_VALUE
    FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
    JOIN baseline_windows w
      ON j.creation_time >= w.window_start
     AND j.creation_time <  w.window_end
    WHERE j.state = 'DONE'
      AND j.start_time > j.creation_time
    GROUP BY w.window_start
  ),
  baseline AS (
    SELECT
      MAX(METRIC_VALUE)       AS COMPARED_VALUE,
      MAX(METRIC_VALUE) * 1.3 AS COMPARED_VALUE_THRESHOLD
    FROM baseline_metric
  )
SELECT
  m.job_id,
  m.user_email,
  m.METRIC_VALUE                AS wait_seconds,
  b.COMPARED_VALUE,
  b.COMPARED_VALUE_THRESHOLD
FROM main_metric m
CROSS JOIN baseline b
WHERE m.METRIC_VALUE > b.COMPARED_VALUE_THRESHOLD
ORDER BY m.METRIC_VALUE DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

⚠️ Job Wait Time Anomaly — job abc123 waited 4.2 min to start. 30-day max: 0.8 min · 5.3× above baseline. Possible slot contention.

Want Slack alerts when any of these fire?

Connect BigQuery to Lighthouse in 10 minutes — free, read-only, no SQL required.

Start free →

🚨

Failures

2 metrics

07 / FAILReal-timeHourly

Job Failure Rate Spike — Last 1h

Project-level job failure rate exceeds the 7-day hourly max by more than 30%.

Why monitor

Failure rate vs absolute count — a spike from 2% to 34% is a real incident; 10 failures during a high-traffic hour might not be.
Compares against 7-day hourly max — catches genuine regressions while ignoring naturally noisy hours.
Includes total and failed counts so you can immediately assess impact scope before digging into logs.

SQL · BigQuery

WITH
  main_metric AS (
    SELECT
      COUNTIF(error_result IS NOT NULL) AS failed_count,
      COUNT(*)                          AS total_count,
      ROUND(SAFE_DIVIDE(
        COUNTIF(error_result IS NOT NULL),
        COUNT(*)
      ) * 100, 2) AS METRIC_VALUE
    FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
    WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 HOUR)
      AND creation_time <  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
      AND state = 'DONE'
  ),
  baseline_windows AS (
    SELECT
      TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY),
        INTERVAL (day_offset + 1) DAY) AS window_start,
      TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY),
        INTERVAL day_offset DAY)       AS window_end
    FROM UNNEST(GENERATE_ARRAY(1, 7)) AS day_offset
  ),
  baseline_metric AS (
    SELECT
      w.window_start,
      ROUND(SAFE_DIVIDE(
        COUNTIF(j.error_result IS NOT NULL),
        COUNT(*)
      ) * 100, 2) AS METRIC_VALUE
    FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j
    JOIN baseline_windows w
      ON j.creation_time >= w.window_start
     AND j.creation_time <  w.window_end
    WHERE j.state = 'DONE'
    GROUP BY w.window_start
  ),
  baseline AS (
    SELECT
      MAX(METRIC_VALUE)       AS COMPARED_VALUE,
      MAX(METRIC_VALUE) * 1.3 AS COMPARED_VALUE_THRESHOLD
    FROM baseline_metric
  )
SELECT
  m.METRIC_VALUE,
  m.failed_count,
  m.total_count,
  b.COMPARED_VALUE,
  b.COMPARED_VALUE_THRESHOLD
FROM main_metric m
CROSS JOIN baseline b
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

🚨 Job Failure Rate Spike — 34% of jobs failed in the last hour (41 of 121). 7-day max: 4.2% · 8.1× above baseline.

08 / FAILReal-timeHourly

Failed Jobs — Last 1h

All failed jobs in the last hour — with error reason, message, and user.

Why monitor

Returns every failure individually — error reason and message included so you know whether it's a quota hit, syntax error, or infrastructure issue.
Fixed threshold (> 0) — any failed job is worth knowing about, no baseline comparison needed.
Pairs with Q7 — Q7 tells you the rate is spiking; Q8 tells you exactly which jobs failed and why.

SQL · BigQuery

SELECT
  job_id,
  user_email,
  job_type,
  creation_time,
  end_time,
  error_result.reason  AS error_reason,
  error_result.message AS error_message,
  query
FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 2 HOUR)
  AND creation_time <  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
  AND state = 'DONE'
  AND error_result IS NOT NULL
ORDER BY creation_time DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

🚨 Failed Job — analyst@company.com · resourcesExceeded: Query exceeded resource limits. Job: bqjob_r123. 3 similar failures in the last hour.

🎯

Optimization

3 metrics

09 / OPTAuditWeekly

Stale Tables — 90-Day No Query

Tables with no recorded job reference in 90 days. Requires JOBS_BY_PROJECT history (180-day lookback).

Why monitor

Uses JOBS_BY_PROJECT referenced_tables to detect actual query activity — not just modification time, which can be updated by pipelines even when data is never read.
Cross-references TABLE_STORAGE_BY_PROJECT for size — so you can prioritize which stale tables to clean up first by storage cost.
Weekly cadence — tables accumulate gradually; a weekly audit is enough to stay ahead of storage waste.

SQL · BigQuery

WITH last_referenced AS (
  SELECT
    ref.project_id AS table_project,
    ref.dataset_id  AS table_dataset,
    ref.table_id    AS table_name,
    MAX(j.creation_time) AS last_referenced_at
  FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT` j,
  UNNEST(j.referenced_tables) AS ref
  WHERE j.creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 180 DAY)
    AND j.state = 'DONE'
  GROUP BY 1, 2, 3
)
SELECT
  s.table_catalog   AS project_id,
  s.table_schema    AS dataset_id,
  s.table_name,
  ROUND(s.total_logical_bytes / POW(1024, 3), 3) AS storage_gb,
  s.row_count,
  r.last_referenced_at,
  DATE_DIFF(CURRENT_DATE(), DATE(r.last_referenced_at), DAY) AS days_since_last_use
FROM `region-us.INFORMATION_SCHEMA.TABLE_STORAGE_BY_PROJECT` s
LEFT JOIN last_referenced r
  ON  s.table_catalog = r.table_project
  AND s.table_schema  = r.table_dataset
  AND s.table_name    = r.table_name
WHERE (
  r.last_referenced_at IS NULL
  OR r.last_referenced_at < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
)
  AND s.total_logical_bytes > 0
ORDER BY days_since_last_use DESC NULLS FIRST
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

📋 Stale Tables — events_2022 (340 GB) last queried 127 days ago. orders_backup (89 GB) never queried. 8 tables flagged this week.

10 / OPTHistoricalDaily

Peak Slot Usage — Yesterday

Hourly slot usage breakdown for yesterday — identifies peak hours and compute concentration.

Why monitor

JOBS_TIMELINE_BY_PROJECT gives period-level granularity — you see slot consumption hour by hour, not just total-per-day.
Peak hour identification enables reservation right-sizing — if your peak is 3× your average, you may be over-provisioned or under-provisioned at different times.
Daily report cadence — yesterday's usage is fully settled, no partial-period noise.

SQL · BigQuery

SELECT
  TIMESTAMP_TRUNC(period_start, HOUR)           AS hour_bucket,
  ROUND(SUM(period_slot_ms) / 3600000.0, 2)    AS slot_hours,
  COUNT(DISTINCT job_id)                         AS job_count,
  ROUND(AVG(period_slot_ms) / 3600000.0, 4)    AS avg_slot_hours_per_job
FROM `region-us.INFORMATION_SCHEMA.JOBS_TIMELINE_BY_PROJECT`
WHERE period_start >= TIMESTAMP_SUB(TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY), INTERVAL 1 DAY)
  AND period_start <  TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY)
GROUP BY 1
ORDER BY slot_hours DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

📊 Slot Usage Report — peak yesterday: 14:00 UTC · 2,840 slot-hours. 6 jobs ran > 500 slot-hours each. Off-peak (22:00-06:00): 12% of total.

11 / OPTReal-timeDaily

Large Bytes Scanned — Last 24h

Individual queries that processed more than 100 GB — likely full table scans or missing partition filters.

Why monitor

100 GB per query is the signal threshold — most well-written BigQuery queries scan far less; anything above that is usually a missing partition filter or an accidental SELECT *.
Returns the query text — you can see immediately whether it was a full table scan, a missing WHERE clause, or an intentional one-time export.
Estimated cost per query at on-demand rates ($6.25/TB) — makes it easy to prioritize which queries to optimize first.

SQL · BigQuery

SELECT
  job_id,
  user_email,
  ROUND(total_bytes_processed / POW(10, 12), 4)  AS bytes_processed_tb,
  ROUND(total_bytes_processed / POW(10, 12) * 6.25, 2) AS estimated_cost_usd,
  ROUND(total_slot_ms / 1000.0, 1)               AS slot_seconds,
  TIMESTAMP_DIFF(end_time, start_time, SECOND)   AS duration_seconds,
  query
FROM `region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT`
WHERE creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 25 HOUR)
  AND creation_time <  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
  AND state = 'DONE'
  AND total_bytes_processed > 100 * POW(1024, 3)  -- > 100 GB processed
ORDER BY bytes_processed_tb DESC
LIMIT 1000

Sample alert

LighthouseToday at 9:41 AMAPP

⚠️ Large Scan — analyst@company.com scanned 2.1 TB in a single query (est. $13.13). Missing partition filter on events table.

Want Slack alerts when any of these fire?

Connect BigQuery to Lighthouse in 10 minutes — free, read-only, no SQL required.

Start free →

These metrics are running in Lighthouse right now

Want alerts when any
of these fire?

Connect your BigQuery project and get all 11 metrics live in under 10 minutes — free, read-only, with smart comparison out of the box.

Already have metrics in mind? Describe them in plain English — Lighthouse writes the SQL.

Start free · 10 min setup →

No credit card required · Read-only access · Cancel anytime

11 metrics. Every BigQueryproject. One setup.

2 IAM roles. Read-only. That's it.

We configure it. You get the alerts.

Connect BigQuery

Deploy the kit

Get Slack alerts

Cost & Billing

Slot Usage Spike — Last 24h

Bytes Billed per User Spike — Last 24h

Dataset Storage — Weekly Snapshot

Long-Term Storage Accumulation

Performance

Long-Running Jobs — Last 1h

Job Wait Time Anomaly — Last 1h

Failures

Job Failure Rate Spike — Last 1h

Failed Jobs — Last 1h

Optimization

Stale Tables — 90-Day No Query

Peak Slot Usage — Yesterday

Large Bytes Scanned — Last 24h

Want alerts when anyof these fire?

11 metrics. Every BigQuery
project. One setup.

Want alerts when any
of these fire?