Snowflake
Monitoring Kit · v2

11 metrics. Every Snowflake
account. One setup.

Every Snowflake account has the same INFORMATION_SCHEMA and ACCOUNT_USAGE tables. These 11 queries turn them into live alerts with smart comparison — each metric checks against a 30-day rolling baseline so you only get paged when something is genuinely wrong.

Start monitoring free →
11
Production-ready metrics
4
Categories covered
30-day
Smart comparison baseline
10 min
From connect to first alert
Snowflake Kit · 11 metrics · Live
MetricCategorySchedule
Warehouse Credit Spike — Last 24h
CostHourly
Serverless Credit Spike — Last 24h
Cost12h
Database Storage Spike — Last 7d
CostDaily
Failsafe Storage Spike — Last 7d
CostDaily
Long-Running Queries — Last 1h
PerformanceHourly
Query Queue Anomaly — Last 1h
PerformanceHourly
Remote Disk Spill Anomaly — Yesterday
PerformanceDaily
ETL Failure Rate Spike — Last 1h
FailuresHourly
Failed Task — Last 1h
FailuresHourly
Stale Tables — 90-Day No Query
OptimizationWeekly
Warehouse Utilization — Yesterday
OptimizationDaily

Permissions

3 read-only grants. That's it.

Lighthouse only reads metadata — query history, metering stats, storage usage. It cannot access your actual table data, run DML, create objects, or modify anything.

ACCOUNT_USAGE Access

GRANT IMPORTED PRIVILEGES ON DATABASE SNOWFLAKE TO ROLE <your_role>;
  • Storage history
  • Serverless costs
  • Disk spill data
  • Access history
  • Warehouse load history
  • Historical query data

Real-Time Warehouse Access

Run once per warehouse you want to monitor.

GRANT MONITOR ON WAREHOUSE <warehouse_name> TO ROLE <your_role>;
  • INFORMATION_SCHEMA.query_history
  • INFORMATION_SCHEMA.warehouse_metering_history
  • INFORMATION_SCHEMA.task_history

Account-Level Monitoring

GRANT MONITOR USAGE ON ACCOUNT TO ROLE <your_role>;
  • warehouse_load_history (utilization metric)

Read-only, always → All three grants are SELECT-only metadata privileges. Lighthouse cannot write data, drop tables, create warehouses, or access the contents of your tables — only operational metadata like query durations, credit consumption, and storage sizes.

How it works

We configure it. You get the alerts.

01

Connect Snowflake

Read-only credentials, 3 clicks. Lighthouse reads your schema — never your raw data. Takes under 5 minutes.

02

Deploy the kit

All 11 monitoring metrics go live in your Lighthouse workspace — pre-configured, pre-tuned, with 30-day smart comparison baselines. No SQL required.

03

Get Slack alerts

When cost spikes, queries fail, or warehouses misbehave, you get a Slack message with value, baseline, and context — before anyone has to ask.

Time windows explained → Real-time metrics use INFORMATION_SCHEMA with a -2h to -1h window to absorb Snowflake's 30–45 min propagation lag. Historical metrics use ACCOUNT_USAGE with a -30h to -6h window so data is fully populated. No partial reads, no false drops.

💰

Cost & Credits

4 metrics
01 / COSTReal-timeHourly

Warehouse Credit Spike — Last 24h

Warehouse spend exceeds the 30-day daily max by more than 30%.

Why monitor

  • Compares today's credits per warehouse against a 30-day rolling baseline — alerts only when spend is genuinely anomalous, not just on normally busy days.
  • Uses INFORMATION_SCHEMA for real-time data — no 45-minute ACCOUNT_USAGE lag, catches runaway warehouses within the hour.
  • Per-warehouse segmentation isolates which workload (ETL, BI, ad-hoc) is responsible for the spike.
SQL · Snowflake
SELECT * FROM (
  WITH
    "main_metric" AS (
      SELECT
        warehouse_name,
        SUM(credits_used)                AS "METRIC_VALUE",
        SUM(credits_used_compute)        AS compute_credits,
        SUM(credits_used_cloud_services) AS cloud_credits
      FROM TABLE(information_schema.warehouse_metering_history(
        dateadd('hours', -25, current_timestamp()),
        dateadd('hours', -1,  current_timestamp())
      ))
      GROUP BY warehouse_name
    ),

    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, current_timestamp())            AS "WINDOW_END",
        DATEADD('hours', -24, DATEADD('day', -SEQ4() - 1,
          current_timestamp()))                                          AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),

    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.warehouse_name,
        t."WINDOW_END",
        SUM(h.credits_used) AS "METRIC_VALUE"
      FROM TABLE(information_schema.warehouse_metering_history(
        dateadd('day', -31, current_timestamp()),
        dateadd('hours', -1,  current_timestamp())
      )) h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time
       AND t."WINDOW_END"   >= h.start_time
      GROUP BY h.warehouse_name, t."WINDOW_END"
    ),

    "Compared Trend 1" AS (
      SELECT
        warehouse_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY warehouse_name
    )

  SELECT
    m.warehouse_name,
    m."METRIC_VALUE",
    m.compute_credits,
    m.cloud_credits,
    c."COMPARED_VALUE",
    c."COMPARED_VALUE_THRESHOLD",
    CASE WHEN m."METRIC_VALUE" > c."COMPARED_VALUE_THRESHOLD"
      THEN 'SPIKE' ELSE 'Normal'
    END AS status
  FROM "main_metric" m
  LEFT JOIN "Compared Trend 1" c ON m.warehouse_name = c.warehouse_name
  ORDER BY m."METRIC_VALUE" DESC
) AS subquery
LIMIT 1000

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Warehouse Credit Spike — COMPUTE_WH used 12.4 credits in the last 24h. 30-day max: 3.1 credits · 4× above baseline.

02 / COSTHistorical12h

Serverless Credit Spike — Last 24h

Serverless spend (tasks, auto-clustering, materialized views) exceeds the 30-day daily max by more than 30%.

Why monitor

  • Serverless costs don't appear under any warehouse — they accumulate silently and show up as a surprise line on the bill.
  • Auto-clustering and MV refreshes can balloon 10× overnight when an upstream table churns unexpectedly.
  • Compares against 30-day daily baseline — only fires when serverless spend is genuinely out of pattern.
SQL · Snowflake
SELECT * FROM (
  WITH
    "main_metric" AS (
      SELECT SUM(credits_used) AS "METRIC_VALUE"
      FROM snowflake.account_usage.serverless_task_history
      WHERE start_time >= dateadd('hours', -30, current_timestamp())
        AND start_time <= dateadd('hours', -6,  current_timestamp())
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, dateadd('hours', -6, current_timestamp())) AS "WINDOW_END",
        DATEADD('hours', -24, DATEADD('day', -SEQ4() - 1,
          dateadd('hours', -6, current_timestamp())))                          AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        t."WINDOW_END",
        SUM(credits_used) AS "METRIC_VALUE"
      FROM snowflake.account_usage.serverless_task_history
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= start_time AND t."WINDOW_END" >= start_time
      GROUP BY t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
    )
  SELECT
    m."METRIC_VALUE",
    c."COMPARED_VALUE",
    c."COMPARED_VALUE_THRESHOLD"
  FROM "main_metric" m
  LEFT JOIN "Compared Trend 1" c ON TRUE
) AS subquery
LIMIT 1000

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Serverless Credit Spike — 8.2 serverless credits used in the last 24h. 30-day max: 1.4 credits · 5.8× above baseline.

03 / COSTHistoricalDaily

Database Storage Spike — Last 7d

Database storage exceeds the 30-day weekly max by more than 30%. Value in bytes.

Why monitor

  • Compares last 7 days of storage against 30 days of weekly windows — catches structural growth, not just day-to-day table churn noise.
  • Per-database segmentation pinpoints which database is growing, not just the account total.
  • Storage creeps up silently — a bad pipeline can add terabytes before it shows on the bill.
SQL · Snowflake
SELECT * FROM (
  WITH
    "main_metric" AS (
      SELECT
        database_name,
        AVG(average_database_bytes) AS "METRIC_VALUE"
      FROM snowflake.account_usage.database_storage_usage_history
      WHERE usage_date >= (current_timestamp() - INTERVAL '7 days')::date
        AND usage_date <  current_date()
      GROUP BY database_name
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('week', -SEQ4() - 1, current_timestamp())              AS "WINDOW_END",
        DATEADD('days', -7, DATEADD('week', -SEQ4() - 1,
          current_timestamp()))                                            AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 6))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.database_name,
        t."WINDOW_END",
        AVG(h.average_database_bytes) AS "METRIC_VALUE"
      FROM snowflake.account_usage.database_storage_usage_history h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START"::date <= h.usage_date
       AND t."WINDOW_END"::date   >= h.usage_date
      GROUP BY h.database_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        database_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY database_name
    )
  SELECT
    m.database_name,
    m."METRIC_VALUE",
    c."COMPARED_VALUE",
    c."COMPARED_VALUE_THRESHOLD"
  FROM "main_metric" m
  LEFT JOIN "Compared Trend 1" c ON m.database_name = c.database_name
) AS subquery
LIMIT 1000

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Database Storage Spike — ANALYTICS_DB averaged 18.2 GB over the last 7 days. 30-day weekly max: 3.1 GB · 5.9× above baseline.

04 / COSTHistoricalDaily

Failsafe Storage Spike — Last 7d

Failsafe storage exceeds the 30-day weekly max by more than 30%. Value in bytes.

Why monitor

  • Failsafe is mandatory and fully billable — you can't turn it off, so fast growth means real money with no way to stop it mid-flight.
  • Rapid failsafe growth = high table churn — usually a pipeline recreating large tables on every run instead of doing incremental updates.
  • Same comparison pattern as regular storage so you get consistent, noise-free alerting for both signals.
SQL · Snowflake
SELECT * FROM (
  WITH
    "main_metric" AS (
      SELECT
        database_name,
        AVG(average_failsafe_bytes) AS "METRIC_VALUE"
      FROM snowflake.account_usage.database_storage_usage_history
      WHERE usage_date >= (current_timestamp() - INTERVAL '7 days')::date
        AND usage_date <  current_date()
      GROUP BY database_name
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('week', -SEQ4() - 1, current_timestamp())              AS "WINDOW_END",
        DATEADD('days', -7, DATEADD('week', -SEQ4() - 1,
          current_timestamp()))                                            AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 6))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.database_name,
        t."WINDOW_END",
        AVG(h.average_failsafe_bytes) AS "METRIC_VALUE"
      FROM snowflake.account_usage.database_storage_usage_history h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START"::date <= h.usage_date
       AND t."WINDOW_END"::date   >= h.usage_date
      GROUP BY h.database_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        database_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY database_name
    )
  SELECT
    m.database_name,
    m."METRIC_VALUE",
    c."COMPARED_VALUE",
    c."COMPARED_VALUE_THRESHOLD"
  FROM "main_metric" m
  LEFT JOIN "Compared Trend 1" c ON m.database_name = c.database_name
) AS subquery
LIMIT 1000

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Failsafe Storage Spike — ANALYTICS_DB: 94 GB in failsafe. 30-day weekly max: 23 GB · Tables with frequent overwrites are driving retention costs.

Performance

3 metrics
05 / PERFReal-timeHourly

Long-Running Queries — Last 1h

Queries exceeding 30 minutes of execution time in the last hour.

Why monitor

  • 30 minutes is the threshold — anything above that is either a regression or a one-off heavy query worth knowing about.
  • Returns QUERY_ID per row so you can pull the full query profile from Snowflake and pinpoint the bottleneck.
  • INFORMATION_SCHEMA, no lag — alerts fire within the hour, not the next morning.
SQL · Snowflake
SELECT
    query_id,
    warehouse_name,
    user_name,
    ROUND(total_elapsed_time / 1000 / 60, 1) AS duration_minutes
FROM TABLE(information_schema.query_history(
    dateadd('hours', -2, current_timestamp()),
    dateadd('hours', -1, current_timestamp())
))
WHERE total_elapsed_time > 1800000   -- 30 minutes in ms
  AND execution_status = 'SUCCESS'
ORDER BY total_elapsed_time DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Long-Running Query — 3 queries exceeded 30 min in the last hour. Longest: 47 min on ANALYTICS_WH by user ETL_SERVICE.

06 / PERFReal-timeHourly

Query Queue Anomaly — Last 1h

A query's queue time exceeds the 30-day daily max by more than 30%.

Why monitor

  • Real-time detection via INFORMATION_SCHEMA, historical baseline via ACCOUNT_USAGE — hybrid approach gives you the best of both: no lag on the alert, full 30-day history on the comparison.
  • Per-query comparison, not totals — fires when a specific query waited longer than any queue in the last 30 days for that warehouse.
  • Direct signal for right-sizing: persistent queue anomalies = scale up or split the workload.
SQL · Snowflake
SELECT * FROM (
  WITH
    "main_metric" AS (
      SELECT
        query_id, warehouse_name, user_name,
        ROUND(queued_overload_time / 1000 / 60, 2) AS "METRIC_VALUE"
      FROM TABLE(information_schema.query_history(
        dateadd('hours', -2, current_timestamp()),
        dateadd('hours', -1, current_timestamp())
      ))
      WHERE queued_overload_time > 0
    ),
    "history" AS (
      SELECT warehouse_name, start_time, queued_overload_time
      FROM snowflake.account_usage.query_history
      WHERE start_time >= dateadd('day', -31, current_timestamp())
        AND start_time <  dateadd('hours', -6, current_timestamp())
        AND queued_overload_time > 0
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, current_timestamp())          AS "WINDOW_END",
        DATEADD('hours', -24, DATEADD('day', -SEQ4() - 1,
          current_timestamp()))                                      AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.warehouse_name, t."WINDOW_END",
        MAX(ROUND(h.queued_overload_time / 1000 / 60, 2)) AS "METRIC_VALUE"
      FROM "history" h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time AND t."WINDOW_END" >= h.start_time
      GROUP BY h.warehouse_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        warehouse_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY warehouse_name
    )
  SELECT
    m.query_id, m.warehouse_name, m.user_name,
    m."METRIC_VALUE", c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
  FROM "main_metric" m
  LEFT JOIN "Compared Trend 1" c ON m.warehouse_name = c.warehouse_name
  WHERE m."METRIC_VALUE" > c."COMPARED_VALUE_THRESHOLD"
  ORDER BY m."METRIC_VALUE" DESC
) AS subquery
LIMIT 1000

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Query Queue Anomaly — 14 queries queued in the last hour on REPORTING_WH. 30-day hourly max: 2 · 7× above baseline.

07 / PERFHistoricalDaily

Remote Disk Spill Anomaly — Yesterday

A query's remote disk spill exceeds the 30-day daily max by more than 30%.

Why monitor

  • Remote spill is 10–100× slower than in-memory — a spilling query burns proportionally more credits for the same output.
  • Uses ACCOUNT_USAGE because bytes_spilled columns don't exist in INFORMATION_SCHEMA — daily run is appropriate since spill isn't a real-time emergency.
  • Compares per-query spill against the 30-day per-warehouse max — only fires when a query is spilling more than anything seen recently.
SQL · Snowflake
SELECT * FROM (
  WITH
    "main_metric" AS (
      SELECT
        query_id, warehouse_name, user_name,
        ROUND(bytes_spilled_to_remote_storage / 1e9, 3) AS "METRIC_VALUE",
        ROUND(bytes_spilled_to_local_storage  / 1e9, 3) AS spilled_local_gb
      FROM snowflake.account_usage.query_history
      WHERE start_time >= dateadd('day', -1, current_date())
        AND start_time <  current_date()
        AND bytes_spilled_to_remote_storage > 0
    ),
    "history" AS (
      SELECT warehouse_name, start_time, bytes_spilled_to_remote_storage
      FROM snowflake.account_usage.query_history
      WHERE start_time >= dateadd('day', -31, current_date())
        AND start_time <  dateadd('day', -1, current_date())
        AND bytes_spilled_to_remote_storage > 0
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, current_date())              AS "WINDOW_END",
        DATEADD('day', -1, DATEADD('day', -SEQ4() - 1,
          current_date()))                                            AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 30))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.warehouse_name, t."WINDOW_END",
        MAX(ROUND(h.bytes_spilled_to_remote_storage / 1e9, 3)) AS "METRIC_VALUE"
      FROM "history" h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time::date
       AND t."WINDOW_END"   >= h.start_time::date
      GROUP BY h.warehouse_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        warehouse_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY warehouse_name
    )
  SELECT
    m.query_id, m.warehouse_name, m.user_name,
    m."METRIC_VALUE", m.spilled_local_gb,
    c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
  FROM "main_metric" m
  LEFT JOIN "Compared Trend 1" c ON m.warehouse_name = c.warehouse_name
  WHERE m."METRIC_VALUE" > c."COMPARED_VALUE_THRESHOLD"
  ORDER BY m."METRIC_VALUE" DESC
) AS subquery
LIMIT 1000

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Remote Disk Spill — 12.4 GB spilled to remote storage yesterday. 30-day max: 0.3 GB · TRANSFORM_WH queries are undersized for this workload.

Want Slack alerts when any of these fire?

Connect Snowflake to Lighthouse in 10 minutes — free, read-only, no SQL required.

Start free →
🔴

Failures

2 metrics
08 / FAILReal-timeHourly

ETL Failure Rate Spike — Last 1h

Pipeline query failure rate exceeds the 7-day hourly max by more than 30%. Filters to ETL types only: INSERT, MERGE, COPY, UPDATE, DELETE, CTAS.

Why monitor

  • Failure rate, not count — 2 failures out of 1,000 ETL queries is fine; 2 out of 3 is a disaster. Rate is the right signal.
  • ETL types only — syntax errors from ad-hoc queries and dev work are excluded so the alert only fires on pipeline breakage.
  • Compares against 7-day baseline using ACCOUNT_USAGE for history and INFORMATION_SCHEMA for the real-time current hour.
SQL · Snowflake
SELECT * FROM (
  WITH
    "all_queries" AS (
      SELECT warehouse_name, execution_status
      FROM TABLE(information_schema.query_history(
        dateadd('hours', -2, current_timestamp()),
        dateadd('hours', -1, current_timestamp())
      ))
      WHERE query_type IN (
        'INSERT', 'MERGE', 'COPY', 'UPDATE', 'DELETE', 'CREATE_TABLE_AS_SELECT'
      )
    ),
    "main_metric" AS (
      SELECT
        warehouse_name,
        COUNT(CASE WHEN execution_status = 'FAILED' THEN 1 END) AS failed_count,
        COUNT(*)                                                     AS total_count,
        ROUND(COUNT(CASE WHEN execution_status = 'FAILED' THEN 1 END)
          / NULLIF(COUNT(*), 0) * 100, 2)                       AS "METRIC_VALUE"
      FROM "all_queries"
      GROUP BY warehouse_name
    ),
    "history" AS (
      SELECT warehouse_name, start_time, execution_status
      FROM snowflake.account_usage.query_history
      WHERE start_time >= dateadd('day', -7, current_date())
        AND start_time <  dateadd('hours', -6, current_timestamp())
        AND query_type IN (
          'INSERT', 'MERGE', 'COPY', 'UPDATE', 'DELETE', 'CREATE_TABLE_AS_SELECT'
        )
    ),
    "Compared Trend 1_time_windows" AS (
      SELECT
        DATEADD('day', -SEQ4() - 1, current_timestamp())          AS "WINDOW_END",
        DATEADD('hours', -24, DATEADD('day', -SEQ4() - 1,
          current_timestamp()))                                      AS "WINDOW_START"
      FROM TABLE(GENERATOR(ROWCOUNT => 7))
    ),
    "Compared Trend 1_metric_per_time_window" AS (
      SELECT
        h.warehouse_name, t."WINDOW_END",
        ROUND(COUNT(CASE WHEN h.execution_status = 'FAILED' THEN 1 END)
          / NULLIF(COUNT(*), 0) * 100, 2) AS "METRIC_VALUE"
      FROM "history" h
      JOIN "Compared Trend 1_time_windows" t
        ON t."WINDOW_START" <= h.start_time AND t."WINDOW_END" >= h.start_time
      GROUP BY h.warehouse_name, t."WINDOW_END"
    ),
    "Compared Trend 1" AS (
      SELECT
        warehouse_name,
        MAX("METRIC_VALUE")              AS "COMPARED_VALUE",
        MAX("METRIC_VALUE") * (1 + 0.3) AS "COMPARED_VALUE_THRESHOLD"
      FROM "Compared Trend 1_metric_per_time_window"
      GROUP BY warehouse_name
    )
  SELECT
    m.warehouse_name, m.failed_count, m.total_count,
    m."METRIC_VALUE", c."COMPARED_VALUE", c."COMPARED_VALUE_THRESHOLD"
  FROM "main_metric" m
  LEFT JOIN "Compared Trend 1" c ON m.warehouse_name = c.warehouse_name
  WHERE m."METRIC_VALUE" > c."COMPARED_VALUE_THRESHOLD"
  ORDER BY m."METRIC_VALUE" DESC
) AS subquery
LIMIT 1000

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ ETL Failure Rate Spike — 23% failure rate in the last hour (7 of 30 ETL queries failed on ETL_WH). 7-day max: 2% · Check recent schema changes.

09 / FAILReal-timeHourly

Failed Task — Last 1h

A Snowflake task failed. Downstream data may be stale.

Why monitor

  • Tasks fail silently by default — no email, no Slack, just stale data downstream until a stakeholder notices.
  • Uses INFORMATION_SCHEMA.task_history for real-time detection — no 45-minute ACCOUNT_USAGE lag, so the alert fires within the hour.
  • Returns ERROR_MESSAGE per task so you know immediately whether it's a dependency failure, a schema change, or a data issue.
SQL · Snowflake
SELECT
    name,
    database_name,
    schema_name,
    state,
    error_code,
    error_message,
    scheduled_time,
    completed_time
FROM TABLE(information_schema.task_history(
    scheduled_time_range_start => dateadd('hours', -2, current_timestamp()),
    scheduled_time_range_end   => dateadd('hours', -1, current_timestamp())
))
WHERE state = 'FAILED'
ORDER BY scheduled_time DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Failed Task — NIGHTLY_TRANSFORM failed at 2:14 AM (scheduled 2:00 AM). Error: SQL compilation error — object 'STAGE_DB.RAW.EVENTS' does not exist.

⚙️

Optimization

2 metrics
10 / OPTAuditWeekly

Stale Tables — 90-Day No Query

Tables with no recorded query access in 90 days. Requires Enterprise edition (access_history).

Why monitor

  • Uses LATERAL FLATTEN on access_history to detect any read access — not just direct queries, but joins and CTEs where the table is a base object.
  • Excludes Snowflake system tables (table_catalog != 'SNOWFLAKE') so only your data shows up in results.
  • Returns one row per table with storage_gb — sorted by size so the biggest cost savings surface first.
SQL · Snowflake
WITH accessed_90d AS (
    SELECT DISTINCT
        f.value:objectId::number AS object_id
    FROM snowflake.account_usage.access_history,
        LATERAL FLATTEN(input => base_objects_accessed) f
    WHERE query_start_time >= DATEADD('days', -90, CURRENT_TIMESTAMP())
      AND f.value:objectDomain::string = 'Table'
)
SELECT
    t.table_catalog,
    t.table_schema,
    t.table_name,
    ROUND(t.active_bytes / 1073741824.0, 3) AS storage_gb
FROM snowflake.account_usage.table_storage_metrics t
WHERE t.deleted = FALSE
  AND t.table_catalog != 'SNOWFLAKE'
  AND t.table_schema NOT IN ('INFORMATION_SCHEMA')
  AND t.id NOT IN (SELECT object_id FROM accessed_90d)
  AND t.active_bytes > 0
ORDER BY storage_gb DESC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

🔍 Stale Tables Audit — 12 tables with no access in 90+ days. Total: 142 GB. Largest: LEGACY_EVENTS (89 GB) in ANALYTICS_DB.

11 / OPTHistoricalDaily

Warehouse Utilization — Yesterday

Ratio of paid running time that was actively processing queries (0–1). Below 0.3 = idle waste. Above 0.9 = capacity risk.

Why monitor

  • Filters to windows with actual load (avg_running > 0) — WAREHOUSE_LOAD_HISTORY includes idle rows too (warehouse started but no queries). Without this filter, running_minutes inflates to a full 1,440-minute day even when the warehouse did almost nothing.
  • DATEDIFF × avg_running per 5-minute window — a 3-second query in a 5-minute window counts as 3 seconds of active time. running_minutes = loaded windows; active_minutes = weighted processing time within them.
  • Alerts in both directions: below 0.3 means the warehouse is mostly idle while billing (oversized or auto-suspend too high); above 0.9 means you may need to scale up.
SQL · Snowflake
SELECT
    warehouse_name,
    ROUND(SUM(DATEDIFF('minute', start_time, end_time)), 1)         AS running_minutes,
    ROUND(SUM(DATEDIFF('minute', start_time, end_time) * avg_running), 2) AS active_minutes,
    ROUND(
        SUM(DATEDIFF('minute', start_time, end_time) * avg_running)
        / NULLIF(SUM(DATEDIFF('minute', start_time, end_time)), 0), 3
    )                                                                     AS utilization_pct
FROM snowflake.account_usage.warehouse_load_history
WHERE start_time >= DATEADD('day', -1, CURRENT_DATE())
  AND start_time <  CURRENT_DATE()
  AND avg_running > 0     -- exclude idle windows (warehouse started but no queries)
GROUP BY warehouse_name
ORDER BY utilization_pct ASC

Sample alert

LH
LighthouseToday at 9:41 AMAPP

⚠️ Low Warehouse Utilization — TRANSFORM_WH: 0.12 utilization yesterday. Only 12% of paid runtime was active · Consider reducing warehouse size or lowering auto-suspend.

Want Slack alerts when any of these fire?

Connect Snowflake to Lighthouse in 10 minutes — free, read-only, no SQL required.

Start free →
These metrics are running in Lighthouse right now

Want alerts when any
of these fire?

Connect your Snowflake account and get all 11 metrics live in under 10 minutes — free, read-only, with smart comparison out of the box.

Already have metrics in mind? Describe them in plain English — Lighthouse writes the SQL.

No credit card required · Read-only access · Cancel anytime