Quick Start

This guide walks you through setting up CronJob monitoring in 5 minutes.

Prerequisites

CronJob Guardian installed in your cluster
At least one CronJob running in your cluster

Step 1: Create an AlertChannel

First, create an alert channel to receive notifications. This example uses Slack:

slack-channel.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: AlertChannel
metadata:
  name: team-slack
spec:
  type: slack
  slack:
    webhookSecretRef:
      name: slack-webhook
      namespace: default
      key: url

Create the secret with your Slack webhook URL:

kubectl create secret generic slack-webhook \
  --from-literal=url=https://hooks.slack.com/services/YOUR/WEBHOOK/URL

Apply the channel:

kubectl apply -f slack-channel.yaml

Other Alert Channels

CronJob Guardian also supports PagerDuty, generic webhooks, and email. See Alerting Configuration for details.

Step 2: Create a CronJobMonitor

Now create a monitor to watch your CronJobs:

basic-monitor.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: production-jobs
  namespace: production
spec:
  # Watch all CronJobs in this namespace
  selector: {}

  # Dead-man's switch: alert if jobs don't run
  deadManSwitch:
    enabled: true
    autoFromSchedule:
      enabled: true           # Detect expected interval from cron schedule
      missedScheduleThreshold: 2  # Alert after 2 missed runs

  # SLA tracking
  sla:
    minSuccessRate: 95      # Alert if success rate drops below 95%
    windowDays: 7           # Over a 7-day rolling window

  # Where to send alerts
  alerting:
    channelRefs:
      - name: team-slack

Apply the monitor:

kubectl apply -f basic-monitor.yaml

Step 3: Access the Dashboard

Port-forward to access the web UI:

kubectl port-forward -n cronjob-guardian svc/cronjob-guardian 8080:8080

Open http://localhost:8080 in your browser.

Dashboard Overview

The dashboard shows:

Overview: Summary cards, CronJob health table, active alerts
CronJob Details: Per-job metrics, execution history, charts
SLA: Compliance dashboard with breach tracking
Alerts: Alert history with filtering

Step 4: Test the Setup

Verify monitoring is working by checking the CronJobMonitor status:

kubectl get cronjobmonitor production-jobs -n production -o yaml

Look for the status section showing discovered CronJobs and their health.

To test alerting, you can:

Manually fail a job: Create a CronJob that exits with error
Use the test button: In the dashboard, go to Channels and click "Test"
Wait for natural failures: Monitor will catch real issues automatically

Example: Watch Critical Jobs Only

Use label selectors to watch specific CronJobs:

critical-only.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: critical-jobs
  namespace: production
spec:
  selector:
    matchLabels:
      tier: critical

  deadManSwitch:
    enabled: true
    autoFromSchedule:
      enabled: true

  sla:
    minSuccessRate: 99.9    # Stricter SLA for critical jobs
    windowDays: 30
    maxDuration: 1h         # Alert if jobs take longer than 1 hour

  alerting:
    channelRefs:
      - name: team-slack
    severityOverrides:
      deadManTriggered: critical    # Dead-man failures are critical
      slaBreached: warning          # SLA violations are warnings

Example: Multi-Namespace Watch

Watch CronJobs across multiple namespaces:

multi-namespace.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: all-production
  namespace: cronjob-guardian
spec:
  # Watch these namespaces
  namespaces:
    - production
    - staging
    - batch-jobs

  selector:
    matchLabels:
      monitored: "true"

  deadManSwitch:
    enabled: true
    autoFromSchedule:
      enabled: true

  alerting:
    channelRefs:
      - name: team-slack

Next Steps

Features - Learn about all monitoring features
CronJob Selectors - Advanced selection patterns
Alert Configuration - Customize alert behavior
Examples - More monitor configurations

Prerequisites​

Step 1: Create an AlertChannel​

Step 2: Create a CronJobMonitor​

Step 3: Access the Dashboard​

Step 4: Test the Setup​

Example: Watch Critical Jobs Only​

Example: Multi-Namespace Watch​

Next Steps​