Skip to main content

API Reference

Packages

guardian.illenium.net/v1alpha1

Package v1alpha1 contains API Schema definitions for the guardian v1alpha1 API group.

Resource Types

ActiveAlert

ActiveAlert represents an active alert

Appears in:

FieldDescriptionDefaultValidation
type stringType of alert
severity stringSeverity of alert
message stringMessage describes the alert
since TimeSince is when the alert became active
lastNotified TimeLastNotified is when the alert was last sent
exitCode integerExitCode from the failed container (for JobFailed alerts)
reason stringReason for the failure (e.g., OOMKilled, Error)
suggestedFix stringSuggestedFix provides actionable guidance for resolving the alert

ActiveJob

ActiveJob represents a currently running job

Appears in:

FieldDescriptionDefaultValidation
name stringName of the Job
startTime TimeStartTime is when the Job started
runningDuration DurationRunningDuration is how long the job has been running
podPhase stringPodPhase is the current phase of the job's pod (Pending, Running, etc.)
podName stringPodName is the name of the pod running the job
ready stringReady indicates how many pods are ready vs total

AlertChannel

AlertChannel is the Schema for the alertchannels API.

FieldDescriptionDefaultValidation
apiVersion stringguardian.illenium.net/v1alpha1
kind stringAlertChannel
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec AlertChannelSpec
status AlertChannelStatus

AlertChannelSpec

AlertChannelSpec defines the desired state of AlertChannel

Appears in:

FieldDescriptionDefaultValidation
type stringType of alert channelEnum: [slack pagerduty webhook email]
slack SlackConfigSlack configuration
pagerduty PagerDutyConfigPagerDuty configuration
webhook WebhookConfigWebhook configuration
email EmailConfigEmail configuration
rateLimiting RateLimitConfigRateLimiting prevents alert storms
testOnSave booleanTestOnSave sends a test alert when saved (default: false)

AlertChannelStatus

AlertChannelStatus defines the observed state of AlertChannel

Appears in:

FieldDescriptionDefaultValidation
ready booleanReady indicates the channel is operational
lastTestTime TimeLastTestTime is when the channel was last tested
lastTestResult stringLastTestResult is the result of the last testEnum: [success failed]
lastTestError stringLastTestError is the error from the last test
alertsSentTotal integerAlertsSentTotal is total alerts successfully sent via this channel
lastAlertTime TimeLastAlertTime is when the last alert was successfully sent
alertsFailedTotal integerAlertsFailedTotal is total alerts that failed to send via this channel
lastFailedTime TimeLastFailedTime is when the last alert failed to send
lastFailedError stringLastFailedError is the error message from the last failed send
consecutiveFailures integerConsecutiveFailures is the number of consecutive failed sends
Resets to 0 on successful send
conditions Condition arrayConditions represent latest observations

AlertContext

AlertContext specifies what context to include in alerts

Appears in:

FieldDescriptionDefaultValidation
logs booleanLogs includes pod logs (default: true)
logLines integerLogLines is number of log lines to include (default: 50)Maximum: 10000
Minimum: 1
logContainerName stringLogContainerName specifies container for logs (default: first container)
includeInitContainerLogs booleanIncludeInitContainerLogs includes init container logs (default: false)
events booleanEvents includes Kubernetes events (default: true)
podStatus booleanPodStatus includes pod status details (default: true)
suggestedFixes booleanSuggestedFixes includes fix suggestions (default: true)

AlertingConfig

AlertingConfig configures alerting behavior

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled turns on alerting (default: true)
channelRefs ChannelRef arrayChannelRefs references cluster-scoped AlertChannel CRs
includeContext AlertContextIncludeContext specifies what context to include in alerts
suppressDuplicatesFor DurationSuppressDuplicatesFor prevents re-alerting within this window (default: 1h)
alertDelay DurationAlertDelay delays alert dispatch to allow transient issues to resolve.
If the issue resolves (e.g., next job succeeds) before the delay expires,
the alert is cancelled and never sent. Useful for flaky jobs.
Example: "5m" waits 5 minutes before sending failure alerts.
severityOverrides SeverityOverridesSeverityOverrides customizes severity for alert types
suggestedFixPatterns SuggestedFixPattern arraySuggestedFixPatterns defines custom fix patterns for this monitor
These are merged with built-in patterns, with custom patterns taking priority

AutoScheduleConfig

AutoScheduleConfig configures automatic schedule detection

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled turns on auto-detection (default: false)
buffer DurationBuffer adds extra time to expected interval (default: 1h)
missedScheduleThreshold integerMissedScheduleThreshold alerts after this many missed schedules (default: 1)Minimum: 1

ChannelRef

ChannelRef references an AlertChannel CR

Appears in:

FieldDescriptionDefaultValidation
name stringName of the AlertChannel CR
severities string arraySeverities to send to this channel (empty = all)

CronJobMetrics

CronJobMetrics contains SLA metrics for a CronJob

Appears in:

FieldDescriptionDefaultValidation
successRate float
totalRuns integer
successfulRuns integer
failedRuns integer
avgDurationSeconds floatDuration in seconds
p50DurationSeconds float
p95DurationSeconds float
p99DurationSeconds float

CronJobMonitor

CronJobMonitor is the Schema for the cronjobmonitors API.

FieldDescriptionDefaultValidation
apiVersion stringguardian.illenium.net/v1alpha1
kind stringCronJobMonitor
metadata ObjectMetaRefer to Kubernetes API documentation for fields of metadata.
spec CronJobMonitorSpec
status CronJobMonitorStatus

CronJobMonitorSpec

CronJobMonitorSpec defines the desired state of CronJobMonitor

Appears in:

FieldDescriptionDefaultValidation
selector CronJobSelectorSelector specifies which CronJobs to monitor
deadManSwitch DeadManSwitchConfigDeadManSwitch configures dead-man's switch alerting
sla SLAConfigSLA configures SLA tracking and alerting
suspendedHandling SuspendedHandlingConfigSuspendedHandling configures behavior for suspended CronJobs
maintenanceWindows MaintenanceWindow arrayMaintenanceWindows defines scheduled maintenance periods
alerting AlertingConfigAlerting configures alert channels and behavior
dataRetention DataRetentionConfigDataRetention configures data lifecycle management

CronJobMonitorStatus

CronJobMonitorStatus defines the observed state of CronJobMonitor

Appears in:

FieldDescriptionDefaultValidation
observedGeneration integerObservedGeneration is the generation last processed
phase stringPhase indicates the monitor's operational stateEnum: [Initializing Active Degraded Error]
lastReconcileTime TimeLastReconcileTime is when the controller last reconciled
summary MonitorSummarySummary provides aggregate counts
cronJobs CronJobStatus arrayCronJobs contains per-CronJob status
conditions Condition arrayConditions represent the latest observations

CronJobSelector

CronJobSelector specifies which CronJobs to monitor. An empty selector matches all CronJobs in the monitor's namespace.

Appears in:

FieldDescriptionDefaultValidation
matchLabels object (keys:string, values:string)MatchLabels selects CronJobs by labels
matchExpressions LabelSelectorRequirement arrayMatchExpressions selects CronJobs by label expressions
matchNames string arrayMatchNames explicitly lists CronJob names to monitor (only valid when watching a single namespace)
namespaces string arrayNamespaces explicitly lists namespaces to watch for CronJobs.
If empty and namespaceSelector is not set, watches only the monitor's namespace.
namespaceSelector LabelSelectorNamespaceSelector selects namespaces by labels.
CronJobs in matching namespaces will be monitored.
allNamespaces booleanAllNamespaces watches CronJobs in all namespaces (except globally ignored ones).
Takes precedence over namespaces and namespaceSelector.

CronJobStatus

CronJobStatus contains status for a single CronJob

Appears in:

FieldDescriptionDefaultValidation
name stringName of the CronJob
namespace stringNamespace of the CronJob
status stringStatus indicates healthEnum: [healthy warning critical suspended unknown]
suspended booleanSuspended indicates if the CronJob is suspended
lastSuccessfulTime TimeLastSuccessfulTime is when the last Job succeeded
lastFailedTime TimeLastFailedTime is when the last Job failed
lastRunDuration DurationLastRunDuration is the duration of the last completed Job
nextScheduledTime TimeNextScheduledTime is when the next Job will be created
metrics CronJobMetricsMetrics contains SLA metrics
activeJobs ActiveJob arrayActiveJobs lists currently running jobs for this CronJob
activeAlerts ActiveAlert arrayActiveAlerts lists current alerts for this CronJob

DataRetentionConfig

DataRetentionConfig configures data lifecycle management for this monitor

Appears in:

FieldDescriptionDefaultValidation
retentionDays integerRetentionDays overrides global retention for this monitor's execution history
If not set, uses global history-retention.default-days setting
Minimum: 1
onCronJobDeletion stringOnCronJobDeletion defines behavior when a monitored CronJob is deletedEnum: [retain purge purge-after-days]
purgeAfterDays integerPurgeAfterDays specifies how long to wait before purging data
Only used when onCronJobDeletion is "purge-after-days"
Minimum: 0
onRecreation stringOnRecreation defines behavior when a CronJob is recreated (detected via UID change)
"retain" keeps old history, "reset" deletes history from the old UID
Enum: [retain reset]
storeLogs booleanStoreLogs enables storing job logs in the database
If nil, uses global --storage.log-storage-enabled setting
logRetentionDays integerLogRetentionDays specifies how long to keep stored logs
If not set, uses the same value as retentionDays
Minimum: 1
maxLogSizeKB integerMaxLogSizeKB is the maximum log size to store per execution in KB
If not set, uses global --storage.max-log-size-kb setting
Minimum: 1
storeEvents booleanStoreEvents enables storing Kubernetes events in the database
If nil, uses global --storage.event-storage-enabled setting

DeadManSwitchConfig

DeadManSwitchConfig configures dead-man's switch behavior

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled turns on dead-man's switch monitoring (default: true)
maxTimeSinceLastSuccess DurationMaxTimeSinceLastSuccess alerts if no success within this duration
Example: "25h" for daily jobs with 1h buffer
autoFromSchedule AutoScheduleConfigAutoFromSchedule auto-calculates expected interval from cron schedule

EmailConfig

EmailConfig configures email notifications

Appears in:

FieldDescriptionDefaultValidation
smtpSecretRef NamespacedSecretRefSMTPSecretRef references Secret with host, port, username, password
from stringFrom is the sender address
to string arrayTo is the list of recipient addresses
subjectTemplate stringSubjectTemplate is a Go template for subject
bodyTemplate stringBodyTemplate is a Go template for body

ExitCodeRange

ExitCodeRange defines a range of exit codes [Min, Max] inclusive

Appears in:

FieldDescriptionDefaultValidation
min integer
max integer

MaintenanceWindow

MaintenanceWindow defines a scheduled maintenance period

Appears in:

FieldDescriptionDefaultValidation
name stringName identifies this maintenance window
schedule stringSchedule is a cron expression for when window starts
duration DurationDuration of the maintenance window
timezone stringTimezone for the schedule (default: UTC)
suppressAlerts booleanSuppressAlerts during this window (default: true)

MonitorSummary

MonitorSummary provides aggregate counts

Appears in:

FieldDescriptionDefaultValidation
totalCronJobs integer
healthy integer
warning integer
critical integer
suspended integer
running integer
activeAlerts integer

NamespacedSecretKeyRef

NamespacedSecretKeyRef references a key in a namespaced Secret

Appears in:

FieldDescriptionDefaultValidation
name string
namespace string
key string

NamespacedSecretRef

NamespacedSecretRef references a namespaced Secret

Appears in:

FieldDescriptionDefaultValidation
name string
namespace string

PagerDutyConfig

PagerDutyConfig configures PagerDuty notifications

Appears in:

FieldDescriptionDefaultValidation
routingKeySecretRef NamespacedSecretKeyRefRoutingKeySecretRef references the Secret containing routing key
severity stringSeverity is the default PagerDuty severityEnum: [critical error warning info]

PatternMatch

PatternMatch defines what to match against for suggested fixes

Appears in:

FieldDescriptionDefaultValidation
exitCode integerExitCode matches specific exit codes (e.g., 137 for OOM)
exitCodeRange ExitCodeRangeExitCodeRange matches a range [min, max] inclusive
reason stringReason matches container termination reason (exact match, case-insensitive)
reasonPattern stringReasonPattern matches reason using regex
logPattern stringLogPattern matches log content using regex
eventPattern stringEventPattern matches event messages using regex

RateLimitConfig

RateLimitConfig configures rate limiting

Appears in:

FieldDescriptionDefaultValidation
maxAlertsPerHour integerMaxAlertsPerHour limits alerts per hour (default: 100)Minimum: 1
burstLimit integerBurstLimit limits alerts per minute (default: 10)Minimum: 1

SLAConfig

SLAConfig configures SLA tracking

Appears in:

FieldDescriptionDefaultValidation
enabled booleanEnabled turns on SLA tracking (default: true)
minSuccessRate floatMinSuccessRate is minimum acceptable success rate percentage (default: 95)Maximum: 100
Minimum: 0
windowDays integerWindowDays is the rolling window for success rate calculation (default: 7)Minimum: 1
maxDuration DurationMaxDuration alerts if job exceeds this duration
durationRegressionThreshold integerDurationRegressionThreshold alerts if P95 increases by this percentage (default: 50)Maximum: 1000
Minimum: 1
durationBaselineWindowDays integerDurationBaselineWindowDays for baseline calculation (default: 14)Minimum: 1

SeverityOverrides

SeverityOverrides customizes alert severities Only critical and warning are valid - alerts are actionable notifications

Appears in:

FieldDescriptionDefaultValidation
missedSchedule stringEnum: [critical warning]
jobFailed stringEnum: [critical warning]
slaBreached stringEnum: [critical warning]
deadManTriggered stringEnum: [critical warning]
durationRegression stringEnum: [critical warning]

SlackConfig

SlackConfig configures Slack notifications

Appears in:

FieldDescriptionDefaultValidation
webhookSecretRef NamespacedSecretKeyRefWebhookSecretRef references the Secret containing webhook URL
defaultChannel stringDefaultChannel overrides webhook's default channel
messageTemplate stringMessageTemplate is a Go template for message formatting

SuggestedFixPattern

SuggestedFixPattern defines a pattern for suggesting fixes based on failure context

Appears in:

FieldDescriptionDefaultValidation
name stringName identifies this pattern (for overriding built-ins like "oom-killed")
match PatternMatchMatch criteria - at least one must be specified
suggestion stringSuggestion is the fix text (supports Go templates)
Available variables: {{.Namespace}}, {{.Name}}, {{.ExitCode}}, {{.Reason}}, {{.JobName}}
priority integerPriority determines order (higher = checked first, default: 0)
Built-in patterns use priorities 1-100, use >100 to override

SuspendedHandlingConfig

SuspendedHandlingConfig configures behavior for suspended CronJobs

Appears in:

FieldDescriptionDefaultValidation
pauseMonitoring booleanPauseMonitoring pauses monitoring when CronJob is suspended (default: true)
alertIfSuspendedFor DurationAlertIfSuspendedFor alerts if suspended longer than this duration

WebhookConfig

WebhookConfig configures generic webhook notifications

Appears in:

FieldDescriptionDefaultValidation
urlSecretRef NamespacedSecretKeyRefURLSecretRef references the Secret containing webhook URL
method stringMethod is the HTTP method (default: POST)Enum: [POST PUT]
headers object (keys:string, values:string)Headers to include in requests
payloadTemplate stringPayloadTemplate is a Go template for JSON payload