API Reference
Packages
guardian.illenium.net/v1alpha1
Package v1alpha1 contains API Schema definitions for the guardian v1alpha1 API group.
Resource Types
ActiveAlert
ActiveAlert represents an active alert
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
type string | Type of alert | ||
severity string | Severity of alert | ||
message string | Message describes the alert | ||
since Time | Since is when the alert became active | ||
lastNotified Time | LastNotified is when the alert was last sent | ||
exitCode integer | ExitCode from the failed container (for JobFailed alerts) | ||
reason string | Reason for the failure (e.g., OOMKilled, Error) | ||
suggestedFix string | SuggestedFix provides actionable guidance for resolving the alert |
ActiveJob
ActiveJob represents a currently running job
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string | Name of the Job | ||
startTime Time | StartTime is when the Job started | ||
runningDuration Duration | RunningDuration is how long the job has been running | ||
podPhase string | PodPhase is the current phase of the job's pod (Pending, Running, etc.) | ||
podName string | PodName is the name of the pod running the job | ||
ready string | Ready indicates how many pods are ready vs total |
AlertChannel
AlertChannel is the Schema for the alertchannels API.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string | guardian.illenium.net/v1alpha1 | ||
kind string | AlertChannel | ||
metadata ObjectMeta | Refer to Kubernetes API documentation for fields of metadata. | ||
spec AlertChannelSpec | |||
status AlertChannelStatus |
AlertChannelSpec
AlertChannelSpec defines the desired state of AlertChannel
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
type string | Type of alert channel | Enum: [slack pagerduty webhook email] | |
slack SlackConfig | Slack configuration | ||
pagerduty PagerDutyConfig | PagerDuty configuration | ||
webhook WebhookConfig | Webhook configuration | ||
email EmailConfig | Email configuration | ||
rateLimiting RateLimitConfig | RateLimiting prevents alert storms | ||
testOnSave boolean | TestOnSave sends a test alert when saved (default: false) |
AlertChannelStatus
AlertChannelStatus defines the observed state of AlertChannel
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
ready boolean | Ready indicates the channel is operational | ||
lastTestTime Time | LastTestTime is when the channel was last tested | ||
lastTestResult string | LastTestResult is the result of the last test | Enum: [success failed] | |
lastTestError string | LastTestError is the error from the last test | ||
alertsSentTotal integer | AlertsSentTotal is total alerts successfully sent via this channel | ||
lastAlertTime Time | LastAlertTime is when the last alert was successfully sent | ||
alertsFailedTotal integer | AlertsFailedTotal is total alerts that failed to send via this channel | ||
lastFailedTime Time | LastFailedTime is when the last alert failed to send | ||
lastFailedError string | LastFailedError is the error message from the last failed send | ||
consecutiveFailures integer | ConsecutiveFailures is the number of consecutive failed sends Resets to 0 on successful send | ||
conditions Condition array | Conditions represent latest observations |
AlertContext
AlertContext specifies what context to include in alerts
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
logs boolean | Logs includes pod logs (default: true) | ||
logLines integer | LogLines is number of log lines to include (default: 50) | Maximum: 10000 Minimum: 1 | |
logContainerName string | LogContainerName specifies container for logs (default: first container) | ||
includeInitContainerLogs boolean | IncludeInitContainerLogs includes init container logs (default: false) | ||
events boolean | Events includes Kubernetes events (default: true) | ||
podStatus boolean | PodStatus includes pod status details (default: true) | ||
suggestedFixes boolean | SuggestedFixes includes fix suggestions (default: true) |
AlertingConfig
AlertingConfig configures alerting behavior
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean | Enabled turns on alerting (default: true) | ||
channelRefs ChannelRef array | ChannelRefs references cluster-scoped AlertChannel CRs | ||
includeContext AlertContext | IncludeContext specifies what context to include in alerts | ||
suppressDuplicatesFor Duration | SuppressDuplicatesFor prevents re-alerting within this window (default: 1h) | ||
alertDelay Duration | AlertDelay delays alert dispatch to allow transient issues to resolve. If the issue resolves (e.g., next job succeeds) before the delay expires, the alert is cancelled and never sent. Useful for flaky jobs. Example: "5m" waits 5 minutes before sending failure alerts. | ||
severityOverrides SeverityOverrides | SeverityOverrides customizes severity for alert types | ||
suggestedFixPatterns SuggestedFixPattern array | SuggestedFixPatterns defines custom fix patterns for this monitor These are merged with built-in patterns, with custom patterns taking priority |
AutoScheduleConfig
AutoScheduleConfig configures automatic schedule detection
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean | Enabled turns on auto-detection (default: false) | ||
buffer Duration | Buffer adds extra time to expected interval (default: 1h) | ||
missedScheduleThreshold integer | MissedScheduleThreshold alerts after this many missed schedules (default: 1) | Minimum: 1 |
ChannelRef
ChannelRef references an AlertChannel CR
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string | Name of the AlertChannel CR | ||
severities string array | Severities to send to this channel (empty = all) |
CronJobMetrics
CronJobMetrics contains SLA metrics for a CronJob
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
successRate float | |||
totalRuns integer | |||
successfulRuns integer | |||
failedRuns integer | |||
avgDurationSeconds float | Duration in seconds | ||
p50DurationSeconds float | |||
p95DurationSeconds float | |||
p99DurationSeconds float |
CronJobMonitor
CronJobMonitor is the Schema for the cronjobmonitors API.
| Field | Description | Default | Validation |
|---|---|---|---|
apiVersion string | guardian.illenium.net/v1alpha1 | ||
kind string | CronJobMonitor | ||
metadata ObjectMeta | Refer to Kubernetes API documentation for fields of metadata. | ||
spec CronJobMonitorSpec | |||
status CronJobMonitorStatus |
CronJobMonitorSpec
CronJobMonitorSpec defines the desired state of CronJobMonitor
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
selector CronJobSelector | Selector specifies which CronJobs to monitor | ||
deadManSwitch DeadManSwitchConfig | DeadManSwitch configures dead-man's switch alerting | ||
sla SLAConfig | SLA configures SLA tracking and alerting | ||
suspendedHandling SuspendedHandlingConfig | SuspendedHandling configures behavior for suspended CronJobs | ||
maintenanceWindows MaintenanceWindow array | MaintenanceWindows defines scheduled maintenance periods | ||
alerting AlertingConfig | Alerting configures alert channels and behavior | ||
dataRetention DataRetentionConfig | DataRetention configures data lifecycle management |
CronJobMonitorStatus
CronJobMonitorStatus defines the observed state of CronJobMonitor
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
observedGeneration integer | ObservedGeneration is the generation last processed | ||
phase string | Phase indicates the monitor's operational state | Enum: [Initializing Active Degraded Error] | |
lastReconcileTime Time | LastReconcileTime is when the controller last reconciled | ||
summary MonitorSummary | Summary provides aggregate counts | ||
cronJobs CronJobStatus array | CronJobs contains per-CronJob status | ||
conditions Condition array | Conditions represent the latest observations |
CronJobSelector
CronJobSelector specifies which CronJobs to monitor. An empty selector matches all CronJobs in the monitor's namespace.
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
matchLabels object (keys:string, values:string) | MatchLabels selects CronJobs by labels | ||
matchExpressions LabelSelectorRequirement array | MatchExpressions selects CronJobs by label expressions | ||
matchNames string array | MatchNames explicitly lists CronJob names to monitor (only valid when watching a single namespace) | ||
namespaces string array | Namespaces explicitly lists namespaces to watch for CronJobs. If empty and namespaceSelector is not set, watches only the monitor's namespace. | ||
namespaceSelector LabelSelector | NamespaceSelector selects namespaces by labels. CronJobs in matching namespaces will be monitored. | ||
allNamespaces boolean | AllNamespaces watches CronJobs in all namespaces (except globally ignored ones). Takes precedence over namespaces and namespaceSelector. |
CronJobStatus
CronJobStatus contains status for a single CronJob
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string | Name of the CronJob | ||
namespace string | Namespace of the CronJob | ||
status string | Status indicates health | Enum: [healthy warning critical suspended unknown] | |
suspended boolean | Suspended indicates if the CronJob is suspended | ||
lastSuccessfulTime Time | LastSuccessfulTime is when the last Job succeeded | ||
lastFailedTime Time | LastFailedTime is when the last Job failed | ||
lastRunDuration Duration | LastRunDuration is the duration of the last completed Job | ||
nextScheduledTime Time | NextScheduledTime is when the next Job will be created | ||
metrics CronJobMetrics | Metrics contains SLA metrics | ||
activeJobs ActiveJob array | ActiveJobs lists currently running jobs for this CronJob | ||
activeAlerts ActiveAlert array | ActiveAlerts lists current alerts for this CronJob |
DataRetentionConfig
DataRetentionConfig configures data lifecycle management for this monitor
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
retentionDays integer | RetentionDays overrides global retention for this monitor's execution history If not set, uses global history-retention.default-days setting | Minimum: 1 | |
onCronJobDeletion string | OnCronJobDeletion defines behavior when a monitored CronJob is deleted | Enum: [retain purge purge-after-days] | |
purgeAfterDays integer | PurgeAfterDays specifies how long to wait before purging data Only used when onCronJobDeletion is "purge-after-days" | Minimum: 0 | |
onRecreation string | OnRecreation defines behavior when a CronJob is recreated (detected via UID change) "retain" keeps old history, "reset" deletes history from the old UID | Enum: [retain reset] | |
storeLogs boolean | StoreLogs enables storing job logs in the database If nil, uses global --storage.log-storage-enabled setting | ||
logRetentionDays integer | LogRetentionDays specifies how long to keep stored logs If not set, uses the same value as retentionDays | Minimum: 1 | |
maxLogSizeKB integer | MaxLogSizeKB is the maximum log size to store per execution in KB If not set, uses global --storage.max-log-size-kb setting | Minimum: 1 | |
storeEvents boolean | StoreEvents enables storing Kubernetes events in the database If nil, uses global --storage.event-storage-enabled setting |
DeadManSwitchConfig
DeadManSwitchConfig configures dead-man's switch behavior
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean | Enabled turns on dead-man's switch monitoring (default: true) | ||
maxTimeSinceLastSuccess Duration | MaxTimeSinceLastSuccess alerts if no success within this duration Example: "25h" for daily jobs with 1h buffer | ||
autoFromSchedule AutoScheduleConfig | AutoFromSchedule auto-calculates expected interval from cron schedule |
EmailConfig
EmailConfig configures email notifications
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
smtpSecretRef NamespacedSecretRef | SMTPSecretRef references Secret with host, port, username, password | ||
from string | From is the sender address | ||
to string array | To is the list of recipient addresses | ||
subjectTemplate string | SubjectTemplate is a Go template for subject | ||
bodyTemplate string | BodyTemplate is a Go template for body |
ExitCodeRange
ExitCodeRange defines a range of exit codes [Min, Max] inclusive
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
min integer | |||
max integer |
MaintenanceWindow
MaintenanceWindow defines a scheduled maintenance period
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string | Name identifies this maintenance window | ||
schedule string | Schedule is a cron expression for when window starts | ||
duration Duration | Duration of the maintenance window | ||
timezone string | Timezone for the schedule (default: UTC) | ||
suppressAlerts boolean | SuppressAlerts during this window (default: true) |
MonitorSummary
MonitorSummary provides aggregate counts
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
totalCronJobs integer | |||
healthy integer | |||
warning integer | |||
critical integer | |||
suspended integer | |||
running integer | |||
activeAlerts integer |
NamespacedSecretKeyRef
NamespacedSecretKeyRef references a key in a namespaced Secret
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string | |||
namespace string | |||
key string |
NamespacedSecretRef
NamespacedSecretRef references a namespaced Secret
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string | |||
namespace string |
PagerDutyConfig
PagerDutyConfig configures PagerDuty notifications
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
routingKeySecretRef NamespacedSecretKeyRef | RoutingKeySecretRef references the Secret containing routing key | ||
severity string | Severity is the default PagerDuty severity | Enum: [critical error warning info] |
PatternMatch
PatternMatch defines what to match against for suggested fixes
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
exitCode integer | ExitCode matches specific exit codes (e.g., 137 for OOM) | ||
exitCodeRange ExitCodeRange | ExitCodeRange matches a range [min, max] inclusive | ||
reason string | Reason matches container termination reason (exact match, case-insensitive) | ||
reasonPattern string | ReasonPattern matches reason using regex | ||
logPattern string | LogPattern matches log content using regex | ||
eventPattern string | EventPattern matches event messages using regex |
RateLimitConfig
RateLimitConfig configures rate limiting
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
maxAlertsPerHour integer | MaxAlertsPerHour limits alerts per hour (default: 100) | Minimum: 1 | |
burstLimit integer | BurstLimit limits alerts per minute (default: 10) | Minimum: 1 |
SLAConfig
SLAConfig configures SLA tracking
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
enabled boolean | Enabled turns on SLA tracking (default: true) | ||
minSuccessRate float | MinSuccessRate is minimum acceptable success rate percentage (default: 95) | Maximum: 100 Minimum: 0 | |
windowDays integer | WindowDays is the rolling window for success rate calculation (default: 7) | Minimum: 1 | |
maxDuration Duration | MaxDuration alerts if job exceeds this duration | ||
durationRegressionThreshold integer | DurationRegressionThreshold alerts if P95 increases by this percentage (default: 50) | Maximum: 1000 Minimum: 1 | |
durationBaselineWindowDays integer | DurationBaselineWindowDays for baseline calculation (default: 14) | Minimum: 1 |
SeverityOverrides
SeverityOverrides customizes alert severities Only critical and warning are valid - alerts are actionable notifications
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
missedSchedule string | Enum: [critical warning] | ||
jobFailed string | Enum: [critical warning] | ||
slaBreached string | Enum: [critical warning] | ||
deadManTriggered string | Enum: [critical warning] | ||
durationRegression string | Enum: [critical warning] |
SlackConfig
SlackConfig configures Slack notifications
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
webhookSecretRef NamespacedSecretKeyRef | WebhookSecretRef references the Secret containing webhook URL | ||
defaultChannel string | DefaultChannel overrides webhook's default channel | ||
messageTemplate string | MessageTemplate is a Go template for message formatting |
SuggestedFixPattern
SuggestedFixPattern defines a pattern for suggesting fixes based on failure context
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
name string | Name identifies this pattern (for overriding built-ins like "oom-killed") | ||
match PatternMatch | Match criteria - at least one must be specified | ||
suggestion string | Suggestion is the fix text (supports Go templates) Available variables: {{.Namespace}}, {{.Name}}, {{.ExitCode}}, {{.Reason}}, {{.JobName}} | ||
priority integer | Priority determines order (higher = checked first, default: 0) Built-in patterns use priorities 1-100, use >100 to override |
SuspendedHandlingConfig
SuspendedHandlingConfig configures behavior for suspended CronJobs
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
pauseMonitoring boolean | PauseMonitoring pauses monitoring when CronJob is suspended (default: true) | ||
alertIfSuspendedFor Duration | AlertIfSuspendedFor alerts if suspended longer than this duration |
WebhookConfig
WebhookConfig configures generic webhook notifications
Appears in:
| Field | Description | Default | Validation |
|---|---|---|---|
urlSecretRef NamespacedSecretKeyRef | URLSecretRef references the Secret containing webhook URL | ||
method string | Method is the HTTP method (default: POST) | Enum: [POST PUT] | |
headers object (keys:string, values:string) | Headers to include in requests | ||
payloadTemplate string | PayloadTemplate is a Go template for JSON payload |