Guilgo Blog

Notes from my daily work with technology.

A SOC at home (homelab) is not an alert theater. It must give you signal when it matters, without noise.

This post documents how I run in my homelab a proactive SOC agent based on four principles:

  • Real detection: problematic pods in Kubernetes.
  • SIEM signal: alerts with level ≥ 7 from Wazuh (via a proxy inside the cluster).
  • Attack surface: CRITICAL and HIGH CVEs only on exposed images.
  • Controlled output: Telegram only when there are findings or signal degradation, plus deduplication to avoid repeating the same issue in a loop.

The whole design lives under soc/ and acts as the repository’s source of truth.

This is not another dashboard. It is an operational loop:

cluster → SIEM → runtime → channel → action

Each iteration produces a clear state: OK, degraded, or requires action.

If you already use Wazuh in your lab, this connects with what’s already on the blog:

Here the change is focus: fewer alerts, more actionable context.

What this adds vs the usual approach

  • Scoped scanning → only what is exposed.
  • Actionable signal → explicit severity and policy, no “deferred noise”.
  • Deduplication → the channel does not become a loop of the same problem.
  • Visible degradation → if kubectl, Trivy, or connectivity is missing, it is reported so “silence” doesn’t look like “OK”.

Mental model

  • Time-bounded: each run has limits (number of images, timeout per scan) so the host doesn’t get stuck.
  • Priority (rough): availability > SIEM signal > CVEs.
  • Single channel: Telegram as an inbox, not a continuous stream: if something arrives, it deserves triage or closure with action.

What you will build (quick view)

  • Scripts under soc/scripts/:
    • soc_proactive.sh: generates the report, writes it to soc/reports/YYYY-MM-DD/…md, and sends it to Telegram.
    • send_telegram.sh: sends messages using the bot token (in my case raam_bot) read from a Kubernetes Secret.
    • trivy_scan_image.sh: utility to scan a specific image on demand.
  • Cron in soc/cron/soc-proactive.cron: runs the check every 6 hours.
  • Optional event-driven integration with Alertmanager: soc/k8s/alertmanagerconfig-soc-inbox.yaml routes warning or critical alerts to an n8n webhook.

Important: this is not a commercial product; it’s an honest assembly over what you operate. Self-hosted and Telegram imply trust in your own configuration and in the channel, not “more secure” by magic. The goal is actionable visibility with time/noise limits.


Prerequisites (it won’t work without these)

Host running the SOC checks

The design assumes a central host (in my setup, raam) with:

  • Kubernetes or k3s accessible via kubectl.
  • Docker accessible (docker ps).
  • Trivy installed for CVEs.
  • curl for Telegram and HTTP queries.

The script validates dependencies and, if something is missing, records it as signal degradation (you won’t believe “everything is green” if the scanner isn’t running).

Telegram

  • A channel to act as the SOC inbox.
  • The bot is an admin in that channel.
  • The numeric chat_id for the channel (typical format -100…).

The posture guide and how to obtain the chat_id without messing up the webhook is in soc/posture/telegram.md inside the repo.


Repository structure (minimum)

  • soc/README.md: goal and folder map.
  • soc/runbooks/: symptom → verification → cause → action.
  • soc/posture/: exposure checklist (“don’t open things by accident”).
  • soc/vuln/: CVE policy (thresholds, cadence, what should not hit the channel).
  • soc/reports/: Markdown output per run with findings.

Step 1 — Configure Telegram destination (chat_id)

soc_proactive.sh uses a placeholder by default (SOC_CHAT_ID="-YOURCHANNEL" or similar). To change it without editing the script, export SOC_CHAT_ID in the cron environment (or adjust the default value if you prefer centralizing it in the repo).

To obtain the chat_id without breaking the bot webhook, the runbook in soc/posture/telegram.md suggests forwarding a message from the channel to the bot in a private chat and reading the chat_id from the bot logs.


Step 2 — Ensure the bot token exists in Kubernetes

Sending uses send_telegram.sh, which reads the token from:

  • a Secret in the right namespace (e.g. raam/raam-bot-secret)
  • key: BOT_TOKEN

If the Secret does not exist or is not readable, sending fails and the script exits with an error: better to fail loudly than to “publish silently” to nowhere.


Step 3 — First manual run

From the repository root (where soc/ exists):

bash soc/scripts/soc_proactive.sh

Expected behavior:

  • If there are no findings, it prints nothing and exits with code 0.
  • If there are findings, it generates a file under soc/reports/YYYY-MM-DD/soc-proactive-<timestamp>.md and sends a Telegram message.

What the script checks

  • Kubernetes: lists pods in undesired states: Pending, CrashLoopBackOff, ImagePullBackOff, ErrImagePull, Error, Failed.
  • Wazuh: queries the wazuh-alerts-proxy service in the monitoring namespace and collects up to 5 recent alerts with level 7 or higher.
  • CVEs (Trivy):
    • Takes images from docker ps that have ports published to 0.0.0.0 or :: (real exposure to the outside).
    • Prioritizes “high value” images (n8n, Wazuh, AdGuard, web services, etc., according to your logic in the script).
    • Scans only CRITICAL and HIGH, with --ignore-unfixed by default.
    • Limits number of images and time per image so the host doesn’t get stuck in a scanning loop.

This aligns with the idea of prioritizing patches when Wazuh flags CVEs: the Telegram channel is not the place to review every MEDIUM from an internal database, but what justifies action or awareness on your exposed perimeter.


Step 4 — Cron on the host (every 6 hours)

On the host that runs the scripts, as root, install the repo’s cron file (adjust the path to your clone):

cp /path/to/your/repo/soc/cron/soc-proactive.cron /etc/cron.d/soc-proactive
chmod 644 /etc/cron.d/soc-proactive

The cron runs bash soc/scripts/soc_proactive.sh and may write traces to soc/.soc_proactive.log (depending on the content of soc/cron/soc-proactive.cron).


Step 5 — Tuning noise (CVE policy)

The detailed policy is in soc/vuln/POLITICA-CVE.md. Intent summary:

  • Treat CRITICAL seriously when the service is exposed or high value.
  • Treat HIGH when context (exposure or value) makes it plausible that it should hit the channel.
  • Keep MED/LOW for periodic review, not for daily Telegram spam.

Useful variables without touching code (example values):

VariableTypical role
SOC_TRIVY_SEVERITYDefault CRITICAL,HIGH
SOC_TRIVY_IGNORE_UNFIXEDDefault true (less noise when there is no fix)
SOC_TRIVY_MAX_IMAGESLimits how many images are scanned
SOC_TRIVY_IMAGE_TIMEOUT_SECONDSPrevents one image from blocking the whole batch

Step 6 (optional) — Alertmanager → n8n

Besides the cron check, you can route cluster alerts into the SOC inbox. The repo includes an AlertmanagerConfig:

  • File: soc/k8s/alertmanagerconfig-soc-inbox.yaml
  • Routes severity=warning or critical to a webhook receiver.
  • Usually ignores Watchdog (a “I’m alive” alert that does not add value to a human SOC inbox).
  • In my homelab it points to n8n on the host (not in Kubernetes), e.g. http://10.5.5.3:5678/webhook/soc-alertmanager, reachable from the monitoring namespace.

Networking caveat: if n8n runs in Docker on the host, IP and firewall must allow traffic from the Alertmanager pod to that URL. Details in soc/k8s/README.md in the repo.


How to read a report

Example filename: soc/reports/2026-04-26/soc-proactive-20260426T100210Z.md.

  • Kubernetes block: if it’s not “OK”, prioritize availability/stability before abstract CVEs.
  • Wazuh block: if you see lines like [L7] (or whatever level you use) with host and rule, it’s SIEM signal that deserves triage, not automatic panic.
  • CVEs (exposed Docker): if a Trivy table exists, prioritize by exposed service, severity (CRITICAL first), and whether a fix exists (update image/tag and redeploy).

Runbooks: don’t improvise when something triggers

  • soc/runbooks/kubejobfailed.md: failed jobs.
  • soc/runbooks/docker-hub-mirror.md: ImagePullBackOff due to limits or blocks to Docker Hub.
  • soc/runbooks/wazuh-jwt-renewer.md: expired JWT or dashboards without data.
  • soc/runbooks/php-apache-ataque-web.md: typical web abuse signals.
  • soc/runbooks/n8n-exposicion-cve.md: focus on n8n (high value in many homelabs).

Extensions once it’s stable

  • Keep soc/inventory/ and soc/inventory/exposure_map.md up to date.
  • Use soc/posture/exposure-checklist.md recurrently.
  • Tune local rules in Wazuh with intent (see soc/wazuh/README.md in the repo).
  • Adjust SOC_TRIVY_IGNORE_UNFIXED or the image limit to reduce false positives without turning off the radar.

Quick commands (copy & paste)

Scan a specific image:

bash soc/scripts/trivy_scan_image.sh docker.n8n.io/n8nio/n8n:2.16.0

Test the channel manually (replace with the real chat_id):

bash soc/scripts/send_telegram.sh "-100xxxxxxxxxx" "SOC test: verification message"

Run the proactive check:

bash soc/scripts/soc_proactive.sh

Wrap-up: a proactive SOC in a homelab does not replace a 24/7 team, but it gives you rhythm: cron, SIEM signal, scoped scanning, and an inbox with clear rules. With runbooks and controlled noise, the next incident starts with facts in Markdown, not intuition at 3 AM.