DEV Community

Jude Hilgendorf
Jude Hilgendorf

Posted on

Building SIEMForge: A Portable SIEM Detection Toolkit with Sigma, Sysmon, and MITRE ATT&CK

If you've ever tried to stand up detection content across more than one SIEM, you already know the pain. Sigma rules live in one repo, Sysmon config lives in another, your Wazuh custom rules are scattered across three local_rules.xml files, and MITRE mapping is an afterthought buried in a spreadsheet. I built SIEMForge to fix that.

SIEMForge is a single Python toolkit that bundles Sigma rules, a Sysmon configuration, and Wazuh custom rules — all mapped to MITRE ATT&CK — with an offline log scanner and a multi-backend rule converter. It runs from a home lab to a production SOC without changing the workflow.

Repo: github.com/TiltedLunar123/SIEMForge

Why I built it

I'm a cybersecurity student, and when I started building detections for a home lab, the tooling gap became obvious fast. Every tutorial assumed you had a Splunk license or a full Elastic stack. The actual Sigma ecosystem is great, but turning a Sigma rule into something that runs against real logs without spinning up a SIEM is friction that kills learning momentum.

I wanted three things:

  1. Write detection logic once in Sigma, deploy it anywhere
  2. Test rules offline against sample logs before pushing anything to production
  3. See MITRE ATT&CK coverage at a glance so I knew where the gaps were

SIEMForge is the tool I wished existed when I started.

What it does

Log Scanner Engine. Point it at a JSON, JSONL, syslog, or CSV log file and it runs your Sigma rules against it — no SIEM required. Human-readable alerts by default, JSON output if you want to pipe it somewhere.

Multi-Backend Rule Conversion. Convert one Sigma rule into Splunk SPL, Elasticsearch Lucene, or Kibana KQL. No vendor lock-in, no rewriting detections when you change platforms.

MITRE ATT&CK Coverage Matrix. Every bundled rule is tagged with techniques. Run one command and see exactly what you cover.

Rule Validation. Catches bad YAML, broken field-condition mappings, and malformed Sigma syntax before you deploy anything.

Sysmon + Wazuh bundled. A production-ready Sysmon config and Wazuh custom rules ship with the project, mapped to the same techniques as the Sigma rules.

Code walkthrough: the log scanner

The piece I'm proudest of is the offline log scanner. It lets you validate detection logic against raw logs without deploying anything. Here's how it works conceptually:

python -m siemforge --scan /var/log/sysmon/events.json
Enter fullscreen mode Exit fullscreen mode

Under the hood, the scanner:

  1. Loads every Sigma rule from rules/sigma/ and parses the YAML into a rule object
  2. Opens the log file and streams records (auto-detects JSON / JSONL / syslog / CSV)
  3. For each record, walks each rule's detection block and evaluates the field-condition logic
  4. On a match, emits an alert with the rule name, MITRE technique, and the matching log line

The tricky part is Sigma's detection syntax — it supports wildcards, regex, contains-all, contains-any, logical AND/OR between selection groups, and negation. Getting the evaluator right is what the 138-test suite is mostly for. If you're building anything similar, test-drive it hard. Edge cases around null, case sensitivity, and list vs scalar matching will eat you alive otherwise.

Here's what a PowerShell detection rule looks like after conversion to Splunk SPL:

index=windows EventCode=1 CommandLine="*powershell*"
AND (CommandLine="*-ep bypass*" OR CommandLine="*DownloadString*")
Enter fullscreen mode Exit fullscreen mode

That SPL query came from a Sigma YAML that looks roughly like:

detection:
  selection:
    EventID: 1
    CommandLine|contains: 'powershell'
  suspicious:
    CommandLine|contains:
      - '-ep bypass'
      - 'DownloadString'
  condition: selection and suspicious
Enter fullscreen mode Exit fullscreen mode

One source of truth, three SIEM backends. That's the whole point.

Install and try it

git clone https://github.com/TiltedLunar123/SIEMForge.git
cd SIEMForge
pip install pyyaml
Enter fullscreen mode Exit fullscreen mode

Scan a log file:

python -m siemforge --scan samples/suspicious_powershell.json
Enter fullscreen mode Exit fullscreen mode

Convert a rule:

python -m siemforge --convert splunk rules/sigma/proc_creation_suspicious_powershell.yml
Enter fullscreen mode Exit fullscreen mode

Print MITRE coverage:

python -m siemforge --mitre rules/sigma/
Enter fullscreen mode Exit fullscreen mode

What's next

v3.1 expanded the test suite, added Windows CI matrix testing, and fixed a Sigma spec edge case around SSH bruteforce detection. Next up: more rules covering cloud attack paths (T1078.004, T1580), a proper detection-as-code pipeline example, and maybe a web UI for the coverage matrix if there's demand.

Call to action

If you're studying blue team, running a home lab, or just want a way to prototype detections without spinning up a full SIEM — clone it, break it, open an issue. Stars help the project surface to more people who'd benefit.

Star SIEMForge on GitHub →

If you'd build something with it, tell me what rule you'd write first.

Top comments (2)

Collapse
 
simpledrop profile image
SimpleDrop-Free&Secure File Sharing

The offline scanner is the part that actually matters for learning. Every beginner tutorial assumes you have Splunk standing by. Nobody wants to spin up an Elastic stack just to test one detection rule. What's your approach for handling log fields that don't map cleanly between backends? That's usually where my Sigma rules fall apart.

Collapse
 
peacebinflow profile image
PEACEBINFLOW

The offline log scanner is the idea that lands hardest. Not just because it's useful, but because it exposes a weird gap in how we teach defensive security. Every tutorial assumes you have a SIEM. But the people who need the most practice—students, home labbers, people trying to break into the field—are exactly the ones who don't have a Splunk license or an Elastic cluster running. So they're stuck learning Sigma syntax without ever seeing a rule fire against real data. It's like learning to cook by reading recipes but never tasting the food.

What I think this pattern does, maybe unintentionally, is make detection engineering feel more like software engineering. You write a rule. You test it locally against sample logs. You see it pass or fail. You iterate. That feedback loop is normal in every other kind of development, but in SIEM work it's weirdly rare. The default workflow is "write the rule, deploy it, wait to see if it breaks production." No wonder detection content is so brittle.

The MITRE coverage matrix being just a command-line flag is the kind of thing that seems small but actually changes behavior. When mapping is frictionless, people do it. When it's a manual spreadsheet exercise, it gets skipped. I've seen teams that only discover coverage gaps during an incident because nobody had the time to sit down and cross-reference rules against ATT&CK manually.

Makes me wonder: if the scanner can validate rules against sample logs now, how far is it from being able to generate those sample logs? A "--generate-attack" flag that spits out a synthetic log file matching a given technique, so you can test your rule without hunting for real samples. That'd close the loop entirely.