ZFS snapshots, IAM misuses, pg_repack, sidecars, kernel params, eBPF probes — because someone has to.

To build resilient platforms with real constraints

▍What we fix that others break

> GitOps that actually merges

> Terraform with DRY, not 3000 lines of variables

> CI/CD pipelines with sanity checks and rollbacks

> Kafka retries that don’t silently drop

> PostgreSQL that survives autovacuum hell

> Kernel params that matter: `dirty_ratio`, `swappiness`, `fs.inotify.max_user_watches`

> Cron replacement with failure alerts

> IAM policies that don’t blow up staging at 3 AM

▍Build Pattern

> AWS with Nitro, not EC2 defaults

> Kubernetes with sane CNI and predictable upgrades

> Observability with real SLOs and distributed tracing

> Go for tools, C++ for perf, Python for glue

> PostgreSQL tuned per workload — not per tutorial

> NixOS for immutability and reproducibility

> Infrastructure where root cause actually debugs

> Security with just enough policy — not just audit checkboxes

▍Infra Design Principles

> Design for rollback

> Never debug blind

> Metrics are for humans, logs are for grep

> No “eventually consistent” without a deadline

> More layers = more tracing

> Immutable where it matters, stateful where it must

> If it can’t be tested, it will be tested in prod

> Don’t trust managed services without knowing their failure mode