ZFS snapshots, IAM misuses, pg_repack, sidecars, kernel params, eBPF probes — because someone has to.
To build resilient platforms with real constraints
▍What we fix that others break
> GitOps that actually merges
> Terraform with DRY, not 3000 lines of variables
> CI/CD pipelines with sanity checks and rollbacks
> Kafka retries that don’t silently drop
> PostgreSQL that survives autovacuum hell
> Kernel params that matter: `dirty_ratio`, `swappiness`, `fs.inotify.max_user_watches`
> Cron replacement with failure alerts
> IAM policies that don’t blow up staging at 3 AM
▍Build Pattern
> AWS with Nitro, not EC2 defaults
> Kubernetes with sane CNI and predictable upgrades
> Observability with real SLOs and distributed tracing
> Go for tools, C++ for perf, Python for glue
> PostgreSQL tuned per workload — not per tutorial
> NixOS for immutability and reproducibility
> Infrastructure where root cause actually debugs
> Security with just enough policy — not just audit checkboxes
▍Infra Design Principles
> Design for rollback
> Never debug blind
> Metrics are for humans, logs are for grep
> No “eventually consistent” without a deadline
> More layers = more tracing
> Immutable where it matters, stateful where it must
> If it can’t be tested, it will be tested in prod
> Don’t trust managed services without knowing their failure mode