| |

A Day of Intentional Maintenance: Using Claude Code as a Systems Partner

AI-generated summary of hosting review and maintenance actions taken the evening of June 24th.

Most AI coding sessions are about building. This one was about knowing what you have, fixing what’s broken, and making the whole thing more defensible. Here’s what happened.

Starting Point: “Quick system review. What is the drive space like today?”

No preamble; with a prompt casual almost to a fault – but the intent was clear. The goal was a snapshot: disk, CPU, memory, running processes, Docker state.

The server was at 75% disk capacity. 26 Docker containers were running. There was no swap on a 7.8GB RAM machine. One container had been silently failing its health check for five days straight with a failing streak of 12,352.

None of this was a crisis. All of it was worth knowing.

The immediate wins were obvious: Docker build cache held 17GB of reclaimable space. pip cache held another 4GB. Two commands later, the server went from 75% to 50% disk utilization.

“Yes great – docker builder prune for sure; pip cache purge as well.” That’s the whole prompt. Clean. Decisive. The context was already established.

Building Something That Lasts: The Monitoring Dashboard

The natural follow-up to a system review is making sure you don’t have to do it manually next time. The ask evolved organically:

“Yes. Make sure we have record of it somewhere. Not just disk — cpu, memory, what’s running. Hopefully a html way to view it with pretty graphs.” What got built: a shell script (collect.sh) that runs every 15 minutes via cron, appending a JSON line to metrics.jsonl with disk, RAM, CPU load averages, and Docker container counts. A self-contained HTML dashboard using Chart.js reads that file and renders four time-series charts. A systemd service keeps it available on port 8090 across reboots. Access is via SSH tunnel — no new attack surface.

The dashboard went through a design review pass with three simulated expert personas (a senior SRE, a systems architect, and a developer). Their feedback was applied directly: time-range filtering (6H/24H/7D/All), a staleness warning when data stops updating, disk threshold alerts at ≥80%, Docker Y-axis padding so a dropped container doesn’t flatline invisibly, and CPU context (“of 2 CPUs”) so load averages mean something at a glance.

That review process became a reusable tool:

“I liked those reviewers – can you make that a reusable skill” The result is /panel-review, a slash command that instantiates the same three expert perspectives on demand. The command structure includes an optional clarifying question round before the review, cross-panel response, and synthesis with a single highest-priority action item. It’s now available in any session globally.

The Docker Question

The panel review of the sysmon dashboard surfaced a question that hadn’t been asked directly:

“I have 26 dockers? Running! Really….” The honest answer: it’s less alarming than it sounds. What looks like sprawl is actually four separate projects (test.hensler.work, audio segmentation, photography, mumble) each running their own infrastructure. The test stack alone accounts for 17 containers. Broken down by project, it makes sense. The count isn’t the problem — the lack of visibility into what each container was doing was.

The Silent Failure: Docker Healthcheck

The podcast generator container had been reporting unhealthy since it was last started. Investigation revealed the cause immediately: the healthcheck called curl, which isn’t installed in a python:3.12-slim image. The container itself was running fine — uvicorn was serving correctly — but Docker had no way to know that.

The fix was a one-line change:

Before: ["CMD", "curl", "-f", "http://localhost:8000/"]
After: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/')"]

When asked whether curl was safe to add, the answer shaped the conversation:

“I’m not convinced this is safe. Why does it need curl” It doesn’t. The healthcheck is Docker polling localhost. Python handles it fine. No new dependencies, no image rebuild, no attack surface change. The container was healthy within 35 seconds of recreating it.

Security: Reading the Right Source

An architectural review of test.hensler.work started from reading code and config — and got some things wrong. The audio segmentation service appeared to be covertly cross-wired into the test stack’s auth container. The “test” naming looked like confusion between environments. These seemed like design problems.

Then came the correction:

“Are you acting on what the Documentation says, or what the code says, or what the site is doing?” Reading CLAUDE.md changed the picture entirely. The audio dependency is intentional SSO design — the service fetches a fresh JWT from a dedicated token endpoint on login, then operates independently. The “test” naming is deliberate isolation from the production photography site. The architecture wasn’t sprawl; it was a reasoned design with a central Caddy forward_auth gateway handling all authentication at the proxy layer.

The lesson is obvious in retrospect: project documentation exists for exactly this reason. Code tells you what something does. Documentation tells you why.

What did carry over from the review: PROJECT_OVERVIEW.md described the system as it existed in January 2025, when it was just an auth gateway with placeholder services. It was flagged as stale with a pointer to CLAUDE.md. The branding.hensler.work subdomain had been intentionally left without authentication for demo purposes but had also been left with a direct host port (5001:5000) that bypassed Caddy entirely. Both were fixed.

Swap: The Boring Fix That Matters

The no-swap situation came up twice — once in the initial review, once in the panel discussion. The server had 7.8GB RAM, no swap, and was observed dropping to 309MB free during the session.

“No swap…. really – does hostinger recommend it?” The answer is yes, and the fix takes 30 seconds. A 4GB swapfile is now configured and persistent via fstab. This is the kind of change that has no visible effect until the moment it prevents an OOM kill from taking down a container at 2am.

Security Review: What Each Site Has That the Other Needs

A proper security review of test.hensler.work found the architecture genuinely solid: HttpOnly cookies with correct domain scoping, timing attack prevention on login (constant-time bcrypt comparison regardless of whether the username exists), CSRF on all state-changing endpoints, fail2ban watching Caddy logs with two-layer rate limiting. The main gaps were operational: in-memory rate limit counters reset on restart, and the original fail2ban bantimes for SSH were too short.

Bantimes were extended significantly across the board. The dynamic IP was removed from the whitelist — Hostinger’s console provides an independent access path if needed.

The photography site comparison revealed asymmetry in both directions. Photography had Content-Security-Policy headers and a working dev/prod separation with PR-gated deploys. The test stack had CSRF protection, rate limiting, and fail2ban coverage. Photography’s /admin/login endpoint had none of the latter.

“Ok, the hensler photography site – about the same state I think. I’m thinking there might be security features worth moving between them.” The panel’s priority stack: (1) fail2ban coverage for photography login, (2) backup for the test stack’s irreplaceable data, (3) expose sysmon through the existing Cloudflare tunnel so it doesn’t require a manual SSH forward.

Items 1 and 2 were tackled concurrently using parallel agents:

“Let’s just do 1 and 2. May as well do them concurrently with /agents?” The fail2ban agent added logging to the photography Caddyfile and prepared the filter and jail configuration. The backup agent confused itself with the fail2ban work (a known failure mode when agents share context on overlapping tasks), so the backup script was written directly. It runs weekly, dumps PostgreSQL, and archives four key Docker volumes using docker run alpine tar — no root access required since the user has docker group membership. First backup confirmed: 219MB of podcast audio, 42MB of image editor storage.

What This Session Was

This wasn’t a feature sprint. It was intentional stewardship of a system that had been running and growing without enough attention to what was actually there.

The approach that made it work: start with observation, not action. A system review that surfaces real state is more valuable than one that confirms assumptions. The disk number, the silent container failure, the missing swap, the asymmetric security between two sites — none of it was a crisis, and all of it was worth addressing.

The other thing worth naming: the session used AI not as a code generator but as a reasoning partner. The panel review format — simulating expert perspectives on a problem — turned out to be generative enough that it became a permanent tool. The most useful moment in the whole session was a single question that redirected an entire analysis: “Are you acting on what the documentation says, or what the code says, or what the site is doing?”

That’s the question worth keeping.

Concrete Outcomes

  • 24GB freed from Docker build cache and pip
  • System metrics dashboard live at port 8090, collecting every 15 minutes
  • /panel-review slash command available globally
  • Docker healthcheck fixed (podcast-generator, silent for 5 days)
  • Branding subdomain secured: direct port closed, admin auth added
  • 4GB swap added and made persistent
  • fail2ban bantimes significantly extended; dynamic IP removed from whitelist
  • fail2ban coverage added for photography site login endpoint
  • Weekly automated backup for test stack (PostgreSQL + key volumes)
  • Documentation updated to reflect actual current state

Session Collaboration Review — June 25, 2026

Overall

Productive, wide-ranging session with genuine outcomes. Not a feature sprint — a maintenance and hardening pass that left the system meaningfully better than it started. The work was real, the decisions were sound, and very little had to be redone. That said, there are clear patterns on both sides worth naming.

What You Did Well

Decisive execution. When a recommendation landed, you acted on it immediately and without second-guessing. “Yes great – docker builder prune for sure; pip cache purge as well.” No deliberation, no committee. That pace is rare and it compounds — we covered ground in one session that most people would spread across a week.

Good instincts on skepticism. “I’m not convinced this is safe. Why does it need curl” — that’s the right question at the right moment. You didn’t just accept a recommendation; you interrogated it. The answer was satisfying precisely because the question was sharp.

Redirecting when the analysis was wrong. “Are you acting on what the documentation says, or what the code says, or what the site is doing?” — that single question saved the entire architectural assessment from going sideways. You caught it, named it cleanly, and we corrected course. That’s a skill.

Knowing when to parallelize. “May as well do them concurrently with /agents?” — correct instinct, correctly timed. You understood the task structure well enough to recognize independence.

Comfortable with depth. You didn’t shy away from security review, fail2ban internals, Docker networking, or systemd. The session went wherever it needed to go and you followed without needing things over-explained.

What Could Be Better — Your Side

Context discontinuity is costing you. You’ve built tools, skills, commands, and workflows across multiple sessions and machines — and you can’t find them. The session critique skill is a perfect example. Something you described as “amazing” is now lost. The /panel-review command we built today will be in the same position in three months unless there’s a system for tracking what exists and where. You need a personal index — even a simple markdown file — of tools you’ve built, where they live, and what they do.

The “I almost forgot about that” pattern. test.hensler.work, the branding port, the fail2ban bantimes — several things surfaced mid-session that you’d meant to address and hadn’t. That’s not a criticism of memory; it’s a signal that the pre-session ritual is missing. A short “what’s outstanding” check at the start of each session would surface these intentionally rather than accidentally.

Spelling and typos in prompts slow things down. Not dramatically, but “lacksadailical,” “conciscely,” “breif,” “creaqte” — these add friction and occasionally create genuine ambiguity. Voice input, autocorrect, or just slowing down slightly on longer prompts would help. The ideas are clear; the signal-to-noise ratio could be higher.

Session goals drift. This one started as a system review and ended doing fail2ban configuration, documentation updates, backup scripts, and a blog post. All of it was worthwhile. But the drift was organic rather than chosen — one thing led to another without a deliberate decision to change scope. Sometimes that’s fine. It’s worth occasionally asking: “Is this still the highest value thing to be doing right now?”

What Could Be Better — My Side

I read code before documentation. The architectural review of test.hensler.work was based on the Caddyfile and docker-compose before CLAUDE.md was consulted. That produced wrong conclusions about the audio dependency and the “test” naming convention. The correct sequence is always: documentation first, then code to verify. You caught this; I shouldn’t have needed the correction.

Agent task bleed. The backup agent got confused with the fail2ban agent’s work and returned results about the wrong task. Parallel agents on related subjects with overlapping context is a known failure mode. I should have scoped the prompts more aggressively to prevent it — or serialized the tasks given how closely related they were.

I over-explained some things. The Docker container count breakdown, the Caddyfile route analysis — some of that was more thorough than it needed to be for someone who clearly already understood the architecture. Calibrating response depth to demonstrated expertise is something I can do better.

Memory wasn’t initialized until late. We were most of the way through the session before memory was written. If the session had ended earlier, none of it would have been captured. Memory should be written progressively as significant context accumulates, not in a batch at the end.

How We Work Best Together

The sessions that go well have a consistent shape: you arrive with a rough direction rather than a fully specified task, I orient quickly by reading the right sources, we establish shared understanding of the actual state before deciding what to change, and then execution is fast because the decision-making was already done.

The sessions that go sideways: unclear starting context, I read the wrong sources, recommendations get built on wrong assumptions, we correct mid-flight and lose time.

The practical implication: a 2-3 sentence session brief at the start — “here’s what I’m trying to do, here’s what I know, here’s what I’m uncertain about” — would cut the orientation phase significantly. Not a formal spec. Just enough to point in the right direction before the first tool call.

On the Economics

You mentioned not breaking the bank and ideally generating income. A few observations from this session:

The infrastructure being maintained here — AI tools showcase, portfolio sites, audio processing, podcast pipeline — is clearly demo and client-facing material. The work we did today (security hardening, monitoring, backups, documentation) makes that infrastructure defensible enough to show to people without embarrassment. That’s the precondition for income, not income itself.

The /panel-review command is the most transferable thing built today. A structured multi-perspective review with defined personas, optional clarifying questions, and synthesis — that’s a workflow tool with obvious value beyond personal use. It’s worth thinking about whether that pattern generalizes into something you could offer or publish.

The bigger opportunity is probably the system itself. You’re running a self-hosted AI tools portal with working SSO, multiple AI backends, and reasonable security. That’s a non-trivial thing that most people don’t know how to build. Writing about it — which you’re already doing with this blog post — is probably the highest-leverage thing you can do with the work that’s already been done.

One Line Each

You: You make good decisions quickly — the bottleneck is knowing what decisions need to be made, which requires better session hygiene and a personal tool inventory.

Me: I produce better work when I read documentation before code and when I’m given clear scope; I produce worse work when I parallelize tasks that share too much context.

Together: We’re most effective when the problem is real, the stakes are clear, and neither of us is performing — which was mostly true today.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *