How I Built an Automated WordPress Post Reviewer (Dead Links, Stale Content, Signal Alerts)
I run a personal blog and, like most people who’ve been publishing for a while, I have a graveyard of older posts. Some have dead links. Some reference AI models or tools that have since been superseded. Some just deserve a second look.
I built a small automated tool to handle this — a WordPress post reviewer that runs on a schedule, picks a post, checks its links, flags stale content, and sends me a summary on Signal. Here’s how it works and how I built it.
The Problem
Content rot is real. A post that was accurate and well-linked in 2023 might have three dead URLs and a reference to “the latest GPT-4 model” by 2026. Most bloggers never go back and check. I wanted a system that would do it for me automatically, without requiring me to remember to do it.
The requirements were simple:
- Pick a post automatically (with smart weighting — never-reviewed posts get priority)
- Check every external link for 404s and timeouts
- Have an AI review the content for stale quotes, outdated stats, and obsolete references
- Send me a summary I can actually act on
- Track which posts have been reviewed so nothing gets skipped indefinitely
How It Works
The system is two parts: a Python script and an AI agent pass, connected via a cron job in OpenClaw.
Part 1: The Python Script (wp_post_reviewer.py)
The script does the mechanical work:
- Fetches all published posts via the WordPress REST API
- Picks one using weighted randomization — posts that have never been reviewed get a 3× weight bonus; older posts get an additional age bonus. This means neglected posts rise to the top without the selection feeling mechanical.
- Checks up to 20 external links using HTTP HEAD requests, flagging anything that returns 400+ or times out
- Logs the review to a local
wp_audit_log.jsonfile so it knows what’s already been seen - Outputs structured JSON — post metadata, link results, and content — ready for the agent to consume
Part 2: The AI Review Pass
The JSON output gets passed to Claude Sonnet, which reviews it for:
- Dead links — with exact URL and HTTP status code
- Stale content — time-anchored language (“recently”, “the latest model”, “last year”), specific outdated statistics, references to tools or products that have changed
- Revival potential — could this post be updated and re-promoted, or is it fine as-is?
- Tags and categories — are they still accurate?
The agent quotes specific text rather than giving vague summaries. If something is stale, it tells me exactly which sentence.
Part 3: The Signal Summary
The result always arrives as a Signal message — even if everything is clean. A typical report looks like this:
📋 Weekly Post Audit
"AI Models Compared: March 2025 Edition" • Published 2025-03-12 • ~400 days old
https://adrianhensler.com/ai-models-march-2025
🔗 Links: 8 checked, 1 dead
❌ https://openai.com/blog/gpt-4-research → 404
📅 Potentially stale:
⚠️ "GPT-4 is currently the most capable model available" — paragraph 2
⚠️ "Claude 3 was just released" — paragraph 5
🔄 Revival?
Strong candidate — update the model comparison table and republish.
Core structure is solid; just the specifics have aged.
🏷️ Tags/Categories: ✅ Appropriate
📌 Overall: Update recommended
Bugs I Hit During Build
Two bugs showed up during the first real run, and they’re worth documenting because they’re easy to miss.
Bug 1: days_old Always Returned 0
The script was calculating post age by comparing a naive datetime (from WordPress’s date field, no timezone) to datetime.now() which is timezone-aware. Python raises a TypeError on that comparison, the try/except caught it silently, and the fallback was days_old = 0.
Fix: switch to WordPress’s date_gmt field (UTC) and compare to datetime.now(timezone.utc). One line change, but it was invisible until the output looked wrong.
Bug 2: Link Checker Found Zero Links
The link extractor used a regex for href="..." attributes. It worked fine for recent posts. But my older daily briefing posts used plain-text link format — Link: https://... — not anchor tags. The regex correctly found zero href attributes, because there weren’t any.
Fix: add a secondary regex pass for bare URLs in the content. Both patterns now run and results are de-duplicated.
The Schedule
The whole thing runs as a cron job in OpenClaw — isolated, on its own schedule, separate from the main session. It fires a few times a week, picks a post, runs the review, and sends the Signal message. I don’t have to think about it until something lands in my messages worth acting on.
What I’d Add Next
- A “fix this post” command that opens a draft with suggested edits already applied
- Tracking which specific URLs have been flagged, so recurring dead links surface across multiple posts
- A monthly digest of everything reviewed and what actions were taken
For now, it does what I needed: it keeps an eye on the archive so I don’t have to.
The script runs on OpenClaw, a self-hosted personal AI gateway. The post selection, link checking, and AI review are all handled locally — no third-party content scanning service involved.