Playbook: Build a Log Anomaly Monitor Across Axiom and Sentry
Your team gets hundreds of log emails and error notifications every day from Axiom and Sentry. Most of them are noise -- known issues, transient errors, expected spikes. The important signals get buried. Someone has to scan through everything to find the things that actually need attention, and when they are busy, problems slip through. This playbook walks you through building tools that pull logs from Axiom and errors from Sentry, then setting up a scheduled agent that reads them, filters out the noise, and only notifies you when something genuinely unusual is happening.
What you will build
- A tool that queries recent logs from Axiom and returns structured results
- A tool that fetches recent issues and events from Sentry
- A scheduled agent that runs every hour, reads logs and errors from both systems, applies anomaly detection logic, and posts to Slack only when it finds something that deviates from normal patterns
- Persistent state that tracks baseline patterns so the agent learns what "normal" looks like over time
What you need before you start
- An Assist workspace with an MCP server set up. If you have not created one yet, follow Creating an MCP server.
- Your AI client (Claude Desktop, ChatGPT, Cursor) connected to the MCP server.
- An Axiom API token with query permissions. You can generate one in Axiom under Settings > API Tokens.
- A Sentry auth token with project read access. You can generate one in Sentry under Settings > Auth Tokens.
- A Slack webhook URL or Slack tools configured in Assist for posting alerts.
Step 1: Build the Axiom log query tool
Start a conversation in your AI client and describe what you need:
"I want to build a tool that queries logs from Axiom. Our Axiom API is at https://api.axiom.co. I have an API token for authentication. The tool should accept a dataset name, a time range (like 'last 1 hour' or 'last 24 hours'), and an optional filter query. It should return the log entries with timestamp, level, message, and any structured fields."
The AI creates the tool. Test it with a real dataset:
"Test the Axiom tool -- query the 'production-logs' dataset for the last hour."
Review the results. You should see structured log entries. Refine as needed:
"The response is too verbose. Limit it to 200 entries and only include timestamp, level, message, service, and status_code fields. Also add a summary at the top: total count, count by level (info, warn, error, fatal)."
"Add a parameter called 'min_level' that filters to only return entries at that severity or higher. So if I pass 'warn', it skips info-level logs."
Once the tool returns clean, filterable results, move on.
Step 2: Build the Sentry issues tool
Now build the Sentry integration:
"Create a tool that fetches recent issues from Sentry. Our Sentry API is at https://sentry.io/api/0/. I need to pass an organization slug and project slug. The tool should return issues from the last hour with: issue ID, title, culprit (where the error happened), count (how many times it occurred), first seen, last seen, and level. Sort by count descending."
Test it:
"Query Sentry for the 'acme-corp' organization, 'web-app' project, last hour."
Iterate based on what you see:
"Add a parameter for time range so I can check the last hour, last 4 hours, or last 24 hours. Also include the issue's short URL so I can click through to Sentry."
"Some of these issues have thousands of events. Add a field called 'is_regression' that checks if the issue was previously resolved but came back."
Build a second Sentry tool for more detail when needed:
"Create a tool called 'get_sentry_issue_details' that takes an issue ID and returns the latest event's stack trace, tags, and breadcrumbs. This is for when the agent needs to dig deeper into a specific error."
Step 3: Build the anomaly detection logic
The key insight: you do not want the agent to alert on every error. You want it to alert on errors that are unusual. That means the agent needs to know what "normal" looks like.
"Create a tool called 'get_baseline_metrics' that reads from a persistent collection called 'log_baselines'. Each record in the collection has: date, dataset, avg_error_count, avg_warn_count, p95_error_count, top_known_issues (list of Sentry issue IDs that are expected). If no baseline exists for today, return the most recent one."
"Create a tool called 'update_baseline_metrics' that writes a new baseline record to the 'log_baselines' collection. It should take the current counts and merge them into a rolling average with the previous baseline. Use exponential moving average so recent data is weighted more heavily."
The baseline collection gives the agent memory. After a week of running, it knows that 15 errors per hour from the payment service is normal, but 150 is not. It knows that issue #4521 fires constantly and can be ignored, but a new issue it has never seen before needs attention.
Step 4: Build the Slack notification tool
"Create a tool called 'post_anomaly_alert' that posts a message to our #ops-alerts Slack channel. The message should include: a severity level (warning or critical), a summary of what was detected, the specific metrics that triggered the alert, and links to the relevant Axiom query or Sentry issue."
"Format the Slack message with sections: a header with a warning or fire emoji based on severity, a summary line, a 'Details' section with bullet points for each anomaly, and a 'Baseline comparison' section showing expected vs actual numbers."
Test the notification:
"Send a test alert to Slack: warning level, summary 'Elevated error rate in payment service', detail 'Error count: 47 (baseline: 12, threshold: 36)', link to Axiom."
Verify the Slack message looks right. Adjust formatting until the team can scan it in seconds.
Step 5: Set up the scheduled agent
Now tie everything together with a scheduled trigger that runs the monitoring loop automatically.
"Create a scheduled agent that runs every hour. Here is what it should do:
Query Axiom for logs from the last hour across our production datasets: 'production-logs', 'api-gateway', and 'worker-jobs'. Filter to warn level and above.
Query Sentry for issues from the last hour across our active projects: 'web-app', 'api', and 'worker'.
Load the current baseline from the log_baselines collection.
Compare the current hour's metrics against the baseline:
- Error count more than 3x the baseline average: critical alert
- Error count more than 2x the baseline average: warning alert
- Any new Sentry issue not in the known issues list with more than 10 events: warning alert
- Any resolved Sentry issue that regressed: critical alert
- Any single service contributing more than 80% of all errors: warning alert (likely a localized problem)
If any anomalies are detected, post to Slack with the details.
If no anomalies are detected, do nothing. No 'all clear' messages -- the team should only hear from this agent when something needs attention.
Update the baseline with the current hour's data so the rolling average stays fresh."
The agent runs in Assist on a schedule. It does not depend on anyone having their AI client open. See Scheduled Triggers for more on how scheduled execution works.
Step 6: Tune the thresholds
After the agent runs for a day or two, you will have data on how it performs. Check the Slack channel:
If it is too noisy (alerting on things that are not real problems):
"Update the anomaly thresholds: change the warning from 2x baseline to 3x baseline. Also add these Sentry issues to the known issues list so they are ignored: #4521, #4877, #5102. They are expected and being tracked separately."
If it is too quiet (missing things the team cares about):
"Lower the threshold for new Sentry issues from 10 events to 5 events in the last hour. Also add a check: if any single endpoint returns more than 50 5xx errors in an hour, that should be a critical alert regardless of the overall error rate."
"Add a daily summary that runs at 9 AM and posts a brief overview: total errors in the last 24 hours, trend vs previous day, and any issues that have been elevated for more than 4 hours."
The daily summary is a second scheduled trigger. It complements the hourly anomaly check: the hourly check catches spikes as they happen, the daily summary provides context for the morning standup.
Step 7: Add more data sources
The pattern works for any log or monitoring system. Extend it:
"Create a tool that queries our Datadog metrics API for the p99 latency of our top 10 endpoints over the last hour. Add a baseline comparison for latency too -- alert if any endpoint's p99 is more than 2x its baseline."
"Create a tool that checks our uptime monitor (Pingdom/UptimeRobot) for any services that went down or had elevated response times in the last hour."
"Add a Slack tool that queries our #incidents channel for any new threads in the last hour. If an incident was already posted by a human, the agent should NOT send a duplicate alert -- it should reference the existing thread instead."
Each new tool feeds into the same scheduled agent. The agent's logic expands to cover more signals, but the output stays focused: only alert when something is genuinely unusual.
Step 8: Share and iterate
The monitoring agent runs independently. The team sees alerts in Slack only when something needs attention. Over time:
- The baseline gets smarter as it accumulates data
- The known issues list grows as the team marks recurring issues as expected
- The thresholds get tuned based on real alert quality
If a team member wants to investigate manually, they can query the tools directly from their AI client:
"Show me all errors from the payment service in the last 4 hours."
"Get the details on Sentry issue #6234 -- what's the stack trace?"
"How does today's error rate compare to the baseline for the api-gateway dataset?"
The same tools the agent uses are available for ad-hoc investigation. The agent handles the routine monitoring. Humans handle the judgment calls.
What you built
You now have an automated log monitoring system:
- Axiom and Sentry tools that pull structured log and error data on demand
- Baseline tracking with persistent state that learns what "normal" looks like for your systems
- A scheduled agent that runs every hour, compares current metrics to baseline, and alerts only on genuine anomalies
- A daily summary agent that posts a 24-hour overview for the morning standup
- Tunable thresholds that the team adjusts through conversation as they learn what matters
- Ad-hoc access so anyone can query logs and errors from their AI client when investigating an issue
The team went from scanning hundreds of log emails a day to reviewing a handful of targeted Slack alerts. The agent handles the noise. The team handles the signal.