Monitoring webhook health
Key metrics to track
Monitor these metrics to ensure your webhook integration is healthy:| Metric | Description | Alert Threshold |
|---|---|---|
| Success Rate | Percentage of webhooks returning 2xx | < 95% |
| Latency | Time from receipt to response | > 5 seconds |
| Error Rate | Percentage of 4xx/5xx responses | > 5% |
| Timeout Rate | Percentage timing out (>30s) | > 1% |
| Queue Depth | Pending webhooks in processing queue | Growing consistently |
Setting up monitoring
Application-level monitoring
Logging best practices
Structured logging
Use structured logging to make webhook events searchable and analyzable:Log aggregation
Send logs to a centralized logging service for analysis:Debugging webhook issues
Common issues and solutions
Webhooks not being received
Webhooks not being received
Symptoms: No requests arriving at your endpointDebugging steps:
- Verify the webhook is enabled in ButterCMS settings
- Check the endpoint URL is correct and publicly accessible
- Test the endpoint with curl:
curl -X POST https://your-endpoint.com/webhooks/buttercms - Check firewall rules and security groups
- For local development, ensure ngrok/tunnel is running
Webhook showing as failed in ButterCMS
Webhook showing as failed in ButterCMS
Symptoms: ButterCMS indicates delivery failureDebugging steps:
- Check server logs for incoming requests
- Verify your endpoint returns 2xx status codes
- Ensure response time is under 30 seconds
- Check for SSL certificate issues
Webhook arriving but not processing correctly
Webhook arriving but not processing correctly
Symptoms: Request received but actions not happeningDebugging steps:
- Log the full webhook payload
- Verify payload structure matches expectations
- Check event type handling in your switch/if statements
- Verify downstream services (cache, database) are accessible
Duplicate webhook deliveries
Duplicate webhook deliveries
Symptoms: Same webhook processed multiple timesDebugging steps:
- Check if your endpoint is responding too slowly (causing retries)
- Look for timeout errors in logs
- Verify idempotency implementation
Content not updating after webhook
Content not updating after webhook
Symptoms: Webhook processes but content remains staleDebugging steps:
- Verify cache invalidation is actually running
- Check CDN cache headers and TTLs
- Confirm the correct cache keys are being cleared
- Test by manually clearing cache
Debug mode
Enable debug mode during development to see full payload details:Testing webhooks locally
Using ngrok
Expose your local server to receive webhooks during development:https://abc123.ngrok.io that forwards to your local server.
Using webhook testing services
For quick testing without setting up your own endpoint: Webhook.siteSimulating webhooks
Test your handler with simulated webhook payloads:Alerting
Setting up alerts
Configure alerts to notify you of webhook issues:PagerDuty Integration
Slack Notifications
Alert conditions
Set up alerts for these conditions:| Condition | Severity | Action |
|---|---|---|
| Error rate > 5% | Warning | Investigate within 1 hour |
| Error rate > 20% | Critical | Immediate investigation |
| Endpoint down > 5 min | Critical | Immediate response |
| Processing time > 20s | Warning | Optimize handler |
| Queue depth increasing | Warning | Scale processing |
Dashboard examples
Grafana dashboard
Create a dashboard to visualize webhook health:Key dashboard panels
- Real-time success rate - Shows current webhook health
- Latency percentiles - P50, P95, P99 processing times
- Events by type - Distribution of webhook events
- Error breakdown - Categorized errors for debugging
- Processing queue depth - For async processing systems
Troubleshooting checklist
Use this checklist when debugging webhook issues:Pre-flight checks
- Webhook enabled in ButterCMS settings
- Correct endpoint URL configured
- Correct event types selected
- Endpoint is publicly accessible
- SSL certificate is valid (for HTTPS)
Endpoint checks
- Server is running and healthy
- Endpoint accepts POST requests
- Endpoint returns 2xx status codes
- Response time is under 30 seconds
- Authentication is correctly configured
Processing checks
- Payload is being parsed correctly
- Event type is being handled
- Downstream services are accessible
- No exceptions being thrown
- Idempotency is implemented
Infrastructure checks
- Firewall allows inbound traffic
- Load balancer is routing correctly
- DNS is resolving correctly
- No rate limiting blocking requests
- Sufficient server resources