Bluejay, helps engineering teams to continuously audit alerts on infrastructure and application resources and automatically set up the alerting best practices - leading to a 30% reduction in downtimes. When an alert is triggered, Bluejay provides an AI-powered run-book to help engineers debug and resolve the issue faster.
Monitoring tools (Like AWS Cloudwatch and NewRelic) collect infrastructure and application performance metrics but alerts on these metrics have to be set up manually.
Current incident management tools come into play only after an alert is triggered. But if the alerts are not set up in the first place, then everything downstream becomes irrelevant.
So for us, incident management means reducing the possibility of incidents rather than reacting to incidents. This requires building strong alerting mechanisms which eliminate manual work for engineers. Additionally, it's crucial to continually optimize thresholds to avoid alert fatigue for on-call engineers. None of the incident management tools do this today.
We believe Incident management should not start after the outage. It has to start with comprehensive alerts and good on-call processes that prevent these production incidents from happening.
Bluejay, exactly does that.
- Bluejay identifies the missing alerts on both infrastructure and application services using your
existing monitoring tools
- Automates the setup and deployment of alerts with a single click
- When an alert is triggered, it notifies the on-call engineers through email, slack, and phone
- Along with the notification, it provides contextual instructions to debug, resolve and mitigate the
issues
- Continuously analyzes alerts and optimizes the thresholds to detect potential incidents and
prevent alert fatigue