Features

Everything you need to fix your incident - Right when you need it the most !

Auto Diagnostics

Correlation Analysis

Auto Remediation

Incident Tracking

Incident Analytics

Integrations


Alert intelligence


Neptune intelligently understands your alert, dynamically queries your APM, logging and graphing tools to collect more information and present it to you along with the alert.

alert_intelligence
default_diagnostics

Default aggregated diagnostics


Don't scramble across 10 different tools to make sense of your alert. Neptune aggregates key metrics that you look for 80% of the time and makes them available along with the alert.


Custom diagnostics


Everyone's environment is different. Neptune supports custom actions to suit your needs. You can execute:
  • Runbooks in any language
  • Graph snapshots
  • Log checks
  • Custom health checks
  • Third party status checks

custom_diagnostics
noise_suppression

Noise suppression


Alert noise is a big problem. Now with auto-diagnostics, you have the data to confidently snooze or suppress your alert and be data driven.


Industry best practice runbooks


Learn how your peers are solving similar alerts and benefit from our continuously updated best practice runbooks and action templates.

industry_best_runbooks
execute script

Snapshots for developers


After fixing the alert, you can easily share heapdumps, stacktraces, log errors and graph snapshots with your developers so that they can fix the root-cause issues permanently.


Correlate alerts across apps, hosts and events


Neptune dynamically finds other alerts that are happening on the same host or app which are closely linked to the incident you are trying to fix.

correlation_history_view
relevancy_factors

Relevancy factors


Neptune applies it's machine learning heuristics to attach a relevancy factor to the alert so that you can get to the root-cause in minutes instead of hours.


Temporal analysis


One root-cause alert can cause an avalanche of other peripheral alerts. A quick temporal and clustering analysis will help you find the needle in the haystack.

temporal_analysis
relevant_incident_history

Relevant incident history


Learn how your team-mates fixed the same incident earlier. Neptune quickly shows you the entire incident history, past collaboration on same incident and who did what previously.

Fix simple and mundane alerts automatically


You won't believe how frequently disk, memory and cpu alerts trigger. You can simply restart processes, reboot hosts or scale up your capacity automatically. Why wake up when you can automate those alerts? You have better things to do. Don't you?

execute script
trigger_host_exec_script

Execute a script on your hosts


You can execute any script (shell, ruby, python etc) on a single host or cluster of hosts. Neptune will intelligently figure out the trigger host. If it's an application alert, Neptune will figure out all the hosts running the application to execute the script on all of them in a staggered fashion.

Run a cloud API action


Start, stop, reboot or terminate your instances and VMs in your clouds in response to any alert. You can also do rolling reboots on a schedule by selecting CRON as the trigger instead of an alert.

cloud_api_action
aws_cli_runbook

Run a cloud CLI action


You can run any cloud CLI action in response to your alert. For e.g :

aws ec2 start-instances --instance-id i-23ad323c

It's very powerful and you can leverage the full power of your CLI in an event driven fashion. The best part is that you do not need to install any agents.


Track all incidents by severity and engagement


See incidents sorted by severity, source and most activity or engagement. Then, you can quickly figure out which incidents need your attention the most, and which ones to prioritize.

alert_intelligence
all_details_about_incident

Know everything about your incident in a one-stop report


Get all details about your incident - Where it came from, on which host or app it triggered, alert JSON, and most importantly relevant graphs and context to quickly triage your incident.


See correlation analysis and history in one page


Neptune intelligently tells you what other alerts and events are happening on the same app or hosts that are relevant to fix the current incident. Also, learn from what your colleagues did to fix the same incident before.

correlation_history_view
incident_timeline

See who did what and when to resolve the incident


Track your entire incident timeline in one place. It will help you conduct a data driven post-mortem to ensure the incident doesn't repeat and that you deliver all the required history and diagnostics to your developers for a permanent fix.


Find out your troublesome alerts


Neptune continously updates and informs you of the top most alerts by count and MTTR, so that you can identify your top 20% issues which are causing 80% of your alerts.

mttr_frequency

Monitoring and Alerting tools


Neptune seamlessly integrates with all your existing monitoring tools. It's as simple as giving an API key and adding a Neptune webhook so that your alerts will start flowing into Neptune.




execute script

Integrate with webhooks


Don't see your monitoring tool above? No worries. You can use Neptune webhook to send any custom alert. Also send us a note about the missing integration and we can quickly integrate your monitoring tool.


Logging tools


Neptune integrates with your logging tools so that you can easily grep for errors in your logs and run your custom saved searches in response to an alert or an event. You can also use an alert from logging tools as a trigger. For e.g : When Papertrail alerts you on R14 memory quota error, you can restart your Heroku dyno which was suffering memory issue.














Infrastructure : Both cloud & on-prem


Neptune works on all clouds and on-prem servers. You can execute any script with our agent. If you don't prefer agents, you can run any API or CLI actions supported by your cloud. For e.g : You can restart Heroku dynos or bring up more dynos using simple heroku CLI commands like this:

heroku ps:restart --app my_heroku_app_name 

Auto Diagnostics

Correlation Analysis

Auto Remediation

Incident Tracking

Incident Analytics

Integrations

We built an incident response automation platform for AWS. Now, we are bringing it for everyone.

Start your   Free trial  or   Book a demo slot