VictorOps: Making On-Call Suck Less
In today’s digital world, monitor system failures, outings or website glitches are not an option and with the rate of information output at such an immediate level, solutions for these outages have to be instantaneous as well. But what does this mean for the IT teams responsible for troubleshooting the incidents? It means long on-call hours, alerts and pages at all hours, and almost constant availability. Waking up in the middle of the night, receiving pages at dinner, daily life being interrupted for hours at a time and suddenly the life of the IT fix-it becomes less than glamorous. Enter VictorOps. Featuring on-call management, incident notifications, and live infrastructure timelines. VictorOps is the start-up with the software “to help make on-call suck less” and get IT fix-its back to bed and back to their lives.
Founded in 2012 by serial entrepreneur Todd Vernon, VictorOps strives to help businesses’ IT systems and their on-call teams troubleshoot and solve problems quickly, efficiently, from any location at any given moment. Having run into the issues of the on-call world himself, Todd was all too familiar with the difficulties of the job and wanted to create software that would improve the quality of life surrounding on-call IT members. Thus VictorOps was born.
So the burning question remains, how does this work?
VictorOps works with multiple platforms, whether its mobile phones, computers, or existing monitoring systems, to look at infrastructure while also doing people management; VictorOps enables the right person and right teams responsible for that problem to be notified so they can mediate the problem. This ensures that only members needed for the incident will be notified. By setting user-specific Notification Policies, alerts can travel through a variety of channels—SMS, Push, phone and email—depending on the type of alert and the settings you have set. Time settings are also available for the software through Flexible Escalation policies, which guarantees specific issues are dealt with within the designated time frame.
The live timeline is another key feature of VictorOps software. This timeline pulls all information pertinent to an issue as well as chats between other members working on the outage. This allows incoming team members to immediately get up to speed with the occurring incident; what’s being done to resolve it; what steps are working, what steps are not etc. The timeline also shows all other alerts going off in the system, some of which could be contributing to the problem you are trying to fix. This timeline allows you to gain more information about solving the problem faster and easier.
With features such as live chats, the software also utilizes and emphasizes team connectivity to aid in problem solving. Other team members on the IT side are easily available and can be integrated into what they call ‘the firefight’ to help fix the outage by using tools like @messaging.
The VictorOps Transmogrifier and Post Mortem tool are perhaps some of the most valuable features of the software. Both are ways to capture the knowledge in people’s heads used to solve the glitches and make it accessible for future outages so you have a better shot at solving the problem faster. The Transmogrifier allows users to annotate their troubleshooting process as they fix it as well as after the resolution has occurred. Team members can then usefully store this information by attaching it to the alert for the incident. This documentation makes future similar outages smoother and quicker to resolve because the alerts now come attached with solutions and suggestions.
The Post Mortem tool is another form of continuous documentation that allows you to look closer at specific incidents along the timeline, such as analyzing resolutions, to save or print for future use. This enables users to go back and see exactly what was done to solve previous problems so that solving them in the future becomes easier. Post Mortem documents are also a useful way for companies to show their clients exactly what steps were taken by their IT team towards resolution during the critical outage.
VictorOps offers two editions of their software—Free and Basic. The Free Forever Edition includes up to 10 users and single push notifications to those users whenever there is an incident as well as the live infrastructure timeline. The Basic Edition offers unlimited numbers of users, unlimited Push, SMS and phone notifications during an incident as well as the following features: live infrastructure timeline, automated on-call scheduling, one touch on-call handoffs, custom notification policies, custom escalation policies, mobile incident ack-ing, and a t-shirt. A free 14 day trial is also available to businesses interested in VictorOps but aren’t ready to fully commit.
CEO Todd Vernon understands that its inevitable that someone has to do the job of problem solving at any hour of the day or night. He realizes that someone will always be responsible and there is no getting around it. However, he believes their tagline resonates well with the users who know there is an element of their job that sucks but if his company can promise them it will suck less and that they can drastically improve being on-call, then he has created a brand promise his company can keep.
VictorOps is currently comprised of thirty passionate team members working hard and having fun in the heart of Boulder, Colorado. They are always growing and looking for great employees to add to the team!