Incident Manager - Information Technology
Job Description
The Application Operations (AO) organization’s overall responsibility is to ensure the availability of over 100+ applications used by the organizations exclusive agents, independent agents, customers, and customer service advocates nationally.
Team members are responsible for proactively increasing system availability, process efficiency, and the overall quality of the end-user experience.
Team members work with IT and Business Partners across the companies landscape.
Job Responsibilities
Be a contributing member of the Incident Management team within Applications Operations, focusing on support activities for agent-, customer-, and employee-facing applications.
Manage the lifecycle of production incidents, restoring services as quickly as possible when interruptions occur.
- Apply and enforce ITSM / ITIL standards for the incident management process (identification, logging, categorization, prioritization, response, diagnosis, escalation, resolution, and closure).
- Work with the Change Management and Problem Management teams to proactively reduce issue occurrence / recurrence.
- Work with the Problem Management team to continually improve runbooks and other technical documentation used during the incident management process.
- Work closely with the other Application Operations Teams as well as with the Application Development, Infrastructure Operations, Information Security, and Business teams, as needed.
- Drive high-severity incident calls. Address and resolve problems encountered by users, whether they require a quick fix or a major collaborative effort across various departments.
This includes providing clear, regular status updates and metrics for ongoing issues.
- Create / maintain / update documentation for tracking issues, errors, application changes, infrastructure related changes, incident resolution, etc.
- Learn and master the functionality of key applications across the company landscape and their underlying processes.
- Drive instrumentation, automation and process improvements.
- Collect impact and other data during the incident resolution process, draft initial RCA documentation, and provide an effective handoff to the Problem Management team.
As needed, participate in follow-up activities to seek root cause, update knowledge resources, and implement preventative measures.
Experience
- Problem-solving / analytical skills must excel at performing detailed analysis and resolving user problems of varying magnitudes
- Technical expertise must demonstrate the ability to understand large scale IT ecosystems and the ability to associate business processes to the IT systems supporting them.
Should have at least one strong area of technical expertise in areas like software development, infrastructure, telephony, or databases.
- Multitasking experience handling technical issues from multiple customers at the same time and assessing / prioritizing the associated impact.
- Communications Excellent communication, both verbal and written, including emails, reports, presentations, etc. Must have the ability to calmly focus a group of resources on the tasks required to recover business services when they are down.
- Customer service Must be able to communicate effectively with customers and co-workers at varying levels within an organization.
- Process improvements Experience identifying and driving process improvements resulting in efficiency gains and / or improved customer experience.
Required Experience
Required Education
- Bachelor’s degree in computer science, engineering, mathematics, or related discipline OR commensurate experience.