Manager, IP4G Platform Operations (remote)
Converge Technology Solutions
2024-11-05 08:40:17
Atlanta, Georgia, United States
Job type: fulltime
Job industry: Other
Job description
Job Summary:
This role is full-time and permanent with Converge Technology Solutions. We are searching for a detail-oriented and analytical Platform Operations Manager to oversee daily operations of the Converge IBM Power for Google Cloud (IP4G) platform. You will lead a team of cloud engineers, supporting customers through onboarding, incident management, and build activities, while collaborating with Product, Engineering, and Development teams to ensure operational goals align with customer outcomes. We are experiencing explosive growth and invest heavily in our team members.
What you will accomplish:
First 30 Days:
- Familiarize yourself with the organization's strategy, objectives, and priorities
- Build relationships with the cloud operations team, engineers, and other cross-functional teams to understand their challenges, strengths, and priorities
- Identify and document current operational processes, key workloads, and performance metrics (uptime, incident response times, etc.)
- Identify and document any urgent risks or critical performance issues that need attention
First 60 Days:
- Address any high-priority issues or risks identified during the first 30 days
- Define key performance indicators (KPIs) for cloud operations - e.g., uptime/availability, incident response metrics, internal initiative(s) state, cost optimization, and team performance metrics
- Aggregate relevant data-sources to establish reporting mechanisms to support the business critical KPIs
- Begin assessing team members' skills and identifying any gaps or training needs.
First 90 Days:
- Create a detailed strategy for cloud operations that aligns with the overall business goals and organizational imperatives
- Ensure the team is aligned with the new processes and that they are empowered to take ownership of critical areas
- Establish a regular reporting structure to keep stakeholders informed on cloud operations team performance, Customer satisfaction, cost, SLO/SLA adherence, and project status among other items as identified
- Establish individual development plans, aligned to organizational MBOs and strategic business objectives
Key Responsibilities:
- Oversee the resolution and response of platform and Customer impacting incidents, minimizing service disruptions and downtime with an emphasis on clear, concise, and structured Customer communications
- Optimize and streamline incident response processes, tooling, and capabilities
- Establish and oversee the capacity management and planning framework that considers both the technical and commercial aspects of platform scaling
- Curate detailed and actionable operational and capacity data-visualization(s) for stakeholders, including executive leadership and technical audiences
- Manage adherence to security controls within the platform environment - SOC, PCI-DSS, and others as appropriate
- Collaborate with Product Development and Platform Engineering teams to support platform deployments and ensure frictionless operations and maintenance activities
- Lead process optimization efforts to drive efficiency, reduce manual intervention, and enhance the Customer experience through automation and orchestration
Qualifications:
- 7+ years of experience in cloud operations, infrastructure management, or a related IT operations role
- Proven leadership experience managing cloud teams and complex multi-disciplinary projects
- Demonstrated ability to lead teams of cloud engineers and operations staff with a focus on developing technical skills and operational excellence
- Experience managing cross-functional teams and working closely with developers, product owners, and business stakeholders
- Proven track record of managing large-scale cloud migrations or multi-cloud environments
- Experience working in Managed/Cloud Service provider business
- Proven experience in cost management and financial optimization in cloud environments
- Strong incident management and troubleshooting skills
- Experience with monitoring and observability tools like CloudWatch, Datadog, Prometheus, Grafana, or similar.
Work Environment:
- Remote within the United States
Total Rewards:
- We offer a comprehensive total rewards package that includes base salary, healthcare benefits, 401k match, stock match program, PTO/holiday, training/development, promotional opportunity and so much more.
Converge Technology Solutions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws. This policy applies to all terms and conditions of employment, including recruiting, hiring, placement, promotion, termination, layoff, recall, transfer, leaves of absence, compensation and training.