Senior Monitoring Engineer
Posted on: April 5, 2021
At Altais, we're looking for bold and curious innovators who
share our passion for enabling better health care experiences and
revolutionizing the healthcare system for physicians, patients, and
the clinical community. Doctors today are faced with the reality of
spending more time on administrative tasks than caring for
patients. Physician burnout and fatigue are an epidemic, and the
healthcare experience and quality suffer as a result. At Altais,
we're building breakthrough clinical support tools, technology, and
services to let doctors do what they do best: care for people. Come
join us as an early member of our passionate and growing team as we
change the game for the future of healthcare and enable the
experience that people need and deserve.
DescriptionDo you enjoy working with a highly motivated and
talented team to deliver mission-critical healthcare solutions that
change the way healthcare is delivered? Altais is growing our Site
Reliability Engineering team to help deploy, manage, troubleshoot,
and enhance our complex cloud-based services for our customers.
We are looking for a highly technical, hands-on Monitoring Engineer
with experience using Datadog, Gremlin, Prometheus, and Jaeger. You
will be managing our Kubernetes Lifecycle: deployments, upgrades,
monitoring, and uptime of all K8S clusters. You will help to
advance the deployment process of software into Kubernetes with
CodeFresh + Terraform at massive scale. Everything is Code!
Your focus will be on building out our Monitoring infrastructure,
reporting, incident management and business continuity. Team
members all participate in an on-call rotation.
You will build innovative automated solutions and tools to help
debug and resolve problems in production and prevent them from
recurring. Further, you will proactively seek out system weaknesses
and find ways to fix them before they cause production issues using
monitoring data, watching trends, and using Chaos Engineering.
This position is located in our San Francisco Montgomery Street
location and will move to our brand-new Oakland City Center
location in Fall 2020.
- Accountable and responsible for the Monitoring infrastructure
scalability, resilience and security ensuring
Monitoring/Automations services delivery
- Provides technical leadership, guidance and training for the
Monitoring/Automations team. Including managing priorities,
monitors and manages execution timelines, and quality of work
- Automating work including infrastructure needs, testing,
- Application and architecture performance and code
- Developing CI/CD processes to improve monitoring cadence
- Working closely with internal partners and teams to ensure that
we ship software that meets security, SLA, and performance
- Debugging complex problems across an entire stack and creating
- Post incident-reviews to find out what's working and what's not
and improving them by filling the gaps in the process
- Writing, updating, and user documentation, including
- Using Chaos Engineering to test what you build under real-world
- Running monthly Chaos Engineering "Game Days"
- 7 years of experience with software engineering, software
development, or system operations
- Expert experience with monitoring, dashboarding, and
observability with Datadog, Prometheus, and Mobile Application
monitoring (iOS and Android)
- Expert experience with building incident management and
integrations between Datadog, Jira Service Management, Ops Genie,
- Experience designing, building, and operating large-scale
production Software-as-a-Service platforms
- In depth practical knowledge of infrastructure management,
access, and monitoring systematic flows
- Production experience with DevOps or site reliability
engineering running web and/mobile applications
- Excellent communication skills, both verbal and written
- Advanced experience on Terraform
- Hands-on expert experience with AWS cloud platform
- Experience debugging complex problems, including application
running on Kubernetes platform and EC2 instances
- Knows their way around a Unix/Linux shell, can write shell
scripts, and understands Linux internals
- A solid understanding NodeJS and Java
- Moderate understanding on how database works, writing queries
to interact with databases, and troubleshooting complex data
layers. Open-source databases (MySQL, Postgres, Redis, Cassandra,
- A solid understanding of networking and core Internet protocols
(e.g. TCP/IP, DNS, SMTP, HTTP, and distributed networks)
- Understands networking and messaging, especially between
- Has hands-on experience using source control (Git, GitHub) and
feature branching strategies
- Have a track record of embedding security into the fabric of an
organization and infrastructure.
You Share our Mission & Values: do not change
- You are passionate about improving the healthcare experience
and want to be part of the Altais mission.
- You are bold and curious- willing to take risks, try new things
and be creative.
- You take pride in your work and are accountable for the quality
of everything you do, holding yourself and others to a high
- You are compassionate and are known as someone who demonstrates
emotional intelligence, considers others when making decisions and
always tries to do the right thing.
- You co-create, knowing that we can be better as a team than
individuals. You work well with others, collaborating and valuing
diversity of thought and perspective.
- You build trust with your colleagues and customers by
demonstrating that you are someone who values honesty and
Keywords: Altais, Oakland , Senior Monitoring Engineer, Engineering , Oakland, California
Didn't find what you're looking for? Search again!