OpenX, a leading provider of digital and mobile advertising technology, seeks a Senior Google Cloud SRE (Software Release Engineer). While OpenX serves 50+ billion of requests per day from thousands of servers across a worldwide data center footprint, we are migrating our infrastructure footprint into Google Cloud Platform (GCP). You will be primarily responsible for the performance, uptime, and growth of various OpenX systems and services on GCP. Experience at large web scale is desirable, though we are willing to train people with the right skills and attitude.
Similar to Google’s approach to SRE, you should adhere to the engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. Much of your software development focuses on optimizing existing systems, building infrastructure and eliminating work through automation.
While the ideal candidate has extensive experience with on-demand burstable virtualized environments hosted on public cloud providers, you should also be comfortable managing physical servers on-premises at scale in the interim. Excellent communication skills are required in order to successfully interact with globally distributed OpenX teams operating in a 24×7 manner. Developing and supporting our infrastructure presents many interesting technical challenges. We especially desire candidates with a passion for open-source software and an interest in the latest technology trends.
- Design, write, and deliver software to implement and support large web scale, highly-performant, highly-available infrastructure on GCP
- Demonstrate and promote best practices for teams deploying and supporting our infrastructure on GCP
- Monitor infrastructure, respond to incidents, correct and improve systems to prevent incidents, and plan capacity
- Support system deployments and product releases
- Tune large-scale clusters for optimal performance and efficiency
- Participate in on-call rotation
- Work closely with engineering, project management, and operational peers to develop innovative technical tools and solutions
- Identify tactical issues and react to emerging areas of concern
- Adhere to the DevOps philosophy by evangelizing communication, collaboration, and integration with software development teams
- Think long-term and be unsatisfied with band-aids
- Identify unnecessary complexity and remove it
- Extensive experience maintaining a large production infrastructure hosted on GCP, AWS, or equivalent public cloud providers
- Extensive understanding how to manage public cloud services and tasks, such as: VPC; load balancing; relational and non-relational datastores (e.g., Google Cloud SQL, Memorystore, AWS RDS); storage (e.g., GCS, AWS S3); monitoring (e.g., GCP Stackdriver, AWS CloudWatch, Prometheus); serverless computing (e.g., GCF, AWS Lambda); and auto-scaling
- Solid experience with software development life cycle (SDLC) best practices, such as test-driven development (TDD), algorithms, data structures, complexity analysis, CI/CD, and software design
- Automate tasks that are scalable, maintainable, and repeatable by utilizing APIs and practicing Infrastructure-as-Code through GitOps
- Experience with managing large scale Kubernetes clusters in a microservices and containerized environment using Docker
- Develop and manage resource utilization, billing, and optimization process in GCP
- Analyze performance bottleneck of our platform hosted in GCP based on monitoring data
- Configure and manage security policies, resource auditing, compliance policies, and access controls to resources in GCP
- Develop and operate backup/restoration procedure in GCP
- Strong experience in a SysAdmin, SRE, DevOps, or equivalent role
- Solid knowledge of the *NIX command-line and architecture
- Strong knowledge of core protocols and tech such as: TCP/IP, HTTP, DNS, load balancers, distributed file systems, relational and non-relational datastores
- Automate tasks in at least one language (other than Bash), ideally Python, Go, or Perl
- Demonstrated experience in network and large scale *NIX system troubleshooting and maintenance practices
- Excellent organizational skills and the ability to work in a fast-paced and hectic work environment
- Capable of technical deep-dives into code, networking, systems, and storage with very bright, experienced engineers
- Must be willing to work some shifts outside of normal business hours, i.e., afternoon/evening, weekends, and holidays
- Humility and Integrity
- Solid experience with cloud management platforms, such as Terraform
- Solid experience building GCP big data platforms, such as DataProc, BigQuery, Pub/Sub, and other technology
- Understanding of programming languages, such as Java, Erlang, C/C++, or others
- Experience with configuration management tools such as SaltStack, Puppet, Chef, or Ansible
- Self-starter with the ability to independently identify and act on areas of improvement
- Knowledge and interest in the latest system architecture trends
- Ability to rapidly learn and understand new systems
- Ability to communicate effectively and write accurate, clear documentation
Company at a Glance
OpenX is focused on unleashing the full economic potential of digital media companies. We do this by making digital advertising markets and technologies that are designed to deliver optimal value to publishers and advertisers on every ad served across all screens.
At OpenX, we have built a team that is uniquely experienced in designing and operating high-scale ad marketplaces, and we are constantly on the lookout for thoughtful, creative executors who are as fascinated as we are about finding new ways to apply a blend of market design, technical innovation, operational excellence, and empathetic partner service to the frontiers of digital advertising.
Our five company values form a solid bedrock serving to define us as a group and guide the company. Our values remind us that how we do things often matters as much as what we do.
We are one
One team. No exceptions. We are a group of strong and diverse individuals unified by a clear common purpose.
Our customers define us
We know our business flourishes or dies because of our customers.
OpenX is mine
We are all owners of OpenX. We stake our personal and professional reputations on the excellence of our work.
We are an open book
We are eager to teach and share what we know with others.
We evolve fast
We take risks and confront failure openly. We recognize and repeat success aggressively. We actively seek out and provide constructive criticism. Defensiveness is for weaklings!