Sr Cloud Operations SRE

Kraków, Małopolskie, Poland
View All jobs

Sr Cloud Operations SRE

OpenX, a leading provider of digital and mobile advertising technology, seeks a Senior Cloud Operations SRE. While OpenX serves hundreds of thousands of requests per second from thousands of servers across a worldwide data center footprint, we are expanding our infrastructure footprint into public clouds, such as AWS and GCP. You will be primarily responsible for the performance, uptime, and growth of various OpenX systems and services on the public cloud. Experience at large scale is desirable, though we are willing to train people with the right skills and attitude.

While the ideal candidate has extensive experience with on-demand burstable virtualized environments hosted on public cloud providers, you should also be comfortable managing physical servers on-premises at scale. Excellent communication skills are required in order to successfully interact with the rest of OpenX Engineering. Developing and supporting our infrastructure presents many interesting technical challenges. We especially desire candidates with a passion for open-source software and an interest in the latest technology trends.

Key Responsibilities

  • Design, implement, and support highly-performant, highly-available infrastructure on public clouds such as AWS, GCP, or equivalent providers
  • Demonstrate and promote best practices for teams deploying and supporting our infrastructure on public clouds
  • Monitor infrastructure, respond to incidents, correct and improve systems to prevent incidents, and plan capacity
  • Support system deployments and product releases
  • Tune large-scale clusters for optimal performance and efficiency
  • Participate in on-call rotation
  • Work closely with engineering, project management, operational, and engineering peers to develop innovative technical tools and solutions
  • Identify tactical issues and react to emerging areas of concern
  • Adhere to the DevOps philosophy by evangelizing communication, collaboration, and integration with software development teams
  • Think long-term and be unsatisfied with band-aids
  • Identify unnecessary complexity and remove it

Required Qualifications

  • Extensive experience maintaining a large production infrastructure hosted on AWS, GCP, or equivalent public cloud providers
  • Extensive understanding how to manage public cloud services and tasks, such as: auto-scaling, load balancing, VPC, and serverless computing (Lambda, GCF)
  • Automate tasks that are scalable, maintainable, and repeatable by utilizing public cloud provider’s APIs
  • Develop and manage resource utilization, billing, and optimization process in AWS, GCP, or equivalent cloud providers
  • Analyze performance bottleneck of our platform hosted in AWS, GCP, or equivalent public cloud providers based on monitoring data
  • Configure and manage security policies, resource auditing, compliance policies, and access controls to resources in AWS, GCP, or equivalent public cloud providers
  • Develop and operate backup/restoration procedure in AWS, GCP, or equivalent public cloud providers
  • Strong experience in a SysAdmin, SRE, DevOps, or equivalent role
  • Solid knowledge of the UNIX command-line and architecture
  • Solid knowledge of core protocols and tech such as: TCP/IP, HTTP, DNS, load balancers, distributed file systems, key-value and relational databases
  • Solid experience with configuration management tools such as SaltStack, Puppet, Chef, or Ansible
  • Automate tasks in at least one language (other than Bash), ideally Python, Go, or Perl
  • Demonstrated experience in network and large scale UNIX system troubleshooting and maintenance practices
  • Excellent organizational skills and the ability to work in a fast-paced and hectic work environment
  • Capable of technical deep-dives into code, networking, systems, and storage with very bright, experienced engineers
  • Must be willing to work some shifts outside of normal business hours, i.e afternoon/evening, weekends, and holidays
  • Humility and Integrity

Nice to Have

  • Solid experience using container technology such as Docker and cluster management with Kubernetes
  • Experience with cloud management platforms such as Terraform
  • Experience building cloud data pipelines using EMR, Big Data, Kinesis, or other technology
  • Understanding of programming languages such as Erlang, Java, Go, C/C++, or others are a plus
  • Self-starter with the ability to independently identify and act on areas of improvement
  • Knowledge and interest in the latest system architecture trends
  • Ability to rapidly learn and understand new systems
  • Ability to communicate effectively and write accurate, clear documentation

OpenX Values

Our five company values form a solid bedrock serving to define us as a group and guide the company. Our values remind us that how we do things often matters as much as what we do.

We are one

One team. No exceptions. We are a group of strong and diverse individuals unified by a clear common purpose.

Our customers define us

We know our business flourishes or dies because of our customers.

OpenX is mine

We are all owners of OpenX. We stake our personal and professional reputations on the excellence of our work.

We are an open book

We are eager to teach and share what we know with others.

We evolve fast

We take risks and confront failure openly. We recognize and repeat success aggressively. We actively seek out and provide constructive criticism. Defensiveness is for weaklings!