Site Reliability Engineer (SRE)

Full-Time
Engineering
APPLY FOR THIS JOB

UrbanSDK is a modern data analytics and machine learning company working in the exciting "smart city" space. We are a high-growth, venture backed analytics company focused on delivering great data, tools, applications and solutions that allow modern cities to operate with greater efficiency and insight. Site Reliability Engineering (SRE) combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that our services — both our internal and customer-visible systems—have reliability, usability and uptime appropriate to users' needs and a fast rate of improvement. SRE’s will oversee testing, sprint development QA, certify releases and monitoring of our application capacity and overall performance. 

As the lead SRE, you’ll have the opportunity to implement testing automation, manage QA, and tackle the challenges of scaling a web based big data platform. We’re an early-stage venture backed company that requires self-direction to work on top priority projects, while we also promote a creative environment for engineers to learn and grow. SREs own the full stack, from cross browser debugging to enhancing performance and maintaining a release schedule. 

Responsibilities:

  • Manage reliability for production, staging, test and development environments for our applications.
  • Establish and maintain testing procedures and reliability process
  • Manage SRE stack (Selenium, Ansible, and Spinnaker etc)
  • Resolved and Develop tickets for product development team in Gitlab
  • Certify bi-weekly sprint releases
  • Build automation by creating tools using Python or bash
  • Provide metrics to improve the stability, security, efficiency and scalability of systems
  • Determine future needs for capacity and investigate new products and/or features


Qualifications

  • Experience with deploying and managing Jenkins CI/CD pipelines
  • Experience with microservices architecture and container orchestration such as Kubernetes, Docker or other container orchestration framework
  • The ability to design, author, and release code in languages like React and Python
  • Experience in database technologies - MongoDB, PostGres
  • Understanding of web applications, linux operating system, standard networking protocols, and components
  • Experience with deploying, supporting and monitoring new and existing services, platforms, and application stacks
  • Experience with scale testing, disaster recovery, and capacity planning
  • Strong experience in QA testing, troubleshooting complex production issues
  • Previous experience working with remote teams


UrbanSDK is an equal opportunity employer. 

Apply Now

Thank you for applying! We review your application and will get back to you ASAP.
Oops! It looks like something went wrong. Please try again.