Site Reliability Engineer

Seeking a generalist who is strong in solving scalability, performance, security and deployment challenges on cloud infrastructure platforms.


We have a variety of applications and services deployed across a mix of cloud infrastructure platforms including AWS and Heroku and on-premise customer environments. We need someone to wrangle our production applications. Our Engineering and Production Application Support teams have built out the present infrastructure including programmatic cookbooks for configuration management, but much of our deployment and provisioning process is still manual and our provisioned resources are not being used efficiently. We would like to do more proactive monitoring and resolution of production problems before our customers are aware of them. 

We are looking for someone to evaluate what we have, keep what’s working, and implement changes to what’s not.

What you need:

  • Must have a solid background in programming principles and software design
  • Two to three years of professional experience with one or more of the following: Java, Ruby, Go, Python (or a similar alternative language).
  • Must be familiar with a Unix/LInux shell
  • AWS experience or other equivalent infrastructure as a service platform and previous experience managing AWS architecture (or equivalent) CloudFront, S3, EC2, RDS and services.
  • Experience with programmatic infrastructure as code and other DevOps.
  • Experience with Chef or equivalent programmatic configuration management tool.
  • Experience maintaining Jenkins or equivalent continuous integration/deployment tools.
  • Solid monitoring skills including log analysis and comprehensive application monitoring.
  • Willingness to collaborate and communicate with a geographically distributed team across 3 timezones over asynchronous tools.
  • Ability to debug code and automate routine tasks with code.
  • Experience with algorithms, data structures and principles of sound software design.

What you’ll do:

  • Own our AWS and Heroku infrastructure which consists of a few hundred servers and three applications.
  • Maintain production services once they are live by measuring and monitoring availability, latency, overall system health, and software issues.
  • Triage and help investigate production performance and other software issues and across services and levels of our stack in collaboration with our product engineers.
  • Take part in rotations to respond to availability incidents.
  • Implement, improve, and own our continuous integration (CI) and continuous deployment (CD) process including Chef infrastructure and codebase.
  • Plan for growth of our infrastructure to deliver better reliability, performance and consistently excellent customer experiences.
  • Maintain and own the security of our applications and infrastructure including  overseeing 3rd-party audits of applications and infrastructure.
  • Evaluate industry standard technologies like containers and serverless architectures as opportunities for innovation.

Hannon Hill Corporation

3423 Piedmont Road NE, Suite 520
Atlanta, GA 30305

Phone: 678.904.6900
Toll Free: 1.800.407.3540
Fax: 678.904.6901

GSA Contract Holder