Modern Digital Watch

Junior SRE Engineer

System Administrator II --


**IAT Level II Certification Required. Candidates without required certification will not be considered**


In this role, candidates will create and maintain operations of site reliability engineering (SRE) efforts on multi-user High Performance Computing (HPC) systems using a variety of configuration management, IT monitoring, and automation tools within a Linux environment (RedHat, CentOS). Candidates will work to create a new Nagios Alerting Database, new SRE Database, and develop an effective consistent SRE automation protocol.

Candidates are preferred to have a Bachelor’s degree in Computer Science or related field, and have five years of demonstrable experience in systems administration and support of a large client-server based IT enterprise. 


Candidates will have experience and/or exposure with automation tools including: Puppet, Salt, Ansible, and Chef. Candidates shall also have experience with scripting in Bash, Python and/or Perl.


Additionally, candidates will have experience or exposure to XFS/ZFS File Systems and NFS/Block Storage FS Sharing; SSH, TMUX, PDSH, CLUSH system access; VI, EMACS, AWK/SES, CRON system editing; and Nagios, Ganglia, SNMP information technology monitoring systems. 

Salt Lake City, UT or Annapolis Junction, MD

REQUIRES ACTIVE TS/SCI CLEARANCE WITH POLY