Job Description

Job Title: Site Reliability Engineer

Location: Bellevue, WA/ Frisco, TX/ Atlanta, GA/ Overland Park, KS - Hybrid

Term: Contract

Job Description:

Key Responsibilities:

Kubernetes Management: Deploy, manage, and optimize Kubernetes clusters in production and staging environments, ensuring high availability and efficient resource utilization.
AWS Infrastructure: Leverage AWS cloud services (EC2, S3, RDS, EKS, Lambda, etc.) to build, manage, and scale cloud-native infrastructure.
Automation & Infrastructure as Code: Develop and maintain automated workflows using Infrastructure as Code (IaC) tools like Terraform, CloudFormation, or Ansible to provision, configure, and manage cloud infrastructure.CI/CD Pipeline Support: Build, optimize, and maintain CI/CD pipelines to enable seamless code delivery and deployments, using tools like Jenkins, GitLab CI, or CircleCI.
Monitoring & Observability: Implement and maintain monitoring, alerting, and logging solutions using tools such as Prometheus, Grafana, CloudWatch, or ELK stack to ensure system health and availability.
Incident Response: Lead and support incident response efforts, conduct root cause analysis, and implement post-incident reviews to improve system resilience.
Performance Optimization: Identify and resolve performance bottlenecks, improve system efficiency, and ensure applications and infrastructure are optimized for both cost and performance.
Security & Compliance: Work with security teams to implement best practices for securing Kubernetes clusters, AWS resources, and platform infrastructure, including access controls, network policies, and encryption.
Collaboration & Documentation: Work closely with development, DevOps, and infrastructure teams to align on best practices, improve automation, and document procedures for infrastructure management and troubleshooting.

Required Qualifications:

Kubernetes Expertise: Strong expertise in managing and scaling Kubernetes clusters, including experience with Kubernetes networking, storage, and multi-cluster architectures.
AWS Cloud Expertise: Proficiency with AWS services such as EC2, S3, EKS, RDS, VPC, Lambda, IAM, CloudWatch, and others. Experience with AWS best practices for scalability, security, and cost management.
Infrastructure as Code (IaC): Hands-on experience with IaC tools such as Terraform, AWS CloudFormation, or Ansible for provisioning and managing cloud infrastructure.CI/CD Pipelines: Experience building and maintaining continuous integration and continuous deployment (CI/CD) pipelines using Jenkins, GitLab CI, or similar tools.
Scripting & Automation: Proficiency in scripting languages such as Python, Bash, or Go to automate operational tasks and improve workflows.
Monitoring & Logging: Experience with monitoring, logging, and alerting tools like Prometheus, Grafana, CloudWatch, ELK stack, or similar tools.
Troubleshooting & Incident Management: Ability to troubleshoot complex issues in distributed systems, conduct root cause analysis, and implement solutions to prevent recurrence.
Collaboration Skills: Strong communication skills with the ability to work collaboratively with developers, operations, and product team

Key Skills:

Kubernetes, AWS Cloud, Reliability Engineer, CI/CD, IAC

Job Tags

Contract work,

Similar Jobs

FGS LLC

Special Security Officer *(San Diego) Job at FGS LLC

...Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status, and will not be discriminated against on the basis of disability.

Aira

Visual Interpreter FTE Job at Aira

Aira is visual interpreting. We are an assistive technology company on a mission to remove barriers and make the world more accessible. We operate 24/7/365 in three languages worldwide and provide service wherever there is an internet connection. Our service is live...

Gulfstream Aerospace Corporation

Cabinet Maker I Job at Gulfstream Aerospace Corporation

...Cabinet Maker Iin GAC Appleton Unique Skills: Second Shift Postion: Safe operation of hand tools and standard power tools for wood working Attention to detail Clear communication Aircraft cabinetry process will be trained Prior woodworking skills...

Weill Cornell Medicine

Patient Navigator | Weill Cornell Medicine Job at Weill Cornell Medicine

...Monday-Friday Weekly Hours: 35.00 Exemption Status: Non-Exempt Salary Range: $28.57 - $36.15 *As required under NYC Human Rights Law Int 1208-2018 - Salary range for this role when Hired for NYC Offices Position Summary Under direct supervision, the Patient...

Datwyler

Quality Manager Job at Datwyler

...25 production sites on four continents, sales in more than 100 countries and over 8,000 employees, the company, headquartered in Switzerland, generates annual sales of more than CHF 1,000 million.Our employees are the heart of Datwyler - we treat each other with respect...

Site Reliability Engineer Job at VDart Inc, Atlanta, GA

VWZVblpzdmxNVStrV1ZpYjllR1lIdjhPa0E9PQ==