Job Description
Client: Royal Caribbean Cruise Lines
Location: Miami, FL
Website: Duration: 6+ month contract
Site Reliability Engineer Description: Consultant will play a critical role in ensuring the reliability, performance, and seamless operation of our digital ecosystem. This includes our guest-facing mobile apps, websites, and the backend systems that power them. You will work collaboratively with development, operations, and product teams to build and maintain a highly resilient and scalable digital experience for our guests.
Essential Duties and Responsibilities: - Incident Response and Resolution: Respond to and resolve production incidents, prioritizing guest-facing issues to minimize disruption. Conduct root cause analysis with guidance from senior team members and implement preventive measures to avoid recurrence.
- Monitoring and Observability: Build, maintain, and enhance monitoring tools and dashboards (using Prometheus, Grafana, or similar) to provide visibility into system health, performance, and guest impact. Proactively detect and address potential issues.
- Automation and Tooling: Develop and implement automation scripts and tools to streamline operations, reduce manual intervention, and improve system reliability. Utilize configuration management tools and infrastructure as code principles.
- Collaboration: Work closely with product teams to incorporate reliability principles into new feature development. Collaborate with operations teams to ensure smooth deployments and transitions.
- Documentation and Knowledge Sharing: Create and maintain clear documentation on system architecture, troubleshooting guides, and incident postmortems. Share knowledge and best practices with the team.
- On-Call Support: Participate in on-call rotation as defined by team needs, primarily focusing on acknowledging and escalating incidents, with guidance from senior team members.
- Working Hours: Expectations of non-standard working hours which include mornings, nights, and weekend rotations.
Knowledge and Skills: - Technical Expertise: Strong knowledge of mobile (iOS, Android) and web technologies, backend systems, cloud infrastructure (AWS, Azure, etc.), and database technologies.
- Programming: Proficiency in one or more programming languages (e.g., Python, Java, Go, Jenkins) for scripting and automation. Working knowledge of and Kubernetes a high plus.
- Monitoring and Observability: Experience with tools like Prometheus, Grafana, Splunk, or similar.
- Incident Management: Experience with incident management tools like PagerDuty, ServiceNow, or similar.
- Security: Understanding of security best practices, vulnerability identification, and incident response.
- Communication: Excellent written and verbal communication skills for collaborating with diverse teams and stakeholders.
- Customer Service: Understands and is aligned to the purpose of providing a great client experience (client focused attitude)
- Detailed Oriented: The ability to understand and appreciate the fine, granular details.
- SQL Database: Ability to work with large volumes of customer data. Ability to use Oracle SQL (or similar) to query databases and perform edits to SQL queries.
Preferred Qualifications: - 5+ years of demonstrated proficiency in one or more scripting languages such as python, Go, etc
- 3+ years of experience with Kubernetes or equivalent
- 5+ years of Software development experience in Java, JavaScript etc.
- 3+ years of experience with containers and container orchestrators - Docker, Kubernetes
- 5+ years of demonstrated experience debugging and fixing system/infrastructure and application issues
- 5+ years of experience working with monitoring tools such as Prometheus, Grafana, Splunk, Google stack driver, etc.
- 5+ years of experience with databases (SQL or NoSQL)
- 5+ years of experience with log analysis and building dashboards
- At least 6 years in a Reliability Engineering, DevOps or infrastructure focused role
- Advanced experience with programming languages ( Python, Java)
- Deep systems and infrastructure knowledge
- Excellent troubleshooting and problem-solving skills
- Experience with high-traffic, guest-facing systems.
Our benefits package includes: - Comprehensive medical benefits
- Competitive pay
- 401(k) retirement plan
- ...and much more!
About INSPYR Solutions Technology is our focus and quality is our commitment. As a national expert in delivering flexible technology and talent solutions, we strategically align industry and technical expertise with our clients' business objectives and cultural needs. Our solutions are tailored to each client and include a wide variety of professional services, project, and talent solutions. By always striving for excellence and focusing on the human aspect of our business, we work seamlessly with our talent and clients to match the right solutions to the right opportunities. Learn more about us at inspyrsolutions.com.
INSPYR Solutions provides Equal Employment Opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, national origin, age, disability, or genetics. In addition to federal law requirements, INSPYR Solutions complies with applicable state and local laws governing nondiscrimination in employment in every location in which the company has facilities 24-10702
Job Tags
Contract work, Local area, Flexible hours, Night shift, Day shift,