- B2B
- Early StageStartup in initial stages
Senior Reliability Engineer
- 4 years of exp
- Full Time
Available
In office - WFH flexibility
Mahesh Viraktamath
About the job
Responsibilities
• Help build a Site Reliability Engineering culture across the organization by sharing your best practices, approaches, documentation, and code with other engineering teams
• Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually
• Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
• Monitor application performance take steps to improve overall application performance and stability and follow through with implementation
• Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability
• Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and organizational efficiency
• Work closely with software engineers and testers to ensure the system is responding properly to no-functional requirements such as performance, security, and availability
• Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it
• Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure
• Keep up-to date with security and proactively identify, diagnose, and solve complex security issues
Requirements
• 4+ years’ experience as SRE/DevOps Engineer - Mandatory
• Working closely with our engineering teams to understand their product requirements and how they build/test/deploy their software applications - Mandatory
• Demonstrable experience in Containerization-Docker and orchestration (Kubernetes) - Mandatory
• Demonstrable experience in CI/CD tools such as bitbucket, bamboo, nexus and helm - Mandatory
• Experience with Infrastructure As Code (Terraform, Cloud Formation, Ansible)
• Knowledge and proven hands-on experience in large-scale databases and distributed technologies, such as Kafka and Confluent Platform Kafka
• Basic programming and scripting skills (preferably Golang, bash, shell, etc.,)
• Ability to provide advice, best practices and recommendations for the operation and deployment of Microsoft Azure
• Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools - Prometheus/Grafana, Nagios, New Relic, Perfmon, PerfView, ProcDump, DebugDiag
• Familiarity with Linux and UNIX systems (e.g. CentOS, RedHat) and command line system administration such as Bash, VIM, SSH.
• Hands on experience in configuration management of server farms (using tools such as Puppet, Chef, Ansible, etc.,).
• Network routing, Load balancing and Networking protocols, a base knowledge of TCP/IP, with an understanding of HTTP and DNS
• SRE & Agile methodologies