- Growing fastShowed strong hiring growth in the past month
DevOps Engineer Intern - Cloud & AI ML Infrastructure
- ₹1L – ₹1.5L
- Remote •
- No experience required
- Full Time
Not Available
Remote only
Ratha Natarajan
About the job
DevOps Engineer Intern - Cloud & ML Infrastructure
About the Role
We're seeking an experienced DevOps Engineer to design, implement, and maintain our cloud infrastructure across Azure and GCP platforms, with a focus on ML operations and observability. This role combines traditional DevOps responsibilities with cutting-edge LLM deployment expertise.
About the Easework AI
https://www.linkedin.com/in/ratharamanan/
https://www.linkedin.com/company/easeworkai/?viewAsMember=true
https://youtu.be/8h9Qw08kyME?si=Hnj8A7qmwgn2xqak
What we're building:
- AI solutions crafted with real procurement experience
- Smart automation that reflects actual procurement workflows
- Connected Data that provides recommendations for strategy decision making
Our mission is to revolutionize procurement by combining practitioner expertise with AI capabilities, delivering:
✓ Solutions that speak procurement teams' language
✓ Technology that fits real-world workflows
✓ Measurable business impact from day one
Key Responsibilities
CI/CD & Automation
- Design and maintain CI/CD pipelines using industry-standard tools
- Implement automated testing, deployment, and rollback strategies
- Create and maintain Infrastructure as Code (IaC) using tools like Terraform
- Automate routine operational tasks and maintenance procedures
Cloud Infrastructure Management
- Manage multi-cloud infrastructure across Azure and GCP
- Configure and optimize virtual networks, load balancers, and VMs
- Implement and maintain robust security practices across cloud services
- Monitor and optimize cloud resource utilization and costs
Container & Kubernetes Operations
- Design and maintain container orchestration using Kubernetes
- Create and optimize Docker images for various services
- Manage container registries and implement container security practices
- Configure and maintain Kubernetes clusters across cloud providers
ML Operations & Observability
- Implement monitoring and observability solutions for LLM deployments
- Set up logging, metrics collection, and alerting systems
- Monitor model performance and infrastructure health
- Optimize ML inference and training infrastructure
Security & Compliance
- Implement security best practices across all infrastructure
- Maintain compliance with industry standards
- Manage access controls and authentication systems
- Conduct security audits and implement improvements
Required Qualifications
- DevOps or Site Reliability Engineering
- Azure and GCP cloud platforms
- Proficiency in Kubernetes and container orchestration
- Experience with CI/CD tools and practices
- Strong coding skills in Python and shell scripting
- Experience with Infrastructure as Code (Terraform, ARM templates)
Preferred Qualifications
- Experience with ML model deployment and MLOps
- Familiarity with LLM infrastructure and observability
- Cloud certifications (Azure, GCP)
- Experience with monitoring tools (Prometheus, Grafana)
- Knowledge of service mesh technologies (Istio)
Technical Skills
- Cloud Platforms: Azure, GCP
- Containerization: Docker, Kubernetes
- CI/CD: Jenkins, GitLab CI, GitHub Actions
- IaC: Terraform, ARM templates
- Monitoring: Prometheus, Grafana, ELK Stack
- Networking: Load balancing, Virtual Networks, DNS
- Security: IAM, Secret Management, Security Groups
- Languages: Python, Bash, YAML
About Our Team
We're a collaborative, highly innovative team focused on building robust, scalable infrastructure for AI ML applications. We value continuous learning, knowledge sharing, and maintaining high standards in our infrastructure and processes.
Benefits
- Intern convered to Full time role in 3-6 months
- Remote work options
- Learn & Practice cutting edge tech
- Innovate , Fail, Learn, Success & Iterate
- Build word class AI platform from scratch