Avatar for Hyve AI Labs
Hyve AI Labs
Actively Hiring
Hyve specializes in AI-based product development. We offer a wide range of IT services

Python Data Scientist (Web Crawling & Llm)

  • ₹6L – ₹15L
  • Remote • 
  • 3 years of exp
  • Full Time
Posted: 1 month ago
Visa Sponsorship

Available

Remote Work Policy

Remote only

Hires remotely in
Preferred Timezones
Dubai Time
RelocationAllowed
Skills
Python
Databases
PostgreSQL
Web Scraping
API
Scrapy
Python/Django/Flask
Databases (SQL and NoSQL)
Python Web Scraping (Beautiful Soup/Scrapy)
LLMs
Large Language Models (LLMs)
LLM Frameworks (Langchain, Claude, LLamaIndex) RAG Technologies Embedding Models Vect

About the job

We are looking for a motivated and detail-oriented Junior Python Data Scientist to join our data science team. The ideal candidate will have hands-on experience in web crawling, data cleansing, and data transformation, along with knowledge of building and training machine learning models. Experience with Large Language Models (LLMs) is a plus. You will collaborate with senior data scientists and engineers to support the collection, cleaning, processing, and analysis of large datasets that will drive business insights and model development.

Key Responsibilities:
Web Crawling & Data Collection:
Build and maintain web crawlers to extract large volumes of structured and unstructured data from various online sources using Python libraries like Scrapy, BeautifulSoup, or Selenium.
Data Cleansing & Preprocessing:
Clean, preprocess, and standardize raw data from various sources (e.g., scraped data, databases, APIs). Handle missing data, data inconsistencies, and outliers, ensuring the data is ready for analysis and modeling.
Data Transformation & Feature Engineering:
Apply data transformation techniques, such as normalization, aggregation, and encoding, to convert raw data into useful features for machine learning models. Work on feature extraction and engineering from textual and numerical data sources.
Exploratory Data Analysis (EDA):
Perform exploratory data analysis to uncover patterns, trends, and insights in the data. Generate visualizations using libraries like Matplotlib, Seaborn, or Plotly to summarize and communicate key findings.
Machine Learning Model Training:
Assist in building, training, and optimizing machine learning models for predictive analytics, classification, regression, or clustering using Python frameworks like scikit-learn, TensorFlow, or PyTorch.
Working with Large Language Models (LLMs):
Support senior team members in fine-tuning and deploying Large Language Models (LLMs), such as GPT, BERT, or similar, for NLP tasks like text classification, sentiment analysis, or entity recognition.
Model Evaluation & Optimization:
Evaluate model performance using metrics like accuracy, precision, recall, and F1-score. Assist in optimizing models using techniques such as hyperparameter tuning and cross-validation.
Documentation & Reporting:
Document your data pipelines, methodologies, and model outputs in a clear and structured manner. Communicate results and findings to both technical and non-technical stakeholders through reports, presentations, or dashboards.

Required Skills & Qualifications:
Education:
Bachelor’s degree in Computer Science, Data Science, Statistics, Mathematics, or related field.
Programming Languages:
Proficiency in Python and its data-related libraries, including Pandas, NumPy, scikit-learn, and Matplotlib.
Web Crawling:
Experience with web scraping tools and libraries, such as Scrapy, BeautifulSoup, or Selenium, and handling the challenges of web data collection.
Data Cleansing & Preprocessing:
Strong skills in data wrangling and cleansing, including handling missing data, outliers, and data inconsistencies in large datasets.
Machine Learning:
Familiarity with training basic machine learning models for classification, regression, and clustering using libraries like scikit-learn.
Large Language Models (LLMs):
Understanding of NLP techniques and working knowledge of LLMs (e.g., GPT, BERT) or an eagerness to learn and work with LLM-based tasks.
Data Transformation:
Experience with data transformation techniques, such as feature engineering, scaling, and encoding, to prepare data for model training.
Version Control:
Knowledge of version control systems such as Git for collaborative development.
Preferred Qualifications:
Experience with Databases:
Basic experience with SQL or NoSQL databases for querying and retrieving data.
NLP & LLM Experience:
Hands-on experience working with natural language processing (NLP) tasks like sentiment analysis, named entity recognition, or language generation using pre-trained models or custom solutions.
Cloud & Deployment Tools:
Familiarity with cloud platforms like AWS, Google Cloud, or Azure and experience in deploying models into production.
Data Visualization:
Experience with data visualization tools like Tableau, Power BI, or similar platforms for creating dashboards or reports.

Key Competencies:
Analytical Thinking:
Ability to think critically and analytically to solve problems related to data collection, cleansing, and transformation.
Attention to Detail:
Strong attention to detail, especially in dealing with large datasets, to ensure data accuracy and quality.
Adaptability:
Ability to learn and adapt quickly to new tools, libraries, and processes, especially in the fast-evolving data science and AI landscape.
Team Collaboration:
Work effectively within a team environment, collaborating with senior data scientists, engineers, and stakeholders.

About the company

Hyve AI Labs company logo

Hyve AI Labs

Actively Hiring
Hyve specializes in AI-based product development. We offer a wide range of IT services 11-50 Employees
Company Size
11-50
Company Type
Startup
Company Type
Portal
Company Type
Consumer Tecnology
Company Type
Software Development
Company Industries
Portals
Company Industries
B2B · SaaS · Mobile · Artificial Intelligence / Machine Learning
Learn more about Hyve AI Labs image

Similar Jobs

Neybox Digital company logo
Neybox Digital
Technology & Design for the well-being of everyone
N Beauty company logo
N Beauty
An immersive, content-led editorial e-commerce platform for the UAE market
Popcorn company logo
Popcorn
Enabling conversational commerce at scale
GENIE AI company logo
GENIE AI
An applied AI lab dedicated to bringing AI out of the lab and into the real-world
Re^2 company logo
Re^2
Re^2 is developing novel restaking and risk analysis solutions
Re^2 company logo
Re^2
Re^2 is developing novel restaking and risk analysis solutions
bitsCrunch  company logo
bitsCrunch
AI enhanced Decentralized Data Analytics & Forensics Protocol