Data Scientist
- £70k – £80k
- Remote •
- Full Time
Not Available
Onsite or remote
Amber Jayne
About the job
Data Scientist
Location: London or surrounding areas
Team: Data Science
Reports To: Data Science Lead
*Job Overview: *
We are seeking a highly skilled and innovative Data Scientist to join our team at RevEng.ai, a cutting-edge AI-driven startup specialising in cyber security. The ideal candidate will have expertise in building and maintaining large-scale datasets and optimising data pipelines to facilitate the training and serving of cutting edge ML models.
You will be responsible for constructing, maintaining, and analysing extensive datasets of binaries to support the development of advanced AI models for identifying and mitigating cyber threats. You will work closely with the ML team to understand data requirements and build high-quality labeled binary datasets for various architectures and operating systems from diverse sources.
*What We Offer: *
- A comprehensive benefits package including a top tier private healthcare plan.
- Share options in a rapidly growing AI cybersecurity company backed by some of the best tech venture capitalists in the world.
- Opportunity to work on innovative projects that address real-world cybersecurity challenges.
- A collaborative and inclusive work environment that values creativity and innovation.
- Professional development opportunities, including training and conferences.
- Flexible working hours and hybrid working as standard.
- Weekly team lunches.
- Access to a vibrant office with social events in the heart of London.
- An extra day off as holiday on your birthday!
Key Responsibilities of the role:
Building high-quality binary datasets:
- Contribute to building a diverse set of binaries from various sources on different operating systems, architectures, and using different compilers.
Data Processing and Analysis:
- Implement distributed processing of large datasets, extracting and labeling key features required for training ML models.
- Identify, engineer, and select relevant features from raw data to improve the performance and accuracy of AI models for identifying security threats.
- Apply domain knowledge to develop innovative feature sets that capture the complexities of cybersecurity data.
Data Pipeline Optimisation:
- Manage and optimise pipelines for processing large datasets.
Compliance and Ethical AI Practices:
- Ensure that all data handling, model development, and deployment are compliant with relevant regulations (e.g., GDPR) and cyber security standards.
- Address potential biases in AI models to maintain ethical AI practices and prevent unintended consequences in security-related decision-making.
*Documentation and Knowledge Sharing: *
- Maintain clear documentation of model development processes, data pipelines, and experimental results.
- Share knowledge and insights gained from research and model development with the broader team to promote best practices and continuous learning.
*Preferred Qualifications: *
- Strong programming skills - experience with C/C++ and Python are desirable.
- Strong knowledge of SQL.
- Strong understanding of compiler frameworks such as GCC, LLVM and Visual Studio.
- Proficiency in working with cloud computing platforms (e.g., AWS, GCP, Azure).
Experience with version control systems (e.g., Git) and containerization technologies (Docker).
Desirable - knowledge of big data technologies (e.g. Spark, Hadoop)
Desirable - Experience building/compiling complex software projects on Windows and Linux. Desirable - Experience working with very large (terabyte scale) datasets.
Desirable - Experience with MLOps practices and tools for model deployment and lifecycle management.
Desirable - Understanding of regulatory requirements and compliance in cybersecurity, such as GDPR or ISO 27001.