We’re looking for a skilled Data Scientist with a focus on AI data to power the next generation of intelligent solutions. This role is perfect for someone who thrives on tackling complex challenges through machine learning, advanced analytics, and coding, and who’s eager to drive innovation through cutting-edge AI methodologies.
Responsibilities
- Semantic Modeling: Design and maintain semantic representations (e.g., ontologies, entity relationships) to enhance structured data queryability and AI-driven reasoning
- Natural Language to Structured Query Mapping: Design and evaluate approaches that interpret natural language questions and accurately map them to structured data queries (e.g., SQL or semantic equivalents)
- Data-to-Text Interpretation: Design and evaluate techniques for interpreting tabular data and generating human-readable natural language explanations
- LLM Prompt Strategy: Assist in developing and refining prompt engineering strategies to ensure accurate, interpretable, and relevant responses from large language models
- AI Algorithm Development: Design and implement AI algorithms that optimize key system processes and improve overall product performance
- Model Evaluation and Enhancement: Continuously evaluate, refine, and improve existing AI models and algorithms to enhance accuracy, scalability, and computational efficiency
- Research & Innovation: Stay current on advancements in AI, GenAI, NLP, and machine learning, incorporating promising techniques and ideas into product solutions
- Ethics & Compliance: Ensure all model development follows data privacy, governance, and ethical AI standards throughout the lifecycle
Requirements
- Master’s degree in Computer Science, Data Science or Applied Mathematics is required; a Ph.D. is strongly preferred
- 8+ years of experience in Data Science, with hands-on work in NLP, GenAI, and large-scale AI model development
- Strong problem-solving skills and the ability to translate complex business requirements into AI research tasks and deliverables
- Excellent communication skills (English), both verbal and written, with the ability to convey complex technical concepts to cross-functional teams
Technical Expertise
- Proven track record in applying data science, machine learning, and AI methodologies in real-world scenarios
- Strong MLOps experience to help build, deploy, and maintain machine learning models in production. Familiarity with cloud platforms (e.g., AWS, GCP, Azure)
- Understanding of model versioning, monitoring, and lifecycle management
- Proficiency in Python and common AI/ML libraries (e.g., PyTorch, TensorFlow, scikit-learn, Hugging Face Transformers, Langchain, Langgraph)
- Solid understanding of LLMs, RAG architectures, GraphRAG, techniques to extract knowledge from structured and unstructured data
- Experience with Agentic workflows and Agentic system design, Orchestration of LLM-powered agents
Research Execution
- Ability to define research goals, design experiments, analyze results, and iterate independently and collaboratively
- Publications or accepted papers in top AI/ML conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP, CVPR) in areas related to Generative AI
- Strong performance in GenAI competitions (e.g., Hugging Face Open LLM Leaderboard, AIcrowd, EvalHack, etc.)
- Participation in leading GenAI hackathons or innovation challenges (e.g., OpenAI, Hugging Face, Google, Anthropic-sponsored)
- Top rankings or meaningful contributions to Kaggle competitions focused on GenAI/NLP
- Demonstrated ability to rapidly prototype and scale GenAI solutions in applied settings (internal tools, open-source repos, agents, or copilots)
What we offer
- Competitive salary and benefits package
- Medical insurance
- Full Remote
- Collaborative and innovative work environment
- Career growth and development opportunities
- A chance to work with a talented and driven team of professionals
About the project
Our client develops a unique in-memory platform using innovative Machine Learning technologies. The product aims to help businesses’ achieve data and analytics processing needs with the highest speed, and to deliver real-time performance by reproducing companies’ data to the in-memory data store. An impressively fast-growing company that partners with the most leading enterprises from all over the world within various industries including healthcare, telecommunications, retail, etc.