AI response evaluation, ranking, and preference labeling. Chatbot quality rating, hallucination detection, red teaming, and prompt-response quality assessment for large language models.
Reviewing AI-generated code for correctness, efficiency, and style in Python, JavaScript, TypeScript, Solidity, and AssemblyScript.
Building and refining prompts for LLM systems (Claude, Grok, GPT). Iterative optimization and structured response tuning.
Text labeling, entity recognition, sentiment classification, intent tagging, bounding box annotation, image classification, and metadata generation.
Built AI-powered production systems using LLM APIs (Claude, Grok, GPT). Structured output parsing, iterative prompt refinement, and response quality tuning for deployed agents.
Automated QC workflows, validation rules, accuracy monitoring. 95%+ accuracy across all projects. Clean, model-ready data at scale.
AI data trainer with an engineering mind. I build the datasets that make models smarter.
B.Eng in Mechanical Engineering. 3+ years working remotely across time zones. I have annotated and curated 3,333+ structured dataset items, built automated QC pipelines, and evaluated AI-generated text and code for correctness.
My technical background in Python, JavaScript, TypeScript, and SQL means I understand how labeled data flows into ML pipelines and what quality standards models need to perform.